lundi 18 mai 2015

python - datastructure design store or parse

I'm creating / have created a word suggestion app based on bigram to get familiar with python and I like to get some advise on designing programs in general.

Input data:

 0;1;1;2
 1;1;2;1

Representing the id, the baseWord, followingWord and the frequency

So if it get for instance 'I' as input, it will return all possible following words of 'I', like 'am','want' etc

For the moment I decided to create a list of dict populated with the raw data and saved it (pickle) so I can decide later how to store the data.

After coding the main parts of the program I decided to profile the program to looking for speed improvements and it appear that the loading of the datastructure from a pickle file is taken 75% of the execution time. So I tought what would happen if I store only the raw data into a pickle file and create a parse function to build the datastructure. The result was that the size of the pickle file reduced with 60% and also the "loading function" was reduced significant, but the overall execution time was increased by 20%.

I have looked if I could tweak the parse function:

 def _parseGramData(self):
    """_parseGramData: Parsing raw gramData 
    :returns: list of dictionaries

    """
    keys = ['id', 'baseWord', 'followingWord', 'freq']
    self.gramData = [dict(zip(keys, value)) for value in 
    self.rawGramData]

Beside creating a list of a list instead of a list of a dict, I can't find an improvement.

I'm just wondering if it is just a design decision to choose between file size versus time execution. What are in general the options?

Aucun commentaire:

Enregistrer un commentaire