mercredi 31 janvier 2018

Is it a good practice to preallocate an empty dataframe with types?

I'm trying to load around 3GB of data into a Pandas dataframe, and I figured that I would save some memory by first declaring an empty dataframe, while enforcing that its float coulms would be 32bit instead of the default 64bit. However, the Pandas dataframe constructor does not allow specifying the types fo multiple columns on an empty dataframe.

I found a bunch of workarounds in the replies to this question, but they made me realize that Pandas is not designed in this way.

This made me wonder whether it was a good strategy at all to declare the empty dataframe first, instead of reading the file and then downcasting the float columns (which seems inefficient memory-wise and processing-wise).

What would be the best strategy to design my program?

Aucun commentaire:

Enregistrer un commentaire