design-patterns: Design pattern for handling large datasets for machine learning

mercredi 16 août 2017

Design pattern for handling large datasets for machine learning

I'm currently attempting to scrape data from websites and building a large (and potentially growing with time) dataset from it. I'm wondering if there's any good practices to adopt when processing, saving and loading large datasets.

More concretely, what should I do when the dataset I want to save is too large to store in RAM, then writing to disk in one go; and writing it one data-point at a time is too inefficient? Is there an approach smarter than writing to file a moderately-sized-batch at a time?

Thank you for your time!

design-patterns

mercredi 16 août 2017

Design pattern for handling large datasets for machine learning

Aucun commentaire:

Enregistrer un commentaire