I'm currently attempting to scrape data from websites and building a large (and potentially growing with time) dataset from it. I'm wondering if there's any good practices to adopt when processing, saving and loading large datasets.
More concretely, what should I do when the dataset I want to save is too large to store in RAM, then writing to disk in one go; and writing it one data-point at a time is too inefficient? Is there an approach smarter than writing to file a moderately-sized-batch at a time?
Thank you for your time!
Aucun commentaire:
Enregistrer un commentaire