mardi 29 octobre 2019

How to (lazy) data load the same big dataset from multiple python modules

I have multiple (python) modules where I use the same input data and also the variables have the same names.

I created a module data_loading.py where the variables are instantiated. I then import the variables I need in the data_analysis_xx modules.

For example,

" Module data_analysis_1 "
from data_loading import var_1, var_2,…, var_k 

" Module data_analysis_2 "
from data_loading import var_1, var_3

In this way I avoid to copy-and-paste the same 200 lines of code in every module to load the same or partially the same set of data

First question:

is using a single source module for data loading the right approach? Is there a standard way or anyway a better way for importing the same variables in multiple modules?

In data_loading I also do some basic data manipulation, which can be time consuming, for example integrity check, split, cut, sort, etc. Problem: this can be time consuming. When I import data_loading, all the variables in it are loaded/processed even if I need only one or few variables.

Second question:

how to make the data_loading module work such that only the variables that really need to be loaded/processed are actually processed?

Possible solutions

Aucun commentaire:

Enregistrer un commentaire