jeudi 29 mars 2018

How to split procedural code in python

I am doing a machine learning program and I am not sure how to split all the code. The program I am making takes two files and outputs a classificator. The steps that are performed are:

  1. Read two files into 2 dataframes
  2. Merge dataframes
  3. Fix columns, preprocess some strings and datetimes
  4. Extract features from objects
  5. Mark these features
  6. Drop infrequent features
  7. Perform feature selection and get ranking of features
  8. Mark left features after feature selection
  9. Drop not needed objects
  10. Build a model
  11. Check accuracy, if ok then return model, if not go back to step 6. and select other features.

This is quite a lot of code and I am not sure how to split it into files/classes.

I though about creating a file such as "preprocessing" where I put steps 1-5, "feature_selection" for steps 6,7, 8, 10 and "model_builing" for step 9. Do you think that it is ok?

Are there any patterns or techniques for designing such a procedural code?

Code is written in python using panas dataframes.

Aucun commentaire:

Enregistrer un commentaire