I am doing a machine learning program and I am not sure how to split all the code. The program I am making takes two files and outputs a classificator. The steps that are performed are:
- Read two files into 2 dataframes
- Merge dataframes
- Fix columns, preprocess some strings and datetimes
- Extract features from objects
- Mark these features
- Drop infrequent features
- Perform feature selection and get ranking of features
- Mark left features after feature selection
- Drop not needed objects
- Build a model
- Check accuracy, if ok then return model, if not go back to step 6. and select other features.
This is quite a lot of code and I am not sure how to split it into files/classes.
I though about creating a file such as "preprocessing" where I put steps 1-5, "feature_selection" for steps 6,7, 8, 10 and "model_builing" for step 9. Do you think that it is ok?
Are there any patterns or techniques for designing such a procedural code?
Code is written in python using panas dataframes.
Aucun commentaire:
Enregistrer un commentaire