vendredi 11 octobre 2019

selection of an adequate architecture/design pattern for minimal working example

I have been tasked with developing a script that reads in some data in csv format and some statistical parameters in .ini format, performs some statistical tests on that data, and outputs a new csv file with new columns containing results of the statistical tests for each row. One important functionality is to be able to select "ranges" of rows from the data such that the user has control over which range of columns is getting used for the statistical tests.

I have dumbed down my program (which is much more complex) to the following toy example. The toy example makes no sense from a scientific point of view but I think it illustrates the functionality I am trying to achieve really well:

The python code is as follows:

from configobj import ConfigObj
import pandas as pd
from scipy import stats

config = ConfigObj('input_mwe.ini')
crit_slope = config.get('data').as_float('critical_slope')

start_col = config['subset']['start_col']
end_col = config['subset']['end_col']

t = list(range(int(start_col), int(end_col)+1))

intercept_list = []
slope_list = []
decision_list = []

df = pd.read_csv('input.csv')

for idx, row in df.iterrows():
    y = list(row[start_col:end_col])
    slope, intercept, r_value, p_value, std_err = stats.linregress(t, y)
    intercept_list.append(intercept)
    slope_list.append(slope)
    if slope<crit_slope:
        decision_list.append('accept')
    else:
        decision_list.append('reject')

df['intercept'] = intercept_list
df['slope'] = slope_list
df['decision'] = decision_list

df.to_csv("output.csv")

input_mwe.ini has the following content:

[data]
    critical_slope=0.2
[subset]
    start_col=3
    end_col=8

input.csv has the following content

Point_id,0,1,2,3,4,5,6,7,8,9
1,5,4,6,8,5,4,1,8,7,4
2,2,5,6,8,7,4,5,6,2,4
3,5,7,8,4,5,6,8,7,4,1
4,5,4,6,6,6,8,7,8,6,5
5,5,4,4,4,5,8,7,9,5,2

The output is as follows

,Point_id,0,1,2,3,4,5,6,7,8,9,intercept,slope,decision
0,1,5,4,6,8,5,4,1,8,7,4,5.3428571428571425,0.02857142857142857,accept
1,2,2,5,6,8,7,4,5,6,2,4,10.36190476190476,-0.9142857142857143,accept
2,3,5,7,8,4,5,6,8,7,4,1,4.40952380952381,0.22857142857142856,reject
3,4,5,4,6,6,6,8,7,8,6,5,6.0476190476190474,0.14285714285714282,accept
4,5,5,4,4,4,5,8,7,9,5,2,3.819047619047619,0.45714285714285713,reject

Now in reality I am performing many more statistical tests, and the decision to accept or reject is based on other parameters. Also the method to subset the columns from the input csv is slightly different.

However, this is irrelevant to my question. I would like to come up with a simple, well recognized and well tested design pattern which can allow me to replicate the above functionality using object oriented code. I have very little experience coming up with design patterns (although I have worked extensively with classes and objects before and know how to build them).

What would be a good, practical starting point to refactor the above into object-oriented code? What is a suitable design pattern to use? Can I use more than one design pattern?

Thanks a lot for your help as I am eager to learn more about organizing my code in this manner.

Aucun commentaire:

Enregistrer un commentaire