design-patterns: Pandas flexible determination of metrics

jeudi 5 mars 2020

Pandas flexible determination of metrics

Imagine we have different structures of dataframes in Pandas

# creating the first dataframe 
df1 = pd.DataFrame({
  "width": [1, 5, 7, 8], 
  "height": [5, 8, 4, 3]})

# creating second dataframe
df2 = pd.DataFrame({
  "a": [7, 8, 9, 10], 
  "b": [11, 23, 1, 5],
  "c": [1, 3, 4, 5]})

In general there might be more than 2 dataframes. Now, I want to create a logic that is mapping columns names to specific functions to create a new column "metric" (think of it as area for two columns and volume for 3 columns). I want to specify column names ensembles

column_name_ensembles = {
    "1": {
       "ensemble": ['height', 'width'],
       "method": area}
    "2": {
       "ensemble": ['a', 'b', 'c'],
       "method": volume}}

Now, the area function create a new column for the dataframe df1['metric'] = df1['height'] * df2['widht'] and the volumen function will create a new column for the dataframe df2['metic'] = df2['a'] * df2['b'] * df2['c']. Note, that the functions can have arbitrary form but it takes the ensemble as parameters. The perfect solution would be a function that takes the dataframe and the column_name_ensembles as parameters and returns the dataframe with the appropriate 'metric' added to it.

I know this can be achieved by multiple if and else statements, but this does not seem to be the most intelligent solution. Maybe there is a design pattern that can solve this problem, but I am not an expert at design patterns.

Thank you for reading my question! I am looking forward for your great answers.

design-patterns

jeudi 5 mars 2020

Pandas flexible determination of metrics

Aucun commentaire:

Enregistrer un commentaire