I am a scientist trying to learn better software engineering practices, so that my analysis scripts are more robust and easily useable by others. I am having trouble finding the best pattern for the following problem. Is there an OOP framework for easy subsetting by instance attributes? For instance:
- I have a large table of vehicle trajectories over time
[[x, y]]
. These are different drivers in different cars, time-trialing on a track.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Form': {0:'SUV', 1:'Truck', 2:'SUV', 3:'Sedan', 4:'SUV', 5:'Truck'},
'Make': {0:'Ford', 1:'Toyota', 2:'Honda', 3:'Ford', 4:'Honda', 5:'Toyota'},
'Color': {0:'White', 1:'Black', 2:'Gray', 3:'White', 4:'White', 5:'Black'},
'Driver age': {0:25, 1:37, 2:21, 3:54, 4:50, 5:67},
'Trajectory': {0: np.array([[0, 0], [0.25, 1.7], [1.2, 1.8], [4.5, 4.0]]),
1: np.array([[0, 0], [0.15, 1.3], [1.6, 1.3], [4.2, 4.1]]),
2: np.array([[0, 0], [0.24, 1.2], [1.3, 1.6], [4.1, 3.9]]),
3: np.array([[0, 0], [0.45, 1.6], [1.8, 1.8], [4.2, 4.6]]),
4: np.array([[0, 0], [0.85, 1.9], [1.5, 1.7], [4.5, 4.3]]),
5: np.array([[0, 0], [0.35, 1.8], [1.5, 1.8], [4.6, 4.1]]),
}
})
- A function takes as input a trajectory, and analyzes it over time. eg.
def avg_velocity(trajectory):
v = []
for t in range(len(trajectory) - 1):
v.append(trajectory[t+1] - trajectory[t])
return np.mean(v)
- I'd like to write a program that can analyze the trajectories of a particular subset eg. the average velocities of all drives on SUVs, the average velocities of all drives on white vehicles.
My current solution is to:
- I use a Pandas Dataframe to store the table.
- I subset by different criteria (ie.
df.groupby(by=['Form'])
)
- Iterating over this list of dataframes, I pass each trajectory to a function
avg_velocity(trajectory)
.
- I store the results in a nested dictionary ie.
results[vehicle_make][vehicle_form][vehicle_color][driver_age]
.
- To retrieve information, I access the nested dictionary by key.
A natural OOP way might be to create a class Drive
with many attributes (ie. Drive.make, Drive.form, Drive.age, ...
etc.).
class Drive:
def __init__(self, form, make, color, age, trajectory):
self.form = form
self.make = make
self.color = color
self.age = age
self.trajectory = trajectory
...
However, I am not sure how to quickly subset by a particular criteria when each drive has been separated into different instances. Say I suddenly want to plot the average velocity of all drives by Toyotas
. I'd have to iterate through a list of Drive
instances, check if Drive.make == 'Toyota'
. Is this a common problem with OOP?