mercredi 17 mai 2023

How to best query from a csv repeatedly throughout the day in a simulation code

I'm trying to write a simulation code in python. This simulation code relies on inputs for a large csv file, and there is a separate csv file for each day in the simulation. I need to make numerous queries (the queries are based on time, which are columns in the csv file) each simulation day.

I'm thinking of using pandas.read_csv to read this in as a dataframe, and store the result and then query from this dataframe. One coding requirement is I don't want the dataframe stored at the query site.

I think the easiest way to do this is with a class, e.g.,

import pandas as pd
class DailyCSVLoader:
  def __init__(filepath):
    self.df = pd.read_csv(filepath)
  def query(time):
    # return the rows corresponding to time

with usage:

import datetime

path = "/path/to/csv/file/filename.csv"
time = datetime.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
loader = DailyCSVLoader(path)
loader.query()

However, for my particular codebase, it might be slightly easier to do this outside of a class and with just a function and perhaps a static variable that holds the dataframe, e.g.,

import pandas as pd

# because I don't want the calling site to store df, I decided to keep it as a static variable here
def daily_csv_loader(filepath):
  daily_csv_loader.df = pd.read_csv(filepath)


def query(time, df):
  # return rows from df corresponding to time

with usage

import datetime

path = "/path/to/csv/file/filename.csv"
time = datetime.datetime(year=2020, month=1, day=1, hour=12, minute=2, second=0)
daily_csv_loader(filepath)
query(time, daily_csv_loader.)

Are there any other approaches here?

Aucun commentaire:

Enregistrer un commentaire