dimanche 26 juin 2016

ruby pattern for file data collection and individual records

I'm looking for a suitable pattern for an application that uses various data files. The basic process will be, run the script, load the data file(s), do stuff, output a report or new data file (no alteration of input files).

The files are various formats but for arguments sake, say they are CSV - the format is not a factor here.

I am attracted to the ActiveRecord style pattern where you have a class representing a dataset. There are class methods through which you can retrieve data, and each record is an instance, with instance methods for each field.

In my case that would look something like

class People
  attr_accessor :first_name, :last_name, :addresss, :city, :country
  def self.load(file)
    rtn = self.new
    @cache = []
    # load data into @cache instance var, each row of data is an instance of self
    rtn
  end
  def all_people
    @cache
  end
  def people_in_city(city)
    # search @cache for matching records
  end
  def people_with_last_name(name)
    # search @cache for matching records
  end
  # etc, etc
end

This kind of works but it feels clunky. Each instance not only has an individual record data, but also a reference to all the data in the file (i.e. @cache). So this is a big break from ActiveRecord, where the actual data is stored elsewhere (i.e. in the database). In my case I want to load all the data, query it this way and that, and then drop out.

My other approach is to use two classes, one for the individual record and antother for the collection, e.g.

class Person
  attr_accessor :first_name, :last_name, :addresss, :city, :country
end
class PeopleCollection
  def initialize(file)
    @cache = []
    # load file and place each record into @cache as a Person instance
  end
  def all_people
    @cache
  end
  def people_in_city(city)
    # search @cache for matching records
  end
  def people_with_last_name(name)
    # search @cache for matching records
  end
  # etc, etc
end

I realize I could just use CSV::Row for my record class, or Hash, but I want to provide dot notation accessors. I also looked at OpenStruct by it relies on method_missing which bugs me in terms of performance impact (but not a deal breaker) and also a I want an exception raised when I hit record.some_missspelled_attribute_accezzor

Any tips appreciated...

Aucun commentaire:

Enregistrer un commentaire