The overall objective of a task is to select a subgroup of people from the entire group of people. The selection process will be based on some criteria.
Say, the entire group of people comes from 2 different databases.
1) Data Hosp – contains all hospitalization records of patients from year 2000-2010, each row represents a hospitalization record. One patient can have multiple hospitalization records (multiple rows on different occasions).
2) Data Pharm – contains all drug prescription information of patients from year 2000-2010, each row represents a prescription record. One patient can have multiple prescription records.
I want to use different rules for the selection process at different times. For example, we can select the subgroup if patients meet any of the following rule-based conditions:
1) Select if a patient has three or more hospitalizations in 2000-2010,
2) Select if a patient has three or more prescription in 2000-2010,
3) Select if a patient has 1) but not 2),
4) Select if a patient has 1) and 2),
5) Select if a patient has 2) but not 1)… etc
I am simplifying the complexity of the rules drastically, so in practice, I would like to create my own python module called “select_patient” so that I don’t have to rewrite similar codes that many times.
I have a fairly good understanding in python basic and intermediate concepts including creating function and simple module, but I haven’t created very complex modules yet, so I don’t know what the best path is to construct this module. I also don't know if I may be unaware of some of the more advance concepts in python that are necessary to create what I want to create.
Also, how do I create a module that’s adaptive to slight variations? For example, in the above conditions, criterion 1) only involves Data Hosp, while criteria 3-5) involve both databases. Furthermore, say if I want to modify condition 1) by adding another condition such that the hospitalization has to occur in urban areas (and not rural) areas.
One design approach I am currently trying is to create a big module in which the parameters will spell out all the conditions in parameters. For example
class select_patients(param1, param2, param3, param4… param20):
def __init__():
def test_cond1():
def test_cond2():
The vast number of parameters allow me to specify under what user-driven circumstance, the different condition will be applicable. But I found this to be very burdensome to specify so many parameters as soon as from the get-go.
Another approach I am thinking of is a piece-meal approach, where I am breaking down the tasks and creating smaller functions first. For example
def test_cond1():
def test_cond2():
def select_patients(param1, param2…):
if XYZ: # apply test_cond1()
else: # apply test_cond2()
Is one approach more preferable than the other?
Since I haven’t used inheritance and iterator/generator (or other advance python concepts) before, are they potentially useful for what I am trying to do?