jeudi 25 février 2016

Python: OOP methods to make code re-usable and readable?

I have a python script, which creates custom filters for dataframes in pandas. These dataframes contain eventlogs with three columns, client, event, and next (event). client has recurring values, which on every occurance (might) have different values of event and next. I want to find all rows containing a value of client corresponding to filter values of (either of) the other two columns to either single them out, or leave them out. This is captured by a FilterType enumerator, which can be 0 or 1 for selecting values (select), or leaving them out (hide).

Right now, I've created three sample filters (Filter_HasEvent, Filter_EventToNext, and Filter_EventFirst), which respectively find all client's if for any occurence of client the given value of event is attained, a given combination of event and next is attained, or if the given value of event occurs as the first event for client.

I know dataframes have a filter function of their own, however, I would also like to apply multiple filters to the same dataframe using both unions and intersections. Hence, why I create a boolean mask for the dataframe, which can be combined, before application.

Now, for my question: By using classes, I currently have to initiate a filter for every value of event, and combination of event and next there is. What's more, the filter type is now selected for each filter using an if-elseconstruction. This becomes increasingly illegible if I want to add more filter types. Is there a nicer (i.e. more oop) way of defining these filters and implementing the different types, e.g., using decorators or design patterns?
Another issue I have, is that when I add more filters (e.g., filter_EventLast), I have to implement all filter types separately for this filter, because they are not reusable. Ideally, I would like to make a template, which would redirect to the appropriate filter, depending on the input received, like in Java. Is this possible?

Below is the sample code, what it looks like now:

import abc
from enum import Enum

import numpy as np
import pandas as pd

np.random.seed(64951)
client = [j for j in range(10) for i in range(10)]
event = pd.Series(np.random.choice(range(10), len(client)))
next = event - event.diff(-1)
rand_df = pd.DataFrame({
    'client': client, 
    'event': event, 
    'next': next
})


class FilterType(Enum):
    select = 0
    hide = 1


class FilterTemplate(object):
    def __init__(self, value, filter_type):
        self.value = value
        self.filter_type = filter_type
        self.mask = None

    @abc.abstractmethod
    def getMask(self, df):
        raise Exception('Method needs to be implemented!')

    @abc.abstractmethod
    def apply(self, df):
        raise Exception('Method needs to be implemented!')


class Filter_HasEvent(FilterTemplate):
    def __init__(self, value, filter_type):
        super().__init__(value, filter_type)

    def getMask(self, df):
        client_in_event = df[df.event == self.value].client.unique()
        if self.filter_type == FilterType.select:
            self.mask = df.client.isin(client_in_event)
        elif self.filter_type == FilterType.hide:
            self.mask = ~df.client.isin(client_in_event)
        return self.mask

    def apply(self, df):
        return df[self.getMask(df)]


class Filter_EventToNext(FilterTemplate):
    def __init__(self, value, next_value, filter_type):
        super().__init__(value, filter_type)
        self.next_value = next_value

    def getMask(self, df):
        client_in_eventtonext = df[(df.event == self.value) & (df.next == self.next_value)].client.unique()
        if self.filter_type == FilterType.select:
            self.mask = df.client.isin(client_in_eventtonext)
        elif self.filter_type == FilterType.hide:
            self.mask = ~df.client.isin(client_in_eventtonext)
        return self.mask

    def apply(self, df):
        return df[self.getMask(df)]


class Filter_EventFirst(FilterTemplate):
    def __init__(self, value, filter_type):
        super().__init__(value, filter_type)

    def getMask(self, df):
        client_unique = pd.Series(sorted(df.client.unique()), index=sorted(df.client.unique()))
        client_has_event_first = client_unique[df.groupby('client').event.first() == self.value]
        if self.filter_type == FilterType.select:
            self.mask = df.client.isin(client_has_event_first)
        elif self.filter_type == FilterType.hide:
            self.mask = ~df.client.isin(client_has_event_first)
        return self.mask

    def apply(self, df):
        return df[self.getMask(df)]

filter1 = Filter_HasEvent(1, FilterType.hide)
mask1 = filter1.getMask(rand_df)
filter2 = Filter_EventToNext(1, 2, FilterType.hide)
mask2 = filter2.getMask(rand_df)

print(filter1.apply(rand_df))
print('')
print(rand_df[mask1 | mask2])

Aucun commentaire:

Enregistrer un commentaire