mardi 5 janvier 2016

OOP in python (related to scrapy)

The question is how to share data between objecs in a safe and maintainable manner.

Example: I've build the scrapy application which spawns numerous spiders. Although each spider is connected to separate pipeline object, I need to compare and sort the data between different pipelines (e.g. I need outputs sorted by different item attributes: prices, date etc.), so I need some shared data area. The same applies to spiders themselves (e.g. I need to count maximum total requests). The first implementation used class variables to shared data between between spiders/pipelines and instance variables for each object.

class MyPipeline(object):
max_price = 0

def process_item(self, item, spider):
if item['price'] > max_price : 
 max_price = item['price']

(The actual structures are more complex) Then I thought out that having a bunch of statics is not a f***ing OOP and the next solution is to have the private class data for each class and use to store values:

class MyPipelineData:
def __init__(self):
   self.max_price = 0

class SpidersData:
  def __init___(self, total_requests, pipeline_data):
    self.total_requests = total_requests
    self.pipeline_data = pipeline_data #the shared data between pipelines

class MyPipeline(object):
pipeline_data = None

def process_item(self, item, spider):
  if _data is None:  
     _data = spider.data.pipeline_data  #the shared data between pipelines  
  if item['price'] > _data.max_price : 
   _data.max_price = item['price']

 class Spider(scrapy.spider):
 def __init__(self, spider_data):
   self._data = spider_data
  # and the same object of SpiderData is passed to all spiders 

Now I have one instance of data shared between all pipeplines (and the same for spiders). Am I generally correct with this? Should I apply same OOP approaches in python as in C++ ?

Aucun commentaire:

Enregistrer un commentaire