The question is how to share data between objecs in a safe and maintainable manner.
Example: I've build the scrapy application which spawns numerous spiders. Although each spider is connected to separate pipeline object, I need to compare and sort the data between different pipelines (e.g. I need outputs sorted by different item attributes: prices, date etc.), so I need some shared data area. The same applies to spiders themselves (e.g. I need to count maximum total requests). The first implementation used class variables to shared data between between spiders/pipelines and instance variables for each object.
class MyPipeline(object):
max_price = 0
def process_item(self, item, spider):
if item['price'] > max_price :
max_price = item['price']
(The actual structures are more complex) Then I thought out that having a bunch of statics is not a f***ing OOP and the next solution is to have the private class data for each class and use to store values:
class MyPipelineData:
def __init__(self):
self.max_price = 0
class SpidersData:
def __init___(self, total_requests, pipeline_data):
self.total_requests = total_requests
self.pipeline_data = pipeline_data #the shared data between pipelines
class MyPipeline(object):
pipeline_data = None
def process_item(self, item, spider):
if _data is None:
_data = spider.data.pipeline_data #the shared data between pipelines
if item['price'] > _data.max_price :
_data.max_price = item['price']
class Spider(scrapy.spider):
def __init__(self, spider_data):
self._data = spider_data
# and the same object of SpiderData is passed to all spiders
Now I have one instance of data shared between all pipeplines (and the same for spiders). Am I generally correct with this? Should I apply same OOP approaches in python as in C++ ?
Aucun commentaire:
Enregistrer un commentaire