I'm trying to learn python development, and I have been reading on topics of architectural pattern and code design, as I want to stop hacking my code. I am currently implementing an application, and I know it has a problematic structure as you'll see, but I don't know how to change it for better.
I'm implementing a webcrawler that will input it's information in a mongoDB instance.
So I this is my general structure:
Spiders
crawlers.py
connections.py
utils.py
__init__.py
crawlers.py implements a class of type Crawler
, and each specific crawler inherits it. Each Crawler has an attribute table_name, and a method: crawl
. In connections.py, I implemented a pymongo
driver to connect to the DB. It expects a crawler as a parameter to it's write
method. Now here come's the trick part... the crawler2 depends on the results of crawler1, so I end up with something like this:
from pymongo import InsertOne
class crawler1(Crawler):
def __init__(self):
super().__init__('Crawler 1', 'table_A')
def crawl(self):
return list of InsertOne
class crawler2(Crawler):
def __init__(self):
super().__init__('Crawler 1', 'table_A')
def crawl(self, list_of_codes):
return list of InsertOne # After crawling the list of codes/links
and then, in my connections, I create a class that expects a crawler.
class MongoDriver:
def __init__.py
self.db = MongoClient(...)
def write(crawler, **kwargs):
self.db[crawler.table_name].bulk_write(crawler.crawl(), **kwargs)
def get_list_of_codes():
query = {}
return [x['field'] for x in self.db.find(query)]
and so, here comes the (biggest) problem (because I think there are many other, some of which I can barely grasp, and others that I'm still totally blind to): the implementation of my connections needs context of the crawler!! For example:
mongo_driver = MongoDriver()
crawler1 = Crawler1()
crawler2 = Crawler2()
mongo_driver.write(crawler1)
mongo_driver.write(crawler2, mongo_driver.get_list_of_codes())```
How would one go about solving it? And what else is particularly worrysome in this construct? Thanks for the feedback!
Aucun commentaire:
Enregistrer un commentaire