- I am making a script which scrapes many news websites and return articles.
- There is a class called
Website
with attributesname
,url
, andarticles
. EachWehsite
subclass has a unique method forget_articles
. - My goal is to then collect all the articles from websites and return a dictionary of type
{name: articles}
. So here's what I have so far.
Specific questions:
- I believe this is a Factory Pattern problem. Is that the best way to go about this solution?
- Are there any best practices for what
self.name
,self.url
, andself.articles
should be for the firstWebsite
class? I just set them all to a placeholder value as they will all be overwritten.
class Website:
"""The parent Website class"""
def __init__(self):
self.name = "X"
self.url = "X"
self.articles = "X"
def get_articles(self):
return "N/A"
def get_all_subclasses(cls):
all_subclasses = []
for subclass in cls.__subclasses__():
all_subclasses.append(subclass)
return all_subclasses
class Website1(Website):
"""Implementation for Website1"""
def __init__(self):
self.name = 'Website1'
self.url = 'www.google.com'
self.articles = self.get_articles()
def get_articles(self):
# insert custom logic to get articles based on url
articles = ['article 1', 'article 2', 'articl 3']
return articles
class Website2(Website):
"""Implementation for Website2"""
def __init__(self):
self.name = 'Website2'
self.url = 'www.google.com'
self.articles = self.get_articles()
def get_articles(self):
# insert custom logic to get articles based on url
articles = ['article 1', 'article 2', 'article 3']
return articles
site_articles = {}
websites = Website.get_all_subclasses(Website)
for site in websites:
site_articles[site().name] = site().articles
Aucun commentaire:
Enregistrer un commentaire