I'm trying to extract informations from an HTML page, like its last modification date, in a context where there are more than one way of declaring it, and those ways use non-uniform data (meaning a simple loop over fetched data is not possible).
The ugly task is as follow:
def get_date(html):
date = None
# Approach 1
time_tag = html.find("time", {"datetime": True})
if time_tag:
date = time_tag["datetime"]
if date:
return date
# Approach 2
mod_tag = html.find("meta", {"property": "article:modified_time", "content": True})
if mod_tag:
date = mod_tag["content"]
if date:
return date
# Approach n
# ...
return date
I wonder if Python doesn't have some concise and elegant way of achieving this through a `while" logic, in order to run fast, be legible and maintenance-friendly:
def method_1(html):
test = html.find("time", {"datetime": True})
return test["datetime"] if test else None
def method_2(html):
test = html.find("meta", {"property": "article:modified_time", "content": True})
return test["content"] if test else None
...
def get_date(html):
date = None
bag_of_methods = [method_1, method_2, ...]
i = 0
while not date and i < len(bag_of_methods):
date = bag_of_methods[i](html)
i += 1
return date
I can make that work right now by turning each approach from the first snippet in a function, append all functions to the bag_of_methods
iterable and run them all until one works.
However, those functions would be 2 lines each and will not be reused later in the program, so it just seems like it's adding more lines of code and polluting the namespace for nothing.
Is there a better way of doing this ?
Aucun commentaire:
Enregistrer un commentaire