vendredi 17 avril 2020

What Java design patterns are available for parsing a list of articles and the articles themselves?

I've created a java scraper which currently works absolutely fine. Being a long term hobby I want to make it more efficient or perhaps more "pleasing" to work with. And learn some design patterns along the way.

At the moment my application goes to a page and scrapes a list of urls leading to individual articles. There are many pages so this is scheduled every day.

Page
  |- Article URL
      |- Data
  |- Article URL
      |- Data
  |- Article URL
      |- Data
  |- Article URL    
      |- Data
Page
  |- Article URL
      |- Data
  |- Article URL
      |- Data
  |- Article URL
      |- Data
  |- Article URL
      |- Data

At the moment I have a 'Page Scraper' class which obtains a full list of article URL's and saves them to DB.

Then, many minutes later, this is completed and an 'Article Scraper' class loads the full list of URLS and scrapes the article data.

This seems hugely ungainly to me. Is there some design pattern which would lend itself to this structure?

Aucun commentaire:

Enregistrer un commentaire