design-patterns: E-commerce store web scraper architecture

mercredi 2 août 2017

E-commerce store web scraper architecture

I'm building a web scraper in Ruby that gathers product information from different stores. Right now it works like so:

A Request object (we'll call this request) is created with args store and sku.
When the process method is called on the request, the following happens:

A. An instance of the scraping/parsing library (Nokogiri) is created and the html page for the item belonging to the store and sku from above is received as a new object (we'll call this object nokogiri_response.

B. An instance of Response (we'll call this response) is created with nokogiri_response injected as a dependency by calling Response.new(nokogiri_response).

C. The to_h method is called on response, which creates a hash by calling the following methods:
```
def title
  @nokogiri_response.at('div[@id="some-store-title"]')
end

def price
  @nokogiri_response.at('div[@id="bby-price-main"]')
end

def in_stock
  @nokogiri_response.at('div[@id="add-to-cart"]')
end
```
and returns a hash that looks like the following:
```
{
  title: 'Design Patterns: Elements of Reusable Object-Oriented Software', 
  price: 37.48, 
  in_stock: true
}
```
This hash is then returned by the process method called by request.

My question is, how can I design this the best possible way so it can work for more than 300 stores? Remember that each store will have different CSS selectors for the title, price, and in_stock methods on the response object.

design-patterns

mercredi 2 août 2017

E-commerce store web scraper architecture

Aucun commentaire:

Enregistrer un commentaire