mercredi 2 août 2017

E-commerce store web scraper architecture

I'm building a web scraper in Ruby that gathers product information from different stores. Right now it works like so:

  1. A Request object (we'll call this request) is created with args store and sku.
  2. When the process method is called on the request, the following happens:

    A. An instance of the scraping/parsing library (Nokogiri) is created and the html page for the item belonging to the store and sku from above is received as a new object (we'll call this object nokogiri_response.

    B. An instance of Response (we'll call this response) is created with nokogiri_response injected as a dependency by calling Response.new(nokogiri_response).

    C. The to_h method is called on response, which creates a hash by calling the following methods:

    def title
      @nokogiri_response.at('div[@id="some-store-title"]')
    end
    
    def price
      @nokogiri_response.at('div[@id="bby-price-main"]')
    end
    
    def in_stock
      @nokogiri_response.at('div[@id="add-to-cart"]')
    end
    
    

    and returns a hash that looks like the following:

    {
      title: 'Design Patterns: Elements of Reusable Object-Oriented Software', 
      price: 37.48, 
      in_stock: true
    }
    
    

    This hash is then returned by the process method called by request.

My question is, how can I design this the best possible way so it can work for more than 300 stores? Remember that each store will have different CSS selectors for the title, price, and in_stock methods on the response object.

Aucun commentaire:

Enregistrer un commentaire