I'm building a web scraper in Ruby that gathers product information from different stores. Right now it works like so:
- A
Requestobject (we'll call thisrequest) is created with argsstoreandsku. -
When the
processmethod is called on therequest, the following happens:A. An instance of the scraping/parsing library (Nokogiri) is created and the html page for the item belonging to the
storeandskufrom above is received as a new object (we'll call this objectnokogiri_response.B. An instance of
Response(we'll call thisresponse) is created withnokogiri_responseinjected as a dependency by callingResponse.new(nokogiri_response).C. The
to_hmethod is called onresponse, which creates a hash by calling the following methods:def title @nokogiri_response.at('div[@id="some-store-title"]') end def price @nokogiri_response.at('div[@id="bby-price-main"]') end def in_stock @nokogiri_response.at('div[@id="add-to-cart"]') endand returns a hash that looks like the following:
{ title: 'Design Patterns: Elements of Reusable Object-Oriented Software', price: 37.48, in_stock: true }This hash is then returned by the
processmethod called byrequest.
My question is, how can I design this the best possible way so it can work for more than 300 stores? Remember that each store will have different CSS selectors for the title, price, and in_stock methods on the response object.
Aucun commentaire:
Enregistrer un commentaire