Scraping is a technique whereby a website or tool extracts select information from another website. For example, when you type a difficult word into Google and instantly see a definition for that word from some Web dictionary, that's because Google "scraped" that information from the dictionary. The term (and practice) has some negative connotations, because scrapers can be used to steal blog content and for other nefarious purposes. But as Google (and many others) show, it doesn't have to be used for evil.
Scraper is a Chrome add-on that somewhat simplifies the process of scraping information from a webpage. I say "somewhat" because you still have to know XPath or jQuery selector syntax, but that's pretty easy to master if you know a bit of HTML.
Once the extension is installed, you can right-click any page element and select "Scrape similar". You then get a window with a single table row, and the XPath expression leading to the exact element you've selected. It is now up to you to generalize that expression so as to get a broader match. As you can see in the screenshot, it was quite simple to write an expression that yields all of the post headlines and links on the DLS homepage.
Once you have the information you wanted, you can click Export to Google Docs and get a spreadsheet with all of your data in tabular form, ready for further processing.
Scraper is very much a work in progress; while jQuery selector syntax is supported, you only get the XPath expression filled in automatically (and it's too specific). Also, it does not support saving expressions as templates, so every time you want to use it, you have to re-build the XPath expression you need. I'm sure this will all get fixed in due time.
Filed under: Web services, Google
Scraper for Chrome lets you easily create spreadsheets from Web data originally appeared on Download Squad on Mon, 22 Nov 2010 14:30:00 EST. Please see our terms for use of feeds.
Read | Permalink | Email this | Comments
EMC ELECTRONICS FOR IMAGING ELECTRONIC DATA SYSTEMS ELECTRONIC ARTS ECLIPSYS
No comments:
Post a Comment