WHAT IS DATA SCRAPING AND HOW CAN YOU USE IT?
Data scraping, also known as web scraping, is the process of posting information from a website into a spreadsheet or local file saved on your scraping google computer. It’s one of the most efficient techniques for getting data from the web, and in some cases to route that data to another website. Popular uses of data scraping include:
Research for web content/business learning ability
Pricing for travel booker sites/price comparison sites
Finding sales leads/conducting market research by crawling public data sources (e. gary. Yell and Twitter)
Sending product data from an e-commerce site to another online vendor (e. gary. Google Shopping)
And that list’s just marring the surface. Data scraping has a vast number of applications – it’s useful in just about any case where data needs to be moved collected from one of destination to another.
The basics of data scraping are not too difficult to find out. Let’s go through how to set up a simple data scraping action using Surpass.
Data Scraping with dynamic web queries in Microsoft Surpass
Setting up a dynamic web query in Microsoft Surpass is an easy, versatile data scraping method that enables you to set up a data feed from an external website (or multiple websites) into a spreadsheet.
Watch this phenomenal tutorial video to learn how to importance data from the web to Surpass – or, if you prefer, use the written instructions below:
Open a new workbook in Surpass
Click the cell you want to importance data into
Click the ‘Data’ loss
Click ‘Get external data’
Click the ‘From web’ symbol
Note the tiny yellow arrows that appear to the top-left of web page and alongside certain content
Gravy the URL of the web page you want to importance data from into the address bar (we recommend choosing a site where data is shown in tables)
Click the yellow arrow next to the data you would like to importance
An ‘Import data’ talk box arises
Click ‘OK’ (or change the cell selection, if you like)
If you’ve followed these steps, you should now be able to see the data from the website set out in your spreadsheet.
The great thing about dynamic web queries is that they don’t just importance data into your spreadsheet as a one-off operation – they feed it in, meaning the spreadsheet is regularly updated with the latest version of the data, as it appears on the source website. That’s why we call them dynamic.
To configure how regularly your dynamic web query updates the data it imports, go to ‘Data’, then ‘Properties’, then buy a frequency (“Refresh every X minutes”).
Automated data scraping with tools
Getting to grips with using dynamic web queries in Surpass is a useful way to gain a preliminary understanding of data scraping. However, if you intend to use data regularly scraping in your work, you may find a dedicated data scraping tool more effective.
Here are our applying for grants a few of the most popular data scraping tools on the market:
Data Scraper (Chrome plugin)
Data Scraper pai gow poker straight into your Chrome cell phone browser extensions, allowing you to choose from a range of ready-made data scraping “recipes” to get data from whichever web page is loaded in your cell phone browser.
This tool works especially well with popular data scraping sources like Twitter and Wikipedia, as the plugin includes a greater variety of recipes sources of such sites.
We tried Data Scraper out by mining a Twitter hashtag, “#jourorequest”, for PAGE RANK opportunities, using one of the tool’s public recipes. Here’s a flavour of the data we got in:
DataMiner Output example
As you can see, the tool has provided a table with the username of a account which had posted recently on the hashtag, plus their twitter update and its URL
Having this data in this format would be more useful to a PAGE RANK representative than simply seeing the data in Twitter’s cell phone browser view for a number of reasons:
It could be used to help create a database of press contacts
You could keep referring back to this list and easily find what you’re looking for, whereas Twitter continuously updates
The list is sortable and editable
It gives you ownership of the data – which could be used real world or changed at at any time
We’re impressed with Data Scraper, even though its public recipes are sometimes slightly rough-around-the-edges. Try installing the free version on Chrome, and have a mess around with extracting data. Be sure to watch the release movie they provide to get an idea of how the tool works and some simple ways to get the data you want.
WebHarvy is a point-and-click data scraper with a free trial version. Its biggest selling point is its flexibility – you can use the tool’s in-built browser to demand data you would like to importance, and can then create your own mining specifications to get exactly what you need from the source website.
Importance. io is a feature-rich data mining tool suite that does much of the hard work for you. Has some interesting features, including “What’s changed? ” reports that can alert you of updates to specified websites – ideal for in-depth competition analysis.
How are marketers using data scraping?
As you will have gathered by this point, data scraping can come in handy just about anyplace where information is used. Here are some key examples of how the technology is being as used by marketers:
Gathering disparate data
One of the great advantages of data scraping, says Marcin Rosinski, CEO of FeedOptimise, is that it will also help you gather different data into one place. “Crawling we can take unstructured, dispersed data from multiple sources and collect it in one place and make it structured, ” says Marcin. “If you have multiple websites controlled by different entities, you can combine it all into one feed.
“The selection of use cases for this is assets. ”
FeedOptimise offers a wide variety of data scraping and data feed services, which you can find out about at their website.
The simplest use for data scraping is retrieving data from a single source. If there’s a web page that contains lots of data that could be useful to you, the easiest way to get that information onto your computer in an tidy format will probably be data scraping.
Try finding a list of useful contacts on Twitter, and importance the data using data scraping. This will give you a taste of how the process can fit into your everyday work.
Outputting an XML feed to alternative sites
Feeding product data from your site to Google Shopping and other alternative sellers is a key application of data scraping for e-commerce. It allows you to automate the potentially lengthy process of updating your product details – which is crucial if your stock changes often.
“Data scraping can output your XML feed for Google Shopping, ” says Target Internet’s Marketing Director, Ciaran Rogers. “ I have worked with a number of trusted online retailers retailer who were continually adding new SKU’s to their site as products came into stock. If your E-commerce solution doesn’t output a suitable XML feed that you can sleep to your Google Merchant Middle so you can advertise your best products that can be an issue. Often your latest products are potentially the best sellers, so you want to get them advertised as soon as each goes live. I’ve used data scraping to produce up-to-date listings to feed into Google Merchant Middle. It’s a great solution, and actually, there is so much you can do with the data once you have it. Using the feed, you can tag the best renovating products on a daily basis so you can share that information with Google Adwords and ensure you bid more competitively on those products. Once you set it up its all quite automated. The flexibility a good feed you have control of in this way is great, and it can lead to some very definite improvements in those campaigns which clients love. ”
It’s possible to set up a simple data feed into Google Merchant Middle for yourself. Here’s how it’s done:
How to set up a data feed to Google Merchant Middle
Using one of the techniques or tools described previously, create a file that uses a dynamic website query to importance the details of products listed on your site. This file should automatically update at regular time periods.
The details should be set out as specified here.
Transfer this file to a password-protected URL
Go to Google Merchant Middle and log in (make sure your Merchant Middle account is properly set up first)
Go to Products
Click the plus button
Enter your target country and create a feed name
Opt for the ‘scheduled fetch’ option
Add the URL of your product data file, along with the username and password required to access it
Opt for the retrieve frequency that best matches your product transfer schedule
Your product data should now be available in Google Merchant Middle. Just make sure you Go through the ‘Diagnostics’ loss to check it’s status and ensure it’s all working easily.
The dark side of data scraping
There are many positive uses for data scraping, but it does get over used by a small community too.
The most prevalent punishment of data scraping is email growing – the scraping of data from websites, social media and directories to uncover people’s email addresses, which are then sold on to spammers or scammers. In some jurisdictions, using automated means like data scraping to harvest email addresses with commercial intent is illegal, and it is almost universally considered bad marketing practice.
Many web users have followed techniques to help reduce the risk of email harvesters getting hold of their email address, including:
Address munging: changing the format of your email address when posting it publicly, e. gary. typing ‘patrick[at]gmail. com’ instead of ‘[email protected] com’. This is an easy but slightly unreliable approach to protecting your email address on social media – some harvesters will search for various munged combining as well as emails in a normal format, so it’s not entirely airtight.
Contact forms: using a contact form instead of posting your email address(es) on your website.
Images: if your email address is presented in image form on your website, it will be beyond the technological reach of all people involved in email growing.
The data Scraping Future
Whether or not you intend to use data scraping in your work, it’s advisable to educate yourself on the subject, as it’s likely to become even more important next few years.
These days there are data scraping AI on the market that can use machine finding out carry on recovering at recognising inputs which only humans have traditionally had the oppertunity to misinterpret – like images.
Big improvements in data scraping from images and videos will have far-reaching consequences for digital marketers. As image scraping becomes more in-depth, we’ll be able to know far more about online images before we’ve seen them ourselves – and this, like text-based data scraping, will help us do lots of things better.
Then there’s the biggest data scraper of all – Google. The whole experience of web search is going to be transformed when Google can accurately infer as much from an image as it could from a page of copy – and that goes double from a digital marketing perspective.
If you’re in different doubt over whether this can happen in the future, try out Google’s image handling API, Cloud Vision, and let us know what you think.