A Little Class on the Internet: Part I: Contagion of Scraping

Monday, September 23, 2013

Part I: Contagion of Scraping – Has It Gone Too Far?

According to Internet World Statistics, the number of Internet users has increased by over 550% from 360 million users in 2000 to 2.4 billion users 2012. While these statistics may not come as a total surprise, the explosion in the number of people with access to the Internet via mobile devices has led the number of users to double over the past five years. The proliferation of Internet access has led to exponential growth in the amount of content on the Internet, creating a repository of "publicly available" information. This includes not only news, business, and financial information, but also personal data via movie, restaurant and hotel reviews (e.g. Yelp, Angie’s List, and TripAdvisor). This same technological explosion; however, has made it easy to extract this data for commercial use and sale—and to do so for free and without authorization. This data extraction, commonly referred to as "scraping," creates legal issues and concerns for both sides of this issue—those who want to scrape (love it), and those who want to protect against it (hate it).

Scraping of data is extremely common in today’s Web environment. Start-ups love the ability to scrape because it’s a cheap and powerful way to gather data without the need for partnerships. For example, Airbnb built its business by scraping data from Craigslist and posting it to their own site, leading to the receipt of a formal “cease and desist” letter. In the travel industry, ~30% of the website traffic is traced to web scraping bots, according to a study by Distil Networks. On the other hand, big corporations also use web scrapers to collect data for their own benefit; however, they don’t want others to use bots against them. Further, the “contagion” of scraping data has spread to the investment community. Recently, a former colleague employed by a hedge fund admitted that he routinely hires Ph.D.’s to scrape company websites. By conducting this level of diligence, he is able to refine investment theses for companies that derive a meaningful amount of sales from online traffic. By scraping data on a daily basis from a given company’s website, he is able to observe, amongst other things, the number of visitors, enabling him to modify financial forecasts to more accurately predict sales over a given quarter or year.

A Little Class on the Internet

Monday, September 23, 2013

Part I: Contagion of Scraping – Has It Gone Too Far?

No comments:

Followers

Useful Resource Links