Tuesday, April 24, 2012

As unstructured data heats up, will you need a license to webcrawl?



http://ow.ly/atQXd

An article by Stacey Higginbotham posted on the Gigaom website.

The article examines the impact that Unstructured Data is having upon the internet, and the author points out some areas of concern.

The article states, "Cheap computing and the ability to store a lot of data at a low cost have made the concept of big data a big business. But amid the frenzy to gather that data, especially unstructured information scraped from or accessed via crawling web sites, companies might be pushing the boundaries of polite (or ethical) behavior. They may also be stealing valuable IP. So is it stoppable and could the current solutions lead to the demise of the open web?"

The article looks at the trend of "scraping" of websites by companies that are trying to create databases to track information.  The article states, "But pinging a web site to grab its information exacts a toll on the site, and an overzealous crawler or hundreds of sites gathering data at any one time could create problems for the crawled site. A bunch of rapid web crawls can look similar to a denial of service attack, simply because the site has to respond to so many requests."

The article discusses recent trends where service providers block the ability to scrape their data, in order to keep more control and possibly even charge those entities that want access to certain data.  The article states, "

Most businesses recognize that their value is in owning the end user and the end users’ data, whether or not the end user herself recognizes that. Twitter’s massive valuation isn’t based on its platform, it’s based on the information it has about the tweets hitting its servers ever second.

Even a company like Yelp, which went public based on the content provided by users — content that it vigorously defended from Google’s indexing — is taking advantage of user-generated data to enrich itself. Protecting that asset from becoming scraped, commoditized and turned into revenue for others seems like a no-brainer."

No comments:

Post a Comment