I found out something surprising today: there’s no copyright protection for raw data. Here’s my source:
Now of course this is one source, and like most things involving U.S. law, it’s debatable, but the fact that there is precedent to support the fact adds some weight to the argument.
If this is true it opens up some interesting areas of discussion around Big Data. One of the key tenants to most big data architectures is a focus on capturing “raw data”, data that hasn’t been selected, coordinated or arranged. The idea behind this is that it allows you to defer decision-making about what data you need and what data you don’t. By capturing all the data raw, you have everything available to you if at a later time you want to extract something unexpected, or correlate the data a new way.
However, in light of the legal explanation above, this means that companies who are operating Big Data initiatives are currently amassing large pools of data for which traditional legal protections do not apply. This data is currently bought-and-sold for-profit among many different companies and agencies; I wonder if they understand that they don’t own it?
The barrier, of course is access. While you and I might be able to use say, Target’s customer database without violating copyright, we would first need (legal) access to it. This seems like a pretty secure barrier, but maybe there’s a loophole. If such a loophole exists, the investment these companies have made in collecting and managing this raw data may not provide the return they expect.