Big data is here and the question is whether we will consume it, or it will consume us? Likely, the answer will be a bit of both. The dystopian view of our big data future is somewhere between those imagined in George Orwell’s 1984, 2002’s Minority Report, or Gattaca from 1997. The Star Trek series and movies demonstrate a more utopian view of our future at least in some respects, one where medicine is bloodless, fast and effective, and no one seems to be wanting yet everyone remains motivated.
We’ve all heard of big data – the term is everywhere today. And we all have a general sense that it refers to huge databases of information. After that, most of us get a little murky in our thinking. And with good reason, because beyond that, things do get much less clear.
The Good, the Bad and the Silly
While not as interesting as a spaghetti western, big data merits serious consideration and will likely have even bigger consequences for our lives and our culture.
On the one hand, there are amazing things that can be culled from large data sets with thoughtful and sophisticated analysis. For example, as early as August 4, 2014, the Network Dynamics and Simulation Science Laboratory at Virginia Tech was not only tracking the Ebola outbreak in Western Africa, but was predicting a significant number of cases in the absence of substantial mitigation.
The results of big data analysis aren’t always perfect. Case in point, the rather well known Google Flu Trends. Back in March 2014, the New York Times reported on a study published in Science that found that Google Flu Trends consistently over estimated the number of cases of flu.
Much in the news of late, the US Centers for Medicare and Medicaid Services released to the public their Open Payments portal to much fanfare and much frustration. Charles Ornstein (ProPublica) took the new website for a spin recently and gave it rather low marks. His concerns ranged from poor website design and function to problems with the quality of the underlying data.
Then, there are the somewhat less serious uses of big data, like the analysis of real estate data showing that Boulder Colorado has more toilets than people. Boulder has 102 residential toilets for every 100 people – the only city that exceeded a 1:1 ratio. Conversely, Miami is at the bottom of this list, coming in with only 62 toilets per 100 people. Perhaps this will discourage folks who are thinking about retiring in Miami.
Closer to home for research administrators, is the announcement earlier this week from the NIH of awards totaling $32 million through the BD2K – Big Data to Knowledge – initiative that is expected to include $650 million over the next seven years. The purpose of the BD2K initiative is to make large biomedical data sets accessible to researchers around the world.
The Future of Privacy
Huge amounts of electronic information exist about each of us in databases associated with our credit cards, stores, and online activities. An unknown NSA consultant, Edward Snowden, catapulted into public awareness in 2013 when he revealed classified documents to a group of reporters leading to exposure of several secret NSA programs that taken together demonstrate the extensive scope of our government’s data collection efforts. The ensuing conversation has stoked Big Brother-esque fears and done little to reassure the public.
EPIC, the Electronic Privacy Information Center, “is an independent non-profit research center… [it] works to protect privacy, freedom of expression, democratic values, and to promote the Public Voice in decisions concerning the future of the Internet.” As described on EPIC’s website, in analyzing big data, “[w]hat counts is the quantity of the data, rather than its quality. It looks for the correlation rather than the causation, the what rather than the why.” In the world of big data, anonymity is swiftly becoming a quaint concept. Even when each bit of data is anonymous, when those data are related, it is often possible to re-identify individuals. This is particularly problematic as previously distinct data sets are being combined and analyzed to identify new findings with increasing frequency today.
Are we consuming big data, or is it consuming us? The answer today is a bit of both.