The promise and pitfalls of big data in healthcare

An editorial in The Atlantic examines the challenges of creating a so-called unified health database to “collect in one searchable repository all of the parameters that measure or could conceivably reflect human well-being”. A resource on this scale would be highly valuable to researchers.

The article makes only a passing mention of the privacy risks inherent in a massive compilation of potentially identifiable data, from sources that run the gamut from mobile health apps to electronic medical records. Numerous studies have demonstrated the relative ease of re-identifying specific individuals when anonymized data are combined with publicly available sources such as social media and search engine results. In 2000, a researcher at Carnegie Mellon University found that a majority of the US population could probably be identified from as few as three data points. The data points in question – ZIP code or county, date of birth and gender – are pieces of information most people would not consider private.

In the US, individual health information is protected under the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. HIPAA generally prohibits the release of health information that includes “any…unique identifying number, characteristic, or code” that could be used to identify an individual, except in specific situations. It does not prohibit the release of de-identified data or limited data sets that do not include direct identifiers, nor does it govern the further use or re-disclosure of data that have been released in accordance with the Privacy Rule. Notwithstanding the trend toward more self-disclosure that has accompanied the rise of social networking, this is a significant gap that would have to be addressed in order for such a project to move forward.

