
How do you define “health data”? To borrow a phrase from Daniel Solove, it is a concept in disarray and in need of a taxonomy.
Here are the items that fall naturally into the health data basket:
- electronic health record data
- current or past health and disability status, including mental and physical well-being
- medication lists
- genetic data, either collected by direct-to-consumer (DTC) testing or in a clinical setting
- data generated by devices, either medical (such as pacemakers and blood pressure monitors) or consumer (such as fitness trackers and home safety devices)
But how about:
- data generated by loyalty cards (supermarket, pharmacy, etc)*
- data generated by the use of social media, search engines, mobile apps, etc.
- data related to where someone lives
- death certificate data
What would you add? Where else do you see data related to health and wellbeing being generated, created, hoarded, or shared? Comments are open.
Update on 2/4/19: Check out this “Tapestry of Potentially High-Value Information Sources That May be Linked to an Individual for Use in Health Care” from “Finding the Missing Link for Big Biomedical Data” by Weber, Mandl, Kohane (PDF). Thanks to Paul Wicks for sharing the article!

* Hat tip to Angela Chen and her article on what types of data are not covered by existing laws.
Featured image: “Sorting” by mkreul on Flickr.
Credit history, credit card data = financial security
Gym data (visits/wk)
Police record, if such exists
Thanks, John! When your comment came in I mistook it for spam since it mentions “credit history” and “police record” but then I saw it was you. Great stuff here.
1) Home energy usage data (there is a power company in Japan now tracking when an old person living alone has changed their behaviour or fallen over, because they monitor changes from baseline in energy usage)
2) Transport data (people using mass transit when they swipe in/out)
3) Data from where you work (whether it’s name badges that track where you go and who you speak with, or even tracking how many hours you’ve been inside the building)
4) Surveillance cameras/CCTV (surely with the newer generation of AI cameras with facial recognition, a city could one day map out the walking routes of every inhabitant, and thus how far each of us has walked today)
In my opinion, all day is health data of some kind, which reflects the need for a health in all policies approach.
Incredible — thank you! This is why I love to blog & share, to get perspectives like yours. Of COURSE a power company in Japan is tracking older adults’ behavior and of COURSE that is health data (but I never would’ve thought of it).
Update: I found an article describing the Japanese utility companies’ approach.
“How Power Companies Can Save an Aging Japan: By monitoring utility use, businesses can play a role in reversing the tragic trend of ‘lonely deaths’,” by Nathaniel Bullard and Miho Kurosaki (Bloomberg, April 6, 2018)
A recent article in Science (paywalled) uses the term “shadow health records” which I thought was quite fascinating. From the abstract —> “But amid a flood of new forms of health data, some third parties have figured out ways to avoid some data privacy laws, developing what we call “shadow health records”—collections of health data outside the health system that provide detailed pictures of individual health—that allow both innovative research and commercial targeting despite data privacy rules. Now that space for regulatory arbitrage is changing. The long arms of Europe’s new General Data Protection Regulation (GDPR) and California’s new Consumer Privacy Act (CCPA) will reach shadow health records in many companies.”
Link: http://science.sciencemag.org/content/363/6426/448
Complete dental history. Would include whether person schedules regular prophylactic visits, as well as a history of fillings, extractions, surgeries, periodontal disease, crowns/bridges/implants, tooth erosion, oral infections, lesions, orthodonture, endodonture, dry mouth, TMJ, and more.
I find it appalling that my dentist has my complete health history (and her hygienist always asks for updates at every visit), but none of my medical providers (including primary-care doc and surgical teams from 2 joint replacements) has ever asked a single question about my dental/oral history.
Wow. The truth of this — and the missed opportunity for health tracking — is neon bright. Thanks, Peg!
Update from Twitter, where @ManeeshJuneja shared this relevant article:
Linking medical and dental health record data: a partnership with the Rochester Epidemiology Project
From the abstract: “The purpose of this project was to expand the Rochester Epidemiology Project (REP) medical records linkage infrastructure to include data from oral healthcare providers. The goal of this linkage is to facilitate research studies examining the role of oral health in overall health and quality of life.”
It’s open access so we can all read and learn from it. Thanks, Maneesh!
Hi Peg – mine does too- trained at the Pankey Institute in Florida- I’ve stuck with this office for 30+ years, a whole generation of them coming and going.
I’m on an overseas speaking trip and I want to canvass people about whether what you say is the same in their countries.
People need the right to review their “health reports” the same way they can review their credit reports. This applies to both medical and commercial health dossiers. EMRs have errors. Patients are a valuable source of information to improve EMR accuracy and reliability. At the same time, a anonymous “this person is a high health risk” can be as financially damaging as “this person is a high credit risk”. People need to be able to see all of the personal health profiles being assembled on them and need to be able to challenge and correct those the same way they correct errors in credit reports.
Well said. Thank you. What elements would you like to see in a “health credit report”? What should NOT be included?
Great Blog and thread Susannah as it’s hard to provide solutions when the elements are not fully defined. To affirm what everyone has said, the breadth of “health data” encompasses anything that directly or indirectly involves an individual so absolutely includes financial history, buying patterns, energy use etc as objective measures but also subjective elements derived from patient generated data. Not just regarding specific health issues but also environmental, social, political etc. And what I think is most important is not necessarily what the data points or responses are, but changes to theses data points and responses as they may change from that individuals “norm”.
Howard, you hit the nail on the head with “it’s hard to provide solutions when the elements are not fully defined.” That goes for everyone: entrepreneurs, policymakers, consumer protection advocates, clinicians, and the list goes on.
Another response along similar lines: a tweet from Translation Health included this NPR story from July 2018: Health Insurers Are Vacuuming Up Details About You — And It Could Raise Your Rates
My concern, among many, is that people will read these stories and clench up with fear. They’ll be less likely to be open to the possibility of sharing data, of saying “yes” to something that may help them. Our failure to define and have conversations about what is “health data” — including what should be protected under law — could have a chilling effect on innovation.
If “EHR data” data are sourced from the clinical visit, we could also consider the downstream medical and Rx claim data from health insurer (private and public). While HIPAA enables access, practical issues limit widespread safe transmission of these data. Hats off to BlueButton 2.0 team for enabling consumer-directed transmission of claims data for Medicare A, B, D.
Hats off indeed! Blue Button 2.0 will have far-reaching effects. Thanks for highlighting this one.
Thanks, all! Michelle Shevin responded to the post on Medium with this comment:
Purchasing behavior (for example, health insurers want to know who is purchasing plus size clothing).
Search history (our searches often reveal our health concerns).
Activity data (location, movement).
Phone usage and typing speed (the ways we use our phones are an indicator of our mental health).
…and much more, given that the value in any data is how it correlates to other (orthogonal) data. All personal data is health-relevant data.
More coming from the National Academies this Spring!
I’d probably add Internet usage and mobile usage, both how often, location and what you search.
I remember learning from my AIDS.gov colleagues that traffic to their site spiked around 3am all across the country, like a wave at a stadium, as people searched online for answers to questions like “what to do when the condom breaks.” And what you search for on your mobile is often more intimate than what you search for on a bigger screen.
So, yes, what you search for online can be considered health data. The question is: Who gets to look? Etc.
Since comments are being posted on Medium and LinkedIn, too, I’ll use this opportunity to share two more:
Rashid K wrote, “should we include educational history ( including test scores ), driving history data ,as well, … perhaps they might prove their relevancy in gaining a holistic view of current state of patients’ health as well as some indication of its trajectory as to where it is going — probably ppl who are better educated with good reflex action ( read clean driving history) would be having a good health… :)”
Paul Tarini wrote, “For a primer on the depth of data the likes of Facebook, Google, Amazon collect, see this entertaining series of articles by Kashmir Hill in Gizmodo about when she tried to block from her life each service for a week. https://gizmodo.com/i-tried-to-block-amazon-from-my-life-it-was-impossible-1830565336 ”
Fascinating!
In the spirit of “don’t let great stuff disappear into the ether”:
Paul Wicks shared an article on Twitter with the comment: “Misses links between these and relative value (plus trials are hidden as pink box) but pretty nice conceptual overview.” See: “Finding the Missing Link for Big Biomedical Data” by Weber, Mandl, Kohane (PDF)
Keep those ideas coming!
Data generated by the patients themselves (PRO’s, provider ratings, peer health advice, etc.)!
Technology makes it easier than ever to record/share your health experiences, and social networks allow you to discover useful info from communities/friends.
Data about my caregivers:
– their unique views of my health
– our relationship, including how much I can depend on them, do I feel like I’m a burden on them
I wholly agree with the second batch of items! I’d echo the same and more (diet and relationship data) from an essay I wrote about widening the scope of medical data collection. https://www.philips.com/a-w/about/news/archive/future-health-index/articles/20180226-narrow-scope-data-collection.html
Great article, Sophie! Thanks for sharing it. Love this:
Medical forms also omit several other important life data questions, such as:
How strong are a patient’s relationships with their family and friends?
What does their current diet look like? Are they a vegan/vegetarian/pescetarian?
How long have they been married or single? Does their marital status contribute to any of their health problems?
I clicked through on the Human Project and indeed it’s a trove of ideas and examples of health data collection. Here’s one key element: informed consent.
A quote from the New York Times article about the launch:
But unlike cellphone and social media companies, Mr. Glimcher said, the Human Project wants to be sure everyone knows exactly what he or she is signing up for.
“The way the industry has gone has been an embarrassment, if not a crime,” Mr. Glimcher said.
For instance, he said, many of the people he and his team have spoken to have no idea that they are being geo-tracked all the time, or that when they go to a hospital they often sign forms that allow their health information to be sold to big health care companies.
“Our subjects have to understand what they are getting into,” he said.
Thank you! I hope we can have holistic medical forms shortly. Informed consent is critical. Henrietta Lacks (HeLa cells) came to mind when I read your final sentences. That’s why I’m grateful for conversations like these!
In your proposed taxonomy I trust there will be clear splits between categories like…
– bodily readings (everything from the familiar weight and BP to data from swabs)
– activity readings that clearly correlate with health one way or another (step count etc)
– the many non biological things others have listed here that OTHER PEOPLE THINK might correlate with our health, now or future. This is where severe and harming mischief can be done by players (insurers, employers) who someone’s truly don’t care about individual harm as long as their “pool” of risks pans out FOR THEM.
– genomics – things wired into you that you can’t (today) do anything about but which certainly affect your health future AND which can be used against you by those risk players
– behaviors that are health related (tobacco), and others that apparently aren’t but might be misconstrued by risk players
Etc.
This raises the question, in creating a taxonomy, where do we cleave? Are there principles to follow or is it arbitrary? I can easily see it getting too complicated for ordinary people to find useful. Would we care? I would, on general principle.
Thanks, Dave! As is often the case, I’m using my blog to try out an idea and see if it resonates with people. Looks like this one does!
I was inspired by Daniel Solove’s taxonomy of “privacy” (I linked to his original article and also recommend his book, Understanding Privacy.) He is a law professor, so he approaches the taxonomy of a concept from a legal standpoint (while pointing out considerations for ethics, policymaking, etc.)
When thinking about a taxonomy of “health data” I see it from multiple points of view: entrepreneur, policymaker, open data advocate, patient rights advocate, public opinion pollster, advisor to health systems… And the comments so far reflect a diverse set of viewpoints.
Matthew Holt famously rails against attempts to define “mobile health” and “digital health” since it really is all just health (with a gloss of tech on top). And that’s what I’m seeing in “health data” as well. If everything is health data, then what hope do policymakers have when it comes to “freeing it” or putting bounds around it, to protect, promote, or prevent its exchange?
I don’t hope to solve the issue here but I do hope to raise questions and get people to notice when they use the term too broadly.
I also hope (and this is happening) to have a creative discussion of what’s possible, what’s around the corner, what are the opportunities for learning — or creation of new businesses and interventions.
The first thing that came to mind when I read Susannah’s tweet about this was that potentially everything is health – related, but that makes policy difficult. There are (at least) two ways of dealing with this, one of which is to draw a line & say things in category A are health data but things in category B aren’t. The advantage of this is that it gives you clear boundaries, but you’ll always be debating if something should be in category A or B.
Another way of thinking about it is to think about qualities not types. In other words, convert it from a either/or question to a question of how much. Your cholesterol levels, for example, would be 100% health data and the number of phone calls you make would be maybe closer to a few percent. How you score it can be based on not just the source of the data or the subject of the data, but also the application to which the data are put, which means you head-off attempts to create loopholes by lobbying for something to be miscategorized. I feel like that would give you a little stronger theoretical framework to work with, which would be important if your overall goal is to give policymakers something to work with.
Thank you! Great comment.
This discussion has helped bring into focus the differences between how health data might be defined in a research context vs. clinical context vs. policymaker context (to name only three). Data that a researcher might find incredibly valuable, a clinician might find irrelevant — and a policymaker might find it unacceptable that it is being collected and tracked.
Again, I’m not hoping to solve the issue here, but rather to shine a spotlight on a fast-moving, complex field with serious implications for multiple stakeholders.
Since this blog is my outboard memory, I’m porting over two more comments from my post on Medium:
Ivan Galanin writes: “Sleep, nutrition, exposome — exposure to sun, nature, noise, crowding. Social interactions. Oral, gut and skin microbiome.”
hegel on healthcare writes: “I believe the concept today objectifies the individual, such that the owners and hoarders of the data are generally not the patients or consumers. If I were a super-patient, the only health data I know of that currently empowers me is my symptom diary or a journal of my health journey. The types of health data we generally think of today perpetuate capitalist interests and exploit individuals as currency. We need to create laws similar to open government data where we can download them all in a machine-readable format; and also protect our privacy by opting to share or not with whom and when.”
Expand Home Assessment to Environment, Neighborhood, Caregivers, educational level, Care management at Home- Psychosocial, Medical, Emotional, Financial.
Psychology- Adjustment to Illness, Acceptance, Level Of Coping Ability.
Home Care Records- Synchronization with Inpatient Records-Transitional Barriers.
All Barriers to success.
1. One thing that comes to mind here is Latanya Sweeney’s data map: https://thedatamap.org
2. We all leave a trail of personal ‘digital exhaust’ intentionally and unintentionally, and various data collectors are picking it up and cleaning/ using/ selling it in various ways. Some of that is aggregated and sold as ‘risk scores’ for various things. The latest entry in this category that raises an eyebrow (at the very least) is risk scoring for opioid abuse or addiction. Subscribers to one of the several products on offer can use a simple numerical score to inform a decision about whether or not to prescribe an opioid painkiller. But … what if the data is wrong, or what if the algorithm is wrong?
I understand you are trying to define the top of the funnel, but I think it’s important to think about uses of data as well as sources of data. We sometimes err on the side of seeking to be over-inclusive in defining what is health data, and I guess my point is that we need to be really sure we know what we’re going to do with that further-afield data so that this doesn’t have unintended consequences based on a misunderstanding or misconstruction of the data — if an algorithm kept a family member from getting appropriate pain medication when it was absolutely necessary then I would be angry …
Wow! Thank you!
Everyone, click through on Sweeney’s Data Map when you have a moment — lots there that I didn’t know about and I consider myself pretty well-versed (and slightly paranoid) about what is collected and shared about my own digital usage.
And thanks for pointing out some of the implications for use and misuse of data exhaust.
On a positive note, I wanted to highlight Project HealthDesign, an RWJF-funded project led by Patti Brennan from 2006-14. Researchers worked with patients and caregivers to create data-driven interventions using observations of daily living (ODLs) aimed at issues like obesity among adolescents; sedentary adults; management of conditions such as chronic pain, asthma, diabetes, depression, and hypertension; and improving care for low-birthweight babies, adolescents, and adults. A personal favorite of mine was “Living Profiles: Design of a Health Media Platform for Teens With Special Healthcare Needs” (studies show that teens with chronic conditions experiencing depressive symptoms are less likely to take their meds so the program monitored – with their permission – teens’ text messages so that if their mood seemed low, they’d get an automatic reminder to take their meds).
Project HealthDesign’s final report (PDF) is worth a click for those interested in learning more about the possibility of what could be included in the definition of “health data.”
Susannah, I hope you can find a “Like” or thumbs-up plugin for this blog. I’d poke it for David Harlow’s contribution, particularly the importance of considering the ultimate use of the categorized data, before/when doing the categorizing.
I trust that your background makes you well aware of the hazards of modeling errors – unintended meanings that get attached to each decision as we try to record “What just happened in this anecdote?? How do we conceptualize it??” It’s so hard later to go back to the raw data with a new filter and re-sort it with new patterns in mind.
Ten years ago this month at TED2009, Tim Berners-Lee The Next Web, and asked the crowd to NOT just read people’s interpretations of their data, but ask for the raw data.
Anyway back to the point: I’m sure whatever taxonomy comes of this exercise, some people will take it to be a newly discovered immutable reality, which is a risk. So, again, I agree with David in being mindful at the beginning of what will be done with these decisions. And please, whoever uses the taxonomy, publish your raw data (per Tim B-L) so future researchers can see how the picture changes if a different taxonomy is applied.
____________
p.s.
The dialog is so rich on your posts that you almost need a Medium-like ability for people to comment mid-stream, like adding comments in Word or a Google Doc.
Thanks, Dave! Been offline for some R&R — always grateful for your insights. Your emphasis on the call for raw data is key and adds a layer of complexity to gaining people’s permission to collect and keep the most sensitive, granular health data.
I’d love to find a WordPress plugin to let people vote up certain comments. I started a new category of posts — featured commenters to pluck out the gems I’ve treasured over the years. But there are so many!
Really glad to read all these comments. Sara Riggare (http://www.riggare.se/) pointed me here after learning I’d missed it; she’s been part of an extensive discussion over the past few years about expanding conventional health care taxonomies. For instance she’s discussed how, in managing a complex condition, the most important data may be a synthetic data type such as “feel well time” which will typically not be registered in any health record.
A question I’ve found interesting to think about how to disentangle conversations about taxonomy from implicit assumptions about data aggregation. Typically, taxonomies are meant to support analysis at an aggregate level. This may be what “hegel on healthcare” is tuning into with the comment that “the concept today objectifies the individual, such that the owners and hoarders of the data are generally not the patients or consumers.”
How can taxonomies be useful without making assumptions about data aggregation? And, related to this, what happens when we focus on the agency of the person about whom observations are made? Maybe this gets us to characterize the observations along different dimensions. For instance: accurate vs. erroneous data in our downloaded medical records; personally-inflected observations made by specific friends, family, and caregivers; outcomes of deliberate self-experiments. This gets far enough away from traditional health data that maybe data is no longer the right word. Observations (Brennan’s language) then starts to make sense as a usefully not-quite-synonym.
Thanks Susannah as always for launching this interesting discussion!
Thank you, Gary! The conversation is never over and I’m grateful for your comment.
(If you scroll to the bottom of this page, there’s a way to subscribe to my blog — it says “Don’t Miss a Post” and you just enter your email address. I *only* send an email when I publish a new essay.)
Deloitte Insights published what could be described as a taxonomy of health data market forces: Forces of change: The future of health. Another useful resource: the Deloitte 2018 Health Care Consumer Survey results.