How Much Time Do You Spend Cleaning Your Data?

Posted on June 29, 2015 I Written By

John Lynn is the Founder of the blog network which currently consists of 10 blogs containing over 8000 articles with John having written over 4000 of the articles himself. These EMR and Healthcare IT related articles have been viewed over 16 million times. John also manages Healthcare IT Central and Healthcare IT Today, the leading career Health IT job board and blog. John is co-founder of and John is highly involved in social media, and in addition to his blogs can also be found on Twitter: @techguy and @ehrandhit and LinkedIn.

I recently came across this really great blog post talking about data scientists wasting their time. Here’s a quote from the article (which quotes the NYT):

“Data scientists, according to interviews and expert estimates, spend 50 percent to 80 percent of their time mired in [the] mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.”
– Steve Lohr, NYT

Then, they have this extraordinary quote from Monica Rogati, VP for Data Science at Jawbone:

“Data scientists are forced to act more like data janitors than actual scientists.”

Every data scientist will tell you this is a problem. They spend far too much time cleaning up the data and they all wish they could spend more time actually looking at the data to find insights. I’ve seen this all over health care. In fact, I’d say we have more data janitors than data scientists in healthcare. Sadly, many healthcare data projects clean up the data and then don’t have any budget left to actually do something with the data.

The solution to this problem is easy to write and much harder to do. The solution is to create an expectation and a culture of clean data in your organization.

I predict that over the next 5-10 years, healthcare data is going to become the backbone of healthcare data decision making. Those organizations that houses are a mess are going to be torn down and sold off to the hospital that’s kept a clean house. Is your hospital data clean or dirty?