Open Source Tool Offers “Synthetic” Patients For Hospital Big Data Projects

Posted on September 13, 2017 I Written By

Anne Zieger is veteran healthcare editor and analyst with 25 years of industry experience. Zieger formerly served as editor-in-chief of FierceHealthcare.com and her commentaries have appeared in dozens of international business publications, including Forbes, Business Week and Information Week. She has also contributed content to hundreds of healthcare and health IT organizations, including several Fortune 500 companies. She can be reached at @ziegerhealth or www.ziegerhealthcare.com.

As readers will know, using big data in healthcare comes with a host of security and privacy problems, many of which are thorny.

For one thing, the more patient data you accumulate, the bigger the disaster when and if the database is hacked. Another important concern is that if you decide to share the data, there’s always the chance that your partner will use it inappropriately, violating the terms of whatever consent to disclose you had in mind. Then, there’s the issue of working with incomplete or corrupted data which, if extensive enough, can interfere with your analysis or even lead to inaccurate results.

But now, there may be a realistic alternative, one which allows you to experiment with big data models without taking all of these risks. A unique software project is underway which gives healthcare organizations a chance to scope out big data projects without using real patient data.

The software, Synthea, is an open source synthetic patient generator that models the medical history of synthetic patients. It seems to have been built by The MITRE Corporation, a not-for-profit research and development organization sponsored by the U.S. federal government. (This page offers a list of other open source projects in which MITRE is or has been involved.)

Synthea is built on a Generic Module Framework which allows it to model varied diseases and conditions that play a role in the medical history of these patients. The Synthea modules create synthetic patients using not only clinical data, but also real-world statistics collected by agencies like the CDC and NIH. MITRE kicked off the project using models based on the top ten reasons patients see primary care physicians and the top ten conditions that shorten years of life.

Its makers were so thorough that each patient’s medical experiences are simulated independently from their “birth” to the present day. The profiles include a full medical history, which includes medication lists, allergies, physician encounters and social determinants of health. The data can be shared using C-CDA, HL7 FHIR, CSV and other formats.

On its site, MITRE says its intent in creating Synthea is to provide “high-quality, synthetic, realistic but not real patient data and associated health records covering every aspect of healthcare.” As MITRE notes, having a batch of synthetic patient data on hand can be pretty, well, handy in evaluating new treatment models, care management systems, clinical support tools and more. It’s also a convenient way to predict the impact of public health decisions quickly.

This is such a good idea that I’m surprised nobody else has done something comparable. (Well, at least as far as I know no one has.) Not only that, it’s great to see the software being made available freely via the open source distribution model.

Of course, in the final analysis, healthcare organizations want to work with their own data, not synthetic substitutes. But at least in some cases, Synthea may offer hospitals and health systems a nice head start.