Doing Big Data Driven Research in Medicine

Published:
May 6, 2021
Doing Big Data Driven Research in Medicine

Healthcare and AI were some of the only industries to see growth in 2020, and are expected to grow even further in 2021, according to ZDnet, and growth in AI has been driven in part by its increasing adoption in the healthcare industry.

This isn’t surprising, medical providers of all types, from hospitals, to pharmaceutical companies, to private practices, generate enormous amounts of data.  So much data in fact, that the business problem is often less getting data itself and more sifting through the mountains of information these organizations already have access to. This makes AI a critical tool in research.

While doing big data research is a challenge, the insights generated  can massively improve both patient and organizational outcomes across every possible metric. It’s all a matter of using the correct techniques to generate these insights. Without the right approach, all the data in the world is useless.

One type of data that any organization doing this analysis will need to use is EHRs, or Electronic Health Records. These are the record keeping systems that healthcare providers use to record information about patients. Just about any interaction an individual patient has with a healthcare provider will result in the generation of an EHR.

Natural Language Processing for Medical Records

From a technical perspective, EHRs are more complex than many other types of Big Data that are typically analyzed.  Types of EHR formats can differ from each other in ways that Excel data typically does not. In addition, the content goes beyond numerical and True/False types of entries. EHRs very often contain the doctor’s written notes, stored as free text, which means that a specialized Natural Language Processing capacity is required to analyze them at scale. These notes contain as much as 70% percent of the most valuable, actionable data in the records.  

However, teaching an algorithm to do effective NLP on EHRs can be a very difficult task. Reasons for this include:

  • Grammar complexities. (for example, the period after “Dr.” does not indicate the end of a sentence);
  • Gradations of degree  (e.g., is “extremely painful” different from just “painful”? );
  • Many words are commonly in use which refer to the same thing  (e.g., “inflammation” “inflamed” “swelling”, “swollen”, and “inflammation” reflect the same phenomenon);
  • Errors in the written text
Image By Proxet. Applications of AI in Healthcare
Applications of AI in Healthcare

Managing Large Medical Records Data in Research

The starting point is to get the specific data that you need from existing data sets, and then apply more specific tools and techniques to them. Because of the complexity of information contained in these records, it is necessary to apply a combination of basic and highly advanced data mining techniques. The first step it will be necessary to take is to make sure you’ve carried out data cleaning on all your data sets.

Clean data is an absolute necessity for doing proper analysis, and in healthcare it can present particular difficulties. In addition to the challenges posed by written texts within the records, another issue is that there is currently no standardized, nationwide EHR format. Different medical organizations, and disciplines within those organizations, may have a wide range of EHR formats and structures, which can confuse an algorithm which is sorting through them all simultaneously.

In addition to formatting issues, there’s also the issue of data that is duplicated, or just contains errors. If the data itself is erroneous, then the conclusions drawn from its analysis cannot be reliable.

Much of the work of data cleansing needs to be done manually, but there are specialized tools on the market that can automate portions of the process, and make those parts that cannot be automated easier to manage manually. Remember, medicine is among the most regulation-sensitive industries in the world, and data analysts will need to be certain that tools they are using are HIIPA compliant (in the United States).

“Regulatory compliance is important in any industry, but especially in medicine. When selecting an organization to help you build custom software development solutions for your healthcare business, it’s important to choose a partner that has deep experience in building compliance-sensitive systems in your market.”

Vlad Medvedovsky, CEO of Proxet

The Importance of a User-Friendly Interface for Medical Workers

Dirty data can be minimized by ensuring that the methods organizations use to collect data are state-of-the-art. One of the best ways to do this is to use a collection method has a user-friendly interface for medical workers. The less easy the UI is to use, the more likely data entry mistakes become. 

Intuitive UI isn’t only important for medical professionals. As organizations go through digital transformation,  patients are also taking part in managing their own care with apps, which have functions ranging from appointment booking and billing, all the way to symptom monitoring and video chat for telemedicine appointments. If the UIs in these systems fall short, patients may hesitate to use them, or make mistakes and enter incorrect data that cannot subsequently be exploited by the organization.

In addition to data concerns, there’s also the matter of competition. For example a pharmacy company trying to capture market share in an increasingly digitized industry may lose out to competitors if their delivery app is difficult to use. And all other things being equal, a healthcare provider that encumbers its patients with clunky apps and interfaces may lose patients to a provider that offers patients easy access to an intuitive healthcare dashboard.

“Consumers don’t just want to view their health information, they want to understand it.”

Anna Labno, Partner at Boston Consulting Group.

Given the diversity of use-cases, from both the practitioner and patient sides, a one-size-fits-all solution is unlikely to be effective in these instances. Each solution should be tailored to the specific requirements of the type of care being provided.

Another thing to consider in medical UI is burnout. Doctors and other healthcare professionals have many demands on their time, and if the software that they need to use for record keeping and other tasks makes the job more, rather than less difficult, they may feel that clerical work is taking up an inordinate amount of their effort and begin to look for work somewhere where the process is more streamlined.

Big Data Analytics in Healthcare

Once you are confident that you have a clean data set, it’s time to apply big data analytics techniques. Exactly which techniques to apply depends on the complexity of the data itself and on the goal of the research.

R vs. Python

Both R and Python can be used for most data analysis tasks, and which one is best to use depends on specific factors. Things to consider include whether or not you already know one of the languages, what your time constraints are, and which language (if any) your colleagues prefer to work with.

Those with a background in programming may find Python to be a more natural choice, whereas academics and others without a computer science background tend to use R. If you’re doing research that will eventually have a client-facing business component, you may also prefer R as it has a wide range of tools for creating elegant graphics, charts, plots and other visual displays. Python also has visual functionality, but it takes more skill to use it than it does with R.

Doing big data analytics in healthcare requires intensive computational resources. Few organizations can direct this amount of their own total available computing power solely to this analysis. In these instances, it may be a good idea to look into cloud solutions for data mining.

Image by Proxet. Clould Computing in Healthcare
State of Cloud Computing Adoption in Healthcare

Cloud computing enables organizations to leverage all kinds of computer functionality and applications offsite. The options range from simple, such as Google Drive, to complex processing of equations with multiple, dedicated machines. The top service providers in the industry include AWS, Microsoft Azure, Google Cloud, and IBM.

Having access to cloud computing can allow your organization to run resource heavy data mining tools on the datasets you’ve collected. There are quite a few options on the market, many of which are no-code or low-code, meaning that even non-specialists can run basic functions. Now, let’s take a look at the techniques that are used with these tools to generate insights

Methods for Data Mining in Medicine

Below is a list of methods commonly used in data mining and analysis:

  • Identifying and tracking patterns: large scale pattern analysis is  the foundation of almost any type of data analysis. Once a pattern of interest has been found, it’s possible to drill down deeper and identify a course of action based on it. 
  • Classification: Data mining techniques that involve analyzing attributes associated the data. When organizations identify key characteristics of any given data type, it then becomes possible to classify related data as well. There are numerous applications here. For example, classifying personally identifiable information make it possible to automate redaction — achieving HIIPA compliance more quickly and easily.
  • Association: Association is closely related to classification. After events have been classified, it is easier to identify associations.
  • Regression: Regression is one of the fundamental concepts in statistics and data science. Running a regression on your data allows you to investigate the relationship between a dependent  and independent variable. Regressions can be used for forecasting, modeling and discovering causal relationships between variables. 
  • Clustering: Clustering groups data visually for easier analysis. By modifying the parameters, you can change how the data is displayed and grouped.

This is far from an exhaustive list of the techniques that can be used in medical analytics. Data Science is a deep and complex field, so much so that some tools employ what are known as “black box” methods.

A “black box” data analysis technique involves running a sophisticated AI on a dataset. The AI will then identify patterns, risks, or recommended courses of action, but will not be able to explain how it reached the conclusion that it did. While running a black box AI on medical big data can generate powerful insights, there are drawbacks as well. For example, it is much more difficult to detect algorithmic bias in a black box AI than in an explainable AI. However, as long the deliberate approach is taken, these risks can be mitigated

As we have seen, doing Big Data research in medicine is a complex process that requires intensive investment in computational resources, expertise, and regulatory considerations. But if the rewards are well worth it, and those organizations that fail to implement a data-driven approach risk being left behind by their competitors.

Proxet has significant expertise in implementing custom software development solutions in big data and healthcare analytics. Our experienced product owners and developers will gladly assist you in deciding what data to collect and how to use it efficiently.

Related Posts