Data fusion is the integration of data and knowledge from different sources. This blog post will enumerate and explain different classification schemes for data fusion and review the common algorithms. We will explore three different applications: (i) data association, (ii) state estimation, and (iii) decision fusion.
First, let’s start with the difference between data fusion and integration and data fusion and data integration best practices.
How many people put off going to the doctor because their symptoms “aren’t that bad” and they don’t want to call out sick from work? Or perhaps they’re too busy and tell themselves that they simply don’t have the time to sit in a crowded doctor’s waiting room? Postponing a visit to the doctor is a recipe for disaster because there is a risk that certain medical conditions become unmanageable (and more expensive) with time. If there’s one thing that the pandemic has shown us, it’s that online telehealth services work. Telehealth has revolutionized the medical industry, allowing patients to put their health first and seek out the best online doctors who can prescribe them medicine and monitor their treatment.
Fusion and integration: The difference
Let’s start with the data fusion definition. Both data fusion and data integration are designed to integrate and organize data that comes from multiple sources. The goal is to present a unified view of data for consumption by various applications, making it easier for analytics to derive actionable insights.
However, there are major differences between data fusion and data integration. The main one is that information fusion focuses on processing real-time streaming data and enriching this stream with semantic context from other Big Data sources. Other differences include:
- Data Reduction
The first and foremost goal of information fusion is to enable data abstraction. So, data integration focuses on combining data to create consumable data. Information Fusion frequently involves “fusing” data at different abstraction levels and various levels of uncertainty to support a more narrow set of application workloads.
- Handling Streaming/Real-Time Data
Data integration is often combined with data-at-rest or batch-oriented data. Information Fusion integrates, transforms, and organizes all manner of data (structured, semi-structured, and unstructured) and uses multiple techniques to reduce the amount of stateless data and only retain stateful and valuable information.
- Human Interfaces
Information Fusion incorporates contributions to the data and reduces uncertainty. If organizations can add and save interfaces that can only be derived with human analysis and support, they will be able to maximize their analytics results.
Data modern business and data management
Nowadays, data integration is critical to the success of any company. However, more than 70% of employees have access to data they should not. Eighty percent of most analysts’ time is spent simply discovering and preparing data. The lack of good data hygiene leads to data breaches and inconsistent data management of business processes. The key elements of data strategy in most companies consists of defensive and offensive approaches. The key objective of defensive approaches are:
- Ensuring data security, privacy, integrity, quality, regulatory compliance, and governance
- Optimizing data extraction, standardization, storage, and access
- Control
- SSOT (Single source of truth)
CNBC reported in April 2020 on nine companies that were offering either discounted or free telemedicine to those The key objectives of the offensive approach are:
- Improving competitive position
- Optimizing data analytics, modeling, visualization, transformation, and enrichment
- Flexibility
However, even organizations that use both defensive and offensive data management approaches and have such processes as an analytical setup, infrastructure, and dedicated personnel can’t always turn their data into business-driving insights. In most cases, it happens for two reasons: technical limitations of the data analytics platforms and lack of business-led data analytics operations.
Nowadays, business users want to get real-time, actionable data. However, most old data analytics platforms do not meet business needs. They are not flexible enough to deliver one-off analyses and ad-hoc reports, and so slow down the business.
Another issue is that companies gather data but don’t know how to analyze it.
“To be able to deliver a good customer experience to individuals, we need to understand those individuals because the more personalized we can be, the more valuable we are to those customers. The way to gain that understanding is through data.”
— Jodie Sangster, CMO for IBM for Australia and New Zealand.
To unlock the value of data, you need to have the right tools and use them at the right time. Let’s see how data fusion algorithms can help.
Data fusion algorithms: Where they are used
Sensor and data fusion is used in different industries, from earth resource monitoring, weather forecasting, and vehicular traffic control to military target classification and tracking. Usually, data fusion technologies are powered by artificial intelligence, making data fusion perform more quickly and efficiently.
In general, data fusion techniques involve sensors that collect disparate sets of information. Once the sensors have gathered the intel, data fusion algorithms aggregate the intel into a single comprehensive data set. Here are the main types of data fusion algorithms:
- Algorithms based on the central limit theorem and binary fusion
These algorithms state that as the sample size of anything we measure increases, the average value of those samples will tend towards a normal distribution (a bell curve). For example, the more times we roll a 6-sided die, the closer the average value of rolls will be to 3.5, the “true” average.
- Kalman filter
A Kalman filter is often used in navigation and control technology. It takes data inputs from multiple sources and estimates unknown values, despite a potentially high signal noise level. Kalman filters can also predict unknown values more accurately than individual predictions with the help of single methods of measurement.
- Algorithms based on Bayesian networks
Bayesian networks predict the likelihood that any chosen hypothesis is the contributing factor in a given event. Some Bayesian algorithms include K2, hill climbing, iterative hill-climbing, and simulated annealing.
- Convolutional neural networks
Convolutional neural network-based methods can simultaneously process many channels of sensor data, producing classification results based on image recognition.
Data integration solutions
The most common data integration solutions are accessing databases (JDBC, ODBC), parsing formatted files (fixed width, delimited, CSV, XML), extracting archives (ZIP, JAR, TAR), retrieval from file transfer (FTP, SCP), messaging (JMS, AMQP), and consuming web services (SOAP, REST).
Here are the best practices for data fusion and integration and elements of the data integration process:
- Extract, Transform, Load (ETL) - extract from source, transform into the structure, and load into the target.
- Enterprise Information Integration (EII) - unified view of data and information for an entire organization.
- Enterprise Data Replication (EDR) - synchronization from real-time processing of captured data changes.
- Enterprise Application Integration (EAI) - middleware that enables integration of systems and applications.
Depending on the business needs, a company can implement the process of integrating data from disparate sources in the following ways:
- Manual
A user manually collects data from disparate source systems and uploads the data to the target databases. Also, to ease the mapping of datasets, every new use case is hard-coded.
- Middleware
Middleware software involves a virtual “pipeline” that is created between multiple systems for bi-directional communication. Middleware software streamlines integration tasks and improves connectivity.
- Data virtualization/data federation
In data virtualization, special tools transpose raw data onto a different level of abstraction. The abstraction layer provides a unified view of the disparate systems, leaving the data in its original location. Then, the data can be accessed from the virtual layer, which contains metadata to access the sources. With the help of data visualization, businesses can get real-time access to their data without exposing the technical details of the source systems. Additionally, they can introduce enterprise-wide changes on the virtual layer instead of consolidating the data in one place or implementing changes at each source separately. This integration approach can run alongside ETL or ELT processes but doesn’t support bulk data management.
- Data warehouse/physical data integration
With this technique, top cloud-based ETL tools are used to move data from the source system to the data warehouse or other physical data destination. This process is more flexible and allows businesses to manage and gather data in a centralized location.
This method includes two main approaches: ETL (extract, transform, load) and ELT (extract, load, transform). All techniques extract, transform, and load data onto a destination.
“In many industries, real-time data capability is becoming a must-have to stay competitive. However, due to the complexity of the systems involved, businesses can’t pivot to real-time overnight. It’s crucial for companies making this transition to find an experienced technical partner to guide them through the process and help them identify the areas where data can best help their business.”
— Vlad Medvedovsky, Founder and CEO at Proxet (ex - Rails Reactor), a software development solutions company.
Example of data integration: Use cases
Now let’s go straight to the commonly-used examples of data integration definition.
“Data modernization enables new services and processes. The business can transform its front-end user interfaces, create new features, automate previously manual processes and even launch new service offerings faster.”
— Deepali Naair, CMO for IBM in India and South Asia.
What’s an example of data integration? Here are some of them:
- Data integration helps retail stores improve their customer journeys and provide a consistent brand experience. With the help of data integration, brands can take all their data and combine it into a unified database where it can be analyzed and stored. Data integration helps retailers use the data they produce and transform it into insights for growing their business.
- In healthcare, data application integration platforms allow aggregation of patient data into one comprehensive record. Integration of patient data helps to control costs, improve outcomes, and promote healthcare and wellness.
- Data integration can feed special sales dashboards for measuring and tracking sales and marketing efforts, customer behavior on the website, and customer buying experience.
- With the help of data fusion software, companies can also receive data from suppliers and partners. For example, a manufacturer can transfer shipping lists, invoice information, or general product data, and a hospital can receive patient records from different independent offices.
Companies can use full-stack data integration platforms powered by AI for real-time event-driven application integration, hybrid cloud, and data source integration with business and technical flows. One such platforms is Akira, which contains tools for modern data integration and secure data pipelines. Another example is the SnapLogic Intelligent Integration Platform, a data integration tool that focuses on robust and flexible self-service functionality. It’s designed to help organizations achieve their digital transformation goals by connecting data from everywhere.
At Proxet, we have deep experience in helping organizations leverage the power of real-time data. If you are considering upgrading your company’s ability to analyze and act on large volumes of information, please get in touch.