Organizations around the globe are increasingly looking for database tools capable of retrieving, storing and analyzing data from various sensors and IoT devices utilized across industrial facilities
It is safe to say that today, the ability to collect and analyze process manufacturing data is more important than ever before. Being aware of that, organizations around the globe are increasingly looking for database tools capable of retrieving, storing and analyzing data from various sensors and IoT devices utilized across industrial facilities. Since data historians are designed for these purposes precisely, the demand for historian software solutions has been on the rise. As well as the number of data historian options available on the market. In this article we are going to take a closer look at data historians in general, providing you with information that should help to get a better understanding of their features and applications, differences between various data historians, and how to approach the selection of a time series data solution that would be right for your specific needs.
Data or process historian (also sometimes being called operational historian or simply “historian”) is a set of time-series database applications designed and typically used for collecting and storing data related to industrial operations.
Data historians were originally developed in the second half of the 1980s to be used with industrial automation systems such as SCADA (supervisory control and data acquisition). Primarily, they were utilized for the needs of the process manufacturing sector in industries such as oil and gas, chemicals, pharmaceuticals, pipelines and refining, etc.
Today, however, process historians are widely used across industries, serving as an important tool for performance monitoring, supervisory control, analytics, and quality assurance. They allow industrial facility managers and stakeholders, as well as engineers, data scientists and various machinery operators, to access the data collected from a variety of automated systems and sensors. The collected data can be utilized for performance monitoring, process tracking or business analytics. Modern-day historians also often include other features related to the utilization of collected data, such as reporting capabilities that allow users to generate automated or manual reports.
Data historian vs. SQL database. What’s the difference?
As you can see, data historians have quite a lot of similarities with regular databases used to store all the systems data, such as SQL. Historians, however, are more than just databases, as they not only collect the raw data, but are also typically able to process it, organize it into reports and forward the data to other storage units. That being said, however, a typical data historian is essentially a time-series database, which has been extensively customized to serve the needs of industrial automation.
The origins of data historians development haven’t been documented as well as the history of many other technologies and solutions used in industrial automation (like SCADAs, for example). It is considered that first early versions of data historians were implemented by Oil System Inc, later rebranded as OSIsoft, as part of its process manufacturing real-time data management software suit, now known as OSIsoft PI System. The original historians, based on DEC VAX/VMS minicomputers, were designed to store data for analytics and regulatory purposes. Naturally, other elements of industrial automation analytics, such as time series databases and spreadsheets, were being developed and evolved in the same time period as historians.
Soon enough, other companies in the industrial automation market followed OSIsoft’s footsteps and developed their own historians, providing them as parts of the SCADA systems or as separate software solutions augmenting hardware platforms and components. If large vendors, such as Siemens, Honeywell International and ABB, typically maintained data historians as standalone products, many SCADA developers, like Inductive Automation (the developer of Ignition SCADA) simply integrated them as one of the system features.
Over the years, data historians gained some traction in other technology sectors where the need to collect, store and analyze time series data was present. But the real explosion in demand for process historians and other time series data solutions didn’t take place until the second part of the 2010s.
The new wave of interest towards time series data solutions is driven by rapidly growing implementation of IIoT (Industrial Internet of Things), Big Data, cloud services and other technological trends that result in the generation of huge (and continuously growing) volumes of real-time data. In response to this rising demand, a number of new projects have emerged, including open-source relational databases for time-series data, such as TimescaleDB and InfluxDB, and cloud-based data services from technological giants — Google Cloud IoT, Azure IoT by Microsoft, and Amazon Timestream. Naturally, the usage of various data historians have also been on the rise in recent years.
Actually, Clarify was among the pioneers in the new wave of time series intelligence solutions that started to emerge in the 2010s, driven by the growing demand for web-based data management tools that would be easy to integrate and use. Clarify was founded in 2012 in Norway, by four graduates of the Norwegian University of Science and Technology. The idea of the project has emerged from their experience working with industrial automation systems and extracting various kinds of data out of them for subsequent analysis and visualization. At the time, industrial automation data was extremely hard to extract and time-consuming to acquire insights.
The process historian market is growing at a steady pace, fueled by consistently increasing demand for industrial automation data for performance improvement, rising use of Big Data analytics across various economic sectors, continuously expanding IoT infrastructure, which generates more and more data that can be collected and analyzed, and other tech market trends. According to a recent report, the global data historian market size is expected to grow from $1.1 bln in 2020 to $1.3 bln by 2025, at a CAGR of 5% during the forecast period. Another market study projects an even higher pace of the historian market growth — at 6.8% during the 2021 - 2026 period.
Industry experts point out that only 20% of the data generated by enterprises today is structured. This means that 80% of the industrial data, coming in various formats, is still unstructured. As the adoption of Industry 4.0 and Industry 5.0 solutions, smart factories, plants, and IIoT machinery continues to grow, more organizations across industrial sectors are looking to implement a data historian. The cloud deployment of data historians is expected to experience the highest growth rates going forward, primarily driven by the implementation of cloud historians by SMEs.
Speculating about the future of data historians, industrial experts mostly agree that this market and this field of technology will endure a major transformation in a ten-year perspective.
According to Walker Reynolds, a reputable IIoT and Industry 4.0 solutions architect and online educator, data historians in their current form will eventually — over the course of this decade, presumably — cease to exist, being integrated into a unified industrial automation software environment of the future. Software concepts such as Unified Namespace and Data Lake will instead be in the spotlight.
What is Unified Namespace?
As Walker Reynolds puts it, a Unified Namespace will play a foundational role in the industrial automation infrastructure of the future. Not only that, the industry expert believes it will evolve into a all-encompassing layer, forming the structure of the business and all of the events
“The Unified Namespace is a single source of truth for all data and information in your business. It’s a place where the current state of the business exists (where it lives). It’s the hub through which the smart things in your business communicate with one another. And it’s the architectural foundation of your Industry 4.0 in digital transformation initiative,” Walker explains.
In a nutshell, unified namespace is a software layer in the industrial automation system of the future, which acts as a centralized repository of all the data collected from sensors, IIoT devices, machines, robotic solutions, and other system components, as well as all its context.
Here’s how Walker Reynolds explains the concept of unified namespace:
“Imagine you’re navigating through a file share that gets you to any data point that you want to view in your organization. This gives you the current value, and it’s structured on a common standard. All of the smart things in your business publish data into that namespace and they consume it from that namespace. So, if I want my MES system to consume raw events from my PLCs, so that they can calculate overall equipment effectiveness, and then write the OEE number, then they’re consuming from the Unified Namespace and they’re writing back to the Unified Namespace. It is the single source of truth for all data and information in your business.”
A Unified Namespace can be implemented in a number of ways with a variety of tools. United Manufacturing Hub, an open-source industrial IoT and manufacturing data platform, is an example of a powerful no-cost solution that allows companies to implement unified namespace architecture in their industrial automation systems. The United Manufacturing Hub project includes both software and hardware components to enable the retrofit of production plants by plug-and-play as well as to integrate existing machine PLCs and IT systems. The platform allows organizations to build end-to-end solutions for various questions in manufacturing such as the optimization of production through OEE analysis, preventive maintenance through condition analysis and quality improvement through stop analysis.
The unified namespace, however, records and presents only the current, real-time, state of every process, application or data stream. In order to access the historical time series data, another system component will be required: the data lake.
What is Data Lake?
Data Lake collects and stores all the data across all industrial automation system components. A data lake stores and processes information from sensors and infrastructure elements, including SCADAs, ERPs, and MES (manufacturing execution systems), across all native formats and structures (structured, semistructured, and unstructured).
Eventually, Data Lake and Unified Namespace are expected to replace data historians in their current form, creating a new generation of data management environment in industrial automation solutions.
That being said, data historian capabilities, in one form or another, will remain an essential part of the industrial automation technology stack.
As the evolution of data historians into a new technological state continues to unfold, let’s review the current state of affairs in this technology field.
Modern-day industrial facilities are incredibly complex structures that include a multitude of systems, hardware components and software solutions running together and exchanging data to support all the work processes. This is why data historians are most often used by industrial engineers, plant supervisors and manufacturing process experts to collect information from machines, instrumentation and various other facility components for supervision, maintenance planning, identification of malfunctions and inefficiencies, as well as other analytical purposes.
The main application of data historians is to automatically collect and store time-series data from various sensors across an industrial automation system. Data collected by historians can then be used by other software solutions for advanced analysis, report generation, visualization, real-time equipment tracking, predictive maintenance alerts and many other needs.
Data historians can be integrated with a variety of software systems and solutions. Most often, the integration and data exchange is conducted through APIs (application programming interfaces) or SDKs (software development kits) provided by the vendors. Here are some of the most commonly used data historian integration options:
Original data historians were basically quite simple proprietary databases deployed and stored locally on a computer and collecting data from various sources around the plant. Modern data historians, however, are naturally much more complex and diverse. SQL-based deployment architectures used in historians today support both cloud and on-premise deployment, which can be centralized or decentralized — with a number of local historians sending data to the main one.
They also utilize various advanced data management and storage technologies that allow historians to provide multiple functionalities beyond just collecting and storing data.
Let’s look at the most common and widely implemented features of modern data historians.
Collecting data from a variety of systems, machines, hardware components and any other instruments equipped with sensors is the main purpose of data historians. There is no shortage of options when it comes to the types of industrial data to be collected by a historian. It can be pretty much any process manufacturing-related metric that is possible to track and measure with a sensor or a number of sensors. This includes, for example, levels of water in coalers, vibrations of motor and turbines, the conveyor speed, the number of product ingredients mixed in a tank, and many other data items.
Data storage and archiving
Since historians typically have to collect very large volumes of data, storing it efficiently, securely and reliably is not a trivial task. Modern historians have to rely on specialized algorithms to compress data as much as possible. Various compression technologies allow historians to store archives with a lot of data over long time periods using relatively small disk space and resources. The most recent data is typically cached in local memory before it gets transferred to a permanent hard drive storage location. This is done because typically the analysis is performed on the recent data in order to gain insights and valuable information. Having this data saved in cache makes it easily available for such applications, while older data is compressed and archived.
Data forwarding, in addition to storage and archiving, is an important feature of historians. The purpose of forwarding is to make sure all the data will be collected and saved without any pieces of it missing, as gaps in data can affect the results of the analysis. Piece of data can be lost in cases when the connection of an on-premise historian with the remote database is lost or interrupted. When such an event occurs, the historian starts to store the data locally, and forwards it to the central server when the connection is available again.
Modern-day data historians typically include a specialized software interface, separate from the main product. Historian interfaces are designed to serve a number of purposes, either augmenting the data collection functionality of a historian or supporting the process of data forwarding and security. An interface can be applied for closer monitoring of certain hardware components and sensors in cases when the main data historian is deployed in a remote location or in the cloud. In such a case, the interface enables close and stable connection with the sensor, securing the data collection and sending it to the central historian, as well as secondary data archive when needed. Each interface point can be configured to receive either raw data directly from a unique source point or to create archives of this data.
Alarm and event management
Modern-day historians also are frequently used as alarm management solutions in industrial facilities, helping to ensure safety and efficiency of all operations. Being able to collect data on alarms and events from various control systems, a historian can be used for identification and elimination of process errors, malfunctions and incorrectly configured operations. In this application, a data historian is able to monitor various custom alarms, as well as responses to these events by operators, producing reports.
Validating time series data is another application enabled by process historians. In addition to retrieving the data from a large variety of instrumentation and control equipment, historians normally store metadata that contains information about data sources, measurements and other relevant details. When it comes to the analysis of large data archives gathered by a historian, not all of it may be relevant or in line with research requirements. In order to ensure that only the right data is measured and analyzed, data validation is performed based on the metadata (also called tags or points) collected by the historian.
Data analysis and visualization
Finally, many data historians today include a variety of features for data analysis, interpolation and visualization. Most common components of this sort that come prebuilt with historians are analysis equations, data sheets, tables, charts and other visualization icons.
There is no shortage of data historians from multiple vendors available on the market today. Here are some of the most popular and widely used in the industrial automation field products.
Canary Historian is one more data historian that was among the market pioneers. Created by Canary Labs, a developer of enterprise data management and trending software, in the late 1980s, since its original release Canary Historian has been installed over 19,000 times in more than 65 countries.
Canary Historian is a NoSQL time series database that uses loss-less compression algorithms for high performance and data security. According to the developers, the Canary Historian can maintain a continuous read speed of more than 2.5 million reads-per-second. Additionally, it can handle high-speed data logging with deployments reaching data resolutions as fast as 10 milliseconds.
Canary Historian can be installed locally on-site as well as at remote corporate locations and is capable of automatically moving data from the site level to the corporate level in real time or on a schedule. The Canary Mirror Service feature allows Canary to send dataset snapshots on an hourly, daily, weekly, or monthly schedule for data duplication to offsite historians.
AVEVA Historian (formerly Wonderware) is one of the oldest data historians on the market — the original version of Wonderware was released in 1987 by Wonderware Corporation, co-founded by software engineers Dennis Morin and Phil Huber in California, the U.S. Wonderware changed owners several times over the years and is now owned by AVEVA Group, following the merger of AVEVA with Schneider Electric Software in 2018.
Modern version of AVEVA is a high-performance data historian that provides advanced data storage and compression capabilities, along with industry-standard query interface to enable easy access to the data. This solution is able to collect both time-series process data, as well as alarm and event data.
One of AVEVA Historian’s distinctive features is a block technology that captures plant data much faster than a standard database system while using very small amounts of storage space. AVEVA also supports the management of low bandwidth data communications, late coming information, and data from systems with mismatched system clocks.
Over the decades of being on the market as Wonderware, this historian has acquired thousands of customers across the economy sectors. Today, AVEVA Historian is widely used in many industries, including food and beverage, chemicals, energy, oil and gas, power generation, automotive and others.
As we have previously mentioned when talking about the origins of data historians, OSIsoft is the pioneer in this field. OSI PI Data Historian, which is a part of real-time data management software suite by OSIsoft called the PI System, is considered to be the first process historian on the market. Originally founded back in 1980, OSIsoft was acquired by AVEVA in a deal worth close to $5 bln that was officially completed in 2021. Following the acquisition, the two companies announced plans to gradually combine their portfolios of products over the time. As of now, however, OSIsoft’s PI System is still available as a standalone product.
PI System is known to be one of the most widely used industrial data management solutions in the world. Analysts estimate that different elements of the PI System are deployed at more than 20,000 sites worldwide, managing data flows from more than 2 billion real-time sensors.
PI Data Historian is one of the core parts of PI System. Composed of multiple software application components, PI System provides open infrastructure to connect sensor-based data from different elements of enterprise infrastructure. Specifically, PI Data Historian is capable of collecting, processing, archiving, finding, analyzing, delivering, and visualizing various kinds of real-time data and events.
Factry Historian is an example of a new-generation historians that started to emerge in the 2010s fueled by quickly growing market demand for data management solutions in the era of IoT and Big Data. Factry Historian is developed and maintained by Factry, a Belgian software company founded in 2016.
Factry Historian is a cloud-based data management platform for collecting, storing and visualizing industrial process data. The solution is focused on helping businesses transform raw production data into actionable, performance-enhancing visual insights. Factry Historian incorporates an asset-based event module able to automatically create records in a SQL database that correspond with critical production events, such as batches, recipe changes, orders, or downtime. This historian also provides users with built-in access to the Grafana visualization application, allowing them to create custom dashboards and use ready-made dashboard templates.
Proficy Historian by GE Digital, a subsidiary of General Electric multinational conglomerate, focused on providing software and IIoT solutions to industrial companies, is another widely used data historian. Proficy Historian is able to collect industrial time-series and A&E (accidents and emergency) data at high speed, store and distribute it. The solution also includes features enabling fast data retrieval and analysis.
Data analysis and visualization is powered by Proficy Operations Hub and the Historian Analysis run-time applications, which are licensed with Proficy Historian. The combination of Proficy Historian and Proficy Operations Hub allows users to receive aggregation of data across multiple data sources and historians, define an asset model via tag mapping and in other ways, as well as to perform advanced trend analysis. To ensure a high level of data security, Proficy Historian offers common and shared User Account Authentication feature, allowing users to centralize authentication level throughout all GE Digital enterprise applications provided under the Proficy Software brand.
FactoryTalk Historian by Rockwell Automation is based on the PI Server solution from OSIsoft. This solution was developed by Rockwell as a specialized data historian to use with a wide portfolio of industrial automation software and hardware products provided by this company. This is why FactoryTalk Historian shares many of its features and technical capabilities with OSI PI Data Historian.
Specifically, FactoryTalk Historian provides high-speed collection, organization, and storage of critical plant performance data for supervisory control, performance monitoring and quality assurance. This data historian includes data analysis and visualization features, allowing users to easily convert raw machine historical data into dashboards and reports that can be shared among different people who should have access to this information. FactoryTalk Historian also has archiving functionality for long-term data storage with fast and efficient data retrieval.
Designed for the integration with Rockwell Automation’s products primarily, FactoryTalk Historian includes a library of process object templates to simplify its deployment. Users only need to define tags within a template in order for the FactoryTalk Historian to automatically search for similar objects and preconfigure them. The solution also supports the use of anonymous connections using the PI-SDK.
Deciding whether you need to install a historian for gathering, storing, and accessing data from sensors installed on industrial machinery and control systems can be quite a non-trivial task. Naturally, a process historian will be of great value for a large number of industrial applications in analytics, investigation, and reporting.
Here are some of the most typical use cases and applications for data historians that can be viewed as reasons to think about implementing one.
Let’s say you made up your mind about needing to install a historian, or some other kind of time series data solution. At this point, the task gets even more difficult as today the choice of data historians available for both cloud and on-premise deployment is abundant as never before.
With such a diversity, it is very easy to make a wrong choice. On a general level, you are facing two extremes: either picking an excessively complex and feature-rich data historian that would lead to very high license and deployment costs, or choosing a tool that’s not able to provide the functionality required for your needs. Such mistakes typically result in unduly expenses and hindered performance.
In light of the above, let’s go through the most important evaluation criteria to use when selecting a data historian.
Ease of deployment and use
Ease of deployment and use is one of the primary and most fundamental criteria to look at when selecting a process historian or an alternative time series data solution. As said already, data historians can be very different from each other in terms of complexity in its various forms, be it internal software architecture, UI usability or the integration requirements. Make sure you have enough time and/or manpower to properly integrate, maintain and actively use a data historian for all the required purposes and systems.
Excessive complexity of deploying traditional data historians and other time series data solutions, as well as difficulties with extracting and analyzing the data they collected, was one of the primary motivations for the founders of Clarify platform. And we are proud to say that when it comes to minimizing time and effort required for installation and configuration, hardly any other data analytics tool can outperform Clarify. You can try a free version of Clarify to personally check how easy it is to set up, configure and integrate with data sources across formats and systems.
Integration and data access
Since any time series database needs to be integrated with a variety of systems and external services, both to collect the data and to forward it for further utilization, integration and data access capabilities is a very important aspect of data historians evaluation as well. Generally speaking, extracting and analyzing industrial automation data today is much easier than it was 10 or even just 5 years ago, thanks to a variety of new SaaS data platforms such as Clarify that have been released in recent years and continuously improving. That being said, you should not expect all data historians to be as focused on simple integration with third-party platforms and effortless import-export of data as Clarify or other modern cloud-based solutions. Some data historians can be quite problematic when it comes to integration with many SCADAs, ERP systems or third-party data analytics tools. So you should check if a historian supports quick input and output connections via web APIs, as well as SQL queries and other technologies that could be necessary for its deployment.
On-premise vs. cloud deployment
Whether to deploy a data historian locally on-premise or in the cloud is one more decision that you would need to make. Even though we are living in the era of cloud technologies, and using a remote database that is easily accessible from multiple locations is definitely convenient, both approaches have their sets of advantages and disadvantages. Cloud deployment provides benefits such as fast and easy access to data, simple integration with third-party tools and technologies, lower costs, scalable storage space and computing power, platform-independence, and some others. On-premise deployment also has its strengths: if the security of data is critical, then keeping the data historian on a local server could be a better choice. On-premise deployment also would be the choice to make in cases when a reliable and uninterrupted connection is required — for real-time tracking of equipment components, monitoring of processes or alarming.
Features and technical capabilities, supported platforms
On a more obvious note, you should definitely look at the list of features and technical capabilities supported by a data historian to understand if the solution fits your specific requirements. Data analytics, report creation and visualization are among the most common components modern-day users are looking for in time series data solutions, in addition to data collection. But not every data historian actually has these features. Some of the additional capabilities can also be not powerful enough to satisfy your needs in real-time analytics and visualization.
The support of multiple platforms, both desktop and mobile, is another potentially important parameter that can be overlooked in the process of a time series data solution evaluation. Most modern-day SaaS data intelligence tools, including Clarify, offer a full web version, which can be accessed from any modern browser, as well as mobile applications for iOS and Android.
User collaboration capabilities is a set of features that’s frequently either missing in many data historians or isn’t provided in full scale. The ability for multiple users in different roles to simultaneously access the same set of data, exchange comments and work on it concurrently shouldn’t be underestimated as it can save you a lot of time and effort. This is the reason why advanced and extensive collaboration capabilities have been among the primary areas of focus for Clarify developers. For example, Clarify allows users who are part of the same organization to tag each other directly in the data timeline, starting a thread or log incidents. The tool also supports adding media files to data for more context, as well as search and review of previous activities in order to avoid solving the same issue twice. Other collaboration features, such as commenting, sharing and personal notifications, are also available.
Support, documentation and user community
The availability and quality of detailed documentation in various forms, as well as customer support, is especially important for complex feature-rich data historians that need to collect and organize a truly large volume of data from multiple sources. In such cases, responsive and high-qualified external help is very important as it can save you a lot of time and costs at the deployment stage. The size and activeness of the user community is something to look at as well. In this aspect, the most popular data historians with the highest number of active users would typically have a bigger user community, producing more user-created tips and other helpful information. But it isn’t always the case as sometimes solutions with a relatively small user base have very active and friendly user communities that are great source of help and advice.
Last but not least, the costs. Final price is always among the most crucial criteria for enterprise tools and solutions. When it comes to data historians, there can be multiple hidden costs related to implementing and using one. Failing to account for some of them not infrequently leads to overbudgeting, which is never a good thing. Licensing price normally is the most straightforward part of a data historian-related expenses. If the licensing model is flexible and tied to the consumption of data capacity, the final price can also easily go out of control.
Deployment of data historians can be another major expense significantly adding to the overall costs. In cases when the deployment process is complex and problematic, it takes a lot of time and effort, sometimes lasting for months or even more than a year for large facilities and organizations. Finally, you should consider the costs of long-term maintenance and support of your time series data solution as they also tend to increase over time. On-premise data historians, for obvious reasons, typically are more expensive and effort-consuming solutions than cloud-based ones.
Some data historian SaaS solutions can end up being extremely expensive, especially for medium and large organizations with serious requirements. Clarify is fully transparent about its pricing and available to all users, both individuals and enterprises of all sizes, at the same affordable subscription fee. The default Team subscription plan is available at €49 per month and includes licenses for 5 users within one organization. There is also a custom Business subscription plan available. The end costs can conveniently be calculated on the same Pricing page on Clarify’s website based on the number of users and data signals required. Additionally, there is a fully functional free version of Clarify available for everyone.
The deployment of a data historian typically requires connecting it to a multitude of data sources and external components. Which is why the final decision on a data historian to implement often has a serious and long-lasting effect on organizations. Once implemented, such a solution can remain a part of industrial automation infrastructure for years and sometimes even decades.
Any modern-day data historian has its limitations and weaknesses.
Vetted by reputable industry experts, such as Walker Reynolds, Clarify is an affordable and simple data intelligence platform that can augment your data historian or enable you to properly utilize process manufacturing data collected over the years.
Clarify can be easily integrated with the majority of data historians from all vendors, including the ones that only support on-premise deployments, allowing you to combine data from multiple time series databases, visualizing and accessing it in real time. Clarify also simplifies the process of connecting third-party data science tools and applications to your data for advanced analysis.
Regardless of your data management requirements, the Clarify platform is a versatile solution that can be used as a universal intermediary tool, augmenting your data management infrastructure and solving challenges with processing, integrating and visualizing time series data across industrial automation systems and software components.