Time series data is omnipresent in our lives as we can encounter it in pretty much any domain. Sensors, monitoring, weather forecasts, stock prices, exchange rates, application performance metrics are just a few examples.
In this article we will summarize all the knowledge that you may need in order to use time series data as a way to gain business insights or conduct a study. We will learn the main concepts related to gathering, organizing and usage, including types and formats, ways to store and collect it, as well as diving into the fundamentals of analysis and visualization techniques.
Let’s begin by clearly defining what it actually is, as well as what it isn’t. Basically, time series data is any type of information presented as an ordered sequence.
It can be defined as a collection of observations for a single subject assembled over different, generally equally spaced, time intervals. Or, to put it simply, time series is data (observations or behavior) collected at different points in time and organized chronologically.
Time is the central attribute that distinguishes time series from other types of data. The time intervals applied to assemble the collected data in a chronological order are called the time series frequency. So having time as one of the main axes would be the main indicator that a given dataset is a time series.
Now let’s clarify what kinds of data are not time series to get this out of the way. Even though it is very common to use time as an axis in datasets, not all collected data is time series. Data can be divided into several types based on time or, in other words, how and when it was recorded.
Here are three basic types of data, divided by the role of time in a dataset presentation:
Since time series are used for multiple purposes and in various fields, the datasets can vary. Here are the three main characteristics that would allow you to easily identify such datasets:
When it comes to the different types of time series data, which is something we are going to talk about in more detail later (as there are so many different types and categories that need to be mentioned), it can be divided into two main categories based on the approach to measuring it.
Flow time series data refers to focusing on measuring the activity of attributes over a given time period, which is typically an interval in the total time axis.
Stock time series data, on the other hand, is measuring attributes at a static point in time.
It wouldn’t be an overstatement to say that data—and our ability to collect and organize data with previously unseen speed and efficiency—is a key component of technological development. Being able to utilize gathered data allows us to build better systems, improve the efficiency of virtually all processes and predict future events in real time.
In today’s world, time series models are truly ubiquitous. Time-axed datasets are widely used across various industries, from finance and economics to sciences, healthcare, weather forecasting and engineering.
As data rapidly becomes one of the most valuable commodities in today’s business world, the importance of time series data is increasing as well. This is why it is so important to understand what it is, and how using data in the right way can allow an organization to improve and optimize virtually all work processes.
Since time is fundamental and the most integral part of our life, it plays a role in gathering any observable information. In other words, it won’t be wrong to say that you can find it everywhere. And it gets increasingly ubiquitous today, as the Fourth Industrial Revolution unfolds and organizations across the globe face the need to implement more and more IT systems, sensors and tools, all of which are able to produce and/or collect data.
Being able to analyze and access insightful data is quickly becoming a necessity for almost any organization, including businesses, social and educational institutions. Time series data is a valuable commodity, and its value in today’s world appreciates mightily.
Being so omnipresent, time series obviously has a myriad of different applications across industries. In order for you to understand how it is used in practice, here are several examples.
Time series data can be divided into a number of types based on different criteria. Knowing the differences between the types of data can be very important as it affects the way you can interact with it and how the database can store and compress this data. Let’s go through some of the most common ways to categorize it, with examples and illustrations.
One characteristic of time series data that is important to understand is based on the relationship to the regularity of measurements collected in the series. Depending on whether the data is collected at regular or irregular time intervals, it can be classified as metrics or events.
Metrics is a type of measurement gathered at regular time intervals.
Events is a type of measurement are gathered at irregular time intervals.
Regular data that is evenly distributed across time and thus can be modeled or used for processes such as forecasting is called metrics. You need to have metrics in order to use modeling, forecasting and producing aggregates of all kinds.
Here are several examples of metrics.
Heart rate monitoring (ECG) and brain monitoring (EEG)
Heart rate monitoring, also known as electrocardiogram (ECG), brain monitoring (electroencephalogram, EEG) and other similar health-related monitoring methods are examples of metrics as they are measuring the activity of human organs at regular time periods.
All kinds of weather conditions monitoring, such as daily temperature, wind, air pressure, heights of ocean tides, etc., is another example of metrics that are used to build data models, which allow us to predict weather changes that will occur in the future.
Stock price changes
Measurements of stock price fluctuations are also gathered at regular time intervals.
Events measurements are gathered at irregular time intervals. Because they are unpredictable, intervals between events are inconsistent, which means that this data can’t be used for forecasting and modeling as this would lead to unreliable future projections.
In simpler words, events are a type of data that is focused on capturing the required data whenever it is generated, which could happen with varying time intervals or as a burst.
Here are several examples of events.
Bank account deposits/withdrawals
Financial monitoring of bank accounts and ATMs is one example of events time series when the data gathered is the deposits and/or withdrawals, which typically occur at unpredictable time periods.
In computing systems, log files serve as a tool that records either various kinds of events that occur in an operating system and other software, or messages between different users of a communication software. Since all of this happens at irregular time intervals, the information in logs can be classified as events.
Another way to classify it, is by the functional forms of data modeling, which divides it into linear and nonlinear.
In linear time series, each data point can be viewed as a linear combination of past or future values. In other words, a linear time series is generated by a linear equation and can be modeled as a linear AR (auto-regressive) model.
One way to specify if the time series is linear is by looking at how X and Y change in accordance to each other. When X increases by 1 and Y increases by a constant rate accordingly, then the data is linear.
Here’s an example of a linear graph.
Nonlinear time series, on the other hand, are generated by nonlinear dynamic equations and have features that can’t be modelled by linear processes, such as breaks, thresholds, asymmetric cycles and changing variances of different kinds. This is why generating and processing nonlinear time series data is typically a lot more difficult and requires complex modeling processes.
Here’s one example of a nonlinear data:
Moving further, let’s take a look at several key aspects of time series data that are important for you to understand in order to be able to work with and use it.
One approach that is fundamental to time series data, and applied across all computing systems in general, is immutability. Which is quite a simple concept to understand.
Immutability is the idea that data (objects in functional programming languages) should not be changed after a new data record was created. Instead, all new records should only be added to the existing data.
As new information in time series data comes in time order, it is recorded as a new entry. And in line with the immutability principle, entries in time series data are usually not being changed after they were recorded and are simply added to all the previous records in a database.
So now it should be fairly easy to understand why immutability is a core concept to time series data. Such a database is always (or ‘typically,’ to be more precise) treated as a time-ordered series of immutable objects. Having immutability incorporated allows to keep time series databases consistent and unchanging as all queries are performed at a particular point-in-time.
Immutability as a core principle is something that differentiates time series data from relational data. As opposed to time series, the relational data, which is stored in relational databases, is typically mutable as it is used for online transaction processing and other requirements where new events take place on a generally random basis. Although these events still occur in chronological order (events of any kind just can’t exist outside of time), time is not relevant to such data and therefore is not used as an axis. For time series data, on the other hand, time changes are always essential.
Now let’s talk in more detail about the three basic types of data that we mentioned previously, and how time series data is different from cross-sectional, pooled and panel data. Understanding these concepts is another key element to being able to utilize time series data with multiple benefits.
As we learnt, time series data is a collection of observations for a single subject assembled over different time intervals. Having time as an axis is a distinctive feature of time series data.
Now let’s compare it to cross-sectional and pooled/panel data.
Cross-sectional data serves as an opposite concept, as this type relies on collecting and organizing various kinds of data at a single point in time. Cross-sectional data doesn’t rely on natural ordering of the observations so the data can be entered in any order.
Data on max temperature, wind or humidity at any day in a year can be a good example of cross-sectional data. As well as closing prices of stocks on a stock market, or sales of a store inventory at any given day.
Pooled (or cross-sectional time series data) and panel (also called longitudinal) data are two rather close concepts, both used to describe data where time series and cross-sectional measurements are combined.
Essentially, both panel and pooled data rely on combining cross-sectional and time series measurements. The only difference is in the relationship to the units (entities), around which the data is collected (e.g. companies, cities, industries, etc.).
Panel data refers to multi-dimensional data that uses samples of the same cross-sectional units observed at multiple points in time.
Pooled data uses random samples of cross-functional units in different time periods, so each sample of cross-sectional data taken can be populated by different units.
Undoubtedly, these data type explanations can still leave some people confused. Here’s an easy way to differentiate these data types from each other.
Time series data is typically stored in time series databases (TSDBs) that are specifically built or optimized for working with timestamp data, be it metrics or events. Since time series data is frequently monitored and collected in huge volumes, it needs a database that can handle massive amounts of data.
TSDBs are also required to be able to support functions that are crucial to the efficient utilization of time series records, such as data summarization, data lifecycle management, and time-related queries.
Here is a list of features and capabilities that a purpose-built or optimized database should have in order to handle time series data work well:
Now, after we learnt what time series data is and how it is stored, the next logical step would be to talk about the actual utilization of time series data. And this is where time series analysis comes into play.
Time series analysis refers to a specific way of analyzing a time series dataset—or simply a sequence of data points gathered over a period of time—to extract insights, meaningful statistics and other characteristics of this data.
Naturally, time series analysis requires time series data, which has a natural temporal ordering and has to be recorded at consistent time intervals. This is something that differentiates time series analysis from cross-sectional data studies where data is tied to one specific point in time and can be entered in any order.
You should also not confuse time series analysis with forecasting, which is a type of time series analysis, as it basically uses historical TS data to make predictions using the same approach.
Time series data analysis relies on identifying and observing patterns that serve as a way to extract actual information from a dataset using one of the available models.
Here are some of the most common patterns observed in time series data.
Detecting these and other patterns by applying various models to time series databases is how you can use time series analysis to achieve different goals.
Here are some of the most common types of time series analysis that can be utilized depending on the end-goal of your study.
As we learnt earlier, forecasting uses historical time series data to make predictions. Historical data is used as a model for the same data in the future, predicting various kinds of scenarios with future plot points.
Descriptive analysis is the main method used to identify the above-described patterns in time series data (trends, cycles, and seasonality).
Explanative analysis models attempt to understand data, relationships between data points, what caused them and what was the effect.
Interrupted (intervention) analysis
Interrupted (intervention) time series analysis, also known as quasi-experimental analysis, is used to detect changes in a long-term time series from before to after a specific interruption (any external influence or a set of influences) took place, potentially affecting the underlying variable. Or, in simpler words, it studies how a time series data can be changed by a curtain event.
Regression analysis is most commonly used as a way to test relationships between one or more different time series. As opposed to regression analysis, regular time series analysis refers to testing relationships between different points in a single time series.
Exploratory analysis highlights main features identified in time series data, typically in visual format.
Association analysis is used to identify associations between any two features in a time series dataset.
Classification is used to identify and assign properties to time series data.
Segmentation analysis is applied to split the time series data into segments based on assigned properties.
Curve fitting is used to study the relationships of different variables within a dataset by organizing them along a curve.
Now let’s talk a little bit about models that are commonly used for time series data. There is a huge variety of models, representing different forms and stochastic processes. Not to get into too much detail, in this article we will only describe a few of the most popular and common models.
When looking at the process level, three broad classes of linear time series models should be specified.
Here they are:
In order to perform time series analysis, you would normally require a fairly large number of data points as a way to ensure reliability and consistency.
One problem that can limit your ability to utilize time series data for analytics, forecasting and other purposes is the absence of high-quality extensive datasets. This issue is easily solvable, however, as you can find a fair amount of time series datasets that include enormous volumes of data, publicly available online.
Here are several great sources where you can find large numbers of highly variable datasets.
Despite having all these wonderful sources of time series data around the world, depending on your specific needs, you may still face a challenge of not having enough data. In such a case, the only solution would be to create your own time series dataset.
Now, creating and using time series datasets is a very broad topic that would require an extensive guide or tutorial on its own to cover all the aspects of this work. We can say, however, that collecting time series data, generating datasets and utilizing them for various purposes gets easier day by day thanks to a number of great libraries and frameworks that are designed to make it easier to work with time series data.
Let’s take a look at several great Python libraries and packages that you can use to create time series datasets, as well as using them afterwards to build models, generate predictions, etc.
Another crucial element of utilizing time series data is visualizations. In order to extract valuable information and insights, your data has to be presented as temporal visualization to showcase the changes at different points in time.
Time series data is typically visualized using specialized tools that provide users with multiple visualization types and formats to choose from. Let’s take a look at some of the most common data visualization options.
Graph is a visual representation of data in an organized manner. Time series graphs, also called time series charts or time series plots, are probably the most common data visualization instrument used to illustrate data points at a temporal scale where each point corresponds to both time and the unit of measurement.
Even though these two terms are frequently used interchangeably, they are not exactly the same thing. Charts is the term for all kinds of representations of datasets in order to make the information clear and more understandable. Graphs, being mathematical diagrams showing the relationship between the data units over a period of time, are commonly used as a type of charts in time series data visualization.
Real time graphs, also known as data streaming charts, are used to display time series data in real time. This means that a real time graph will automatically update after every several seconds or when the new data point is received from the server.
Here are some of the most common types of time series visualizations.
Line graph is probably the most simple way to illustrate time series data. It uses points connected to visualize the changes. Being the independent variable, time in line graphs is always presented as the horizontal axis.
Histogram charts visualize the data by grouping it into bins, where bins are displayed as segmented columns. All bins in a histogram have equal width, while their height is proportional to the number of data points in the bin.
Dot plots or dot graphs present data points vertically with dot-like markers, with the height of each marker group representing the frequency of the elements in each interval.
Scatter plots or charts use the same dot-like markers that are scattered across the chart area of the plot, representing each data point. Scatter plots are usually used as a way to visualize random variables.
Trend line is based on standard line and plot graphs, adding a straight line that has to connect at least two points on a chart, extending forward into the future to identify areas of support and resistance.
Time series data is translated into charts, graphs, and other kinds of consumable analytical information that provides organizations with often invaluable insights, using data visualization tools, which can come in the form of software or SaaS solutions.
Data visualization tools allow organizations to create dashboards with easy to understand visualizations of key trends and KPIs. It is also increasingly common for modern data visualization solutions to provide users with a simple drag-and-drop interface to allow users with no technical and/or coding skills to work with time series data and create dashboards with visually-presented data. These tools are typically designed specifically to visualize already organized data and are not intended to be used for time series data analysis. Some of them, however, may include additional features, allowing users to perform multiple types of activities, such as exploring, organizing and interacting with data, as well as collaborating and sharing data between members of the same organization.
- When choosing a data visualization tool for use, check if it meets all the main general requirements for such a solution:
- Supports data import from a wide range of sources, including database querying, application connectors, and file uploads.
- Has user-friendly and simple to use visual no-code interface for interactions with data.
- Provides visual presentation of all KPIs in real time.
- Allows multiple users to work on the same datasets and visualizations at the same time.
- Enables easy export of all your data to multiple channels, including directly to Excel and via API.
- Offers multiple kinds of visualizations for time series data metrics.
Here’s a list of the best tools for your time series data visualization based on the requirements listed above.