Time series forecasting is hardly a new problem in data science and statistics. The term is self-explanatory and has been on business analysts’ agenda for decades now: The very first practices of time series analysis and forecasting trace back to the early 1920s.
The underlying idea of time series forecasting is to look at historical data from the time perspective, define the patterns, and yield short or long-term predictions on how — considering the captured patterns — target variables will change in the future. The use cases for this approach are numerous, ranging from sales and inventory predictions to highly specialized scientific works on bacterial ecosystems.
Although an intern analyst today can work with time series in Excel, the growth of computing power and data tools allows for leveraging time series for much more complex problems than before to achieve higher prediction accuracy.
Time Series Problems
Many machine learning and data mining tasks operate with datasets that have a single slice of time or don’t consider the time aspect at all. Natural language processing, image or sound recognition, and numerous classification and regression problems can be solved without time variables at all. For example, the sound recognition solution that we worked with entailed capturing specific teeth grinding sounds of patients as they slept. So, we weren’t interested in how these sounds change over time, but rather how to distinguish them from ambinet sounds.
Time series problems, on the other hand, are always time-dependent and we usually look at four main components: seasonality, trends, cycles, and irregular components.
Source: Forecasting: Principles & Practice, Rob J Hyndman, 2014
Trends and seasonality are clearly visible
The graph above is a clear example of how trends and seasons work.
Trends. The trend component describes how the variable — drug sales in this case — changes over long periods of time. We see that the sales revenues of antidiabetic drugs have substantially increased during the period from the 1990s to 2010s.
Seasons. The seasonal component showcases each year’s wave-like changes in sales patterns. Sales were increasing and decreasing seasonally. Seasonal series can be tied to any time measurement. We can consider monthly or quarterly patterns for sales in midsize or small eCommerce, or track microinteractions across a day.
Cycles. Cycles are long-term patterns that have a wave form and recurring nature similar to seasonal patterns but with variable length. For example, business cycles have recognizable elements of growth, recession, and recovery. But the cycles themselves stretch in time differently for a given country throughout its history.
Irregularities. Irregular components appear due to unexpected events, like cataclysms, or are simply representative of noise in the data.
Today, time series problems are usually solved by conventional statistical (e.g. ARIMA) and machine learning methods, including artificial neural networks (ANN), support vector machine (SVM), and some others. While these approaches have proved their efficiency, the tasks, their scope, and our abilities to solve the problems change. And the mere set of use cases for time series today has a potential to be expanded. As statistics step into the era of big data processing, the Internet of Things providing limitless trackable devices, and social media analysis, analysts look for new approaches to handle this data and convert it into predictions.
So, let’s survey the main things that are happening in the field.
Methods to combat non-stationary data
“Prediction is very difficult, especially if it’s about the future.” — Nils Bohr, Nobel laureate in Physics
Traditional forecasting methods strive to bring stationarity into time series, i.e. make a number of statistical properties repeat constantly over time. Raw data doesn’t usually provide enough stationarity to yield confident predictions. For instance, to the graph of antidiabetic drug sales above, we must apply multiple mathematical transformations to render non-stationary time series at least approximately stationary. Then we’ll be able to find patterns and make predictions that are more accurate than coin tossing, which is right in 50 percent of cases.
Source: Forecasting: Principles & Practice, Rob J Hyndman, 2014
Bringing stationarity to data
But time series in some fields are very resistant to our efforts as there are too many irregular factors that impact changes. Look at travel disruptions, especially those that happen during political unrest and the dangers of terrorism. Traveler streams change, destinations change, and airlines are adjusting their prices differently making year-old observations nearly obsolete. Or crude oil prices, which are critical to predict for players across many industries, haven’t permitted us to build time series algorithms that would be precise enough.
Traditional machine learning methods
The traditional machine learning approach is to split an available historic dataset into two or three smaller sets to train a model and to further validate its performance against data that a machine hasn’t seen before. If we apply machine learning without the time series factor, a data scientist can choose the most relevant records from the available data and fit the model to them, leaving noisy and inconsistent records behind.
In time series, the main difference is that a data scientist needs to use a validation set that exactly follows a training set on the time axis to see whether the trained model is good enough. The problem with non-stationary records is that data in the training set might not be homogeneous to the testing set, as time series properties substantially change over the period that training and validation sets cover.
Stream learning approach
Here’s when we can use the stream learning technique. Stream learning suggests incremental changes to the algorithm — basically, its re-training. As a new record or a small set of them comes in, it updates the model instead of processing a whole set of data. This approach requires the understanding of two main things:
Data Horizon. How many new training instances are needed to update the model? For example, Shuang Gao and Yalin Lei from the China University of Geosciences recently applied stream learning to increase prediction accuracy in such non-stationary time series as crude oil prices mentioned above. They’ve set the data horizon as small as possible so that every update on the oil prices immediately updates the algorithm.
Data Obsolescence. How long does it take to start considering historic data or some of its elements irrelevant? The answer to this question may be quite tricky as it requires a share of assumptions based on domain expertise, basically, an understanding of how the market you work with changes and how many non-stationary factors bombard it. If your eCommerce business has significantly grown since last year both in terms of customer base and product variety, the data of the same quarter of the previous year may be considered obsolete. On the other hand, if the country experiences economic recession the new short-term data may be less enlightening than that of the previous recession.
While crude oil forecasts based on stream learning eventually perform better than conventional methods, they still show results that are only slightly better than a flipped coin does and stay in a ballpark of 60 percent confidence. They are also more complicated in development, deployment, and require prior business analysis to figure out data horizon and obsolescence.
Another way to struggle with non-stationarity is ensemble models. Ensembling uses multiple machine learning and data mining methods to further combine their results and increase predictive accuracy. The technique has nothing to do with new approaches in data science, but it has critical meaning in terms of business decisions related to data science initiatives.
Basically, while building robust forecasting is expensive and time-consuming, it doesn’t narrow down to making and validating one or two models with further choosing of the best performer. In terms of time series, non-stationary components — like different durations of cycles, low weather predictability, and other irregular events that have an impact across multiple industries — make things even harder.
This was the problem for the Google team that was building time series forecasting infrastructure to analyze business dynamics of their search engine and YouTube with further disaggregating these forecasts for regions and small-time series like days and weeks. With Google engineers recently disclosing their approach, it became clear that even the Mount Olympus of AI-driven technologies chooses simpler methods over complex ones. They don’t use stream learning yet and settle for ensemble methods. But the main point that they express is that you need as many methods as possible to get the best results:
“So, what models do we include in our ensemble? Pretty much any reasonable model we can get our hands on! Specific models include variants on many well-known approaches, such as the Bass Diffusion Model, the Theta Model, Logistic models, bsts, STL, Holt-Winters and other Exponential Smoothing models, Seasonal and other ARIMA-based models, Year-over-Year growth models, custom models, and more.” — Eric Tassone and Farzan Rohani say.
By averaging the forecast of many models that perform differently in different time series situations, they achieved better predictability than they could with a single model. While some models work better with their specific non-stationary data, others shine in theirs. The average that they yield acts like an expert opinion and turns out to be very precise.
Source: Our quest for robust time series forecasting at scale, Eric Tassone and Farzan Rohani, 2017, Forecast procedure in Google
However, the authors of the post note that this approach may be the best one for their specific situation. Google services stretch across many countries where different factors like electricity, internet speed, user working cycles are adding too many non-stationary patterns. So, if you aren’t operating with a multitude of locations or a large set of varying data sources, ensemble models may not be for you. But if you track time series patterns across countries or business units in different regions it might be the best fit.
Automation of time series forecasting
A couple of months ago we published overview of MLaaS platforms for semi and fully automated machine learning tasks that can be approached by organizations with limited access to data science and analytics talent.
The problem with automation in prediction and machine learning operations is that the technologies are still in their infancy. Fully automated solutions suffer from the lack of flexibility as they perform many operations under the hood and can either do straightforward and general tasks (like objects recognition on pictures) or fail to capture business specifics. On the other hand, hiring full-blown data science teams may be cost-sensitive in the early stages of your analytics initiative. A happy medium here are instruments like TensorFlow, that still require some engineering talent on board but provide enough automation and convenient tools to avoid reinventing the wheel.
The great contributor to the operationalization of time series prediction is Prophet, a new open source product from Facebook with an epic name. Facebook has been quite generous in open sourcing the tools they use, remember React Native that was released for public use in 2013. But this time they give away a pretty task-specific package.
Prophet is positioned as “Forecasting at Scale”, which according to the authors means mainly 3 things:
1. A broad variety of people can use the package. Potential users are both data scientists and people who have the domain knowledge to configure data sources and integrate Prophet into their analytics infrastructures.
2. A broad variety of problems can be addressed. Facebook used the tool for social media time series forecasting, but the model is configurable to match various business circumstances.
3. Performance evaluation is automated. Here comes the sweetest part. Evaluation and a number of surface problems are automated and human analysts just have to visually inspect forecasts, do the modeling, and react to situations when the machine thinks that forecasts have a high error probability.
Modeling, in this case, means that analysts use their domain knowledge and external data to tweak the work of Prophet. For instance, you can input market size data or other capacities information to let the algorithm consider these factors and adjust to them. As you know when you are going to roll out some game-changing updates, like a site redesign or some mind-blowing feature, you can also signal the algorithm about these. And eventually, you can define the relevant scale of seasonality and even add holidays as recurring patterns in your time series. All retailers know how different the Black Friday or Christmas are from the rest of the year.
Twitter and Microsoft
Back in 2013, a social media giant, Twitter, released their time series package. It’s a bit off-topic because their tool is aimed at anomaly detection and is called… AnomalyDetection. It’s also automatic and isn’t aimed at social media only, but extends to any field where time series analysis is applied, especially when you monitor many series for different products, stores, or markets. While AnomalyDetection doesn’t forecast the future, it helps to automatically detect time-dependent anomalies. This is very helpful in distributed architectures with large traffic to spot spammers, bots, or simply explore what kinds of events impact your business performance in different locations. Have you accounted for all weird holidays?
The reason we’re mentioning Twitter here is that both Prophet and AnomalyDetection are representatives of the emerging automation trend in the time series field. Pretty soon these operations are going to become more affordable and potentially move to the popular cloud infrastructures. For example, Microsoft recently rolled out its Azure Time Series Insights for IoT that doesn’t seem to add the prediction capacity yet but already provides data streaming from devices and allows for anomaly detection.
Time Series Forecasting as a Sales and UX Lever
There has always been a precise distinction between machine learning tasks that you solve to run internal analytics and customer-facing ML-based applications. Good examples of the latter are facial recognition apps, neural networks to process images, or recommendation engines in online content services.
Internal analytics have usually been employed to gain business insights. But things get the new perspective as giving away some prediction results, especially those that relate to time series seems a great opportunity for improving and personalizing the user experience.
One of those cases is our client Fareboom.com. Fareboom is a flight-booking service that succeeds in finding the lowest air fares possible for its customers. The problem with airfares is that they change rapidly and without obvious reasons. Unless you’re buying tickets right before a trip, future pricing information would be advantageous. A great UX solution was to predict whether the prices are going to drop or increase in the near or distant future and give this information to customers. This encourages customer making return and makes Fareboom their go-to platform for optimizing their travel budgets.
The engine has 75 percent confidence that the fares will rise soon
Giving away at least some of your analytics is a particularly good strategy for the travel industry and generally all businesses that connect people with end-service providers. If you have seasonal or trending data on the hotels that people enjoy during Christmas, why not turn it into recommendations?
The Main Trend
Both time series automation and the growth of available data from endpoint devices define the main trend in time series forecasting. Analytics are becoming increasingly more affordable and eventually more critical for business success. Not only can we track business progress, but also, we can capture very specific — non-stationary and sometimes — time dependent events that were missing before. And the emerging power of intermediary services contributes to this availability trend.
The main concern today’s executives’ should be defining an analytics strategy, whether it’s going to be customer-facing or internal, and leading the initiative.
Originally published at AltexSoft’s blog: “Time Series Analysis and Forecasting: Novel Business Perspectives”