October 18, 2022

TL;DR â€” In this article, we briefly introduce you to time series analysis, which we then use to identify causal relationships between different events concerning companies as reflected in the news. For instance we will try to identify what events are likely to occur after a company publishes its quarterly earnings. You can view the accompanying R code on my GitHub. You can download the dataset used in this article from Kaggle.

Outline of this article:

- Introduction
- Time Series Analysis
- Time Series Forecasting
- Granger Causality
- Identifying causal links from news events
- Final words (+ code and dataset)

Weâ€™ve all heard the phrase â€ścorrelation does not imply causationâ€ť and intuitively it makes sense: Indeed, the rate of sunburn soaring in the summer months has nothing to do with the increase in ice cream sales. Instead, both phenomena are caused by a common underlying factor, which in this case is an increase in temperature and UV radiation, which itself is caused by the shortened distance between the Earth and the Sun during summer^{1}.

However, this doesnâ€™t mean that we canâ€™t model causality (beyond a strong correlation) and more importantly, leverage statistical methods and data analysis to build models that help us better understand the causal relationships between various phenomena.

Causal models, often simply visualised as graphs, can be a powerful tool and mental model for expressing and communicating causal links among variables within a system. For instance, it is much more helpful to describe the rise in ice cream sales and in the rate of sunburn as independent â€śeffectsâ€ť with a common cause which is a shortened distance between the Earth and the Sun, resulting in an increase in temperature and UV radiation.

The â€śRisk Bow-Tieâ€ť is a practical example of causal thinking commonly used in Risk Management. The Risk Bow-Tie or the Bow-Tie Method is used for managing risks in a wide range of sectors from military to manufacturing and banking. At the most basic level, a Risk Bow-Tie visualises the causes leading to an event, and the consequences of that event, as well as damage mitigation and preventive measures.

Suppose you are a car manufacturer and youâ€™ve identified a global shortage of semiconductor chips as a risk to your ability to build and ship new cars, hopefully long before it actually happens. A Risk Bow-Tie in this case not only allows you to come up with preventative measures to mitigate risks, but causal thinking can also help you identify and better prepare for any consequences arising from this risk should it ever materialise.

In this article we aim to demonstrate how time series analysis could be utilised to help us identify causal links among a large pool of events extracted from the worldâ€™s news. For instance, if we knew that every time a public US company is reported to suffer a data breach, it issues an announcement within a few days, and furthermore the company has a high likelihood of being investigated by the FTC within a few weeks, we could use that knowledge to make better decisions regarding what might happen next every time a new data breach event happens. This exercise can be immensely valuable in risk management processes for scenario analysis, as well as forecasting applications.

Before we demonstrate how this could be done, let us briefly introduce a few concepts that will help you understand whatâ€™s going on when we look at the actual analysis later on.

The relationship between a cause (increase in temperature) and an effect or a series of effects (increase in ice cream sales) takes place in a sequential order over time, so unless you believe that the arrow of time could be reversed (like it does in Christopher Nolanâ€™s *Tenet*), you could expect the effect to always take place some timeâ€“a number of seconds, minutes, days, weeks or monthsâ€“after the cause.

This means that the problem weâ€™re trying to solveâ€“identifying what kind of event causes another kind of eventâ€“lends itself nicely to time series analysis. (Although we should highlight that just because something happened after another thing, it doesnâ€™t mean that it was caused by it. This is a logical fallacy called post hoc ergo propter hoc)

**But wait, what is a time series?**

A time series is simply a sequence of observations made over time. Letâ€™s say you measure your weight every day and record the data in a spreadsheet. By doing so you are creating a time series tracking your weight. Similarly, the temperature in your city in each month of the year forms a time series data, and so does the number of articles written each day about Apple or Google. You get the idea.

Now let us take a brief look at time series forecasting, or in other words, how to predict the future values of a time series given its past values, and any other useful information we may have about the time series. Letâ€™s imagine you continued to weigh yourself every day for a year, and now you would like to predict what your future weight might be in, letâ€™s say, 3 monthsâ€™ time. In this case a good forecast may take into consideration the seasonal fluctuations in your weight, and account for a bump after Thanksgiving, Christmas or wedding season.

As you can imagine, forecasting a time series is a non-trivial task and itâ€™s borderline impossible in some cases, for example when the forecast might impact the thing weâ€™re trying to forecast (e.g.Â predicting the stock market). That being said, luckily a lot of smart folks have been thinking about this problem for decades and perhaps centuries, and we have a wealth of knowledge today on how to go about forecasting time series data where possible^{2}.

Some of the commonly used techniques for time series forecasting are listed below:

- The ARIMA family of models
- Exponential smoothing models
- Dynamic regression models
- Neural networks

We wonâ€™t go into the specifics of each forecasting model in this article, but let us briefly explain how ARIMA models work to help you better understand how it can be used for inferring causal relationships. If youâ€™re interested in learning more about time series forecasting, we highly recommend reading **Forecasting: Principles and Practice** by Rob J Hyndman and George Athanasopoulos.

**ARIMA**

ARIMA family of models are the workhorse of time series forecasting. They are so simple and yet so effective that you would find them in nearly every forecasting handbook and toolbox.

Letâ€™s break down *ARIMA*: ARIMA stands for Autoregressive Integrated Moving Average. Wait, what?

- â€śAutoregressiveâ€ť means we use the
**past values**of the time series to predict its future values. This intuitively makes sense. Your body weight tomorrow is highly likely to be close to your weight in the past few days. - â€śMoving averageâ€ť means we use the
**past forecast errors**to predict the future values of a time series. So for instance, if our weight forecast yesterday was off by 1kg, we factor that into our prediction for tomorrowâ€“perhaps we are likely to be off again by a similar amount. - Finally, â€śIntegratedâ€ť here means that we make our time series
**stationary**, normally using a technique called differencing. For example, rather than recording your weight every day, you would measure the*changes to your weight*like +1kg, -2kg, and so on or even better, the change in %, which makes forecasting a bit easier. Explaining stationarity will make this article too long, so instead weâ€™ll refer you to this chapter from Forecasting: Principles and Practice. https://otexts.com/fpp3/stationarity.html

So when you put all of these pieces together, you get a model that aims to predict the future values of a time series given its past values and forecast errors, after the time series has been made stationary.

Now that we got the basics out of the way, letâ€™s talk about the fun stuffâ€“causality!

**BIG CAVEAT:** If youâ€™ve read this far you must be expecting a big reveal. And indeed there is one, but with a big caveat. Causality is a big serious word in the world of statistics, and one canâ€™t simply throw it around. When we talk about causality in this section, weâ€™re referring to a weak and simplified notion of causality called Granger Causality, named after the late British econometrician, Clive Granger. For a deep philosophical view towards causality, refer to Judea Pearlâ€™s work.

Okay, now letâ€™s get to the fun part *for real*.

Back to our weighing example, letâ€™s imagine that every day you not only measure your weight, but you also measure your intake of calories. Of course your body is a complex system and more calories may not always translate to a jump in your weight. But for simplicity letâ€™s say weâ€™ve eliminated all the other factors, and our hypothesis is that the more calories you take in, the more your weight goes up.

In this scenario, you might notice that the changes to your weight (remember that with differencing we decided to record the changes rather than the absolute values) are strongly dependent on the *changes* to your calorie intake, letâ€™s say a week or a month prior. So thereâ€™s clearly a causal relationship going on here. If you plotted your daily calorie intake and weight change charts, you might get something like this:

Granger causality allows us to statistically and numerically test whether this might be true. In Granger causality, we compare two forecasts obtained for a time series X:

- One forecast that only takes past values of the same time series itself into consideration; and
- Another forecast that also considers a second time series Y in addition to the past values of the target time series.

We then compare these two forecasts and if the second one is more accurate than the first one in predicting the future values of our time series, and this is true only in one direction, we may conclude that Y *Granger-causes* X. In other words, if adding the information about calories helps us predict weight, but adding information about weight doesnâ€™t help me predict calories, then calories Granger-cause weight gain.

(Note that the distinction between â€ścausalityâ€ť and â€śGranger causalityâ€ť is so important that we use the â€śGrangerâ€ť prefix for the causation verb)

Equipped with the basic knowledge of time series forecasting and Granger causality, we can now look at a practical example.

At AYLIEN we collect millions of news articles every day, and we use our NLP models to tag these articles. One type of tag, *entities*, specifies the companies that are mentioned in the article, for instance Apple. Another type of tag, *categories*, specifies various events and topics that are discussed in an article, for example â€śnew product releasesâ€ť (indicated using a code).

Since AYLIEN collects and tags news articles every minute of every day over years and years, our News Intelligence platform could be seen as a giant database of time series data for millions of entities and event types. Weâ€™ve recently open sourced a portion of our time series data called SNES â€” Stock-NewsEventsSentiment which you can download from here.

Letâ€™s have a look at an example. Below we see the volume of English news articles mentioning Apple (the company) over the past 18 months, for various types of events. For instance we can see that the chart for Corporate Earnings spikes very regularly once per quarter. Not only that, we can also see that the volume of positive articles seems to go up a short while after Apple announces a new product (see the â€śNew Productsâ€ť and â€śPositive Sentimentâ€ť rows in the chart below). This is exactly what weâ€™re interested in looking at.

You can probably guess where weâ€™re going with this: By applying a Granger causality test to this data, we can identify causal links among different event types, i.e.Â what event types might cause what other types of events. We can answer this question by running pairwise Granger tests over time series that each reflect the volume of articles published about a specific event type. Note that as discussed above we must first make our time series data stationary, for instance by using differencing.

Also note that Granger causality is a parametric model, meaning it expects us to provide a lag value i.e.Â number of time steps between â€ścauseâ€ť and â€śeffectâ€ť. We can run our tests for various lag values, letâ€™s say 1, 3, 7 and 30 days.

The table below shows a subset of the results obtained using Granger causality applied to the time series data for Apple. Each row represents a single Granger test with a specific order (number of days between â€ścauseâ€ť and â€śeffectâ€ť) and a pair of event types. The P column represents the P value from the Granger causality test, and indicates how confidently we can reject the null hypothesis^{3}, or in other words, how likely it is that there is a causal relationship between the two event types.

There are many ways to visualise the above data. For instance we can use a graph or a Sankey diagram demonstrate the causal chain for a given event type:

Not every relationship we extract this way is going to reflect an actual causal relationship. Indeed our methodology above is prone to all sorts of errors, for instance misidentifying spurious correlations as causal relationships. Therefore it would be useful to navigate through lots of potential causal relationships and evaluate them to identify genuine relationships. For this purpose we can use an interactive heatmap chart like the one below:

We hope you enjoyed this rather long blog post. If youâ€™ve read this far make sure you give yourself a tap on the shoulderâ€“you deserve it!

You can see the R code for this article here, and the dataset can be downloaded from here.

The world of causal inference in time series data is quite vast and weâ€™ve only scratched the surface here. If youâ€™d like to continue educating yourself in this area, this amazing blog post by Shay Palachy nicely curates and summarises a wide range of techniques for causal inference using time series data and their advantages and disadvantages, which could guide your next steps.

Until next time!

In this example ice cream sales and rate of sunburn are conditionally independent.â†©ď¸Ž

The null hypothesis here is that event_type_1 does not Granger-cause event_type_2.â†©ď¸Ž

Previous post