Financial News - Noise or Information? [Part II]

This is part II of the three-part series on building analytical models using financial news data to predict stock price movement (up or down).

So far, we have explored the use of LSTM to predict stock price movement based on trend and momentum indicators. We also tried the n-gram TF-IDF scheme to model financial news data for predicting stock price movement. Unfortunately, both methods failed to pick up any useful signals that would help us trade profitably. For those who missed the two earlier posts, or would like to recap the analyses, you may access them through here and here.

One issue with our earlier attempt to predict stock price movement using financial news was that the language model proposed in part I of the series was not sophisticated enough to understand the meaning of words nor the sequential dependencies of words in sentences. This post introduces word embedding and Bi-directional Long Short-Term memory (Bi-LSTM) to model financial news data that could overcome the shortcomings of our previous work.

The data set up for this analysis is identical to the one used in part I of the series. You may refer to the previous post here for more details. We proceed to present the main techniques proposed in this instalment.

Instead of transforming words into numeric form using TF-IDF, we use a more advanced technique called word embedding. Word embedding represents words as unique real-valued vectors that typically have a pre-defined fix vector length of 50 or more. It is a representation that captures semantic meaning of words by having words with similar meaning near each other in the vector space as illustrated by the example in Figure 1. For instance, the word cat is closer to the word dog in the word embedding vector space, as oppose to the word student which is closer to teacher. The representation is learn from word usage in text based on the hypothesis that words used in similar ways have similar meaning.

Figure 1. Illustrative example of word embedding distance

There are various ways to incorporate word embedding in our predictive model, such as through transfer learning where we plug in embedding such as Word2Vec or GloVe that were pre-trained using independent and massive text corpus into our predictive model. Alternatively, we could construct an embedding layer in our model that learns jointly with a neural network to predict stock price movement. In this post, we applied the latter approach of learning word embedding jointly with a special class of neural network: Bi-LSTM.

Recall the LSTM model we introduced in our inaugural post. A Bi-LSTM is simply two connected LSTM layers that model sequential dependencies in opposite directions. Specifically, the forward LSTM layer models word dependencies in a sentence from left to right, while the backward LSTM layer models word dependencies from right to left. The advantage of such an arrangement over a unidirectional LSTM is that the model would be able to utilise forward and backward information at every word. For instance, the sentence:

Apple is Jimmy's favourite...

Is the sentence talking about Jimmy's favourite fruit? Or his preferred brand for smartphone? Without future information about the sentence, a unidirectional LSTM would not know the context in which the word Apple was used.

Figure 2 depicts the overall model architecture of the Word Embedding + Bi-LSTM model. First, news data is tokenised and fed into the embedding layer, where it will learn the semantic meaning of words jointly with the sequential word dependencies in the Bi-LSTM layer. The Bi-LSTM layer is connected to a sigmoid layer which will estimate the probability that stock price will go up or down one day ahead.

Figure 2. Model architecture

Figure 3 shows box plots of the estimated probability by the actual price movement. A model that could discriminate the up and down price movement effectively would show to blue box plot for actual upward movement one day ahead to be higher along the probability scale, while that for the orange box plot to be lower along the probability scale. As shown in the figure, there were still significant overlap in the two box plots and the model estimated probabilities hug the 0.5 threshold closely. These suggest the task of predicting stock price movement remains challenging even after employing more advanced techniques. Nonetheless, the discriminating power of the current model is slightly better than our previous n-gram TF-IDF logistic model.

Figure 3. Distribution of the estimated probability by actual price movement one day ahead

We again simulate what happens if a trader started trading with $1,000 on 2nd Jan 2019, and for each trading day, he split his investment evenly into the 10 selected stocks. We showed that his portfolio would have grown to about $1,434 by 31st Dec 2019, if he were to buy stocks at the previous closing price and sell his stocks at the end of each trading data at the closing price for that day. Also, recall that our portfolio only grew to $1,429 based on our previous TF-IDF logistic model. This time, an analyst that made his purchase by following our Word Embedding + BiLSTM model prediction, he would have grown his portfolio to $1,456! This represents a 5.1% improvement above the naive method of buying all 10 stocks with equal weightage every trading day.

We finally managed to construct a model that was able to sniff out useful information about stock price movement one day ahead. But could we do better? Specifically, the word embedding method used in this post was context-free. meaning, it is unable to differentiate the word bull when used to describe the stock market, and when it was used to refer to an animal. This is because the numerical representation of the word remains the same regardless of the context in which the word was used. In the next and final instalment of this series, we will introduce a state-of-the-art algorithm that is able to generate context-aware embedding.

The python code used for this analysis is available at my Github here.

How would you have constructed the predictive model differently to predict stock price movement? What other finance-related analyses you would like to see? Leave a comment below to share your thoughts!

Search This Blog

MoneyLab

Financial News - Noise or Information? [Part II]

Comments

Post a Comment

Popular posts from this blog

Reflecting on 2024

Networth Update 3Q 2024

Starting Early and Staying Focused: How I Reached $500,000 at 32