Big Data is being increasingly used in many spheres of investment, and identifying sources of information which lend themselves to this practice has become a hot topic both in academia and the investment profession.
Social media is an obvious contender here and can be thought of as a database of society’s behaviour and a medium for capturing investor sentiment via Twitter and financial blogs, to name but a few.
As behavioural finance continues to challenge the notion of efficient markets, an interesting research question for the investment management profession is whether comments shared on social media are correlated to, or even predictive of, the state of the global economy and the future performance of stocks and markets.
Twittering into the future
One of the first papers on this topic, titled ‘Twitter Moods Predict the Stock Market’, was published in the Journal of Computational Science in 2011 by a trio of academics, who investigated the links between the daily content of 9.7 million tweets posted by 2.7 million users between March and December 2008 and the Dow Jones Industrial Average (DJIA).
They did so by using two tools to assess the mood of a tweet: OpinionFinder, a publicly-available software package to measure sentiment analysis, and GPOM, which is a little bit more sophisticated in that it measures six dimensions of mood instead of just positive or negative.
Their results did show significant correlation between one Twitter sentiment dimension and the direction of the DJIA. However, this study can be easily criticised because of the short length of the data series and a lack of out-of-sample testing.
Since the publication of the above study, other researchers have started investigating social media as a potential factor in predicting stock market returns.
For example, a team from Johns Hopkins University published a study in the Journal of Portfolio Management last year, calling social media the ‘sixth factor’ in an asset pricing model of stock returns.
They argued that social media is a distinct factor on top of the five advocated by famous academic duo, Eugene Fama and Kenneth French, who updated their three-factor model to a five-factor model (size, value, momentum, profitability and investment) in 2015.
The John Hopkins team researched sentiment-based content published on StockTwits, a social media platform that collects views on specific securities generated by the crowd, typically market participants such as traders, analysts and financial information providers.
The peculiarity of this dataset is that each contributor can define the sentiment of their tweets by labelling them as ‘bullish’ or ‘bearish’. The authors utilise this feature, which makes this study different from others which employ more complex textual analysis techniques.
The authors found a statistical relation between positive sentiment on stocks and their future positive return and have documented this factor as distinct from the five proposed by Fama-French.
In terms of the econometric rigour, this study is an improvement over prior ones but still lacks a long time series. It analysed data from 2013 to 2015 and was limited to a group of 15 US-based stocks.
A longer data set was studied by Stephen Heston from the University of Maryland and Nitish Sinha from the Federal Reserve in Washington. Their paper, titled ‘News versus Sentiment: Predicting Stock Returns from News Stories’, was published last autumn in the Financial Analyst Journal.
Their study brings a few improvements: it expands the time series from 2003 to 2010 and it explores the effect of aggregating news over horizons longer than one day, as well as the importance of understanding the tone of the news.
The authors found that daily aggregation of news sentiment is sub-optimal for predicting future stock returns. It is better to quantify the sentiment over at least a weekly period. They also found that news tone matters. In fact, negative news had the highest predictability.
The bottom line
In terms of the application of these new data sets by investment managers, private conversations I’ve had with some quantitative asset managers reveal an increased interest in studying them but caution in allocating a risk budget to these newer alpha signals.
In the words of Fan et al. (2014), ‘Big Data bring new opportunities to modern society but challenges to data scientists’.
According to the authors, the challenges brought by the high dimensionality of Big Data include: noise accumulation and spurious correlations; and heavy computational costs and algorithm instability.
There are interesting implications for investors but lot of more research work by the PhDs is needed.