SentoResearch

Research material & open-source software by and for the community

In a nutshell

Within the growing and fascinating landscape at the frontier of text mining, sentiment analysis, and econometrics, the field sentometrics has emerged. Researchers in sentometrics investigate the transformation of qualitative sentiment embedded in textual data (and other alternative data sources) into quantitative sentiment variables, and their subsequent application in an econometric analysis of the relationships between sentiment and other variables.

Many researchers steer forward sentometrics by doing tremendous work across the domains of economics, finance, politics and beyond. The objective of this hub is to provide resources and open-source software to help the community of these researchers interact with each other and showcase their work, while also introducing those interested to enter the field.

This survey paper and the R package sentometrics are perfect starting points to dive into this exciting field.

Data

EPU Belgium

Daily EPU Flanders, Wallonia, and Belgium updated daily from 2003 to today.

MCCC

Daily U.S. Media Climate Change Concerns Index from 2003 to 2024.

U.S Topical Economic Sentiment

Daily Topical U.S Economic Sentiment Indices from 1996 to 2016.

EPU Quebec

Monthly EPU from French-Canadian sources. Index available from 1913 to 2020.

Posts

2025 Update on the Media Climate Change Concerns Index

Update of the Media Climate Change Concerns (MCCC) index! The index ranges now from January 2003 to June 2025. The MCCC is constructed from major U.S. newspapers and newswires to proxy for climate change concerns.

2024 Update on the Media Climate Change Concerns Index

Update of the Media Climate Change Concerns (MCCC) index! The index ranges now from January 2003 to June 2024. The MCCC is constructed from major U.S. newspapers and newswires to proxy for climate change concerns.

Software

*

caret

Miscellaneous functions for training and plotting classification and regression models.

glmnet

LASSO and elastic net regularized generalized linear models.

GWP

Sentiment lexicon calibration with the Generalized Word Power methodology.

NLTK

NLTK is a leading platform for building Python programs to work with human language data.

quanteda

A fast, flexible, and comprehensive framework for quantitative text analysis in R.

scikit-learn

Machine learning in Python.

SentimentAnalysis

Dictionary-based sentiment analysis.

sentometrics

An integrated framework for textual sentiment time series aggregation and prediction.

sentometrics.app

A Shiny interface to the R package sentometrics.

sentopics

Tools for estimating and analyzing various classes of sentiment/topic models.

spaCy

Industrial-strength natural language processing in Python.

STM

The Structural Topic Model (STM) allows researchers to estimate topic models with document-level covariates.

TextBlob

TextBlob is a Python library for processing textual data.

textir

Inverse regression analysis of text.

tidytext

Text mining using tidy tools.

transformers

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0.

UDPipe

Natural language processing toolkit.

VADER

Sentiment analysis tool that is specifically attuned to sentiments expressed in social media.