Prediction of employment rates from Twitter daily rhythms in the US

Eszter Bokányi, Zoltán Lábszki, Gábor Vattay

Department of Physics of Complex Systems, Eötvös Loránd University, Pf. 32, H-1518 Budapest, Hungary

Abstract

By modeling macro-economical indicators using digital traces of human activities on mobile or social networks, we can provide important insights to processes previously assessed via paper-based surveys or polls only. We collected aggregated workday activity patterns of US counties from the normalized number of messages sent in each hour on the online social network Twitter. In this paper, we show how the daily rhythm of these patterns can be modeled and linked to county employment statistics. The daily rhythm can be decomposed into a linear combination of two dominant patterns whose mixing ratio defines a measure for each county. By showing that this measure correlates significantly with employment ($0.46\pm0.02$) and unemployment rates ($-0.34\pm0.02$) in the counties, the two dominant activity patterns can be linked to rhythms signaling presence or lack of regular working hours of individuals. Thus, the analysis could provide policy makers a better insight into the processes, where problems could not only be identified based on the number of officially registered unemployed, but also on the basis of the digital footprints people leave on different platforms.

Dataset

Raw geolocated tweet count per hour per county each workday between 01/01/2014 and 31/10/2014 can be downloaded from the following link. First column is state and county FIPS code concatenated (e.g. state=01, county=001, then 1001 is the resulting encoding), second column indicates hour, third the geolocated tweet count.

County unemployment rates were downloaded for the year 2014 from the following page of the BLS.

Population estimates were taken from the US 2010 Census using the American FactFinder.

Codes

Analysis codes can be viewed here.

Contact

E-mail: bokanyi at complex dot elte dot hu