Overview

We present a study about simulating message routing over the network of Twitter followers. Our simulations are based on publicly available data collected via the Twitter API, which includes the connections between about 6 million users who posted their GPS coordinates in their tweets. The main question we ask is whether we can find a chain of links between randomly selected users in this network by using only local information and not resorting to a global knowledge of network topology. We are if individuals can navigate the network they are part of, simulating the setting of the original Milgram experiments (note that we mean the small-world experiment, not the infamous other one with electric shocks!). In contrast with those experiment however, to simplify the model, we only take into account geographic distance when passing the messages instead of many other clues people may have in real-world settings. We found that it is quite easy to establish a chain up to any target user's city. On the other hand, our routes tend to get lost in the city, indicating that geographical distance plays little role on the formation of social ties on those levels. This is in agreement with Milgram and colleagues' original findings, who noticed that as the letters get closer to the target, the decisions of participants are increasingly based on perceived social factors and not on geographical distances.

Background

The impression of living in a small world has been around since at least the beginning of the industrial age. Already in 1929, Hungarian writer Frigyes Karinthy speculated that any two people in the world can be connected by a chain of personal acquaintances no longer than five. One of the first empirical studies on the subject was by Jeffrey Travers and Stanley Milgram, whose results were popularized as the six degrees of separation. In their experiments, they asked arbitrarily selected participants from the mid-western USA to forward a letter to a target living in Boston via a chain of personal acquaintances, and then analyzed the resulting chains. Although only about 20% of the letters reached the target, the average length of chains was indeed quite low, on average around five steps. While the possible bias due to the experiment requiring active participation of the message-holders is still debated, they did show that short chains exist among a significant fraction of randomly selected individuals. In addition to this, their results also show that participants without a global knowledge of the network of social contacts were able to find these chains by only making decisions about the next link based on their sense of their own social circles. These results gained a revived attention around 2000 when large-scale network datasets started to become available. Studies analyzing the graph of connections in online social networks confirmed that short chains do exist, while the possible mechanism how participants might be able to find these short chains was also studied theoretically. The results of Jon Kleinberg were the first to confirm theoretically that finding short chains based on local information only is possible in models of spatially embedded networks. Now, thanks to the widespread use of GPS-enabled smartphones, we can test Kleinberg's model on a large-scale dataset of geo-located Twitter users, complementing similar studies on more coarse-grained social network data and mobile network datasets.

Dataset

As part of the FUTURICT.hu project, we have been collecting public statuses and connectivity information from Twitter via their public APIs since 2012. Over the course of more than two years, we accumulated over 4 billion tweets; among these, over 1.5 billion contain GPS coordinates. This kind of data opens up many new ways to gain information about society, as many research areas were previously limited by the lack of large-scale datasets (to find more information about our ongoing research projects, check out our related work). To build a geographically embedded network of Twitter users, we identified those who regularly post GPS coordinates with their tweets; we selected the 6 million most active users, and queried Twitter for who they follow publicly. This resulted in 122 million edges among them. We emphasize that all the data we use has been made publicly available to everyone by Twitter with the consent of its users. On the other hand, due to Twitter's policy, we cannot directly redistribute the dataset. However, we are happy to share any information on how to obtain similar datasets; if you are interested, do not hesitate to contact us.

Simulations

What we do is basically the simulation of Travers and Milgram's experiment on the network of Twitter followers. We choose a starting user and a target user, and try to establish a chain of links between them. Instead of using global knowledge of the network, we simulate users making decisions based on local information only. This means that we build up the chain iteratively, beginning at the staring user, and adding one participant in each step. We base our decision on a simple rule: among the neighbors of the current user, we choose the one who is closest to the target in terms of geographical distance. Note that we can easily do this since we know the geographical position of each user based on their geo-tagged tweets. This is of course a simplification of the original Milgram experiment, where other factors, like the occupation of the target or the perceived social status of possible intermediate message recipients could also play a role in the decision of the participants. Of course, we do not allow loops in our chains, and we limit the maximum length to one hundred users; if the chain does not complete by that, we consider the routing between that specific pair of users as unsuccessful.

Results


We find that typically, chains are able to reach the target user's city in only a few steps. Despite this, only about 20% of our chains could reach the exact target; the rest of the chains circle around the target user for more than 100 steps. This is not very surprising, as we would expect that inside a city, the role of spatial distance in the formation of social ties diminishes, meaning that we might not even know our neighbors, and our friends can live scattered out in our city. More formally, the probability of having a friend in our immediate neighborhood is not significantly higher than having a friend anywhere in our hometown. This results in that once we are in the target's city, selecting the geographically closest user as the next recipient does not increase the probability of reaching the target. On this scale, getting closer in terms of geographical distance does not correspond to getting closer in terms of network distance. We can test this reasoning by looking at the distribution of distances between Twitter friends. For distances smaller than 10 km, when we increase the distance the probability of finding a friend increases as the number of available people increases if we consider people from a bigger radius. On the other hand, for distances over 10 km, this probability decreases with increasing distance, as a result of that the cost of creating and maintaining a social relationship increases with the distance. For this part, the distribution can be well approximated with a power-law mathematical function, whose parameters agree with the parameters giving the most effective decentralized routing in Kleinberg's work. This agrees well with the fact that our chains seem to work efficiently on this scale, while getting lost below the 10 km threshold.

About us

János Szüle

szule at complex dot elte dot hu website

Dániel Kondor

dkondor at complex dot elte dot hu Scholar

László Dobos PhD

dobos at complex dot elte dot hu website

István Csabai DSc

csabai at complex dot elte dot hu website Scholar

Gábor Vattay DSc

vattay at elte dot hu Twitter Scholar

All authors work at the Eötvös Loránd University, Budapest, Hungary. JS and DK are PhD students aiming to write their theses in 2015. LD is currently a teaching assistant, IC and GV are full professors. All authors' research interests include data mining and modeling on large network datasets. Along with the large-scale collection and analysis of tweets, the authors are also involved in analysing the Bitcoin transaction graph. DK is also currently doing an internship at Ericsson Research in Budapest, where he is involved in the Signature of Humanity project. LD and IC are also interested in astrophysics, and have a long term collaboration with the Johns Hopkins University in Baltimore regarding the management of the Virtual Observatory of astronomical measurements. IC is also interested in bioinformatics with a focus on large-scale analysis of next genome sequencing data.

Other members of the group, whose help we are grateful to include Tamás Sebők, Tamás Hanyecz, Zsófia Kallus and Norbert Barankai.

This work was supported by the European Union and the European Social Fund through project FuturICT.hu (grant no. TAMOP-4.2.2.C-11/1/KONV-2012-0013). The authors are also thankful for the support of the Hungarian Government, the National Development Agency, the Research and Technology Innovation Fund and the MAKOG Foundation (grant no. EITKIC_12-1-2012-0001).

Citation info for our paper:
Szüle J, Kondor D, Dobos L, Csabai I, Vattay G (2014) Lost in the City: Revisiting Milgram's Experiment in the Age of Social Networks. PLoS ONE 9(11): e111973. doi:10.1371/journal.pone.0111973.

Related work

Data collection and database:
Dobos L, Szüle J et al (2013). A multi-terabyte relational database for geo-tagged social network data. CogInfCom 2013.

Regional features identified in the text of the tweets (still an ongoing project, new results are to be published soon!):
Kondor D, Csabai I et al (2013). Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages. CogInfoCom 2013.

Analysis of the follower and mention graph aggregated by geographical regions, and spreading processes on it:
Kallus Zs, Barankai N et al (2013). Regional properties of global communication as reflected in aggregated Twitter data. CogInfCom 2013.

Efficient geographic indexing employed in our database of tweets:
Kondor D, Dobos L et al (2014). Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh. SSDBM 2014.

Finding geographic communities using the Twitter follower graph:
Kallus Z, Barankai N, Szüle J, Vattay G (2015). Spatial Fingerprints of Community Structure in Human Interaction Network for an Extensive Set of Large-Scale Regions. PLoS ONE 10(5): e0126713.

Modeling information spreading on the aggregated Twitter network (in preparation):
Kallus Zs, Kondor D, Stéger J, Csabai I, Vattay G (2014). Online news propagation: hidden geometry recovered by Twitter?

Downloads

Here you can download some aggregate measures about the Twitter follower graph and some of the generated chains. Each data file is in plain text format.