We present a study about simulating message routing over the network of Twitter followers. Our simulations are based on publicly available data collected via the Twitter API, which includes the connections between about 6 million users who posted their GPS coordinates in their tweets. The main question we ask is whether we can find a chain of links between randomly selected users in this network by using only local information and not resorting to a global knowledge of network topology. We are if individuals can navigate the network they are part of, simulating the setting of the original Milgram experiments (note that we mean the small-world experiment, not the infamous other one with electric shocks!). In contrast with those experiment however, to simplify the model, we only take into account geographic distance when passing the messages instead of many other clues people may have in real-world settings. We found that it is quite easy to establish a chain up to any target user's city. On the other hand, our routes tend to get lost in the city, indicating that geographical distance plays little role on the formation of social ties on those levels. This is in agreement with Milgram and colleagues' original findings, who noticed that as the letters get closer to the target, the decisions of participants are increasingly based on perceived social factors and not on geographical distances.
The impression of living in a small world has been around since at least the beginning of the industrial age. Already in 1929, Hungarian writer Frigyes Karinthy speculated that any two people in the world can be connected by a chain of personal acquaintances no longer than five. One of the first empirical studies on the subject was by Jeffrey Travers and Stanley Milgram, whose results were popularized as the six degrees of separation. In their experiments, they asked arbitrarily selected participants from the mid-western USA to forward a letter to a target living in Boston via a chain of personal acquaintances, and then analyzed the resulting chains. Although only about 20% of the letters reached the target, the average length of chains was indeed quite low, on average around five steps. While the possible bias due to the experiment requiring active participation of the message-holders is still debated, they did show that short chains exist among a significant fraction of randomly selected individuals. In addition to this, their results also show that participants without a global knowledge of the network of social contacts were able to find these chains by only making decisions about the next link based on their sense of their own social circles. These results gained a revived attention around 2000 when large-scale network datasets started to become available. Studies analyzing the graph of connections in online social networks confirmed that short chains do exist, while the possible mechanism how participants might be able to find these short chains was also studied theoretically. The results of Jon Kleinberg were the first to confirm theoretically that finding short chains based on local information only is possible in models of spatially embedded networks. Now, thanks to the widespread use of GPS-enabled smartphones, we can test Kleinberg's model on a large-scale dataset of geo-located Twitter users, complementing similar studies on more coarse-grained social network data and mobile network datasets.
As part of the FUTURICT.hu project, we have been collecting public statuses and connectivity information from Twitter via their public APIs since 2012. Over the course of more than two years, we accumulated over 4 billion tweets; among these, over 1.5 billion contain GPS coordinates. This kind of data opens up many new ways to gain information about society, as many research areas were previously limited by the lack of large-scale datasets (to find more information about our ongoing research projects, check out our related work). To build a geographically embedded network of Twitter users, we identified those who regularly post GPS coordinates with their tweets; we selected the 6 million most active users, and queried Twitter for who they follow publicly. This resulted in 122 million edges among them. We emphasize that all the data we use has been made publicly available to everyone by Twitter with the consent of its users. On the other hand, due to Twitter's policy, we cannot directly redistribute the dataset. However, we are happy to share any information on how to obtain similar datasets; if you are interested, do not hesitate to contact us.
What we do is basically the simulation of Travers and Milgram's experiment on the network of Twitter followers. We choose a starting user and a target user, and try to establish a chain of links between them. Instead of using global knowledge of the network, we simulate users making decisions based on local information only. This means that we build up the chain iteratively, beginning at the staring user, and adding one participant in each step. We base our decision on a simple rule: among the neighbors of the current user, we choose the one who is closest to the target in terms of geographical distance. Note that we can easily do this since we know the geographical position of each user based on their geo-tagged tweets. This is of course a simplification of the original Milgram experiment, where other factors, like the occupation of the target or the perceived social status of possible intermediate message recipients could also play a role in the decision of the participants. Of course, we do not allow loops in our chains, and we limit the maximum length to one hundred users; if the chain does not complete by that, we consider the routing between that specific pair of users as unsuccessful.
szule at complex dot elte dot hu website
dkondor at complex dot elte dot hu Scholar
László Dobos PhD
dobos at complex dot elte dot hu website
István Csabai DSc
Gábor Vattay DSc
All authors work at the Eötvös Loránd University, Budapest, Hungary. JS and DK are PhD students aiming to write their theses in 2015. LD is currently a teaching assistant, IC and GV are full professors. All authors' research interests include data mining and modeling on large network datasets. Along with the large-scale collection and analysis of tweets, the authors are also involved in analysing the Bitcoin transaction graph. DK is also currently doing an internship at Ericsson Research in Budapest, where he is involved in the Signature of Humanity project. LD and IC are also interested in astrophysics, and have a long term collaboration with the Johns Hopkins University in Baltimore regarding the management of the Virtual Observatory of astronomical measurements. IC is also interested in bioinformatics with a focus on large-scale analysis of next genome sequencing data.
Other members of the group, whose help we are grateful to include Tamás Sebők, Tamás Hanyecz, Zsófia Kallus and Norbert Barankai.
Citation info for our paper:
Szüle J, Kondor D, Dobos L, Csabai I, Vattay G (2014) Lost in the City: Revisiting Milgram's Experiment in the Age of Social Networks. PLoS ONE 9(11): e111973. doi:10.1371/journal.pone.0111973.
Data collection and database:
Dobos L, Szüle J et al (2013). A multi-terabyte relational database for geo-tagged social network data. CogInfCom 2013.
Regional features identified in the text of the tweets
(still an ongoing project, new results are to be published soon!):
Kondor D, Csabai I et al (2013). Using Robust PCA to estimate regional characteristics of language use from geo-tagged Twitter messages. CogInfoCom 2013.
Analysis of the follower and mention graph aggregated by geographical
regions, and spreading processes on it:
Kallus Zs, Barankai N et al (2013). Regional properties of global communication as reflected in aggregated Twitter data. CogInfCom 2013.
Efficient geographic indexing employed in our database of tweets:
Kondor D, Dobos L et al (2014). Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh. SSDBM 2014.
Finding geographic communities using the Twitter follower graph:
Kallus Z, Barankai N, Szüle J, Vattay G (2015). Spatial Fingerprints of Community Structure in Human Interaction Network for an Extensive Set of Large-Scale Regions. PLoS ONE 10(5): e0126713.
Modeling information spreading on the aggregated Twitter network (in preparation):
Kallus Zs, Kondor D, Stéger J, Csabai I, Vattay G (2014). Online news propagation: hidden geometry recovered by Twitter?
DownloadsHere you can download some aggregate measures about the Twitter follower graph and some of the generated chains. Each data file is in plain text format.
Data files for degree and distance distribution in the Twitter follower network:
|degrees.txt - 20MB||Follower count per Twitter user in our dataset (mutual following graph degrees)|
|degrees_out.txt - 32MB||Followed user count per Twitter user in our dataset (out degree only)|
|distances_between_users.txt - 296MB||Distances between Twitter friends in our dataset ([km])|
|graph3_dist.txt - 1.7GB||Distances, with node numbers (i.e. the follower graph edge list)|
|d2_corr_dim_data.dat - 3kB||Distance distribution between all user pairs, for correlation dimension calculation|
Results of greedy routing processes !!!big files - ~2.3GB (8GB uncompressed) each!!! :The data is grouped by routing processes (i.e. the randomly selected source and target user) and ordered by the hop number Each routing process has 100 steps (100 lines). The first column is the hop number in a given routing process, the second coulmn gives the distance from the target user ([km]), and the third column describes if the routing process is dead (1 - can go further | 0 - dead )
|World-World.dat||From: Anywhere - To: Anywhere|
|SF-LA.dat||From: San Fransisco - To: Los Angeles|
|Phoe-LA.dat||From: Phoenix - To: Los Angeles|
|SF-Bost.dat||From: San Fransisco - To: Boston|
|Bost-LA.dat||From: Boston - To: Los Angeles|
|Phoe-Bost.dat||From: Phoenix - To: Boston|
|Par-Hamb.dat||From: Paris - To: Hamburg|
|Ist-Paris.dat||From: Istanbul - To: Paris|
|Ath-Par.dat||From: Athens - To: Paris|
|Ath-Hamb.dat||From: Athens - To: Hamburg|
|Wich-Bost.dat||From: Wichita - To: Boston|
|Madrid-Hamb.dat||From: Madrid - To: Hamburg|
|Cop-Rome.dat||From: Copenhagen - To: Rome|