Download the data

Basic dataset -- includes the list of all transactions up to 2013.12.28.

Computed data -- includes basic statistics computed from the transaction data and the reconstructed directed graph between the addresses.

The zip files contain TSV files (UNIX line endings, columns are separated with tabs).

Note: to reproduce the results of our first paper (Do the rich get richer?...), limit the analysis to the first 235000 blocks (transactions up to 2013.05.07.

Basic dataset


blockhash.txt -- enumeration of all blocks in the blockchain, 277443 rows, 4 columns:
  blockID -- id used in this database (0 -- 277442, continous)
  bhash -- block hash (identifier in the blockchain, 64 hex characters)
  btime -- creation time (from the blockchain)
  txs -- number of transactions

txhash.txt -- transaction ID and hash pairs, 30048983 rows, 2 columns:
  txID -- id used in this database (0 -- 30048982, continous)
  txhash -- transaction hash used in the blockchain (64 hex characters)

addresses.txt -- BitCoin address IDs, 24618959 rows, 2 columns:
  addrID -- id used in this database (0 -- 24618958, continous, the address with addrID == 0 is invalid /blank, not used/)
  addr -- string representation of the address (alphanumeric, maximum 35 characters; note that the IDs are NOT ordered by the addr in any way)

tx.txt -- enumaration of all transactions, 30048983 rows, 4 columns:
  txID -- transaction ID (from the txhash.txt file)
  blockID -- block ID (from the blockhash.txt file)
  n_inputs -- number of inputs
  n_outputs -- number of outputs

txin.txt -- list of all transaction inputs (sums sent by the users), 65714232 rows, 3 columns:
  txID -- transaction ID (from the txhash.txt file)
  addrID -- sending address (from the addresses.txt file)
  value -- sum in Satoshis (1e-8 BTC -- note that the value can be over 2^32, use 64-bit integers when parsing)

txout.txt -- list of all transaction outputs (sums received by the users), 73738345 rows, 3 columns:
  txID -- transaction ID (from the txhash.txt file)
  addrID -- receiving address (from the addresses.txt file)
  value -- sum in Satoshis (1e-8 BTC -- note that the value can be over 2^32, use 64-bit integers when parsing)

txtime.txt -- transaction timestamps (obtained from the blockchain.info site), 30048983 rows, 2 columns:
  txID -- transaction ID (from the txhash.txt file)
  unixtime -- unix timestamp (seconds since 1970-01-01)

Computed data


contraction.txt -- list of addresses possibly belonging to the same user, 24618959 rows, 2 columns:
  addrID -- address ID (from the addresses.txt file)
  userID -- ID of identified user (not continuos, each two addrID which belong to the same "user" appear as inputs in the same transaction at least once)

balances.txt -- balances of nodes after 277,443 blocks (on 2013.12.28.), 24617959 rows, 2 columns:
  addrID -- address ID (from the addresses.txt file)
  balance -- balance in Satoshis (1e-8 BTC -- note that the value can be over 2^32, use 64-bit integers when parsing)

degree.txt -- node degrees (number of distinct transaction partners), 24575385, 3 columns:
  addrID -- address ID (from the addresses.txt file)
  indeg -- indegree (number of distinct addresses which appear as inputs in transactions where this address appears as output)
  outdeg -- outdegree (number of distinct addresses which appear as outputs in transactions where this address appears as input)

txedge.txt -- edges constructed from the transactions: a transaction with 2 inputs and 3 outputs results in 6 edges (all possible combinations), an edge may appear multiple times, with the corresponding transaction IDs, 129178908 rows, 3 columns:
  txID -- transaction ID in which this edge appears
  addrin -- sending address
  addrout -- receiving address

txedgeunique.txt -- edges constructed from the transactions; each edge appears only once, 89220163 rows, 2 columns:
  addrin -- sending address
  addrout -- receiving address


back to the main page