ELTE Bitcoin Project website and resources

Downloads

You can download the dataset in plain text (TSV) format or access it inside our database.

Notes:

New dataset, 5.53 GiBWhole blockchain up to 2014.10.19. (326,027 blocks).
Computed data, 3.05 GiBpossible identification of addresses belonging to the same user and the reconstructed directed graph between the addresses
Original dataset, 2.7 GiB The data used in our two papers, includes the list of all transactions up to 2013.12.28, presented here for historical purposes.
Original computed dataset, 1.2 GiB Statistics computed from the transaction data for the original dataset.
Timestamped graph of most active users, 105 MiBThe graphs used in our New Journal of Physics article, created using the "original dataset".

Read the descriptions.

Download the modified bitcoind client we used to extract the data from the blockchain.

Download the modified red-black tree code and code based on it used in calculating the test statistics for evaluating the presence of preferential attachment in the transaction network.

Database description

All data is accessible in our database through the CasJobs web interface (you need to register an account which is free of charge): nm.vo.elte.hu/casjobs/login.aspx. Look for the BitCoinPublic database.

The database still contains the 'original' dataset with transactions up to the end of 2013. To obtain a more recent blockchain, use the downloadable datafiles. To reproduce the analysis of our first paper, limit your queries / analysis to blockID < 240,000 (and / or txID < 17354797).

The database includes the following tables:
blockhashenumeration of all blocks in the blockchain
blockID int not null primary key id used in this database (0 -- 239999, continuous)
bhash binary(32) not null block hash (identifier in the blockchain)
btime int not null creation time (from the blockchain)
txs int not null number of transactions
txhashtransaction ID and hash pairs
txID int not null primary key id used in this database (starts from 0, continuous)
txhash binary(32) not null transaction hash used in the blockchain
addresses BitCoin address IDs
addrID int not null primary key id used in this database (starts from 0, continuous, the address with addrID == 0 is invalid /blank, not used/)
addr varchar(35) not null string representation of the address (note that the IDs are NOT ordered by the addr in any way)
tx enumeration of all transactions
txID int not null primary key transaction ID (from the txhash table)
blockID int not null block ID (from the blockhash table)
n_inputs int not null number of inputs
n_outputs int not null number of outputs
txin list of all transaction inputs (sums sent by the users)
txID int not null, transaction ID (from the txhash table)
addrID int not null, sending address (from the addresses table)
value bigint not null sum in Satoshis (1e-8 BTC)
clustered index: txID, addrID
txout list of all transaction outputs (sums received by the users)
txID int not null, transaction ID (from the txhash table)
addrID int not null, receiving address (from the addresses table)
value bigint not null sum in Satoshis (1e-8 BTC)
clustered index: txID, addrID
contraction list of addresses possibly belonging to the same user
addrID int not null primary key, address ID (from the addresses table)
userID int not null ID of identified user (not continuous, each two addrID which belong to the same "user" appear as inputs in the same transaction at least once)
balances balances of nodes at some specified times
blockID int not null, ID of last block after which the balance was calculated (valid values: 100000,120000,140000,160000,180000,199956,228931,234999 )
addrID int not null, address ID (from the addresses table)
balance bigint not null, balance in Satoshis (1e-8 BTC)
constraint pk_balances primary key(blockID, addrID)
degree node degrees (number of distinct transaction partners) after 277,443 blocks
addrID int not null primary key, address ID (from the addresses table)
indeg int not null, indegree (number of distinct addresses which appear as inputs in transactions where this address appears as output)
outdeg int not null outdegree (number of distinct addresses which appear as outputs in transactions where this address appears as input)
txedge edges constructed from the transactions: a transaction with 2 inputs and 3 outputs results in 6 edges (all possible combinations) an edge may appear multiple times, with the corresponding transaction IDs; JOIN with the txtime table to obtain a timestamped graph
txID int not null, transaction ID in which this edge appears
addrin int not null, sending address
addrout int not null receiving address
clustered index: txID
txedgeunique edges constructed from the transactions; each edge appears only once, edges are indexed by the sending and receiving addresses
addrin int not null, sending address
addrout int not null, receiving address
constraint pk_txedgeu primary key(addrin,addrout)
nonclustered index: addrout,addrin
txtime transaction timestamps (obtained from the blockchain.info site)
txID int not null primary key, transaction ID
unixtime int not null unix timestamp
Datafiles description

Notes: a new dataset is now available which contains the blockchain up to 2014.10.19. This includes a bit more information than the previous dataset. The main difference is that the transaction outputs and inputs now have a separate identifier inside a transaction. Using these identifiers, transaction inputs can be linked with the previous transaction output that is spent; the new dataset also includes these linkings. Most of the files however have the same structure; files or columns present in only the original dataset are presented with red, files or columns only present in the new dataset are presented with green background.

All files have unix line endings (\n or 0x0a), fields are separated by tabs (\t), and contain no headers or BOM. All files contain only ASCII characters.

Basic datasets: new, original

blockhash.txt enumeration of all blocks in the blockchain, 277443 / 326027 rows, 4 columns:
blockID id used in this database (starts from 0, continous)
bhash block hash (identifier in the blockchain, 64 hex characters)
btime creation time (from the blockchain)
txs number of transactions
txhash.txt transaction ID and hash pairs, 30048983 / 49236171 rows, 2 columns:
txID id used in this database (starts from 0, continous)
txhash transaction hash used in the blockchain (64 hex characters)
addresses.txt BitCoin address IDs, 24618959 / 50609574 rows, 2 columns:
addrID id used in this database (starts from 0, not continuous, the address with addrID == 0 is invalid /blank, not used/)
addr varchar(35) not null string representation of the address (alphanumeric, maximum 35 characters; note that the IDs are NOT ordered by the addr)
tx.txt enumaration of all transactions, 30048983 / 49236171 rows, 4 columns:
txID transaction ID (from the txhash.txt file)
blockID block ID (from the blockhash.txt file)
n_inputs number of inputs
n_outputs number of outputs
txin.txt list of all transaction inputs (sums sent by the users), 65714232 / 119286937 rows, 3 / 4 columns:
txID transaction ID (from the txhash.txt file)
i input ID (same as in the txlinks.txt file, a txID -- i pair will appear only once)
addrID sending address (from the addresses.txt file)
value sum in Satoshis (1e-8 BTC -- note that the value can be over 2^32, use 64-bit integers when parsing)
txout.txt list of all transaction outputs (sums received by the users), 73738345 / 133622442 rows, 3 / 4 columns:
txID transaction ID (from the txhash.txt file)
i output ID (same as in the txlinks.txt file, a txID -- i pair will appear only once)
addrID receiving address (from the addresses.txt file)
value sum in Satoshis (1e-8 BTC -- note that the value can be over 2^32, use 64-bit integers when parsing)
txtime.txt transaction timestamps (obtained from the blockchain.info site), only in the original dataset 30048983 rows, 2 columns:
txID transaction ID (from the txhash.txt file)
unixtime unix timestamp (seconds since 1970-01-01)
txlinks.txt links connecting transaction outputs with previous transcation inputs, only in the new dataset, 119286937 rows, 4 columns:
txID transaction ID (from the txhash.txt file)
prev_txID ID of previous transaction (whose output is being spent)
i input ID (note: one txID -- i pair appears only once)
prev_i output ID in the previous transcatoin (note: one prev_txID -- prev_i pair appears only once)
nonstandard.txt a list of transactions with nonstandard outputs, where the receiving address cannot be decoded; in these cases the addrID in in the txout.txt and txin.txt files is 0; corresponding outputs and inputs can be linked together with the txlinks.txt file; only in the new dataset, 9701 rows, 1 column:
txID transaction ID
multiple.txt a list of transaction outputs, where multiple addresses receive the sum together (see e.g. the description here); here, only the first address is present in the txin.txt and txout.txt files; only in the new dataset, 360722 rows, 3 columns:
txID transaction ID
i output ID
addrID receiving addresses (a txID -- i pair will appear with multiple addresses)

Computed datasets: new, original

contraction.txt list of addresses possibly belonging to the same user, 24618959 / 50609573 rows, 2 columns:
addrID address ID (from the addresses.txt file)
userID ID of identified user (not continuos, each two addrID which belong to the same "user" appear as inputs in the same transaction at least once)
balances.txt balances of nodes after 277,443 blocks (on 2013.12.28.), only in the original dataset, 24617959 rows, 2 columns:
addrID address ID (from the addresses.txt file)
balance balance in Satoshis (1e-8 BTC -- note that the value can be over 2^32, use 64-bit integers when parsing)
degree.txt node degrees (number of distinct transaction partners), only in the original dataset, 24575385 rows, 3 columns:
addrID address ID (from the addresses.txt file)
indeg indegree (number of distinct addresses which appear as inputs in transactions where this address appears as output)
outdeg outdegree (number of distinct addresses which appear as outputs in transactions where this address appears as input)
txedge.txt edges constructed from the transactions: a transaction with 2 inputs and 3 outputs results in 6 edges (all possible combinations), an edge may appear multiple times, with the corresponding transaction IDs, 129178908 / 361759010 rows, 3 columns:
txID transaction ID in which this edge appears
addrin sending address
addrout receiving address
txedgeunique.txt edges constructed from the transactions; each edge appears only once, 89220163 / 294484230 rows, 2 columns:
addrin sending address
addrout receiving address

Graphs uses in our New Journal of Physics study (download).

lt_graph.txt timestamped graph among the long-term users (LT core), 5121024 rows, 3 columns:
ain userID of the sender (corresponds to the userID in the contraction.txt in the original dataset)
aout userID of the recipient
unixtime unix timestamp of the transaction (seconds since 1970-01-01, obtained from the blockchain.info site)
au_graph.txt timestamped graph among the most active users (AU core), 11971481 rows, 3 columns:
ain userID of the sender (corresponds to the userID in the contraction.txt in the original dataset)
aout userID of the recipient
unixtime unix timestamp of the transaction (seconds since 1970-01-01, obtained from the blockchain.info site)
graph_addresses.txt Bitcoin addresses of users appearing in the two graphs, 3468509 rows, 2 columns:
userID userID as appearing in the previous files
addr string representation of Bitcoin addresses (note: multiple addresses can belong to a userID)