Ever wondered what goes on inside the block chain? There's an overview on the Bitcoin wiki, but let's take a look at the ultimate truth of the matter: the raw bytes. In this article I intend to walk through the bytes that form the genesis block, explaining the content from the ground up. The target audience knows a little about how Bitcoin works, and is interested in learning more. If you are a computer science major familiar with all the features of Bitcoin, you probably won't learn much you didn't already know.
This post started off as a few notes I made on the protocol, but as the notes expanded I decided to make them publishable. There are many points where I could go off on a tangent to explain in deeper detail, but I have tried to keep those in check by linking to further material instead. Source code references are relative to v0.5.0 of the Satoshi client.
My favorite learning strategy is to strip away as much abstraction as possible, to see the guts and mechanisms that make things work. I hope others find this is a useful starting point for deep diving into the Bitcoin protocol and source code. (Although my personal interest is in understanding the protocol without having to stomach too much C++).
So let's go!
If you look in the data directory of your Bitcoin installation (~/.bitcoin on Ubuntu), you'll see two important files: blkindex.dat and blk0001.dat. These are the block index and the block chain.
(Why blk0001.dat? Because the client tries to keep the block chain store below 2GB to accommodate filesystem limits (main.cpp, line 1461). If you are reading this article in the future, you may see blk0002.dat, blk0003.dat, and so on... or if you are reading it in the far future, legacy filesystems and 32-bit machines might finally be obsolete, allowing the client to use a single file... or maybe the client will use an alternative means of block chain storage entirely.)
The block index is a utility file that allows fast lookups into the actual raw data of the block chain. The index is a Btree-format Berkely DB file:
$ file ~/.bitcoin/blkindex.dat /home/james/.bitcoin/blkindex.dat: Berkeley DB (Btree, version 9, native byte-order)
The blk0001.dat file has no database structure; it is just a concatenation of block messages that the client receives. This is the file of primary interest to this article. Now we get out hexdump and the fun begins!
$ hexdump -n 300 -C blk0001.dat 00000000 f9 be b4 d9 1d 01 00 00 01 00 00 00 00 00 00 00 |................| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000020 00 00 00 00 00 00 00 00 00 00 00 00 3b a3 ed fd |............;...| 00000030 7a 7b 12 b2 7a c7 2c 3e 67 76 8f 61 7f c8 1b c3 |z{..z.,>gv.a....| 00000040 88 8a 51 32 3a 9f b8 aa 4b 1e 5e 4a 29 ab 5f 49 |..Q2:...K.^J)._I| 00000050 ff ff 00 1d 1d ac 2b 7c 01 01 00 00 00 01 00 00 |......+|........| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff |................| 00000080 ff ff 4d 04 ff ff 00 1d 01 04 45 54 68 65 20 54 |..M.......EThe T| 00000090 69 6d 65 73 20 30 33 2f 4a 61 6e 2f 32 30 30 39 |imes 03/Jan/2009| 000000a0 20 43 68 61 6e 63 65 6c 6c 6f 72 20 6f 6e 20 62 | Chancellor on b| 000000b0 72 69 6e 6b 20 6f 66 20 73 65 63 6f 6e 64 20 62 |rink of second b| 000000c0 61 69 6c 6f 75 74 20 66 6f 72 20 62 61 6e 6b 73 |ailout for banks| 000000d0 ff ff ff ff 01 00 f2 05 2a 01 00 00 00 43 41 04 |........*....CA.| 000000e0 67 8a fd b0 fe 55 48 27 19 67 f1 a6 71 30 b7 10 |g....UH'.g..q0..| 000000f0 5c d6 a8 28 e0 39 09 a6 79 62 e0 ea 1f 61 de b6 |\..(.9..yb...a..| 00000100 49 f6 bc 3f 4c ef 38 c4 f3 55 04 e5 1e c1 12 de |I..?L.8..U......| 00000110 5c 38 4d f7 ba 0b 8d 57 8a 4c 70 2b 6b f1 1d 5f |\8M....W.Lp+k.._| 00000120 ac 00 00 00 00 f9 be b4 d9 d7 00 00 |............| 0000012c
This is the "canonical hex+ascii display" mode of the hexdump utility. Each row represents 16 bytes from the input file. Reading each row left to right, first you see the offset from the beginning of the file, then 16 bytes, each represented as two hexadecimal characters, and then those same 16 bytes decoded into ASCII characters where possible. Where they decode to a non-printable character, a dot is printed instead. The vertical pipe characters are not part of the data; they just serve to delimit the hex view from the ASCII view.
Magic network ID
f9 be b4 d9 is the magic number that identifies what follows as a Bitcoin protocol message. It's worth noting that all* numbers in the block are represented in little-endian byte order, so the value represented by these four bytes is 0xD9B4BEF9, or 3,652,501,241 in decimal. The source claims this number was chosen so as to be unlikely to occur in normal data (main.cpp, line 1760).
Notably, the magic network ID is not part of the block. It is used solely as a delimiter between blocks. Given that the block length is stored, and that there is also an index file, it could be happily omitted. But even when the block chain reaches a length of 200,000 blocks in 2012, this would save only 4 bytes * 200,000 = 800,000 bytes = 0.76MiB. There are greater space savings to be had elsewhere.
Block length
1d 01 00 00 (0x0000011d hex, 285 decimal) is the length of the block in bytes. Four bytes are available, giving a maximum block length of 232 - 1 = 4,294,967,295 bytes ≈ 4GiB. However, the current client will not accept blocks larger than 1MB (main.h, line 30).
The block length is obviously not part of the block either.
Block format version
01 00 00 00 (0x00000001 hex, 1 decimal) is the block format version. This is distinct from the protocol version and client version, both of which can be incremented independently. But the block format version of existing blocks must not change, or the block hash will change, and the chain of hashes from one block to the next will be broken.
Hash of previous block
This being the genesis block, there is no previous block, and so where there would normally be found a 32-byte hash of the previous block, there is instead 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00.
It is actually possible for a block to hash to zero, but hugely unlikely (though more and more likely as the difficulty increases). The protocol should probably be amended to consider a hash of zero to be invalid, making the zero value used in the genesis block a valid sentinel value.
Merkle root
3b a3 ed fd 7a 7b 12 b2 7a c7 2c 3e 67 76 8f 61 7f c8 1b c3 88 8a 51 32 3a 9f b8 aa 4b 1e 5e 4a is the root hash in the merkle tree of hashes that organize the transactions in this block.
Timestamp
29 ab 5f 49 (0x495fab29 hex, 1231006505 decimal) is the timestamp of when the block was created. As this value forms part of the block, it does not change during the mining process. Hence it represents the time at which transactions were collected into a block, not the time the completed block was published, although it is updated "every few seconds" (main.cpp, line 3088) so it doesn't fall too far behind.
The format of this field is UNIX epoch time. 1231006505 is Sat, 03 Jan 2009 18:15:05 UTC. (No timezone information is encoded; the times are UTC by fiat). The field is interpreted as an unsigned integer (main.h, line 785) which means its maximum value is 4,294,967,295, or approximately 7th February 2106. The protocol must be upgraded before then!
Bits
ff ff 00 1d (0x1d00ffff hex, 486604799 decimal) is a representation of the target, the value which the hash of the block header must not exceed in order to mine the block. The representation is an encoding particular to Bitcoin.
Nonce
1d ac 2b 7c is a random number generated during the mining process. To successfully mine a block, the header is hashed. If the resulting hash value is not less than or equal to the target, the nonce is incremented and the hash is computed again. This typically happens billions of times before a small enough hash is found.
Transaction count
01 is a variable length integer representing the number of transactions in this block. The length is not infinitely variable; a maximum of 8 bytes are available for an unsigned integer, so the limit is 18,446,744,073,709,551,615. Frankly, that's a lot of transactions for one block.
A block can never have zero transactions; at the very least there will always be one generating the block reward.
As this field shows, there is only one transaction in this block, so all that follows is part of that transaction.
Transaction version number
01 00 00 00 (0x01 hex, 1 decimal) is the version number of the transaction data format. For exactly the same reasons as the block format version number, this cannot change with protocol or client version number increments.
Count of inputs
01 is the number of transaction inputs. This is another variable length integer, with no practical limit on the number of inputs that can be specified.
Input
I'm going to give another excerpt of the hex dump with links to drill into the input parts of the transaction:
00000050 ff ff 00 1d 1d ac 2b 7c 01 01 00 00 00 01 00 00 |......+|........| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff |................| 00000080 ff ff 4d 04 ff ff 00 1d 01 04 45 54 68 65 20 54 |..M.......EThe T| 00000090 69 6d 65 73 20 30 33 2f 4a 61 6e 2f 32 30 30 39 |imes 03/Jan/2009| 000000a0 20 43 68 61 6e 63 65 6c 6c 6f 72 20 6f 6e 20 62 | Chancellor on b| 000000b0 72 69 6e 6b 20 6f 66 20 73 65 63 6f 6e 64 20 62 |rink of second b| 000000c0 61 69 6c 6f 75 74 20 66 6f 72 20 62 61 6e 6b 73 |ailout for banks| 000000d0 ff ff ff ff 01 00 f2 05 2a 01 00 00 00 43 41 04 |........*....CA.|
Hash of the input transaction
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 would be the hash of the transaction being referenced as an input, but of course this is the block reward transaction, so there is none.
Input transaction index
ff ff ff ff is the index of a specific output in the referenced input transaction. As you know, transactions can have many outputs, and subsequent transactions can take none, one or many of those outputs as inputs. A zero here references the first output of the referenced transaction. The value you see here is a representation of -1, a kind of dummy value because there is no input transaction.
It's interesting to note here that a prior transaction could in theory really have 0xffffffff + 1 (decimal 4,294,967,296) outputs, but if the value 0xffffffff is being used as a sentinel value, there would be no way to reference the 4,294,967,296th output and it would be lost. Fortunately the maximum block size of 4GiB makes it impossible to create that many outputs.
Response script length
4d (decimal 77) is the length of the script that follows. This is another variable length integer, so some very long scripts are supported in theory.
Response script
Here is possibly the most interesting and exciting piece of the whole block - the script which proves that this transaction is allowed to use the input it references. If you are not familiar with the capabilities of the Bitcoin scripting system, I urge you to read the wiki page on the subject. Bear in mind that most of the opcodes are disabled pending a secure implementation, and that the client will only recognise a very select set of standard transaction types. Here's the script again in bold:
00000080 ff ff 4d 04 ff ff 00 1d 01 04 45 54 68 65 20 54 |..M.......EThe T| 00000090 69 6d 65 73 20 30 33 2f 4a 61 6e 2f 32 30 30 39 |imes 03/Jan/2009| 000000a0 20 43 68 61 6e 63 65 6c 6c 6f 72 20 6f 6e 20 62 | Chancellor on b| 000000b0 72 69 6e 6b 20 6f 66 20 73 65 63 6f 6e 64 20 62 |rink of second b| 000000c0 61 69 6c 6f 75 74 20 66 6f 72 20 62 61 6e 6b 73 |ailout for banks|
First up is 04 which tells the interpreter that the next 4 bytes are data to be pushed onto the stack. Those four bytes are ff ff 00 1d, which happen to be the representation of the target that we saw earlier. 01 means the next 1 byte is also to be pushed onto the stack; that next byte is 04. Then, 45 indicates the next 69 bytes are to be pushed onto the stack. As you can see from the ASCII readout, those 69 bytes represent the string "The Times 03/Jan/2009 Chancellor on brink of second bailout for banks".
Why was this string chosen? Perhaps it reveals the watershed moment in financial history when Something Had To Be Done. Or perhaps it is there just to prove that Bitcoin definitely launched after that headline was published.
This is the end of the script. Not much has been achieved, but this is a generation transaction, so it does not contain the normal machinery of signature verification that you would find in a typical transaction. Indeed this script can push any arbitrary data onto the stack, and this has been used to insert messages into the blockchain. It's a privilege of being the one to mine the block.
I've called this the response script, as usually it would be providing a response to the challenge set by the script in a referenced output.
Sequence number
ff ff ff ff is the "sequence number", which supports the transaction replacement feature. The idea is that you broadcast a transaction with a lock time (see below) at some point in the future. You are then free to broadcast replacement transactions (with higher sequence numbers) until that time. If you want to lock the transaction permanently, the client will set the sequence number to 0xffffffff, the largest 4-byte integer. However, the whole transaction replacement and locking feature simply isn't used in any client yet, so all transactions are broadcast locked by default.
Output
Here's another restatement for drilling into:
000000d0 ff ff ff ff 01 00 f2 05 2a 01 00 00 00 43 41 04 |........*....CA.| 000000e0 67 8a fd b0 fe 55 48 27 19 67 f1 a6 71 30 b7 10 |g....UH'.g..q0..| 000000f0 5c d6 a8 28 e0 39 09 a6 79 62 e0 ea 1f 61 de b6 |\..(.9..yb...a..| 00000100 49 f6 bc 3f 4c ef 38 c4 f3 55 04 e5 1e c1 12 de |I..?L.8..U......| 00000110 5c 38 4d f7 ba 0b 8d 57 8a 4c 70 2b 6b f1 1d 5f |\8M....W.Lp+k.._| 00000120 ac 00 00 00 00 f9 be b4 d9 d7 00 00 |............| 0000012c
Output count
01 is the number of outputs, in variable length integer format.
Output value
00 f2 05 2a 01 00 00 00 (hex 0x000000012a05f200, decimal 5000000000) is the number of "bitcoins" being sent. Actually, it's the number of base units, as one bitcoin is 100000000 base units. Thus, this transaction is sending 50 BTC. This is the block reward. This field is a fixed 8 bytes, which is what sets the maximum divisibility of a bitcoin. If a single bitcoin ever became so valuable that the granularity of individual base units became too coarse, this is the field that would need updating. I would change it to use some form of arbitrary precision representation.
Challenge script length
43 (decimal 67) is the length of the response script that follows.
Challenge script
41 04 67 8a fd b0 fe 55 48 27 19 67 f1 a6 71 30 b7 10 5c d6 a8 28 e0 39 09 a6 79 62
e0 ea 1f 61 de b6 49 f6 bc 3f 4c ef 38 c4 f3 55 04 e5 1e c1 12 de 5c 38 4d f7 ba 0b
8d 57 8a 4c 70 2b 6b f1 1d 5f ac is the second half of the script: the response to the challenge. This is what proves that you have the right to spend the input transaction. Breaking it down into opcodes:
41 (decimal 65) is the opcode telling the script interpreter that the next 65 bytes is data to be pushed onto the stack. There are only 66 remaining bytes in the script, so skipping forward to the last one we find ac (decimal 172), the opcode OP_CHECKSIG. This tells the interpreter to look at the previous two stack items, treat them as a signature and a public key, and verify that the signature was made by the private key corresponding to the public key. But where does this other stack item come from if there was only one item pushed onto the stack (the 65 bytes of data)? It comes from the response script of the transaction which redeems this transaction: it is a signature of the next transaction (broadly speaking; see the detail). The script in this transaction sets the challenge: given a public key, what signature causes OP_CHECKSIG to return true when verified against the next transaction? Only the holder of the corresponding private key can generate such a signature. This is what prevents other people spending your bitcoins.
What is the content of the 65 bytes of data? It's a standard representation of an ECDSA public key: the byte 04 followed by a pair of 32-byte big-endian integers representing the coordinates of a point on the secp256k1 curve. Anyone interested in saving space in their blockchain could probably omit the 04 and imply its presence from the transaction version number.
This public key is more commonly represented as the RIPEMD-160 hash of the SHA-256 hash of the public key, concatenated with a version number and a checksum - all represented in Base 58 [details]. In this case, 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa. This is Satoshi's first Bitcoin address. As you can see on blockexplorer, these bitcoins have not yet been redeemed...
Lock time
00 00 00 00 is the lock time. All transactions currently set this to zero as the feature is not currently implemented. The intention (main.h, line 40) is that a number up to 500000000 represents a block number, and higher numbers represent a UNIX timestamp at which the transaction will become locked and immune to replacement.
Fun fact: block number 500,000,000 is approximately the year 11,515. UNIX timestamp 500,000,000 is approximately 11th April 1985, safely well before the start of the current block chain.
What next?
If you look back at the hexdump at the top of this article, you'll see that the next bytes are f9 be b4 d9 again, indicating the start of the next block. As of January 2012, there are more than 160,000 blocks following this. Block number 1,000,000 will be in 2031, when more than 99% of the Bitcoins that will ever exist will have been generated.
Notably absent
There is no block number. When you see a block number on sites like blockexplorer, this is just the offset of a particular block in the longest valid chain.
Conclusion
I hope somebody finds this post useful; writing it helped me achieve my goal of learning about the Bitcoin protocol. Of course, any corrections gratefully received.



































