There has been a lot of discussion around blockchain in recent times. It’s been touted as a solution for everything from the financial system to aviation maintenance records, but the details of how it works has always been hidden under a haze of magical technomancy. In this article I take a deep dive into what a blockchain is and build a simple (but actual) blockchain from scratch.
The first problem you need to overcome in any research into blockchain is that most of it is mired in the feverish discussion around Bitcoin (or other AltCoins). Bitcoin uses a blockchain implementation for its ledger, but the idea of a blockchain itself is more general. You could use a blockchain for your grocery lists (remember to wear your tin foil hat to the supermarket though).
At it’s heart, a blockchain is a distributed, verifiable list. It’s distributed so there is always more than one copy of it in circulation, much like a pirated Justin Beiber album. However the canonical version of a blockchain is verifiable. This means that you can ensure that there will be only one version of the final chain that is accepted by everyone.
To begin let’s define a simple python class called
Block which creates the core of our Blockchain implementation.
class Block: def __init__(self, block_id, payload, prev_hash): self.block_id = block_id self.timestamp = datetime.now() self.payload = payload self.prev_hash = prev_hash self.generate_hash() def __repr__(self): return str(self.block_id) + str(self.timestamp) + str(self.payload) + str(self.prev_hash) def generate_hash(self): self.hash = sha256(repr(self).encode('utf-8')).hexdigest()
The key feature that this class provides thus far is backward variability. We achieve this by generating a cryptographic hash that positively identifies this Block. The
generate_hash function first gets a textual representation of the block by using the magic python function
__repr__ which gets called when you do a
repr(self). This is the idiomatic Python (or Pythonic) way of doing things and it makes sense when taken together with the standard Python library… but I digress.
The key thing to notice here is that the previous block’s hash is also concatenated to the text that is hashed by this block. This is why the constructor requires the previous block’s hash. So how do you build the first block? Simply by hard coding the hash of the previous (non existent block) to some value (that value could be empty). Let’s call the first block the
genesis = Block(0, 'The Genesis Block', 'InTheBegginingThereWasNoHash') print(genesis.hash) # 22964c842772cd04923da00b0d599c65d63cc63611fd53e6fcb852453ea0f68a
Every subsequent block we build will require the actual hash of the previous block. Let’s add two more blocks.
genesis = Block(0, 'The Genesis Block', 'InTheBegginingThereWasNoHash') first = Block(1, 'The First Block', genesis.hash) second = Block(2, 'The Second Block', first.hash)
Notice how we pass the previous block’s hash into the new block we create? This create a chain of blocks hence “blockchain”. Now that we have a few blocks in our chain we can do something useful with them, like verifying that the chain is valid. Let’s write a function to do that and put our blocks in a dictionary for easy access.
blocks[genesis.hash] = genesis; blocks[first.hash] = first; blocks[second.hash] = second; def validate(block): if block.hash == genesis.hash: return True current_block = block while True: prev_block = blocks.get(current_block.prev_hash, None) if prev_block == None: return False if prev_block.hash == genesis.hash: return True current_block = prev_block
The above function allows us to verify the authenticity of a block all the way back to the genesis block.
print(validate(second)) # True
It also ensures that an adversary can’t fake a block.
badblock = Block(99, 'A Criminal Block', 'AFakeHash') print(validate(badblock)) # False
But it doesn’t stop him from creating a valid (but duplicate) second block off the previous block like so:
doppelganger = Block(2, 'The Real Second Block?', first.hash) print(validate(doppelganger)) # True!
Now, what we need is a method to stop duplicate blocks from being created. This is called a Proof of Work system. In short, we require anyone creating a block to do some specific non trivial calculation. A puzzle of sorts, that is defined in the previous block. In order to create a new block you need to first grab the parent block and recover the puzzle. Let’s add that to the implementation.
class Block: def __init__(self, block_id, payload, prev_hash): self.block_id = block_id self.timestamp = datetime.now() self.payload = payload self.prev_hash = prev_hash self.generate_hash() proof = repr(self) + random.choice([''.join(p) for p in permutations('ABCDEF')]) self.next_proof_hash = sha256(proof.encode('utf-8')).hexdigest() def __repr__(self): return str(self.block_id) + str(self.timestamp) + str(self.payload) + str(self.prev_hash) def generate_hash(self): self.hash = sha256(repr(self).encode('utf-8')).hexdigest()
The key is the two new lines of code added to the constructor which creates an additional field called
proof = repr(self) + random.choice([''.join(p) for p in permutations('ABCDEF')]) self.next_proof_hash = sha256(proof.encode('utf-8')).hexdigest()
What this does is that it first generates a random permutation of the string ‘ABCDEF’ (behold the power of list comprehensions) and then hashes it together with the contents of the current block. This creates an irrefutable puzzle that bears the signature of the block. The challenge then is to generate the same permutation combinatorially (which is O(n!), quite a tough challenge as the length of the permuted string increases). In reality there are specially crafted proof of work algorithms, I’ve used a permutation just to keep things simple.
The important thing to remember at this point is that blockchains are distributed. You might have multiple processors running the code to work out the puzzle in parallel. This creates a sort of race to the finish and it’s this race which ensures that a single adversary can’t game the system. Let’s create a parallel implementation of the proof of work solver to see this in action.
def proof_of_work(new_block): sleep(random.randint(10, 100) / 1000.0) # Simulate randomness parent_block = blocks[new_block.prev_hash] for p in permutations('ABCDEF'): text = repr(parent_block) + ''.join(p) attempt = sha256(text.encode('utf-8')).hexdigest() if parent_block.next_proof_hash == attempt: for key, value in blocks.items(): if value.block_id == new_block.block_id: return blocks[new_block.hash] = new_block third = [Block(3, 'The Third Block - Option A', second.hash), Block(3, 'The Third Block - Option B', second.hash), Block(3, 'The Third Block - Option C', second.hash), Block(3, 'The Third Block - Option D', second.hash)] pool = Pool(4) result = pool.map(proof_of_work, third) # Only 1 third block option will succeed
What would happen if two processors both solve the puzzle at about the same time? They would begin to update the distributed copies of the block chain and since the system is distributed some nodes would accept one option for block 3 and other nodes another option. This is solved by a relatively simple rule that nodes are required to switch to the longest chain available. This means that any duplication is unlikely to remain for long (as the probability of multiple duplication is very low).
This rule, together with the proof of work makes it very difficult for a single adversary to add blocks to the chain. To even stand a 50% chance of adding a single block would require half the computing power of the whole network (and the chances degrade as the number of chained blocks needed increases).
The proof of work system has some interesting side effects on the real world. Bitcoin uses the proof of work system to both ensure verifiability of the chain and to act as a money creation mechanism in the Bitcoin ecosystem. Money is created as a reward for doing the proof of work. It’s what people are doing when you hear that they are mining. This draws energy, actual electricity from the grid and that in turn affects the planet.