Blockchain Explained: How Does Immutability Work?


Facebook Twitter Link

Blockchains are designed to be immutable. Once a block is written to a blockchain, realistically, it cannot change. In comparison, think about an Excel spreadsheet or a database table, both made up of rows and columns. One may: add more rows, add new columns, edit existing rows, and delete entire columns.

In short, spreadsheets and databases are mutable. A blockchain is designed to be immutable; once a piece of information goes in there, you can depend on it never changing. (Note, this is not 100% true, but the complexities involved in rewriting a blockchain are enormous, the details of which we'll address in this article.)

The trusted nature of blockchain is one of its magic pieces. You can believe that data on the blockchain is legitimate, having been validated by multiple participants in the network.

An Intro to Cryptographic Hashes

To understand how this works, we’ll need to understand a little bit about cryptographic hashes. (Don't let the idea of cryptography scare you off. We'll be staying at a very high level in this discussion.) A hash is a set of mathematical operations which are designed to output a "digital signature" on any digital data. This digital signature is called a checksum.

Developers have access to several different "hash functions" in most modern programming language nowadays. Input some set of bytes, and a hash function will output a unique checksum signature. The magic of how this works is much beyond most people's comprehension (including me) since mathematicians much smarter than you or I invented them.

One flavor of hash functions is SHA-2, or Secure Hash Algorithm 2, which was developed by the NSA. Under this umbrella are several distinct implementations: SHA-224, SHA-256, SHA-384, etc. In the world of blockchains, SHA-256 is one of the more popular. The implementation details are unimportant for our discussion, but just remember that we will feed data into a hash function and some unique signature, a checksum, will pop out.

As an example, let's look at these three lines of Python code. The hashlib package from Python's standard library gives us access to sha256 and other SHA-2 functions. This algorithm is available for pretty much every language under the sun. Into this function, we’ll input the string: “hashes are cool…” 

>>> import hashlib
>>> h = hashlib.sha256('hashes are cool...')
>>> h.hexdigest()

blockchain explained sha256 hashing

From this function, we get a string of 64 characters — that’s it. No matter what size of the input you give it, you’re always going to get the same fixed-length string output. If I fed 5Mb of binary data to this same function, I would still receive a 64 character signature. To prove this, I'll put in a completely new string:

>>> h = hashlib.sha256('really, I mean it. I am telling the truth!')
>>> h.hexdigest()

An interesting and important property of hash functions is that the output is guaranteed to be unique. If you give me two files which appear to be the same (images, big text files, etc.), I could tell you with absolute certainty whether those files were 100% identical by computing the checksum of both and looking for equality. You could have an image and change just a single pixel by one hex value. Looking at the images side-by-side, you wouldn’t be able to tell the difference, but the hashes would show that they’re different.

A tiny change to the input will produce an entirely different and unrelated hash output. The original string 'hashes are cool...' produced a hash output of '49240b3cc693fd281422bbcabb5f207ae2a390003534989fb55080799ee08d8c'.

I'll now add one more period to the end of that sentence and compare the result.

>>> h = hashlib.sha256('hashes are cool....')
>>> h.hexdigest()

We can see that the new hash is entirely different with the addition of a single character (the extra "."). What is even more fascinating — and important for this discussion — is that, given the resulting hash output, it's computationally impossible to go in the other direction and determine what input produced a given checksum.

If I handed a developer the checksum
'6c3e6636db1eb2fa8fcf8d4fdbf33bff4129ccb985c24848ef25311d506dde73' and asked them to give me the input value which produced this, they couldn't.

However, if I gave a developer the string 'hashes are cool....' and asked them to give me the SHA-256 signature, they'd give me the same value. Another important and interesting property of hash functions is that they are relatively fast. Running a SHA-256 function on some body of text is practically instantaneous.

The critical property to remember here is that hashes are easy to verify given some input, but it's impossible to find the input which produces a known or preexisting hash value.

How are Facts Guaranteed to be Secure?

A blockchain is a data structure made up of blocks of data, and they are linked together; hence the chain. Each block is made up of multiple "transactions" or simply, "facts." In the case of Bitcoin, the facts stored describe the transfer of Bitcoin from one address to another. Blockchain software will calculate checksums for each block. Data fed into this calculation includes many things, including the checksum of the prior block. Including the prior block’s checksum in the calculation for a new block is what makes rewriting a blockchain near impossible.

In Bitcoin, every new transaction floats in a "transaction pool" which is a collection of "facts" that are on-deck to be written to the blockchain. Once enough transactions have accumulated, Bitcoin nodes begin the process of creating a new block and adding it to the chain. This action, in effect, creates a permanent record of these previously unvalidated transactions.

You've likely heard the term "mining." Mining means taking several inputs and hashing them together along with some random data to attempt to produce a checksum that conforms to a certain format. This format will look something like this:


That is, it will start with many zeros. We've already discussed how it's impossible to predict the checksum using a SHA-256 algorithm. To find a checksum with many leading zeros, miners take the data which is to be verified (transactions, block number, previous block's checksum, etc.) and add in random data, in the hopes of producing a hash which looks like this and, "solves the puzzle." Each iteration in the mining loop is a simple matter of changing the random data and calculating a new checksum. We've already seen how a single character will completely change the resulting checksum. For this discussion, the critical point to remember is that the previous block's checksum is used when generating a checksum for the current block.

So now, imagine the case that a malicious participant in the network would like to alter history in the Blockchain. The malicious user would like to add a transaction in Block 2 which states:

User John Doe has sent 10,000 Bitcoin to user Brian Zambrano

Sign up for our newsletter

Join 10,000+ subscribers to get the latest IoT development news delivered to your inbox.

To accomplish this task, the user would need to go back to Block2 and compute a brand new Block2 checksum. This exercise by itself would require significant computing power since solving the hashing puzzle is computationally expensive by design. Assuming the user was able to complete this successfully, the malicious user would then need to recalculate a valid checksum for Block3. Remember, Block3's checksum is based on the original Block2's checksum. By changing Block2, Block3, and every block after it are instantly invalid and would need new checksum in order to be considered valid. Every block’s state of validation depends on all the blocks before it. The longer the blockchain gets, the harder it is to change.

The Bitcoin network is a public distributed network, meaning nobody "owns" the blockchain. Nodes in the network work on the principle that the longest valid blockchain wins. As the malicious user starts their attack, it would be impossible to rewrite history and recalculate new valid blocks after the targeted block which was altered. The malicious user would need more computational power than the entire blockchain network. Or, a majority of the current network would need to band together and decide to be evil.

Currently, there is a lot of computing power participating in Bitcoin and other public blockchains. Unless quantum computers become a commodity, blockchains will not be overwritten by a brute force attack like this. There is a very, very small chance that one node could overwrite a couple of blocks at the tip of the chain. However, quickly, nodes that play by the rules would outpace the bad node since they can calculate valid nodes faster than the bad node. Eventually, the correct nodes would produce and agree upon a longer blockchain than what the evil node created, and the network would discard the evil node's blockchain in favor of the correct version.

Want to learn more about how blockchains work? Check out this open-source book called Mastering Bitcoin.