Hashing refers to the process of generating a fixedsize output from an input of variable size. This is done through the use of mathematical formulas known as hash functions (implemented as hashing algorithms).
Although not all hash functions involve the use of cryptography, the socalled cryptographic hash functions are at the core of cryptocurrencies. Thanks to them, blockchains and other distributed systems are able to achieve significant levels of data integrity and security.
Both conventional and cryptographic hash functions are deterministic. Being deterministic means that as long as the input doesn’t change, the hashing algorithm will always produce the same output (also known as digest or hash).
Typically, the hashing algorithms of cryptocurrencies are designed as oneway functions, meaning they cannot be easily reverted without large amounts of computing time and resources. In other words, it is quite easy to create the output from the input, but relatively difficult to go in the opposite direction (to generate the input from the output alone). Generally speaking, the more difficult it is to find the input, the more secure the hashing algorithm is considered to be.
How does a hash function work?
Different hash functions will produce outputs of differing sizes, but the possible output sizes for each hashing algorithm is always constant. For instance, the SHA256 algorithm can only produce outputs of 256 bits, while the SHA1 will always generate a 160bits digest.
To illustrate, let’s run the words “Binance” and “binance” through the SHA256 hashing algorithm (the one used in Bitcoin).
Note that a minor change (the casing of the first letter) resulted in a totally different hash value. But since we are using SHA256, the outputs will always have a fixed size of 256bits (or 64 characters) – regardless of the input size. Also, it doesn’t matter how many times we run the two words through the algorithm, the two outputs will remain constant.
Conversely, if we run the same inputs through the SHA1 hashing algorithm, we would have the following results:
Notably, the acronym SHA stands for Secure Hash Algorithms. It refers to a set of cryptographic hash functions that include the SHA0 and SHA1 algorithms along with the SHA2 and SHA3 groups. The SHA256 is part of the SHA2 group, along with SHA512 and other variants. Currently, only the SHA2 and SHA3 groups are considered secure.
Why do they matter?
Conventional hash functions have a wide range of use cases, including database lookups, large files analyses, and data management. On the other hand, cryptographic hash functions are extensively used in informationsecurity applications, such as message authentication and digital fingerprinting. When it comes to Bitcoin, cryptographic hash functions are an essential part of the mining process and also play a role in the generation of new addresses and keys.
The real power of hashing comes when dealing with enormous amounts of information. For instance, one can run a big file or dataset through a hash function and then use its output to quickly verify the accuracy and integrity of the data. This is possible because of the deterministic nature of hash functions: the input will always result in a simplified, condensed output (hash). Such a technique removes the need to store and “remember” large amounts of data.
Hashing is particularly useful within the context of blockchain technology. The Bitcoin blockchain has several operations that involve hashing, most of them within the process of mining. In fact, nearly all cryptocurrency protocols rely on hashing to link and condense groups of transactions into blocks, and also to produce cryptographic links between each block, effectively creating a blockchain.
Cryptographic hash functions
Again, a hash function that deploys cryptographic techniques may be defined as a cryptographic hash function. In general, breaking a cryptographic hash function requires a myriad of bruteforce attempts. For a person to “revert” a cryptographic hash function, they would need to guess what the input was by trial and error until the corresponding output is produced. However, there is also the possibility of different inputs producing the exact same output, in which case a “collision” occurs.
Technically, a cryptographic hash function needs to follow three properties to be considered effectively secure. We may describe those as collision resistance, preimage resistance, and second preimage resistance.
Before discussing each property, let’s summarize their logic in three short sentences.

Collision resistance: infeasible to find any two distinct inputs that produce the same hash as output.

Preimage resistance: infeasible to “revert” the hash function (find the input from a given output).

Secondpreimage resistance: infeasible to find any second input that collides with a specified input.
Collision resistance
As mentioned, a collision happens when different inputs produce the exact same hash. Thus, a hash function is considered collisionresistant until the moment someone finds a collision. Note that collisions will always exist for any hash function because the possible inputs are infinite, while the possible outputs are finite.
Put in another way, a hash function is collisionresistant when the possibility of finding a collision is so low that it would require millions of years of computations. So despite the fact that there are no collisionfree hash functions, some of them are strong enough to be considered resistant (e.g., SHA256).
Among the various SHA algorithms, the SHA0 and SHA1 groups are no longer secure because collisions have been found. Currently, the SHA2 and SHA3 groups are considered resistant to collisions.
Preimage resistance
The property of preimage resistance is related to the concept of oneway functions. A hash function is considered preimageresistant when there is a very low probability of someone finding the input that generated a particular output.
Note that this property is different from the previous one because an attacker would be trying to guess what was the input by looking at a given output. A collision, on the other hand, occurs when someone finds two different inputs that generate the same output, but it doesn’t matter which inputs were used.
The property of preimage resistance is valuable for protecting data because a simple hash of a message can prove its authenticity, without the need to disclose the information. In practice, many service providers and web applications store and use hashes generated from passwords rather than the passwords in plaintext.
Secondpreimage resistance
To simplify, we may say that the secondpreimage resistance is somewhere in between the other two properties. A secondpreimage attack occurs when someone is able to find a specific input that generates the same output of another input that they already know.
In other words, a secondpreimage attack involves finding a collision, but instead of searching for two random inputs that generate the same hash, they search for an input that generates the same hash that was generated by another specific input.
Therefore, any hash function that is resistant to collisions is also resistant to secondpreimage attacks, as the latter will always imply a collision. However, one can still perform a preimage attack on a collisionresistant function as it implies finding a single input from a single output.
Mining
There are many steps in Bitcoin mining that involves hash functions, such as checking balances, linking transactions inputs and outputs, and hashing transactions within a block to form a Merkle Tree. But one of the main reasons Bitcoin blockchain is secure is the fact that miners need to perform a myriad of hashing operations in order to eventually find a valid solution for the next block.
Specifically, a miner has to try several different inputs when creating a hash value for their candidate block. In essence, they will only be able to validate their block if they generate an output hash that starts with a certain number of zeros. The number of zeros is what determines the mining difficulty, and it varies according to the hash rate devoted to the network.
In this case, the hash rate represents how much computer power is being invested in Bitcoin mining. If the network’s hash rate increases, the Bitcoin protocol will automatically adjust the mining difficulty so that the average time needed to mine a block remains close to 10 minutes. In contrast, if several miners decide to stop mining, causing the hash rate to drop significantly, the mining difficulty will be adjusted, making it easier to mine (until the average block time comes back to 10 minutes).
Note that miners don’t have to find collisions because there are multiple hashes they can generate as a valid output (starting with a certain number of zeros). So there are several possible solutions for a certain block, and miners only have to find one of them – according to the threshold determined by the mining difficulty.
Because Bitcoin mining is a costintensive task, miners have no reason to cheat the system as it would lead to significant financial losses. The more miners join a blockchain, the bigger and stronger it gets.
Closing thoughts
There is no doubt that hash functions are essential tools in computer science, especially when dealing with huge amounts of data. When combined with cryptography, hashing algorithms can be quite versatile, offering security and authentication in many different ways. As such, cryptographic hash functions are vital to nearly all cryptocurrencies networks, so understanding their properties and working mechanisms is certainly helpful for anyone interested in blockchain technology.