Basics in Cryptography: What is a password hash?

What is a password?

Passwords have become a mainstay in the world of technology and also everyday speech. From your email account to children’s story books about buried treasure the request for a password has become so commonplace that we put little thought into what a password actually is, let alone a password hash.

Passwords are technically defined by NIST as “A string of characters (letters, numbers and other symbols) that is used to authenticate an identity, to verify access authorization or to derive cryptographic keys.” We are all fully up to speed on how to enter a password in order to access our favorite websites and accounts, but the practice of keeping these credentials stored securely can be a bit more tricky.

As users it is crucial to never reuse passwords, always choose strong passwords, and be sure to change them from time-to-time. One way of creating and using unique and secure passwords across all of your accounts is by utilizing a password manager. By integrating a password manager into your privacy hygiene practices, you are already more secure than the vast number of internet users who have not yet taken this essential step.

Unfortunately, you cannot protect yourself 100% as your login credentials are being handled by the services you sign up for. This is where data leaks enter the story. We’ve become all to familiar with news articles saying something along the lines of “the breach includes usernames, dates of birth, email addresses, phone numbers, and password hashes.” While most of these terms are easy to understand, that last one may be a source of confusion. To clarify, let’s take a look at how companies and online services actually handle the data you submit when creating a new account.

The problem of storing passwords securely.

Imagine you are a growing online forum and as more and more people rush to sign up for your service, you need to determine a secure way to keep your user’s login information safe and secure. After all, if someone steals their login info, you’re going to have unhappy users.

To better understand this, let’s look at what exactly happens when you try to login to your favorite website. First, you visit their login page in your browser and are prompted to enter a username/email address and a password. You enter this information and click “Login”. What happens next gets a little complicated.

When choosing your password, it is not (hopefully) stored in plaintext at service in question. If the services stored your passwords in clear text, this would run the risk of exposing your login credentials in case the service gets compromised. Instead, the password you have chosen should be hashed and the hash should be stored in a secure database operated by the service. When you go to login to the new account, the email address you submit and the hash of your password are compared to the information stored in the database. Because these match, voila!, you have gained access to the account. However, someone who hacks the service will only get hashes that they can’t do anything with and can’t log into your account.

At Tuta we are using Argon2 to derive the key that unlocks your encrypted data (password key). When we switched from bycrypt to argon2 we also increased the length of the generated keys to 256 bits making them post-quantum secure as well. The password key is not used to authenticate you with the Tuta server, instead we hash it using SHA256 to become the password verifier, which is transmitted to the server and hashed again before being stored in the database. For more details on how the password is secured in Tuta check here

A visualization of how we handle passwords and encryption keys.

So what is it that makes storing a hash more secure than storing a plaintext password? The answer is simple additional protection in the event of a breach incident.

What is password hashing?

Password hashing is the process of taking a user created password and running it through a hashing algorithm to scramble up the bits and create a unique representation of the chosen password. If you do not know the password, it will ideally not be possible to replicate that exact same hash.

There are many types of hashing algorithms, but they all perform a similar function. They take a plaintext password which is considered a string of data with an arbitrary length (think of the different lengths of your passwords), this string is then converted into bits of data which are scrambled up deterministically (meaning that the same password will always result in the same hash) according to the hash function being used. This hash is then stored in a secure database.

It is important to remember that hashing is not the same thing as encrypting. There is no key which makes it possible to decrypt a hash. Hashing is a one-way street intended to move only from plaintext to hash. For those who might be culinarily inclined, let’s take the potato for an example. The peeled potato is our plaintext, once we hash it and turn that sucker into a delicious plate of hashbrowns this is our hash. Just like you can’t transform hashbrowns back into the original potato, you cannot go transform a hash back into the original password.

Hashing functions are a one-way street

A hashing function is a one-way street only. Unlike encryption, no keys are generated for returning a hash back to the original plaintext.

When someone talks about cracking passwords, they are not “decrypting” anything, but rather creating their own list of hashes based upon commonly used passwords and then comparing their known results to information leaked in data breaches. This becomes a matter of cross-referencing and doesn’t require deep knowledge of cryptanalysis, just basic logic.

What is a salted password hash?

We spoke earlier about salting hashes. Salting is a process where random data is added to an input before it is processed by a hashing algorithm. As the salt is unique for every password this protects against attacks which aim to utilize precomputed hashes because, even if the salt values would be included in a data breach, the attacker would have to precompute values for each individual salt which makes the attack far more expensive. Additionally, if passwords are not salted, if you and I are using the same password, the hashes will appear the same in the database due to the deterministic nature of password hashing algorithms. By salting them, we end up with different final outputs, which prevents hackers from quickly picking out accounts with identical passwords.

An example of the difference between hashed passwords with and without salting.

Salting not only makes food taste better, but it also strengthens your password hashes!

Cryptographic salts can be derived from things like a username or other unique identifiers which are specific to each account. The element of uniqueness is key here because it makes the use of precomputed table attacks far more resource demanding than would be reasonable.

Problems and Types of Attacks

Hashing is of course not perfect, but there are a number of known attacks which are taken into consideration when these functions are being created by cryptographers.

Birthday Attacks

Every password hashing algorithm is vulnerable to an attack known as a birthday attack. These take advantage of a mathematical probability problem known as the birthday problem. A more precise writeup of the mathematics behind this problem is available here. Fortunately, these attacks are not considered to be any faster than a brute-force attack.

Collision Attacks

A collision attack occurs when two different passwords (plaintext strings) are run through a hashing algorithm and return a matching hash. This is problematic because it means that an incorrect password, combined with the correct username could result in a successful login.

If a collision attack is discovered to be faster and more efficient than a birthday attack, the password hashing function can be considered broken and should be replaced with a more resilient algorithm.

Brute-force Attacks

Brute-force attacks are a potential vulnerability for every cryptosystem. Given an infinite amount of time and an infinite amount of computing power, a hash can be discovered. Because of this fact, they are considered a litmus test for the severity of other vulnerabilities. Should an attack be determined to be faster than a brute-force, the algorithm in question must be either patched or abandoned.

There are a number of password hashing algorithms which have been developed and they each have their own unique strengths and weaknesses. Unfortunately a number of the most popular hashing algorithms have been shown to be vulnerable, but they are still regularly used (as this YouTube video explains) and are often found in breach datasets.

  1. MD5: Considered to be a long outdated algorithm, this hashing function is still a regular presence when new breach datasets reach the public. MD5 was developed by MIT professor Ronald Rivest in 1991 as a replacement for MD4. It produces a 128-bit hash, but was quickly shown to be susceptible to collision attacks. In 2013, researchers were able to prove that the algorithm was no longer secure and should be replaced. Unfortunately, MD5 hashes continue to surface in data breaches, meaning that your data can be exposed by no fault of your own.

  2. SHA: The recommended replacement for MD5 was the Secure Hash Algorithms family of hashing functions. SHA-1 was introduced in 1995 after development by the NSA and generates 160-bit hashes. The algorithm was considered to be insecure against advanced threat actors in 2005, which posed a particular threat for cases when it was being used to create digital signatures to verify authenticity. SHA-2 was published prior to this in 2001 and officially replaced its predecessor in 2011 when NIST introduced new minimum security requirements. Unfortunately, their roll-out was slowed by a lack of backwards compatibility. Most recently, SHA-3 was released in 2015 with an entirely different structure than the previous iterations which were similar to MD5. SHA-2 and -3 both support the same hash lengths of 256 to 512 bits depending on the settings chosen.

  3. bcrypt: Introduced in 1999, bcrypt is a password hashing algorithm that introduced default salting as well as being adaptive. This means that over time, it can continue to resist brute-force attacks. Bcrypt builds upon the Blowfish cipher which was developed by Bruce Schneier.

  4. Argon2: Argon2 has been the winner of the Password Hashing Competition 2015. It is not only side-channel resistant but is also memory-hard which makes it more resistant to brute-force attacks than bcrypt.

At Tuta we are leading the way forward in security and privacy. Our move to Argon2 made us the first email provider using the strongest form of key derivation currently available. By putting cryptography and security first, we are building privacy you can trust. Combined with our zero-knowledge architecture, we don’t have access to the keys required to decrypt or view data. Your privacy is in the palm of your hand.

Take back control of your data today by making the switch to Tuta and turning on privacy.