Password Storage Strategies

Written Thursday, October 31st, 2024

Topics: Passwords, Password storage, Hashing, Security

Disclaimer
Preamble
Plain text
Encryption
Hashing
Hashing + Salt
Hashing + Salt + Pepper
Conclusion

Disclaimer

The best way to store passwords for your website is to let someone else handle it. Having options like "Sign in with Google" lets you offload the hard part to the big guys who know what they're doing, and can pay for the legal fees when they screw up. That said, there are always concerns to be had about privacy and reliability when relying on a single/few large companies for logins. However, that is not the subject of this article.

If you do decide to handle password storage yourself, do not use this article as a guide. This is intended to be an introduction and overview of the different ways to securely (or insecurely) store passwords. It is not a how-to guide. Do your reasearch and figure out what the best, current standards are, and use those.

Preamble

Password storage is a subject that has interested me for a while, probably since I first watched Computerphile's video ft. Tom Scott on the subject (you'll notice some similarities between this article and that video. that is not a coincidence).

I think what gets me about the subject is the initial question of: How do I store the user's password in a way that will let them sign in, while also being as secure as possible, so that in the event of a data breach, the user is mostly protected?

It's not an easy question to answer, as we'll soon find out. It also doesn't help that people in general have bad passwords. Forgetting passwords is a pain, and so people tend to want to use passwords that are easy to remember. Unfortunately, passwords that are easy to remember are also not good passwords, as they're typically short, easy to guess, and often re-used. That's why password managers are more secure, even though they represent a fairly significant single point of failure. One strong, hard to crack password is better than a bunch of weak, easy to crack ones, and is WAY better than the same password re-used over and over again. This article isn't about password managers though, it's about a different kind of password storage.

Specifically, I'll be talking about the different options there are for storing your users' passwords for your website. We'll be going in order of least to most secure, and talking about the strengths and weaknesses of each one.

Plain text

This is by far the easiest method to implement, and is also the worst. The concept is dead simple: When the user signs up, take their password, and then put it into the database as-is, in plain text. When the user tries to sign in, just compare the password they give against the password you have stored, and if they match, let them in. This method also has the added bonus of being able to just send the user their password if they forget it, no need to change it to a whole new one!

The disadvantage being that as soon as anyone gains unauthorized access to your database (which is probably quite easy, seeing as how you're clearly not very security-minded if you're storing passwords in plain text), they now have immediate access to all of your users' passwords, no extra work required. Heck, it doesn't even have to be unauthorized. A disgruntled (or worse: incompetent) employee could leak them with ease. When that inveitably happens, now every other account that the user used that password for has also been compromised.

It should go without saying, but do not do this. This is the worst possible way to store passwords.

Encryption

If we don't want the passwords to be stored in plain text, then why don't we encrypt them? After all, modern encryption is effectively unbreakable, so even if the encrypted passwords get leaked, nobody will be able to tell what they are, right?

This is also a very bad idea, for a few reasons. First, the password still needs to be unencrypted at some point in order to check it during login attempts. This means the passwords will still exist on the system in plain text, even if for a short amount of time. A bad actor inside the company would have no trouble at all exploiting that.

The other major issue is that since the passwords need to be decrypted, there is a key somewhere to do that. That key is probably on the same server where the passwords are stored. In the event of a breach where the encrypted passwords are leaked, it would likely be trivial for the attacker to also find the key used to encrypt those passwords.

Ultimately, any method that leaves it possible to recover the original password is going to have these same flaws, or at least very similar ones. That brings us to the next method...

Hashing

A hash function takes in data of an arbitrary size, and transforms it into a different, fixed-length value, usually called a hash or a digest. The same input data will always yield the same output hash. These are quite useful in computer science for reasons we won't get into here. A cryptogtaphic hash function is a hash function that has some other special qualities that make it useful for cryptography. The most important of those qualities are:

Any input data should produce a unique output hash, and even very similar inputs should yield entirely different hashes. This isn't technically possible, since our output is a fixed size, and our input could be literally anything. An infinite number of inputs with only a finite number of outputs mean that outputs have to repeat at some point. However, a strong hashing algorithm makes this astronomically unlikely.

It should not be feasible to find two input values that yield the same hash. When two inputs to a hashing algorithm yield the same output, this is called a collision. If it is at all possible to generate a collision intentionally, that hashing algorithm is considered broken, and should not be used for cryptography. Some examples of once widely-used, now broken hashing algorithms are MD5 and SHA-1.

It shouldn't be too fast. If it is, it makes it easier for an attacker to brute-force a hash by just calculating the hash for a bunch of different possible values. We'll talk a bit more about this later.

Perhaps the most important for our purposes, it should not be possible to derive the original input value from the computed hash. A secure cryptogtaphic hashing algorithm is one-way, so that if somebody wanted to find out what the original input value was for a give hash, their only option would be to try every possible input value until they find the right one.

You might already see where we're going with this. What if we take our passwords, and hash them before storing them? When the user wants to log in, we just take the password they're trying to log in with, hash it, and compare it to the hashed password we have stored. If we use a strong cryptogtaphic hashing algorithm, then even if the password hashes did get leaked, there would be no way for an attacker to find out what the passwords are, since they're all hashed!

Alas, this too is insufficient. Mainly because of something we talked about earlier, that being that people are bad at making passwords. Because so many people are so bad at coming up with passwords, many different people will all use the same bad password for their accounts. The Pwned Passwords database at haveibeenpwned.com lists over 10,000,000 occurences of people using the word "password" as their password, and that's only in databreaches that we know about.

Since a hash function will always output the same hash for the same input, this means that an attacker only has to crack the hash of one commonly-used password, and now every account that uses that password has been compromised as well. This is far from ideal.

Hashing + Salt

We need a way to deal with this password-reuse problem, and there's actually a fairly simple way to do that. What if before we hash someone's password, we add a few random characters on the end. Then, we just store those characters alongside the password hash, so that they can be used when checking passwords for login attempts.

Since even a small change in a hash function's input should yield an entirely different hash, this effectively means that every user's password hash will be unique, even if they share the same password. This little string of characters is called a salt, and salting our password hashes like this pretty much eliminates the password-reuse problem we talked about.

This is more or less what modern, secure password hashing algorithms like bcrypt do, although they add one more important step as well. Instead of hashing the password just once and calling it a day, the password is hashed multiple times, taking the output of each cycle and feeding it back into the hash function. Ideally, this is done as many times as possible(4,000+ rounds). This drastically increases the cost of calculating the hash. For the server storing the passwords, this usually isn't an issue. Assuming it takes ~0.1 seconds to hash the password, the server can handle this just fine, as it doesn't need to calculate the hash all that often, only when a user creates an account or tries to log in. But for an attacker trying to brute-force the hash, and who needs to calculate as many hashes as possible as quickly as possible, this adds a significant time cost. Think of it as the attacker having to break through a brick wall to get to the password. It's a tough job, but if the attacker has something like explosives(expensive and fast computer hardware), it can be done. By running our hash through multiple rounds, we're adding even more brick walls between the password and the attacker. Even with explosives, that's going to take a while to get through.

Hashing + Salt + Pepper

I'm starting to notice a theme with these names...

A pepper (also called a secret key) serves a very similar role to a salt, but it has one key difference. While a salt is a short string of characters appended to the password before hashing that doesn't need to be kept secret (it's stored in plain text next to the password hash), a pepper is a much longer(112+ bits) string of characters, appended to the password alongside the salt, and is kept secret. Usually this means keeping the pepper in a separate place from the password database, so that if it gets compromised, the pepper remains a secret.

Now, even if our database of password hashes and salts gets leaked, so long as the attacker doesn't also gain access to the pepper(which, to be fair, is not a given), cracking the hash will be much more difficult, even if the user has an otherwise weak password. Instead of just having to guess the password, the attacker now also has to guess the pepper if they want to crack the hash. And if every password is given a different and unique pepper, now they effectively have to guess two passwords for any one password they want to crack. Granted, this also means that the application owners have to securely store a large amount of secret data (the peppers), entirely separate from their already large store of secret data (their password database). Because of this, often times a single pepper will be used for every password. While this is easier to deal with, it also means that if the pepper is guessed once, then all passwords hashed with that pepper have lost that layer of protection. This is why it's very important that the pepper is long, as it makes it infeasible to guess through brute-force.

Conclusion

As you can see, quite a lot of thought has gone into figuring out how to store passwords in a way that will minimize the damage done in the event of a data breach. Despite this, big and important companies are still to this day using old and insecure methods for storing passwords. Like in 2016, when LinkedIn was breached and over 160 million account records were released. To make matters worse, LinkedIn was storing the passwords as unsalted SHA-1 hashes, making it absolutely trivial to compromise a large number of accounts very quickly.

The moral of the story? Please, for the love of all that is good in the world, if you're going to handle password storage yourself, then make sure you're using the most up to date and secure methods there are. All it takes is one little mistake, and suddenly you've compromised the accounts of all your users, and probably gotten yourself into a heap of legal trouble in the process.