1. What Does “Hashing” Do?
Hashing converts a piece of data (either small or large), into a relatively short piece of data such as a string or an integer.
This is accomplished by using a one-way hash function. “One-way”
means that it is very difficult (or practically impossible) to reverse
it.
A common example of a hash function is
md5(), which is quite popular in many different languages and systems.
$data =
"Hello World";
$hash = md5(
$data);
echo $hash;
With
md5(), the result will always be a 32 character
long string. But, it contains only hexadecimal characters; technically
it can also be represented as a 128-bit (16 byte) integer. You may
md5()
much longer strings and data, and you will still end up with a hash of
this length. This fact alone might give you a hint as to why this is
considered a “one-way” function.
2. Using a Hash Function for Storing Passwords
The usual process during a user registration:
- User fills out registration form, including the password field.
- The web script stores all of the information into a database.
- However, the password is run through a hash function, before being stored.
- The original version of the password has not been stored anywhere, so it is technically discarded.
And the login process:
- User enters username (or e-mail) and password.
- The script runs the password through the same hashing function.
- The script finds the user record from the database, and reads the stored hashed password.
- Both of these values are compared, and the access is granted if they match.
Once we decide on a decent method for hashing the password, we are going to implement this process later in this article.
Note that the original password has never been stored anywhere. If
the database is stolen, the user logins can not be compromised, right?
Well, the answer is “it depends.” Let’s look at some potential problems.
3. Problem #1: Hash Collision
A hash “collision” occurs when two different data inputs generate the
same resulting hash. The likelihood of this happening depends on which
function you use.
How can this be exploited?
As an example, I have seen some older scripts which used
crc32()
to hash passwords. This function generates a 32-bit integer as the
result. This means there are only 2^32 (i.e. 4,294,967,296) possible
outcomes.
Let’s hash a password:
echo crc32('supersecretpassword');
Now, let’s assume the role of a person who has stolen a database, and
has the hash value. We may not be able to convert 323322056 into
‘supersecretpassword’, however, we can figure out another password that
will convert to the same hash value, with a simple script:
set_time_limit(0);
$i = 0;
while (true) {
if (crc32(
base64_encode(
$i)) == 323322056) {
echo base64_encode(
$i);
exit;
}
$i++;
}
This may run for a while, though, eventually, it should return a
string. We can use this returned string — instead of
‘supersecretpassword’ — and it will allow us to successfully login into
that person’s account.
For example, after running this exact script for a few moments on my computer, I was given ‘
MTIxMjY5MTAwNg==‘. Let’s test it out:
echo crc32('supersecretpassword');
echo crc32('MTIxMjY5MTAwNg==');
How can this be prevented?
Nowadays, a powerful home PC can be used to run a hash function
almost a billion times per second. So we need a hash function that has a
very big range.
For example,
md5() might be suitable, as it generates
128-bit hashes. This translates into
340,282,366,920,938,463,463,374,607,431,768,211,456 possible outcomes.
It is impossible to run through so many iterations to find collisions.
However some people have still found ways to do this (see
here).
Sha1
Sha1() is a better alternative, and it generates an even longer 160-bit hash value.
4. Problem #2: Rainbow Tables
Even if we fix the collision issue, we’re still not safe yet.
A rainbow table is built by calculating the hash values of commonly used words and their combinations.
These tables can have as many as millions or even billions of rows.
For example, you can go through a dictionary, and generate hash
values for every word. You can also start combining words together, and
generate hashes for those too. That is not all; you can even start
adding digits before/after/between words, and store them in the table as
well.
Considering how cheap storage is nowadays, gigantic Rainbow Tables can be produced and used.
How can this be exploited?
Let’s imagine that a large database is stolen, along with 10 million
password hashes. It is fairly easy to search the rainbow table for each
of them. Not all of them will be found, certainly, but, nonetheless…some
of them will!
How can this be prevented?
We can try adding a “salt”. Here is an example:
$password = "easypassword";
echo sha1($password);
$salt = "f#@V)Hu^%Hgfds";
echo sha1($salt . $password);
What we basically do is concatenate the “salt” string with the
passwords before hashing them. The resulting string obviously will not
be on any pre-built rainbow table. But, we’re still not safe just yet!
5. Problem #3: Rainbow Tables (again)
Remember that a Rainbow Table may be created from scratch, after the database has been stolen.
How can this be exploited?
Even if a salt was used, this may have been stolen along with the
database. All they have to do is generate a new Rainbow Table from
scratch, but this time they concatenate the salt to every word that they
are putting in the table.
For example, in a generic Rainbow Table, “
easypassword” may exist. But in this new Rainbow Table, they have “
f#@V)Hu^%Hgfdseasypassword”
as well. When they run all of the 10 million stolen salted hashes
against this table, they will again be able to find some matches.
How can this be prevented?
We can use a “unique salt” instead, which changes for each user.
A candidate for this kind of salt is the user’s id value from the database:
$hash = sha1($user_id . $password);
This is assuming that a user’s id number never changes, which is typically the case.
We may also generate a random string for each user and use that as
the unique salt. But we would need to ensure that we store that in the
user record somewhere.
function unique_salt() {
return substr(sha1(mt_rand()),0,22);
}
$unique_salt = unique_salt();
$hash = sha1($unique_salt . $password);
This method protects us against Rainbow Tables, because now every
single password has been salted with a different value. The attacker
would have to generate 10 million separate Rainbow Tables, which would
be completely impractical.
6. Problem #4: Hash Speed
Most hashing functions have been designed with speed in mind, because
they are often used to calculate checksum values for large data sets
and files, to check for data integrity.
How can this be exploited?
As I mentioned before, a modern PC with powerful GPU’s (yes, video
cards) can be programmed to calculate roughly a billion hashes per
second. This way, they can use a brute force attack to try every single
possible password.
You may think that requiring a minimum 8 character long password
might keep it safe from a brute force attack, but let’s determine if
that is, indeed, the case:
- If the password can contain lowercase, uppercase letters and number, that is 62 (26+26+10) possible characters.
- An 8 character long string has 62^8 possible versions. That is a little over 218 trillion.
- At a rate of 1 billion hashes per second, that can be solved in about 60 hours.
And for 6 character long passwords, which is also quite common, it would take under 1 minute.
Feel free to require 9 or 10 character long passwords, however you might start annoying some of your users.
How can this be prevented?
Use a slower hash function.
Imagine that you use a hash function that can only run 1 million
times per second on the same hardware, instead of 1 billion times per
second. It would then take the attacker 1000 times longer to brute force
a hash. 60 hours would turn into nearly 7 years!
One way to do that would be to implement it yourself:
function myhash(
$password,
$unique_salt) {
$salt =
"f#@V)Hu^%Hgfds";
$hash = sha1(
$unique_salt .
$password);
for (
$i = 0;
$i < 1000;
$i++) {
$hash = sha1(
$hash);
}
return $hash;
}
Or you may use an algorithm that supports a "cost parameter," such as BLOWFISH. In PHP, this can be done using the
crypt() function.
function myhash(
$password,
$unique_salt) {
return crypt(
$password,
'$2a$10$'.
$unique_salt);
}
The second parameter to the
crypt() function contains some values separated by the dollar sign ($).
The first value is '$2a', which indicates that we will be using the BLOWFISH algorithm.
The second value, '$10' in this case, is the "cost parameter". This
is the base-2 logarithm of how many iterations it will run (10 =>
2^10 = 1024 iterations.) This number can range between 04 and 31.
Let's run an example:
function myhash(
$password,
$unique_salt) {
return crypt(
$password,
'$2a$10$'.
$unique_salt);
}
function unique_salt() {
return substr(sha1(mt_rand()),0,22);
}
$password =
"verysecret";
echo myhash(
$password, unique_salt());
The resulting hash contains the algorithm ($2a), the cost parameter
($10), and the 22 character salt that was used. The rest of it is the
calculated hash. Let's run a test:
$hash =
'$2a$10$dfda807d832b094184faeu1elwhtR2Xhtuvs3R9J1nfRGBCudCCzC';
$password =
"verysecret";
if (check_password(
$hash,
$password)) {
echo "Access Granted!";
}
else {
echo "Access Denied!";
}
function check_password(
$hash,
$password) {
$full_salt =
substr(
$hash, 0, 29);
$new_hash = crypt(
$password,
$full_salt);
return (
$hash ==
$new_hash);
}
When we run this, we see "Access Granted!"
7. Putting it Together
With all of the above in mind, let's write a utility class based on what we learned so far:
class PassHash {
private static $algo =
'$2a';
private static $cost =
'$10';
public static function unique_salt() {
return substr(sha1(mt_rand()),0,22);
}
public static function hash(
$password) {
return crypt(
$password,
self::
$algo .
self::
$cost .
'$' . self::unique_salt());
}
public static function check_password(
$hash,
$password) {
$full_salt =
substr(
$hash, 0, 29);
$new_hash = crypt(
$password,
$full_salt);
return (
$hash ==
$new_hash);
}
}
Here is the usage during user registration:
require (
"PassHash.php");
$pass_hash = PassHash::hash(
$_POST[
'password']);
And here is the usage during a user login process:
require (
"PassHash.php");
if (PassHash::check_password(
$user[
'pass_hash'],
$_POST[
'password']) {
}
else {
}
8. A Note on Blowfish Availability
The Blowfish algorithm may not be implemented in all systems, even
though it is quite popular by now. You may check your system with this
code:
if (CRYPT_BLOWFISH == 1) {
echo "Yes";
}
else {
echo "No";
}
However, as of PHP 5.3, you do not need to worry; PHP ships with this implementation built in.
Conclusion
This method of hashing passwords should be solid enough for most web
applications. That said, don't forget: you can also require that your
members use stronger passwords, by enforcing minimum lengths, mixed
characters, digits & special characters.