Password authentication for web and mobile apps (e-book)

How to manage a PHP application's users and passwords

Alexander Peslyak
Founder and CTO
Openwall, Inc.

better known as

Solar Designer
Openwall Project leader

April 2010
Last revised: August 2010
Some rights reserved

Introduction

Almost all large PHP applications, as well as many small ones, have a notion of user accounts, and, whether we like it or not, they typically use passwords (or at best passphrases) to authenticate the users. How do they store the passwords (to authenticate against)? Reasonable applications don't. Instead, they store password hashes. There have been many short articles, blog posts, even book chapters that try/claim to show you how to properly compute and use password hashes. Older ones will tell you to use the md5() function. Newer ones will tell you to use sha1() or hash() (SHA-256, etc.), add salting (but "forget" to add stretching, which is equally important), and use mysql_real_escape_string() on the username. Unfortunately, while some of these recommendations are steps in the right direction (although not all are), none of the articles on password security in PHP that I saw were "quite it".

Finally, some of the more recent blog posts, forum comments, and the like have started to recommend phpass, the password/passphrase hashing framework for PHP that I wrote, and which has already been integrated into many popular "web applications" including phpBB3, WordPress, and Drupal 7. Obviously, I fully agree with this recommendation. However, I was not aware of an existing step-by-step guide on integrating phpass into a PHP application, and password security is not only about password hashing anyway.

In this article/tutorial, I will guide you through the steps needed to introduce proper (in my opinion at least) user/password management into a new PHP application. I will start by briefly explaining password/passphrase hashing and how to access the database safely. Then we will proceed through several revisions of the sample program. We'll start with a very simple PHP program capable of creating new users only and having some subtle issues. We will gradually improve this program adding functionality (logging in to existing user accounts, changing user passwords, and enforcing a password policy) and "discovering" and dealing with the issues.

We will also briefly touch many related topics. Sub-headings have been chosen such that you can skip or skim over the topics you think you're already familiar with... or better read those sections anyway. Let's get started.

Password/passphrase hashing

Decent systems/applications do not actually store users' passwords. Instead, they transform new passwords being set/changed into password hashes with cryptographic (one-way) hash functions, and they store those hashes. They should preferably use hash functions intended for password hashing. Direct/naive use of other cryptographic hash functions, such as PHP's md5(), sha1(), or hash('sha256', ...) for that matter, has dire consequences.

When a user authenticates to the application with a username and a previously-set password, the application looks up some auxiliary information (such as the hash type, the salt, and the iteration count - all of which are described below) for the provided username, transforms the provided password into its hash, and compares this hash against the one stored for the user. If the two hashes match, authentication succeeds (otherwise it fails).

- "Why bother with password hashing when I use (or don't use) SSL (https URLs) anyway?"
(Surprisingly, this question is really being asked both ways. More often, people would make an incorrect statement that you don't need password hashing, or don't need to do it right, because you do or because you don't use SSL.)
- Password hashing, if done right, reduces the risk impact of having the hashes stolen or leaked - an attacker will recover fewer plaintext passwords from the hashes. Also, the cost of recovery from an incident like this may be reduced - rather than change all passwords at once, which may be costly or prohibitive to do, a system's administrator may audit the password hashes with a tool such as John the Ripper and only have the weak passwords changed. With proper password hashing and password policy enforcement in place, the majority of the passwords could be considered "strong enough" and would not need to be changed immediately even after a known and otherwise-resolved security compromise. The use of SSL mitigates the risk of having some plaintext passwords captured while in transit. Clearly, these risks are different. An attacker capable of capturing some of the network traffic is not necessarily capable of getting a copy of the database, and vice versa. Thus, it makes perfect sense to use one of these countermeasures - password hashing and SSL - without the other (which does not address "the other" risk then), and it also makes sense to use both of them together.

Salting

Salts are likely-unique values that are entered into a password hashing method along with the password, which results in the same password hashing into completely different hash values given different salts. Proper use of salts may defeat a number of attacks, including:

Ability to try candidate passwords against multiple hashes at the price of one
Use of pre-hashed lists (or the smarter "rainbow tables") of candidate passwords
Ability to determine whether two users (or two accounts of one user) have the same or different passwords without actually having to guess one of the passwords

Salts are normally stored along with the hashes. They are not secret.

Stretching

Offline password cracking (given stolen or leaked password hashes) involves computing hashes of large numbers of candidate passwords. Thus, in order to slow those attacks down, the computational complexity of a good password hashing method must be high - but of course not too high as to render it impractical.

Typical cryptographic hash functions not intended for password hashing were designed for speed. If these are directly misused for password hashing, then offline password cracking attacks may run at speeds of many million of candidate passwords per second.

These cryptographic hash functions (or even block ciphers) - let's call them "cryptographic primitives" - may be used as building blocks to construct a decent password hashing method, which would use thousands or millions of iterations of the underlying cryptographic primitive. This is called password (or key) stretching (or strengthening). Preferably, the number of iterations should not be hard-coded, but rather it should be configurable by an administrator for use when a new password is set (hashed), and it should be getting saved along with the hash (to allow the administrator to change the iteration count for newly set/changed passwords, yet not break support for previously-generated password hashes).

- "My web application must be fast. I can't afford to use a slow hash function!"
- Actually, you can. No one said it should be taking an entire second to compute a password hash. Is 10 milliseconds fast enough for you? Perhaps it is, but if not you can make it 1 ms or less (which is likely way below other per-request "overhead" that your application incurs anyway) and still benefit from password stretching a lot. Please note that without any stretching a cryptographic primitive could be taking as little as some microseconds or even nanoseconds to compute (at least during an offline attack, which would use an optimal implementation) . If you go from one microsecond to one millisecond, which is clearly affordable, you make offline attacks (against stolen or leaked hashes) run 1000 times slower, or you effectively stretch your users' passwords or passphrases by about 10 bits of entropy each. That's significant - it is roughly equivalent to each passphrase containing one additional word, without actually adding that extra word and having the users memorize it. Besides, the password hash is typically only computed when a user logs in (or when a new user is registered or a password is changed), which occurs relatively infrequently (compared to the frequency of other requests). Subsequent requests by the logged in user will use a session ID instead.

Choice of the underlying cryptographic primitive

The choice of the underlying cryptographic primitive - such as MD5, SHA-1, SHA-256, or even Blowfish or DES (which are block ciphers, yet they may be used to construct one-way hashes) - does not matter all that much. It's the higher-level password hashing method, employing salting and stretching, that makes a difference.

- "I heard that MD5 has been "broken". Shouldn't we use SHA-1 instead?"
- It is true that MD5 has been broken as it relates to certain attacks (practical). SHA-1 has also been broken in certain other ways (mostly theoretical). However, neither break has anything to do with the uses of these functions for password hashing, especially not as building blocks in a higher-level hashing method. Thus, any possible reasons to move off MD5 or SHA-1 as underlying cryptographic primitives for password hashing "because of the break" are purely "political" rather than technical. (It may be easier to just phase out MD5 and SHA-1 rather than differentiate their affected vs. unaffected uses.)

phpass - the password/passphrase hashing framework for PHP applications

phpass provides an easy to use abstraction layer on top of PHP's cryptographic hash functions suitable for password hashing. As of this writing, it supports three password hashing methods, including two via PHP's crypt() function - these are known in PHP as CRYPT_BLOWFISH and CRYPT_EXT_DES - and one implemented in phpass itself on top of MD5. All three employ salting, stretching, and variable iteration counts (configurable by an administrator, encoded/stored along with the hashes).

PHP 5.3.0 and above is guaranteed to support all three of these hashing methods due to code included into the PHP interpreter itself. Specific builds/installs of older versions of PHP may or may not support the CRYPT_BLOWFISH and CRYPT_EXT_DES methods - this is system-specific. For example, the Suhosin PHP security hardening patch, included into many distributions' packages of PHP, has been adding support for CRYPT_BLOWFISH for years, many operating systems - such as *BSD's, Solaris 10, SUSE Linux, ALT Linux, and indeed Openwall GNU/*/Linux - are also providing support for CRYPT_BLOWFISH via the system libraries (which PHP uses), and some operating systems - *BSD's, Openwall GNU/*/Linux - also provide support for CRYPT_EXT_DES.

The MD5-based salted and stretched hashing implemented in phpass itself is supported on all systems - starting with the ancient PHP 3. phpass provides a way for you (the application developer or administrator) to force the use of these "portable" hashes - this is a Boolean parameter to the PasswordHash constructor function.

Unless you force the use of "portable" hashes, phpass' preferred hashing method is CRYPT_BLOWFISH, with a fallback to CRYPT_EXT_DES, and then a final fallback to the "portable" hashes. CRYPT_BLOWFISH and CRYPT_EXT_DES are preferred primarily for the efficiency of the underlying implementations (in C and on some systems in assembly), compared to phpass' own code around MD5 (in PHP, even though the underlying MD5 code is in C). This greater code efficiency allows for more extensive and thus more effective use of password stretching (higher iteration counts). (It is assumed that an attacker would have a near-optimal implementation of any of these hashing methods anyway.)

Besides the actual hashing, phpass transparently generates random salts when a new password or passphrase is hashed, and it encodes the hash type, the salt, and the password stretching iteration count into the "hash encoding string" that it returns. When phpass authenticates a password or passphrase against a stored hash, it similarly transparently extracts and uses the hash type identifier, the salt, and the iteration count out of the "hash encoding string". Thus, you do not need to bother with salting and stretching on your own - phpass takes care of these for you.

- "What source of randomness does phpass use? Does it work on Windows?"
- You might have noticed that phpass uses /dev/urandom, which is a decent supply of randomness on modern Unix-like systems. However, phpass will transparently fallback to its own pseudo-random byte stream generator (which is based primarily on multiple measurements of the current time with up to microsecond precision) when /dev/urandom is unavailable or when it fails. Thus, yes, phpass works on Windows (as well as on Unix-like systems indeed).

Naturally, we'll use phpass for our sample program.

The database (and how to access it safely)

SQL injections

What SQL injections are

In many cases, we will need to pass pieces of untrusted user input into SQL queries. Even with our trivial database and the initial revision of our user management program (which we'll create soon), there will be untrusted user input: the username and password (or passphrase), at least before we've verified them. If we blindly embed the target username string obtained via a website form into an SQL query string, we might alter the SQL query. Since the username is under a potential attacker's control, the attacker may be able to alter our SQL query in a way such that another valid SQL query of the attacker's choice is formed. This may allow not only to circumvent our program's intended behavior (e.g., have it change another user's password with that altered query), but also to mount all sorts of attacks on the SQL server, as well as on our program (such as via query results that would suddenly become fully untrusted input as well).

How to deal with SQL injections

"- Can't we just enclose the user inputs in single quotes when embedding them in an SQL query string? Wouldn't that do the trick?"
- No. One of the input values can simply close the quotes, braces, etc., do its dirty deed, then provide additional SQL statements (or whatever) to make the rest of the original query "complete" (avoiding a syntax error). Thus, this naive approach alone does not work at all.

There are several real ways to combat SQL injections, of varying effectiveness and with different pros and cons. Most of these can be used together for greater assurance.

Filtering - sanitize the input values rejecting or modifying "bad" ones (preferably using a whitelist of known-safe input values rather than a blacklist of known-unsafe ones)
Escaping - prefix any special characters (most notably the single quote character) with an escape character (preferably using the API functions specific to the target SQL server type)
Encoding - turn any input strings into other strings consisting of safe characters only - e.g., an application may introduce '%' as its own escape character, then URL-encode all characters not from a known-safe set (the '%' character has a special meaning in certain contexts, though, so you might choose another or you might only use this technique along with escaping)
Prepared statements - rather than form SQL query strings with inputs embedded into them (in one way or another), an application may use advanced APIs to pass SQL queries with placeholders to the SQL server and then pass the input values to the SQL server "separately"

In the sample program that we'll be writing during the rest of this article, we'll use filtering (the "rejection" kind of it) and prepared statements in such a way that if any one of these techniques fails to provide its security, the application will nevertheless remain secure.

Prepared statements with PHP and MySQL

As of this writing, PHP offers three main interfaces to MySQL: PHP's MySQL Extension (obsolete, not recommended for new projects - but still widely used), PHP's mysqli (MySQL Improved) Extension ("preferred" for new projects), and PHP Data Objects (PDO) (recommended, but not "preferred" for new projects). The last two of these support prepared statements. Both require PHP 5+. We'll use mysqli.

The separation of code and data achieved with the mysqli PHP extension, the underlying MySQL APIs that it uses, and the (relatively) new MySQL protocol revision can't be perfect - everything is sent over the same socket connection anyway - but it does appear to be way better (simpler, and hence less error-prone) than what could be achieved by escaping. Specifically, in the MySQL binary protocol, the input values are preceded by binary representations of their lengths in bytes and then are sent verbatim.

Beware: apparently, certain interfaces and older/transitional software versions emulate prepared statements on the client end, which makes them susceptible to the risks typical for SQL escaping. This is one of the reasons why we choose not to rely on prepared statements alone for security against SQL injections.

Employ the principle of least privilege

Besides avoiding SQL injections, it makes sense to mitigate any that would potentially occur anyway, as well as possibly some other attacks carried out against or via the database. To this end, it is a good idea to have your PHP application use an SQL server account with the minimum privileges required - not an administrative account and not an account that can also access another database.

This also helps in case your PHP application is somehow fully compromised, such that the attacker gains direct access to the database with the application's access privileges, yet you care not to let this compromise directly "propagate" onto other databases that your application does not use.

Schema

For our sample program, we'll start with just one table in a brand new MySQL database. Connect to the MySQL server (such as with the command-line mysql client program) and issue the following:

create database myapp;
use myapp;
create table users (user varchar(60), pass varchar(60));

(We will need to revise this a little bit to deal with an issue that we'll "discover" further down this article.)

The user column will hold usernames, and the pass column will hold password hashes. Currently, phpass produces hash encoding strings that are at most 60 characters long.

The sample program is born

The code snippets included in this article generally assume that you're familiar with creating HTML web pages and PHP scripts. Thus, any opening and closing tags (such as <html> and <?php), etc. are omitted from here, to keep the article from growing too long. However, the sample files in the archive provided with the article do include all of those essential bits.

How to create new users

First, we need to put the phpass code in place. (We will use it to hash the new password.) We place the PasswordHash.php file from the phpass distribution tarball somewhere within our web "virtual host" "document root" directory and we set proper permissions for the file to be loaded by the web server's PHP setup (typically, the Unix permission bits will need to be 600 or 644 depending on web server setup).

Then we create a subdirectory for our sample program (this is how multiple revisions of the program are included in the archive accompanying this article - in separate subdirectories). Let's call the directory demo (and set its Unix permissions to 711). We'll place two files into this directory: user-man.html (with permissions set to 644) containing the HTML form below, and user-man.php (with permissions set the same as we did for PasswordHash.php).

Let's place the following HTML form into user-man.html:

<form action="user-man.php" method="POST">
Username:<br>
<input type="text" name="user" size="60"><br>
Password:<br>
<input type="password" name="pass" size="60"><br>
<input type="submit" value="Create user">
</form>

This form asks for and submits a username and a password to the user-man.php script. Let's start writing it. First, let's include the phpass code:

require '../PasswordHash.php';

To actually use phpass, we need to decide on and specify the extent of password stretching and whether we want to force the use of "portable" hashes or not (both of these matters were briefly discussed above). Let's place those constants into PHP variables:

// Base-2 logarithm of the iteration count used for password stretching
$hash_cost_log2 = 8;
// Do we require the hashes to be portable to older systems (less secure)?
$hash_portable = FALSE;

(In a real application, these should be in a configuration file included from the actual program code files instead. Alternatively, they may be configurable via the application itself, by an administrative user.)

To obtain the submitted username and password, let's initially use:

$user = $_POST['user'];
// Should validate the username length and syntax here
$pass = $_POST['pass'];

(This is a bit problematic. We will revise it soon.)

Now we can hash the password with:

$hasher = new PasswordHash($hash_cost_log2, $hash_portable);
$hash = $hasher->HashPassword($pass);
if (strlen($hash) < 20)
	fail('Failed to hash new password');
unset($hasher);

This uses the fact that the shortest valid password hash encoding string that phpass can currently return is 20 characters long (this is the case for CRYPT_EXT_DES, whereas other hash types use even longer encoding strings). fail() is a custom function that we'll use in our sample program. Let's define it (earlier in the code) as follows:

function fail($pub, $pvt = '')
{
	$msg = $pub;
	if ($pvt !== '')
		$msg .= ": $pvt";
	exit("An error occurred ($msg).\n");
}

(This function as defined above is a bit problematic. We will revise it soon.)

Note that we don't bother producing proper HTML output in fail(). For our sample program, it is simpler to produce plain text output. Let's set the HTTP header accordingly such that the web browser does not attempt to parse our script's output as HTML:

header('Content-Type: text/plain');

Indeed, we need to do this before our script possibly produces any output. (In a real PHP application, you would likely be producing HTML output instead, which requires more code and extra safety measures.)

Let's also place our database access credentials into PHP variables. For example:

// In a real application, these should be in a config file instead
$db_host = '127.0.0.1';
$db_port = 3306;
$db_user = 'mydbuser';
$db_pass = 'voulDyu0gue$s?';
$db_name = 'myapp';

Let's connect to the database using mysqli, and let's not forget to check for a possible failure:

$db = new mysqli($db_host, $db_user, $db_pass, $db_name, $db_port);
if (mysqli_connect_errno())
	fail('MySQL connect', mysqli_connect_error());

Finally, let's try to create the user by inserting the username and the password hash encoding string (which includes the salt, etc.) into the database table using the prepared statements API:

($stmt = $db->prepare('insert into users (user, pass) values (?, ?)'))
	|| fail('MySQL prepare', $db->error);
$stmt->bind_param('ss', $user, $hash)
	|| fail('MySQL bind_param', $db->error);
$stmt->execute()
	|| fail('MySQL execute', $db->error);

If we got this far, we must have successfully created the user. Let's close the database connection:

$stmt->close();
$db->close();

In fact, it would be nice to do this on failure as well, but that would make the code more complicated (the cleanups to perform would vary depending on where the failure occurs). Instead, we rely on the web server setup to perform any cleanups for terminating PHP scripts, which it needs to do anyway because scripts may sometimes terminate abnormally.

So that's it. Please find the HTML file and the demo program we've just created, complete with all details and with the snippets in the proper order (unlike in this article), in the demo1 subdirectory in the accompanying archive (tar.gz, ZIP).

Let's test the program. Go to the URL for the user-man.html HTML page in a web browser, enter myuser for the username and mypass for the password. If the script completes without error, we should be able to see the new user account in the users table:

mysql> select * from users;
+--------+--------------------------------------------------------------+
| user   | pass                                                         |
+--------+--------------------------------------------------------------+
| myuser | $2a$08$Lg5XF1Tt.X5TGyfb43vBBeEFZm4GTXQhKQ6SY6emkcnhAGT8KfxFS |
+--------+--------------------------------------------------------------+
1 row in set (0.00 sec)

The password hash will look almost completely different, though, due to the random salt and due to the hash possibly being of a different type (if you're using PHP older than 5.3.0 and your build of PHP does not include the Suhosin patch and your operating system does not provide native support for CRYPT_BLOWFISH hashes... or if you edited the code to set $hash_portable to TRUE).

What if the user already exists?

Let's try to create the same user, with the same username, once again. Also enter the same password, just for kicks. The script succeeds again, and we get:

mysql> select * from users;
+--------+--------------------------------------------------------------+
| user   | pass                                                         |
+--------+--------------------------------------------------------------+
| myuser | $2a$08$Lg5XF1Tt.X5TGyfb43vBBeEFZm4GTXQhKQ6SY6emkcnhAGT8KfxFS |
| myuser | $2a$08$7lM07FwQMm5/C8G/urT4z..MudfsS227e8oUEu6T51bNWk/RG//qe |
+--------+--------------------------------------------------------------+
2 rows in set (0.00 sec)

We get a duplicate user record. We'll need to address this shortcoming. Meanwhile, notice how the hash encoding string is indeed almost entirely different, as explained above, even though the same password was supplied.

We could issue a SELECT query to see if the username is already taken before trying to create the user. However, this would involve a race condition: a simultaneous request to our script could create a user of that name after our SELECT query but before we do the INSERT. Then we would end up creating a duplicate user record anyway.

To deal with this, we need to revise our database schema such that the MySQL server would not permit duplicate usernames:

drop table users;
create table users (user varchar(60), pass varchar(60), unique (user));

Now let's repeat the experiment: create the same user twice via our web form. On our second attempt to create the user, the script fails with:

An error occurred (MySQL execute: Duplicate entry 'myuser' for key 1).

Checking the table contents, we see that only one instance of the user was created - just like we wanted. The message printed on an attempt to create a duplicate user is technical, though - not one suitable for an end user. We'll deal with this a bit later. Meanwhile, let's focus on another aspect of it.

Avoid leaking server setup details

A portion of the error message above was produced by MySQL. On another occasion, it could possibly produce a message leaking the details of your setup - such as the database name, the database server address, and/or a full pathname to a database table file. We could want to avoid displaying this information to the user, unless we're "the user" and we're debugging. Also, those error messages may happen to contain characters that would need to be quoted if we were producing HTML output. Let's modify the fail() function to support a non-debugging mode where it would not reveal the potentially-sensitive messages. Also, let's add a comment about the "HTML issue", such that it is hopefully not overlooked if this code is actually made to produce HTML output later.

// Are we debugging this code?  If enabled, OK to leak server setup details.
$debug = TRUE;

function fail($pub, $pvt = '')
{
	global $debug;
	$msg = $pub;
	if ($debug && $pvt !== '')
		$msg .= ": $pvt";
/* The $pvt debugging messages may contain characters that would need to be
 * quoted if we were producing HTML output, like we would be in a real app,
 * but we're using text/plain here.  Also, $debug is meant to be disabled on
 * a "production install" to avoid leaking server setup details. */
	exit("An error occurred ($msg).\n");
}

Please note that similar potential leaks of server setup details typically exist in the default settings of Apache and PHP. As an application developer, our responsibility here is to provide a way to avoid those leaks - which we did by introducing the $debug setting. It is up to a server administrator (or someone installing our PHP application on a server) to decide on and configure those settings in a certain way. It is a good idea to document the issue and the settings prominently. Also, it may be desirable to use safe defaults - that is, have $debug default to FALSE in our case. However, for our sample program we'll continue with a default of TRUE.

How to differentiate MySQL errors

We'd like to determine if the requested username is already taken - and show a user-friendly message if so. However, the attempt to add a user could fail for other reasons as well, so it would be wrong to show the same user-friendly message on all errors.

One of the approaches would be to issue a SELECT query on the username after an attempt to add the user fails. If the SELECT query returns 1 row, then the username is definitely already taken. We can implement this as follows:

if (!$stmt->execute()) {
	$save_error = $db->error;
	$stmt->close();

// Does the user already exist?
	($stmt = $db->prepare('select user from users where user=?'))
		|| fail('MySQL prepare', $db->error);
	$stmt->bind_param('s', $user)
		|| fail('MySQL bind_param', $db->error);
	$stmt->execute()
		|| fail('MySQL execute', $db->error);
	$stmt->store_result()
		|| fail('MySQL store_result', $db->error);

	if ($stmt->num_rows === 1)
		fail('This username is already taken');
	else
		fail('MySQL execute', $save_error);
}

This works, and it might be the most reliable and portable approach, however there exists a shortcut approach: the MySQL server returns a very specific error code when the error is an attempt to create a duplicate user record. If we aren't too concerned about a possible change in MySQL's error codes in a future version of it, then we can simply check for the known error code. Even if this stops working, the only impact will be a less friendly message displayed to the user.

if (!$stmt->execute()) {
	if ($db->errno === 1062 /* ER_DUP_ENTRY */)
		fail('This username is already taken');
	else
		fail('MySQL execute', $db->error);
}

We will be using this shortcut approach in further revisions of the sample program.

The "Magic Quotes" issue

Magic Quotes is a deprecated feature of PHP. When enabled, which it is on many web servers, the PHP interpreter will automagically escape many inputs to PHP scripts. This may be desirable to provide some minimal security for poorly-written PHP scripts that fail to defend themselves against SQL injections, but with properly-written PHP scripts this feature may actually be more of a problem.

Specifically, unless we deal with this issue, a password containing the single quote character might or might not reach our PHP application with the character escaped. If the magic_quotes_gpc PHP setting is then toggled, or if our PHP application install is moved to another system where this setting is set differently, the password will stop working.

Thus, we need to check whether magic_quotes_gpc was set and undo its effect at least for specific inputs where this matters. Here's the code:

function get_post_var($var)
{
	$val = $_POST[$var];
	if (get_magic_quotes_gpc())
		$val = stripslashes($val);
	return $val;
}

We'll use this function instead of direct reads from $_POST[].

Input filtering

Let's sanitize our inputs in order to mitigate some obscure DoS attacks, as well as not to rely on our use of prepared statements alone to prevent SQL injections.

$user = get_post_var('user');
/* Sanity-check the username, don't rely on our use of prepared statements
 * alone to prevent attacks on the SQL server via malicious usernames. */
if (!preg_match('/^[a-zA-Z0-9_]{1,60}$/', $user))
	fail('Invalid username');

$pass = get_post_var('pass');
/* Don't let them spend more of our CPU time than we were willing to.
 * Besides, bcrypt happens to use the first 72 characters only anyway. */
if (strlen($pass) > 72)
	fail('The supplied password is too long');

The sample program with the improvements mentioned so far implemented is found under the demo2 subdirectory in the accompanying archive (tar.gz, ZIP).

How to authenticate existing users

Let's enhance our sample program with support for multiple operations - initially there will be two of them: creating a new user account and logging in to an existing account. We will be passing the operation code via a hidden form field. Let's add it into our existing form:

<input type="hidden" name="op" value="new">

And let's add a login form:

<form action="user-man.php" method="POST">
<input type="hidden" name="op" value="login">
Username:<br>
<input type="text" name="user" size="60"><br>
Password:<br>
<input type="password" name="pass" size="60"><br>
<input type="submit" value="Log in">
</form>

Now let's start to add support into the PHP code. First, validate the operation code:

$op = $_POST['op'];
if ($op !== 'new' && $op !== 'login')
	fail('Unknown request');

Then let's introduce an if statement and move our new user creation code under it:

if ($op === 'new') {
	$hash = $hasher->HashPassword($pass);
	if (strlen($hash) < 20)
		fail('Failed to hash new password');
	unset($hasher);

	($stmt = $db->prepare('insert into users (user, pass) values (?, ?)'))
		|| fail('MySQL prepare', $db->error);
	$stmt->bind_param('ss', $user, $hash)
		|| fail('MySQL bind_param', $db->error);
	if (!$stmt->execute()) {
		if ($db->errno === 1062 /* ER_DUP_ENTRY */)
			fail('This username is already taken');
		else
			fail('MySQL execute', $db->error);
	}

	$what = 'User created';
}

To perform user authentication, we'll need to obtain the user's password hash encoding string using a SELECT query for the supplied username, then use the CheckPassword() method from phpass to check the supplied password against the hash. Let's introduce an else branch with our "login" code in it:

} else {
	$hash = '*'; // In case the user is not found
	($stmt = $db->prepare('select pass from users where user=?'))
		|| fail('MySQL prepare', $db->error);
	$stmt->bind_param('s', $user)
		|| fail('MySQL bind_param', $db->error);
	$stmt->execute()
		|| fail('MySQL execute', $db->error);
	$stmt->bind_result($hash)
		|| fail('MySQL bind_result', $db->error);
	if (!$stmt->fetch() && $db->errno)
		fail('MySQL fetch', $db->error);

	if ($hasher->CheckPassword($pass, $hash)) {
		$what = 'Authentication succeeded';
	} else {
		$what = 'Authentication failed';
	}
	unset($hasher);
}

Finally, let's have our script print the authentication result:

echo "$what\n";

That's all - we can now "log in" as myuser and see the "Authentication succeeded" message. If we enter a wrong password for the user or if the target username does not exist, we see "Authentication failed".

Please find this revision of the HTML file and the sample program in the demo3 subdirectory in the accompanying archive (tar.gz, ZIP).

How to change user passwords

Let's introduce the proper HTML form:

<form action="user-man.php" method="POST">
<input type="hidden" name="op" value="change">
Username:<br>
<input type="text" name="user" size="60"><br>
Current password:<br>
<input type="password" name="pass" size="60"><br>
New password:<br>
<input type="password" name="newpass" size="60"><br>
<input type="submit" value="Change password">
</form>

and support for the additional operation code into the PHP script:

$op = $_POST['op'];
if ($op !== 'new' && $op !== 'login' && $op !== 'change')
	fail('Unknown request');

Now let's add to the user authentication branch of the existing if / else statement. When authentication fails, reset $op such that we don't take any other action:

	if ($hasher->CheckPassword($pass, $hash)) {
		$what = 'Authentication succeeded';
	} else {
		$what = 'Authentication failed';
		$op = 'fail'; // Definitely not 'change'
	}

Then add our new code:

	if ($op === 'change') {
		$stmt->close();

		$newpass = get_post_var('newpass');
		if (strlen($newpass) > 72)
			fail('The new password is too long');
		$hash = $hasher->HashPassword($newpass);
		if (strlen($hash) < 20)
			fail('Failed to hash new password');
		unset($hasher);

		($stmt = $db->prepare('update users set pass=? where user=?'))
			|| fail('MySQL prepare', $db->error);
		$stmt->bind_param('ss', $hash, $user)
			|| fail('MySQL bind_param', $db->error);
		$stmt->execute()
			|| fail('MySQL execute', $db->error);

		$what = 'Password changed';
	}

That's it - an existing user may now get the password changed.

This revision of the HTML file and the sample program may be found in the demo4 subdirectory in the accompanying archive (tar.gz, ZIP).

In a real PHP application, you will likely also have other ways to change a user's password - by an administrative user (then authentication of the user is bypassed) or with authentication by a temporary token (for forgotten passwords). These may be implemented in a similar fashion.

How to enforce a password policy

As far as I'm aware, there's currently no decent password/passphrase strength checking module intended specifically for PHP (either written in PHP or implemented as a PHP extension). (The Crack extension in PECL, which is an interface to CrackLib, is not quite it (just like CrackLib itself is not good enough these days). There are many regexp-based recipes found on the web, but these disregard/disallow passphrases, have the policy hard-coded, and are mostly untested on real-world passwords.)

So we will be invoking the pwqcheck(1) program from the passwdqc package. This program is specifically intended for use from scripts.

Let's introduce a new PHP include file, called pwqcheck.php, defining the following function:

function pwqcheck($newpass, $oldpass = '', $user = '', $aux = '', $args = '')
{
// pwqcheck(1) itself returns the same message on internal error
	$retval = 'Bad passphrase (check failed)';

	$descriptorspec = array(
		0 => array('pipe', 'r'),
		1 => array('pipe', 'w'));
// Leave stderr (fd 2) pointing to where it is, likely to error_log

// Replace characters that would violate the protocol
	$newpass = strtr($newpass, "\n", '.');
	$oldpass = strtr($oldpass, "\n", '.');
	$user = strtr($user, "\n:", '..');

// Trigger a "too short" rather than "is the same" message in this special case
	if (!$newpass && !$oldpass)
		$oldpass = '.';

	if ($args)
		$args = ' ' . $args;
	if (!$user)
		$args = ' -2' . $args; // passwdqc 1.2.0+

	$command = 'exec '; // No need to keep the shell process around on Unix
	$command .= 'pwqcheck' . $args;
	if (!($process = @proc_open($command, $descriptorspec, $pipes)))
		return $retval;

	$err = 0;
	fwrite($pipes[0], "$newpass\n$oldpass\n") || $err = 1;
	if ($user)
		fwrite($pipes[0], "$user::::$aux:/:\n") || $err = 1;
	fclose($pipes[0]) || $err = 1;
	($output = stream_get_contents($pipes[1])) || $err = 1;
	fclose($pipes[1]);

	$status = proc_close($process);

// There must be a linefeed character at the end.  Remove it.
	if (substr($output, -1) === "\n")
		$output = substr($output, 0, -1);
	else
		$err = 1;

	if ($err === 0 && ($status === 0 || $output !== 'OK'))
		$retval = $output;

	return $retval;
}

Please note that this passes any untrusted input via the file descriptor, not via the command-line (which would be unsafe). Please refer to the pwqcheck(1) manual page included in the passwdqc package for information on the command-line options and on the "protocol" used.

The function accepts the new password or passphrase, the strength of which is to be checked. It optionally also accepts the old password or passphrase, the username, and any auxiliary user-specific information such as the user's full name and/or e-mail address (multiple items may be separated with spaces). All of this information is treated as untrusted input, and it is used for more accurate checking of the strength of the new password or passphrase.

Finally, the function optionally accepts additional arguments to pass to pwqcheck(1) via the command-line. These may override the default password policy. Obviously, they must not be under an untrusted user's control.

The return value is the string 'OK' if the new password/passphrase passes the requirements. Otherwise the return value is a message explaining one of the reasons why the password/passphrase is rejected.

Let's make use of this in our program. Include the file and define some settings (that we'll use a bit later):

require 'pwqcheck.php';

// Do we have the pwqcheck(1) program from the passwdqc package?
$use_pwqcheck = TRUE;
// We can override the default password policy
$pwqcheck_args = '';
#$pwqcheck_args = 'config=/etc/passwdqc.conf';

Define a wrapper function specific to our program:

function my_pwqcheck($newpass, $oldpass = '', $user = '')
{
	global $use_pwqcheck, $pwqcheck_args;
	if ($use_pwqcheck)
		return pwqcheck($newpass, $oldpass, $user, '', $pwqcheck_args);

/* Some really trivial and obviously-insufficient password strength checks -
 * we ought to use the pwqcheck(1) program instead. */
	$check = '';
	if (strlen($newpass) < 7)
		$check = 'way too short';
	else if (stristr($oldpass, $newpass) ||
	    (strlen($oldpass) >= 4 && stristr($newpass, $oldpass)))
		$check = 'is based on the old one';
	else if (stristr($user, $newpass) ||
	    (strlen($user) >= 4 && stristr($newpass, $user)))
		$check = 'is based on the username';
	if ($check)
		return "Bad password ($check)";
	return 'OK';
}

Please note that this lets you experiment with very basic password strength checking (with a trivial hard-coded policy) even if you have not yet installed passwdqc.

Finally, introduce uses of the function into two places in the program - when creating a new user:

if ($op === 'new') {
	if (($check = my_pwqcheck($pass, '', $user)) !== 'OK')
		fail($check);

and when changing a user's password:

	if ($op === 'change') {
		$stmt->close();

		$newpass = get_post_var('newpass');
		if (strlen($newpass) > 72)
			fail('The new password is too long');
		if (($check = my_pwqcheck($newpass, $pass, $user)) !== 'OK')
			fail($check);

We're done. Now let's test it - strong passwords and passphrases should be accepted, whereas weak ones should be getting rejected with various messages.

The pwqcheck.php file (with a lengthy comment in it) and this revision of the sample program are found in the demo5 subdirectory in the accompanying archive (tar.gz, ZIP).

Future work

Someone should create a PHP extension around passwdqc, making the functions of libpasswdqc available for use from PHP scripts.

Timing attacks

Our sample program is vulnerable to probing for valid usernames via timing attacks: its response time differs for existing vs. non-existent users. (The ability for new users to self-register provides another way for someone to probe for valid usernames, though. For now, we'll assume that either self-registration is disabled (or restricted) or we care about probing for valid usernames via timing attacks anyway, such as because a large number of user registrations is immediately apparent whereas timing attacks might not be.)

We may try to mitigate this by always performing the password hashing step - even if the target username could not be found in the database. First, we need to define a dummy "salt" string (a portion of the hash encoding string) that we'll use for computing the dummy hashes:

/* Dummy salt to waste CPU time on when a non-existent username is requested.
 * This should use the same hash type and cost parameter as we're using for
 * real/new hashes.  The intent is to mitigate timing attacks (probing for
 * valid usernames).  This is optional - the line may be commented out if you
 * don't care about timing attacks enough to spend CPU time on mitigating them
 * or if you can't easily determine what salt string would be appropriate. */
$dummy_salt = '$2a$08$1234567890123456789012';

Then we introduce the following code right before our call to CheckPassword():

// Mitigate timing attacks (probing for valid usernames)
	if (isset($dummy_salt) && strlen($hash) < 20)
		$hash = $dummy_salt;

Alternatively, we could use the HashPassword() method, which would generate a new salt for us, but its processing cost is not exactly the same as that of CheckPassword(), so a (smaller) timing leak would remain.

A revision of the sample program with the above changes is included in the demo6 subdirectory in the accompanying archive (tar.gz, ZIP).

Unfortunately, this does not fully eliminate timing leaks - e.g., the MySQL server's response time may differ - but those leaks are likely smaller than leaks through properly-stretched (purposely slow) password hashing. Yet it might remain possible to probe for valid usernames with a large number of timings for each potential username.

Moreover, major timing leaks will remain if your database contains hashes of mixed types or with different password stretching iteration counts.

Timing leaks are surprisingly difficult to fully deal with. Naive attempts to deal with them such as by introducing random delays (even those in excess of "the signal") do not work as well as one might expect them to. "Constant time" would do the trick, but it is difficult to achieve, especially when we consider both real and CPU time, as well as other server resource consumption, which could also be indirectly measured by an attacker.

Other related concerns

There are many other security, usability, and implementation issues closely related to the way a web application manages its users and passwords. Discussing those issues in full detail and with sample code is beyond the scope of this article, yet it is important for you to be aware of them.

Randomly-generated passwords/passphrases

As an alternative to forcing the user to come up with a "strong-looking" password or passphrase, an application may generate and offer a random password or passphrase. (Preferably, the user should also be allowed to pick a suitable password or passphrase of their own, which is the case we've been considering so far.)

This is particularly important when new accounts are to be created by an administrator rather than by the users themselves. It would be unrealistic for the administrator to come up with sufficiently different passwords/passphrases for a large number of users, so letting a computer generate those passwords or passphrases is the only reasonable way to go.

One of the ways to implement this is by invoking the pwqgen(1) program from the passwdqc package. Unlike the pwqcheck(1) program discussed above, pwqgen does not require two-way communication via file descriptors, so you can invoke it with the simpler popen() and pclose() functions. Please be sure to check the exit status from the program. Currently, pwqgen only works on systems with /dev/urandom.

It is also possible to try to generate random passwords or passphrases in PHP code and without a dependency on /dev/urandom, but chances are that those passwords or passphrases won't in fact be as random as they might look - e.g., certain versions of Joomla running with certain versions of PHP are able to generate at most 1 million of different initial passwords, which an attacker can test against a stolen or leaked password hash in a second.

Randomness

It is difficult to obtain a significant amount of cryptographically random data in pure and portable PHP code. As it has been mentioned above, phpass uses /dev/urandom with a fallback to its own pseudo-random byte stream generator. The latter is good enough for salting, but it might not be good enough for other purposes because of the limited amount of entropy that it uses as its input. Other than that, it uses a decent approach. Drupal 7 attempts to reuse a revision of the same code (derived from phpass) to generate all sorts of random tokens, not just salts, and the Drupal developers are trying to process more entropy by feeding certain PHP variables and results of certain PHP function calls into the algorithm. It is difficult to say just how much entropy is added in this way and whether it is sufficient for a given purpose or not. Additionally, a related concern is that if not enough entropy is being processed, then it might be possible to infer the inputs to the algorithm from the stream of "random" outputs by testing likely inputs in an offline attack. So a reasonable requirement for the inputs could be that they not only are hopefully sufficiently random, but are also not security-sensitive otherwise.

Other algorithms may have additional undesirable properties - such as leaking of the inputs or/and of the internal state in a more direct way, thereby allowing for further "random" outputs to be predicted. Furthermore, if the size or entropy of the internal state is too small, then the entropy of the resulting "random" outputs will also likely be smaller than that of the inputs, and additionally the internal state itself may be inferred in an offline attack given a few "random" outputs, which would facilitate prediction of further "random" outputs.

Overall, it is easier to get this wrong than to get it right, and you're unlikely to have any assurance of having it done right.

Thus, it is preferable to use a supply of randomness provided by the OS, such as /dev/urandom. Another option is to run and query a "randomness daemon", which would accumulate randomness over a long period of time, but this approach has been largely obsoleted by modern Unix-like systems having implemented a randomness pool in the kernel, which is what /dev/urandom is an interface to.

A PHP-specific issue with accessing an external supply of randomness, such as /dev/urandom, is PHP's lack of support for unbuffered reads. Even if your application only reads, say, 8 bytes, PHP will read an entire buffer worth of data (typically 8192 bytes). This slows things down (albeit not too badly - e.g., it might be taking around 1 millisecond to read 8192 bytes from /dev/urandom on a modern Linux box) and it wastes the precious entropy. A revision of phpass-derived code being considered for Drupal 7 attempts to partially address this by rounding up the number of bytes to read to a multiple of 4096 (considering that PHP would effectively do at least the same anyway) and by maintaining its own buffer for use in subsequent calls to the function. This helps when multiple random "numbers" need to be obtained while servicing a single request, but other than that it is only a partial workaround. It would be best to have the issue fixed in PHP itself (by introducing optional unbuffered reads).

Update (August 2010): PHP 5.3.3+ has in fact introduced support for unbuffered reads. We're not yet making use of it in phpass, though.

Resetting forgotten passwords/passphrases

Typically, there's a way for a user to reset the password (or passphrase) by having the application send a message to the user's address previously registered in the system. The message may contain the new password (randomly generated) or it may contain a password reset token. The randomness concerns apply.

Here are some questions to consider before implementing this functionality:

Is the new password or passphrase passed in this way permanent or is it temporary (with a forced change upon login and/or an expiration date)?

If a token is passed, is it expiring? How is it to be passed back to the application? (If it is embedded in an URL, then it is likely to get logged by any web proxy servers involved and by the final web server. Manual entry or copy & paste into a form field might be a little bit safer.)

The token may be randomly generated and stored into the database. Alternatively, it may be computed as a MAC of the username, the user's address that the token is to be sent to (e-mail or the like), the current timestamp (with certain granularity - e.g., just the date, with no time of day), and maybe other system- and user-specific information, and with a secret key specific to each instance of the application. In the latter case, when a user returns to the system with a token to be validated, the correct token is simply regenerated using all the same data. If the two tokens match, the user is considered authenticated. If they don't, the check is repeated for the "previous" timestamp (just one time). Thus, these cryptographic tokens automatically expire in 1 to 2 "units of time", but there's no way to deactivate them before they expire (e.g., upon first use) unless an extra parameter (such as a password change count for the user) was included under the MAC. The secret key needs to contain enough entropy (say, 80 bits or more) to withstand offline attacks on it given a valid token, and it needs to be different for each instance of the application. Luckily, it does not need to be easy to memorize, so there's no need to use key stretching, but instead this lack of stretching needs to be compensated for by including more entropy into the key than you would include into a password.

Online password guessing

With password hashing, we've been trying to mitigate offline password cracking given stolen or leaked password hashes. However, what about online attacks - where an attacker probes candidate passwords or passphrases by trying to authenticate to our web application with them? Obviously, those attacks would generally run a lot slower than offline ones, yet they might succeed in guessing some of the weakest passwords. For example, simply probing for the password "123456" might result in almost 1% of accounts getting compromised.

Luckily, by enforcing a password policy meant to prevent easy cracking of our password hashes, we also defeat online password guessing. Thus, if you went for this, then implementing any other measures to deal with online password guessing becomes relatively unimportant. In fact, if you're confident in your password policy, then you may use very relaxed account lockout settings or none at all in order to avoid causing problems for legitimate users.

On the other hand, if you neglected or decided not to implement password policy enforcement, if you have old passwords that pre-date the policy, or if you have potentially non-compliant passwords for any other reason (e.g., imported from another system), then it may be important to implement countermeasures against online password probing. A typical countermeasure is temporary or permanent account lockout if more than N consecutive authentication attempts fail in X time. This should be further improved to deal with attacks that don't target a specific username - e.g., if an attacker tries just the password "123456" against 1000 different usernames, no single user account will be locked out, yet the attacker might gain access to around 10 accounts (if all of the usernames exist). A way to mitigate this is to apply per-connecting-address limits on authentication attempts. Yet a distributed attack, such as using a botnet, would defeat that. So enforcing a password policy is a better way to go.

Denial of Service (DoS) attacks

Almost any network service is susceptible to certain kinds of resource consumption DoS attacks. As it relates to the topic of this article, online password probing attacks discussed above may also happen to be DoS attacks, typically unintentionally. Currently, this is not common with web applications, but it does happen with other network services.

Intentional DoS attacks may find and use requests that are even more costly for the server to process - e.g., those involving expensive database queries - or they may simply use an even higher intensity stream of non-targeted requests.

- "Doesn't password stretching make my web application more susceptible to DoS attacks?"
- With reasonable settings, no. The application developer should provide a default password stretching iteration count that will not make this a problem in practice, and the system administrator may tune this setting to better match a given system's capacity and expected use pattern. For example, if a web application is not able to handle more than 50 requests per second for other reasons anyway, then having password authentication take 10 ms will not provide a significantly more attractive way of attack. In fact, since different kinds of requests consume server resources differently, perhaps the login request, which does not involve much work besides password hashing on the CPU, won't be the heaviest. Quite often, disk I/O capacity (and not CPU time) is the scarce resource.

Another potential DoS attack relevant to the topic of this article is through registration of too many user accounts, if your web application is meant to permit for users to self-register. This kind of attack is currently not common in practice - likely because other DoS attacks, which are of less relevance to this article, are more effective. To mitigate this attack, you could implement all sorts of limits - e.g., no more than N user registrations per source IP address per day, no more than M users with the same domain name portion of e-mail address - and require validation of the e-mail address by a "secure token" implemented using a MAC before the user is added to the database.

Password policy enforcement and usability concerns

It may be inconvenient for many users to have to submit a form only to get their desired password or passphrase rejected (for not being compliant to the policy), then have to come up with another password or passphrase and resubmit.

This may be partially dealt with by summarizing the gist of the policy on the web page with the form - e.g. for passwdqc's default policy as of this writing you can use:

"A valid password should be a mix of upper and lower case letters, digits, and other characters. You can use an 8 character long password with characters from at least 3 of these 4 classes, or a 7 character long password containing characters from all the classes. An upper case letter that begins the password and a digit that ends it do not count towards the number of character classes used.

A passphrase should be of at least 3 words, 11 to 40 characters long, and contain enough different characters."

You may also offer randomly generated passwords and/or passphrases. Just do not give any static examples of passwords or phrases that would pass the check.

Another improvement could be to have the web page check the strength of the password or passphrase as the user types it - such as by submitting it to the server every few seconds via Ajax or by duplicating the most critical password strength checks in JavaScript. (Indeed, this feature would only work if JavaScript is enabled in the user's web browser.) Neither approach is perfect: the former would cause extra server load and the latter would not duplicate passwdqc's actual checks exactly (they are a bit too complicated to fully duplicate them in JavaScript). Yet if you're so inclined, the C function to look at is is_simple() in passwdqc_check.c. A hybrid approach may work best - only bother the server with checking passwords or passphrases that pass the JavaScript checks.

Indeed, actual policy enforcement should be taking place on the server when the form is finally submitted anyway.

Challenge/response authentication

Although the standard way to protect web application passwords while in transit is with SSL (https URLs), it may also be possible to protect them to a very limited extent by implementing challenge/response authentication. On the web browser side, this would be implemented in JavaScript (with a fallback to sending the password in the clear when JavaScript is not available).

Unfortunately, implementing this involves trade-offs, and the implementations I've seen so far are not great - they require that plaintext-equivalents of passwords or at best relatively weak password hashes be stored on the server, which is unacceptable when the number of user accounts is large (because the cost of recovery from a security compromise of the server or, say, of a backup dump becomes prohibitive).

Although storage of plaintext-equivalents on the server can be avoided, certain other issues remain (e.g., it might not be possible to implement much password stretching in this way due to the slowness of JavaScript code).

So I do not currently recommend this approach (major advances in this area would need to occur first), yet I felt that it needed to be mentioned in here.

Sessions

Once a user logs in, a session needs to be created - such as by using PHP's session handling capabilities or otherwise. There are plenty of potential issues related to session management, which could be the subject of a separate article. Since this is not very closely related to user and password management, since this is such a complicated topic, and since this article is too long as it is, I am leaving this topic completely beyond the scope of this article.

Licensing

Permission is hereby granted to reproduce and redistribute the article and its accompanying archive in their original form (unmodified and electronic only).

Non-exclusive rights are hereby granted to SektionEins GmbH to reproduce, distribute, and advertise the article including but not limited to on the http://php-security.org website, in printed and/or electronic advertisements, and in all other media.

Others interested in reproducing and/or redistributing the article other than in its original form and/or other than electronically should contact the copyright holder for an express permission.

No copyright to the source code snippets found in this article and to the sample programs included in the accompanying archive is claimed, and they're hereby placed in the public domain. Please feel free to reuse them in your programs.

In case this attempt to disclaim copyright and place the source code snippets and the sample programs in the public domain is deemed null and void, then the snippets and the programs are Copyright (c) 2010 Alexander Peslyak and they're hereby released to the general public under the following terms:

Redistribution and use in source and binary forms, with or without modification, are permitted.
(This is heavily cut-down "BSD license", to the point of being copyright only.)

986983