Introducing GeekCache

I'm happy to announce the beta release of my first open source project, GeekCache. GeekCache is a PHP library for key/value storage (currently implemented for Memcached). Why release a new caching library, when excellent libraries such as Stash and doctrine/cache are already out there? Because I needed features that those libraries did not offer. GeekCache is a complete rewrite of a caching library we've used internally at BoardGameGeek for years. We are dependent on its features, some of which are entirely unavailable in other libraries (as far as I know), and certainly not all in the same library.

In addition to the basic features you are likely to find in almost any caching library, here are the key features that distinguish GeekCache:

  • Invalidation via tags
  • Memoization
  • Regeneration via pass-through regenerator callables
  • Soft invalidation, for returning stale data when regenerating through queued processes.
  • a fluent interface for cache item creation

Below are a few quick examples. For more details, including how to use the included service provider to get a the cache builder and clearer, see the GeekCache GitHub page.

Basics

I most often use GeekCache to cache the results of MySQL queries. On a busy site such as BoardGameGeek, it would be very slow to pull an item from the database every time we needed it. Instead, we almost always store the results of a query in cache, so that the query itself is run only once.

But storing in cache is the easy part. The hard part is making sure that the cache is cleared when it is no longer valid. This is where tags come in.

Tags

We can add any assortment of tags when storing a value to cache. When the tag is cleared, all items with that tag are cleared as well. Say we are retrieving user 1 from the database.

$cacheitem = $cacheBuilder
    ->memoize()
    ->addTags('users', "user_1")
    ->make("tag_user_1", 3600);

$user = $cacheitem->get();
// $user is false if it has not been stored in cache

if ($user === false) {
    $user = $this->getUser(1);
    $cacheitem->put($user);
}

//later

$result = $cacheitem->get();
// result equal to $user

// now, the user object is changed. The cache must be cleared.
$cacheClearer->clearTags("tag_user_1");

$result = $cacheitem->get();
// result is now false, because the cache has been cleared

Memoization

When you add memoize() to a chain of build methods, cache values will be stored in a local array cache for the duration of a php process. If a cache item is retrieved a second time over the course of the page load, the value from the local cache will be returned. This can substantially improve performance if some values are looked up multiple times on a page load.

Regenerators

Regenerators are quite powerful. Not only do they make caching code cleaner, by allowing the process that regenerates the cache to be handled by GeekCache (via a closure or any other callable), they can also allow you to return stale data to users (rather than blanks) when the process to regenerate the data is too slow to be run on page generation.


$cacheitem = $cacheBuilder ->memoize() ->addTags('users', "user_1") ->make("tag_user_1", 3600); $regenerator = function () use ($this, $userid) { return $this->getUser($userid); } $value = $cacheitem->get($regenerator); // $value is the result of getUser $value = $cacheitem->get(); // $value is the correct result, because the regenerated value has been put into cache $cacheClearer->clearTags("user_1"); $value = $cacheitem->get(); // $value is now false, because one of the tags has been cleared $queuedRegenerator = function() use ($this, $userid) { // queue up a process to regenerate the results in another process // returning false indicates that a process has been queued, so any stale data // that might be available is returned return false; } $value = $cacheitem->get($queuedRegenerator); // $value is the original cached user // Tag are cleared via soft invalidation, so that stale data can be // returned, if and only if a process is spun off to regenerate the cache

See the documentation for how to add a grace period, so that stale data can also be returned from caches that have expired due to time.

Other Stuff

This is just some of what GeekCache has to offer. In addition to other features, such as a counter (including atomic incrementing), GeekCache also offers fully tested, extensible code. Except where absolutely necessary, all classes are immutable--no need to worry about whether reusing a builder will mean bleed over from a previous use.

I'd love to get some constructive feedback on places people think GeekCache could be further improved, especially (but by no means exclusively) from people who might want to use it in their own projects. In particular, I'm still not fully satisfied with some of the naming choices I've made. Take a look under the hood, and tell me what you think.

I made the GeekCache code and the interfaces as clean and expressive as I was able, and while I still see room for improvement in places, on the whole, I'm pleased with the way it's turning out. I would be quite happy to hear if anyone else found a use for it in their own projects!

Rehashing Password Hashes

Every time a prominent website has their user database stolen, the question of how the passwords were stored comes up. Even properly hashed password tables are vulnerable to some attacks, but the better the hashing algorithm, the less the risk to the users.

Tools have come a long way. In the PHP world, we now have the password_hash() function built into PHP 5.5. However, many of us work on websites which have a table of old passwords hashed by older, less secure tools or by some home-brewed hashing system. When I read about methods of upgrading old systems, the recommendation I sometimes see is that the best you can do to correct this situation is have a system that validates passwords when users log in using the old system, then silently rehashes them with the new improved algorithm. Unfortunately, this leaves infrequent users, or users who created accounts and never logged in again, more vulnerable, should your database be compromised. But you can't create the correct new hash without the original password, right?

This is absolutely correct; luckily, there is another alternative: you can rehash the old hashes. Here is how it works:

Say your legacy systems works roughly as follows:

public function savePassword(User $user, $password)
{
   $salt = $this->getRandomSalt();
   $hash = $this->hashPassword($password, $salt);
   $user->setPasswordHash($hash);
   $user->setPasswordSalt($salt);

public function checkUserPassword(User $user, $password)
{
   return $this->checkPassword($password, $user->getSalt(), $user->getHash())
}

protected function checkPassword($password, $salt, $hash)
{
   return $hash == $this->hashPassword($password, $salt);
}

When you want to upgrade to password_hash, run something resembling the following on each user.

public function convertUserPassword(User $user)
{
   $user->setPasswordVersion('legacy');
   $user->setNewPasswordHash(password_hash($user->getHash()));
}

Once the passwords are converted, you can change your password checking code:

public function checkUserPassword(User $user, $password)
{
   if($user->getPasswordVersion() =='legacy')
   {
      // using our old hashPassword function and our old salt
      $oldStyleHash = $this->hashPassword($password, $user->getSalt());
      return password_verify($oldStyleHash, $user->getNewPasswordHash());
      // if you want, now might be a good time to hash the actual password,
      // and upgrade the user's password version.
   }
   // else, use password_verify() as normal
}

Notice what we've done here: We don't have the user's password, but we do have a hash that we know can be generated from the password and the proper salt. We've treated that hash as the password for the new, improved hashing system. We know that when the user does type in their password, we will be able to regenerate that hash because we still have the salt.1

Now for the important part: you can delete all of the the old password hashes (but not the salts) from the database and any backups. All users are now protected by the new password hashing algorithm, even if they never log in again.

Note that in practice, some legacy systems will have saved the salt and the hash as part of the same string, but these should be separable. The important thing is to keep the salt but discard the old hash.

Once the original password hashes are deleted, ALL of your users should benefit from the improved hashing algorithm, not just those who log in again.

Last minute thought: This functionality could be built into the PHP password hashing API directly for hashes originally created by the API. A password_rehash() function would take a password hash created with a now-deprecated algorithm, rehash the hash as described in this post via the new algorithm, and store both the old and new salt (and algorithm codes) in the new password hash string, such that password_verify() could verify it.


  1. The first time a user logs in under the new system, you can hash the plain text password with password_hash(), and upgrade the password version accordingly. That way, if you ever drop support for the old password system entirely, anyone who's logged in since then will have their password saved in the new format. However, it is not necessary to wait for them to log in to delete the original hash. ↩︎