Home — Daniel Karp's Blog — Daniel writes about programming, mostly.

Using Edge Side Includes to improve HTTP Caching for your API

At BoardGameGeek, we've been working on a new, mostly RESTish private API for our new Angular front end. In order to improve performance of the front end, as well as to take pressure off of our application servers, we added Varnish caching to our API via Fastly. there is a good summary of Varnish at the link provided, but in short, it is a cache that sits between our server and the API clients (for us, users' web browsers), and caches GET requests based the HTTP response headers provided by our servers.

As with most caching solutions, the caching is the easy part; the hard part is cache invalidation, and making sure that we don't return stale data. I'd like to describe how we've made use of Edge Side Includes (ESI) to ensure cache consistency at our endpoint while minimizing complexity for the API consumer.

Motivation

Let's take a look at a simplified version of our comments API. A GET request to the endpoint /api/comments/[id] returns a single comment by id. It might look something like this:

{
    "type": "comments",
    "id": "8243204",
    "body": "You should submit it to the photo contest",
    "source": {
        "type": "images",
        "id": "4317736"
    },
    "postdate": "2018-09-24T06:17:32+00:00",
    "author": 101597,
    "thumbs": 154,
    "links": [
        {
            "rel": "self",
            "uri": "/api/comments/8243204"
        }
    ]
}

Almost everything returned here is set at the time of creation, and is unlikely to change. We can cache this endpoint by setting the appropriate headers at the origin server, and purging the url (or invalidating with a surrogate key) if the user edits the comment.

However, there is a problem: one value, thumbs, can change at any time, and is controlled by a totally separate system. We could have the thumbs system purge the comment cache, but that would mean constantly purging a comment that receives many thumbs, just to change one number; moreover, I'd rather the thumbs system, which is responsibly for thumbs on all sorts of content, not have to be that smart.

More Motivation: Collections

Now let's look at a second, related problem: a GET request to the endpoint /api/images/4317736/comments returns the first page of comments posted on image 4317736. It might look something like this:

{
    "comments": [
        {
            "type": "comments",
            "id": "8243204",
            "body": "You should submit it to the photo contest",
            "source": {
                "type": "images",
                "id": "4317736"
            },
            "postdate": "2018-09-24T06:17:32+00:00",
            "author": 101597,
            "thumbs": 154,
            "links": [
                {
                    "rel": "self",
                    "uri": "/api/comments/8243204"
                },
            ]
        },
        // 50 More comments
        {},
        {},
        ....
    ],
    "links": []
}

We'd like to cache this collection in Varnish, but each comment in the collection can be edited by a user. Moreover, the API has some flexibility in ordering and pagination--for example, you can ask for comments after (or before) a particular comment (by id). A user adding or editing one comment would require us to purge many API endpoints, each of which would require quite a bit of data from our servers.¹

Solve by loading separate endpoints

In both of these cases, we have mixed content in one endpoint that becomes stale at diffent times. An obvious solution is to say that if the content needs to be purged at diffent times, it should be loaded by different endpoints. This is a perfectly good solution; for the single comment endpoint, we could add a link to the thumbs count in the links section, so that the client can look up this one value. Similarly, the collection of comments, rather than returning the full comments, could return links to the API endpoint for each comment. We could load all of these links over HTTP/2, and it would be fast. In fact, in many cases, we do exactly that. But it can create a lot work for the consumer, and multiplies our HTTP requests (which we pay for by request).

Edge Side Include

Our solution is to use an Edge Side Includes in both of these cases.² It works like this: The comment returned from our server to Varnish acually looks like this:

{
    "type": "comments",
    "id": "8243204",
    "body": "You should submit it to the photo contest",
    "source": {
        "type": "images",
        "id": "4317736"
    },
    "postdate": "2018-09-24T06:17:32+00:00",
    "author": 101597,
    "thumbs": <esi:include src="/api/thumbs?type=comments&id=8243204"/>,
    "links": [
        {
            "rel": "self",
            "uri": "/api/comments/8243204"
        }
    ]
}

Notice the value for thumbs; if you are not familiar with Edge Side Includes, this may look weird. It is not valid JSON, for one thing. But this is not a problem, because, properly configured (Varnish configuration is beyond the scope of this post, but it isn't too hard), Varnish understands ESI tags, and resolves them. It does so by looking up the url in its cache, and if it is present, inserting the content, and otherwise requesting the url from our servers, just as if any other request had come in. In this case, /api/thumbs?type=comments&id=8243204 will return 154 (which is, incidentally, valid JSON, as are values like true, false, and null), and Varnish will insert it into the result and return it to the user.

Varnish stores in cache for each endpoint what is returned from our server. That means that when a user requests a comment, it pulls the main response from comment from cache (or from our server), then separately pulls the thumbs count from cache (or from our server) and combines them. On the back end, the comments system can worry about purging the comments cache when necessary, and the thumbs system can worry about the purging the thumbs cache. The API consumer doesn't have to worry about any of this, and always gets consistent data.

Here is how the data is returned from our server for the collection of comments:

{
    "comments":[
        <esi:include src="/api/comments/8243204"/>,
        <esi:include src="/api/comments/8242210"/>,
        <esi:include src="/api/comments/8238346"/>
    ],
    "links":[],
}

Again, this is not valid JSON until Varnish resolves the ESIs. It looks up each comment, resolves the additional Edge Side includes in each comment (as above), and finally returns the full list of comments. For each request, Varnish again builds up the ESIs from cache, sometimes over multiple levels. This isn't necessarily instantaneous, especially if some of the caches have are stale, but it is pretty fast.³

Benefits

This may all seem a bit complicated, but look at what we gain:

First, we get almost automatic cache consistency. When a single comment is updated, we only have to purge the endpoint for that single comment. When the thumbs system causes the value of thumbs to change for one comment, it purges the endpoint for its value. All of the other comments and other values can stay in cache undisturbed. When a new comment comes in, all we have to purge is the endpoint for the list of comments (in the form of ESIs)--this is much better purging endpoints that return the entire content of each comment.

Second, since all of these ESIs are resolved in Varnish, to the extend that the endpoints are all in cache, one endpoint with many ESIs counts as a single request, for billing purposes.

Caveats

There are trade offs with this solution. It requires your API endpoints for single items to return fairly simple results-- our comments API, for a single comment, returns that comment in the root of the JSON response, and not much else. If we uses some more complex API specification like JSON API, this wouldn't work nearly so cleanly, especially for collections.

Generating those API responses with the ESI tags embedded requires jumping through some hoops--because what we are generating is not valid JSON, standard libraries don't work. Our solution involves using placeholder strings, followed by JSON encoding, followed by some string replacements.

In addition, our API responses are nonsensical without Varnish (or something else to resolve the ESIs) as part of the stack. That means more complexity for testing, either automated or manual. Varnish has to be a part of each developers stack for testing in the browser.

There is a performance penalty for adding ESIs (vs looking up a single cached value)--although it is very fast, we wouldn't want to add ESIs for values that aren't actually needed most of the time when an API endpoint is loaded.

There can be issues when an item is deleted; ideally, you'd purge all caches that might have ESIs for the deleted item, but that is not always practical. The solution we've come up with to deal with this issue is to have our 404 API responses return the text null (which is also valid JSON) so that any endpoint that includes an ESI to an invalid item will still return valid JSON. Then our consumers have to understand how to deal with null values in a collection.

Evaluation

Despite these issues, we've found using ESIs in our API to be well worth the trouble. Some of the biggest benefits are the cache consistency with simple purging code, and the relative simplicity of use for the consumer.

I recommend ESIs for three (related) cases:

When the API endpoint returns a collection, use an ESI for each item in the collection, so each item can be invalidated separately.
When a value that is controlled by a separate system is added to an API endpoint (such as thumbs), use an ESI to include that value, so that each system can control its own cache invalidation.
When an API endpoint has a mix of stable and dynamic data (even if it all orginates from the same system), include the dynamic data by ESI (or vice versa). This allows the dynamic data to stay current, while preventing requests for the more stable data from hitting the servers.

I encourage other API authors to give this approach a try. I know this post is vague related to the details--I'd be happy to answer questions I can about implementation.

We do most of our cache purging manually, and don't expire them based on time, unless absolutely necessary. Because we purge with Surrogate keys, this isn't quite as hard as it might seem, but it would still be much more difficult without ESIs. ↩︎
OK, I'm about to lie. We actually do use a link in the links section to get the number of thumbs. But this was an easier to understand example than what we actually use ESIs for in the comments system. ↩︎
It could be even faster. We use Fastly to provide our Varnish caching services. Unfortunately, the version of Varnish that Fastly uses resolves Edge Side Includes in series. This can be a problem, especially if there are several stale caches. But the newest version of Varnish Plus resolves Edge Side Includes in parallel! On my wish list of things I'd like to do is to try out Varnish Plus. (Are you listening, Fastly? Parallel ESI resolution is a killer feature!) ↩︎

Introducing GeekCache

I'm happy to announce the beta release of my first open source project, GeekCache. GeekCache is a PHP library for key/value storage (currently implemented for Memcached). Why release a new caching library, when excellent libraries such as Stash and doctrine/cache are already out there? Because I needed features that those libraries did not offer. GeekCache is a complete rewrite of a caching library we've used internally at BoardGameGeek for years. We are dependent on its features, some of which are entirely unavailable in other libraries (as far as I know), and certainly not all in the same library.

In addition to the basic features you are likely to find in almost any caching library, here are the key features that distinguish GeekCache:

Invalidation via tags
Memoization
Regeneration via pass-through regenerator callables
Soft invalidation, for returning stale data when regenerating through queued processes.
a fluent interface for cache item creation

Below are a few quick examples. For more details, including how to use the included service provider to get a the cache builder and clearer, see the GeekCache GitHub page.

Basics

I most often use GeekCache to cache the results of MySQL queries. On a busy site such as BoardGameGeek, it would be very slow to pull an item from the database every time we needed it. Instead, we almost always store the results of a query in cache, so that the query itself is run only once.

But storing in cache is the easy part. The hard part is making sure that the cache is cleared when it is no longer valid. This is where tags come in.

Memoization

When you add memoize() to a chain of build methods, cache values will be stored in a local array cache for the duration of a php process. If a cache item is retrieved a second time over the course of the page load, the value from the local cache will be returned. This can substantially improve performance if some values are looked up multiple times on a page load.

Regenerators

Regenerators are quite powerful. Not only do they make caching code cleaner, by allowing the process that regenerates the cache to be handled by GeekCache (via a closure or any other callable), they can also allow you to return stale data to users (rather than blanks) when the process to regenerate the data is too slow to be run on page generation.


$cacheitem = $cacheBuilder
    ->memoize()
    ->addTags('users', "user_1")
    ->make("tag_user_1", 3600);

$regenerator = function () use ($this, $userid) {
    return $this->getUser($userid);
}

$value = $cacheitem->get($regenerator);
// $value is the result of getUser

$value = $cacheitem->get();
// $value is the correct result, because the regenerated value has been put into cache

$cacheClearer->clearTags("user_1");

$value = $cacheitem->get();
// $value is now false, because one of the tags has been cleared 

$queuedRegenerator = function() use ($this, $userid) {
    // queue up a process to regenerate the results in another process
    // returning false indicates that a process has been queued, so any stale data
    // that might be available is returned
    return false;
}

$value = $cacheitem->get($queuedRegenerator);
// $value is the original cached user
// Tag are cleared via soft invalidation, so that stale data can be
// returned, if and only if a process is spun off to regenerate the cache

See the documentation for how to add a grace period, so that stale data can also be returned from caches that have expired due to time.

Other Stuff

This is just some of what GeekCache has to offer. In addition to other features, such as a counter (including atomic incrementing), GeekCache also offers fully tested, extensible code. Except where absolutely necessary, all classes are immutable--no need to worry about whether reusing a builder will mean bleed over from a previous use.

I'd love to get some constructive feedback on places people think GeekCache could be further improved, especially (but by no means exclusively) from people who might want to use it in their own projects. In particular, I'm still not fully satisfied with some of the naming choices I've made. Take a look under the hood, and tell me what you think.

I made the GeekCache code and the interfaces as clean and expressive as I was able, and while I still see room for improvement in places, on the whole, I'm pleased with the way it's turning out. I would be quite happy to hear if anyone else found a use for it in their own projects!