Using Edge Side Includes to improve HTTP Caching for your API 26 September 2018

At BoardGameGeek, we've been working on a new, mostly RESTish private API for our new Angular front end. In order to improve performance of the front end, as well as to take pressure off of our application servers, we added Varnish caching to our API via Fastly. there is a good summary of Varnish at the link provided, but in short, it is a cache that sits between our server and the API clients (for us, users' web browsers), and caches GET requests based the HTTP response headers provided by our servers.

As with most caching solutions, the caching is the easy part; the hard part is cache invalidation, and making sure that we don't return stale data. I'd like to describe how we've made use of Edge Side Includes (ESI) to ensure cache consistency at our endpoint while minimizing complexity for the API consumer.

Motivation

Let's take a look at a simplified version of our comments API. A GET request to the endpoint /api/comments/[id] returns a single comment by id. It might look something like this:

{
    "type": "comments",
    "id": "8243204",
    "body": "You should submit it to the photo contest",
    "source": {
        "type": "images",
        "id": "4317736"
    },
    "postdate": "2018-09-24T06:17:32+00:00",
    "author": 101597,
    "thumbs": 154,
    "links": [
        {
            "rel": "self",
            "uri": "/api/comments/8243204"
        }
    ]
}

Almost everything returned here is set at the time of creation, and is unlikely to change. We can cache this endpoint by setting the appropriate headers at the origin server, and purging the url (or invalidating with a surrogate key) if the user edits the comment.

However, there is a problem: one value, thumbs, can change at any time, and is controlled by a totally separate system. We could have the thumbs system purge the comment cache, but that would mean constantly purging a comment that receives many thumbs, just to change one number; moreover, I'd rather the thumbs system, which is responsibly for thumbs on all sorts of content, not have to be that smart.

More Motivation: Collections

Now let's look at a second, related problem: a GET request to the endpoint /api/images/4317736/comments returns the first page of comments posted on image 4317736. It might look something like this:

{
    "comments": [
        {
            "type": "comments",
            "id": "8243204",
            "body": "You should submit it to the photo contest",
            "source": {
                "type": "images",
                "id": "4317736"
            },
            "postdate": "2018-09-24T06:17:32+00:00",
            "author": 101597,
            "thumbs": 154,
            "links": [
                {
                    "rel": "self",
                    "uri": "/api/comments/8243204"
                },
            ]
        },
        // 50 More comments
        {},
        {},
        ....
    ],
    "links": []
}

We'd like to cache this collection in Varnish, but each comment in the collection can be edited by a user. Moreover, the API has some flexibility in ordering and pagination--for example, you can ask for comments after (or before) a particular comment (by id). A user adding or editing one comment would require us to purge many API endpoints, each of which would require quite a bit of data from our servers.1

Solve by loading separate endpoints

In both of these cases, we have mixed content in one endpoint that becomes stale at diffent times. An obvious solution is to say that if the content needs to be purged at diffent times, it should be loaded by different endpoints. This is a perfectly good solution; for the single comment endpoint, we could add a link to the thumbs count in the links section, so that the client can look up this one value. Similarly, the collection of comments, rather than returning the full comments, could return links to the API endpoint for each comment. We could load all of these links over HTTP/2, and it would be fast. In fact, in many cases, we do exactly that. But it can create a lot work for the consumer, and multiplies our HTTP requests (which we pay for by request).

Edge Side Include

Our solution is to use an Edge Side Includes in both of these cases.2 It works like this: The comment returned from our server to Varnish acually looks like this:

{
    "type": "comments",
    "id": "8243204",
    "body": "You should submit it to the photo contest",
    "source": {
        "type": "images",
        "id": "4317736"
    },
    "postdate": "2018-09-24T06:17:32+00:00",
    "author": 101597,
    "thumbs": <esi:include src="/api/thumbs?type=comments&id=8243204"/>,
    "links": [
        {
            "rel": "self",
            "uri": "/api/comments/8243204"
        }
    ]
}

Notice the value for thumbs; if you are not familiar with Edge Side Includes, this may look weird. It is not valid JSON, for one thing. But this is not a problem, because, properly configured (Varnish configuration is beyond the scope of this post, but it isn't too hard), Varnish understands ESI tags, and resolves them. It does so by looking up the url in its cache, and if it is present, inserting the content, and otherwise requesting the url from our servers, just as if any other request had come in. In this case, /api/thumbs?type=comments&id=8243204 will return 154 (which is, incidentally, valid JSON, as are values like true, false, and null), and Varnish will insert it into the result and return it to the user.

Varnish stores in cache for each endpoint what is returned from our server. That means that when a user requests a comment, it pulls the main response from comment from cache (or from our server), then separately pulls the thumbs count from cache (or from our server) and combines them. On the back end, the comments system can worry about purging the comments cache when necessary, and the thumbs system can worry about the purging the thumbs cache. The API consumer doesn't have to worry about any of this, and always gets consistent data.

Here is how the data is returned from our server for the collection of comments:

{
    "comments":[
        <esi:include src="/api/comments/8243204"/>,
        <esi:include src="/api/comments/8242210"/>,
        <esi:include src="/api/comments/8238346"/>
    ],
    "links":[],
}

Again, this is not valid JSON until Varnish resolves the ESIs. It looks up each comment, resolves the additional Edge Side includes in each comment (as above), and finally returns the full list of comments. For each request, Varnish again builds up the ESIs from cache, sometimes over multiple levels. This isn't necessarily instantaneous, especially if some of the caches have are stale, but it is pretty fast.3

Benefits

This may all seem a bit complicated, but look at what we gain:

First, we get almost automatic cache consistency. When a single comment is updated, we only have to purge the endpoint for that single comment. When the thumbs system causes the value of thumbs to change for one comment, it purges the endpoint for its value. All of the other comments and other values can stay in cache undisturbed. When a new comment comes in, all we have to purge is the endpoint for the list of comments (in the form of ESIs)--this is much better purging endpoints that return the entire content of each comment.

Second, since all of these ESIs are resolved in Varnish, to the extend that the endpoints are all in cache, one endpoint with many ESIs counts as a single request, for billing purposes.

Caveats

There are trade offs with this solution. It requires your API endpoints for single items to return fairly simple results-- our comments API, for a single comment, returns that comment in the root of the JSON response, and not much else. If we uses some more complex API specification like JSON API, this wouldn't work nearly so cleanly, especially for collections.

Generating those API responses with the ESI tags embedded requires jumping through some hoops--because what we are generating is not valid JSON, standard libraries don't work. Our solution involves using placeholder strings, followed by JSON encoding, followed by some string replacements.

In addition, our API responses are nonsensical without Varnish (or something else to resolve the ESIs) as part of the stack. That means more complexity for testing, either automated or manual. Varnish has to be a part of each developers stack for testing in the browser.

There is a performance penalty for adding ESIs (vs looking up a single cached value)--although it is very fast, we wouldn't want to add ESIs for values that aren't actually needed most of the time when an API endpoint is loaded.

There can be issues when an item is deleted; ideally, you'd purge all caches that might have ESIs for the deleted item, but that is not always practical. The solution we've come up with to deal with this issue is to have our 404 API responses return the text null (which is also valid JSON) so that any endpoint that includes an ESI to an invalid item will still return valid JSON. Then our consumers have to understand how to deal with null values in a collection.

Evaluation

Despite these issues, we've found using ESIs in our API to be well worth the trouble. Some of the biggest benefits are the cache consistency with simple purging code, and the relative simplicity of use for the consumer.

I recommend ESIs for three (related) cases:

  • When the API endpoint returns a collection, use an ESI for each item in the collection, so each item can be invalidated separately.
  • When a value that is controlled by a separate system is added to an API endpoint (such as thumbs), use an ESI to include that value, so that each system can control its own cache invalidation.
  • When an API endpoint has a mix of stable and dynamic data (even if it all orginates from the same system), include the dynamic data by ESI (or vice versa). This allows the dynamic data to stay current, while preventing requests for the more stable data from hitting the servers.

I encourage other API authors to give this approach a try. I know this post is vague related to the details--I'd be happy to answer questions I can about implementation.


  1. We do most of our cache purging manually, and don't expire them based on time, unless absolutely necessary. Because we purge with Surrogate keys, this isn't quite as hard as it might seem, but it would still be much more difficult without ESIs. ↩︎

  2. OK, I'm about to lie. We actually do use a link in the links section to get the number of thumbs. But this was an easier to understand example than what we actually use ESIs for in the comments system. ↩︎

  3. It could be even faster. We use Fastly to provide our Varnish caching services. Unfortunately, the version of Varnish that Fastly uses resolves Edge Side Includes in series. This can be a problem, especially if there are several stale caches. But the newest version of Varnish Plus resolves Edge Side Includes in parallel! On my wish list of things I'd like to do is to try out Varnish Plus. (Are you listening, Fastly? Parallel ESI resolution is a killer feature!) ↩︎

Categories: varnish, caching, api