Finding Things - Beautiful Code

Using Algolia’s Search-as-a-Service on Google AppEngine

YapMe went live on the AppStore a few weeks ago, and I though I’d share some insights on the engineering side of the product.

While some might this this is “just an app”, I took the decision (from day one, like for seevl), to built it not as a standalone application, but as a platform. In other words, it’s not an ad-hoc photo-app, but an “object-centred sociality” platform, where users:

  • interact (post, like, comment), around “media entities” (photos, sounds, etc.), and;
  • access their data (and create it) on their devices through a set of APIs.

To do so, I’ve decided to rely on the Google Cloud Platform. I’ve used it in the past for several music-tech experiments, but it was the first-time for a real, large-scale, application.

Google Cloud Platform: Everything but the search

To put it simply, the Google Cloud Platform is Google’s suite of tools designed to build and host systems in the cloud, using their powerful infrastructure. It includes many components, such as AppEngine (for hosting and running your applications), BigQuery (for large scale analytics), Prediction (for Machine Learning algorithms) or Cloud Endpoints (for building API endpoints and their SDKs), and many more.

While Datastore excel at storing / retrieving entities (Knowledge Graph style), the plain-text query capabilities of the overall platform are, surprisingly, extremely limited. Its search API is disappointed to say the least, as there is no easy way to run autocomplete, fuzzy searches are not supported, indexing is very basic, etc. That’s why we’ve decided to use Algolia to implement our search features – starting with user search.

Algolia: Plug-and-Play Search-a-a-Service

If you’re a reader of this blog, you might remember InstaSearch, my previous experiment with Algolia.

Algolia is Search-as-a-Service, and takes away all the burden of managing plain-text search: building and maintaining indexes, facets, fuzzy-search features, etc. – if you’ve deployed solr in the past, you know what I mean. It puts everything into a friendly – and impressively fast – Web-interface, coupled with a REST API and clients for almost every popular language.

Configuring Algolia's typo-tolerance
Configuring Algolia’s typo-tolerance

Setting-up Algolia on Google AppEngine

When using Python on AppEngine, you can easily index your data in  Algolia using its official Python library.  I’ve recently submitted a patch (from 1.7 onwards) which addresses a few issues with the former versions of the library when using it on GAE:

  • Replacing requests‘ Socket by Google’s urlfetch . While sockets are experimentally supported on AppEngine, they’re more costly than the standard urlfetch and, well, experimental (that said, you lose the advantages of socket connections);
  • Not-trying to validate Algolia’s domains not hosted under algolia.net (since the API use different falls-back servers), as GAE’s Python doesn’t support SNI, and raised a 40X error when trying to do so.

Since GAE supports virtualenv, you can simply install it by typing

virtualenv env
source ./env/bin/activate
pip install algolia

Then, on your appengine_config.py file(s)

import os
from google.appengine.ext import vendor

vendor.add(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'env'))

def webapp_add_wsgi_middleware(app):
  app = recording.appstats_wsgi_middleware(app)
  return app

And in any of your Web-app or models file

from algoliasearch import algoliasearch
from settings import ALGOLIA_ID, ALGOLIA_KEY, ALGOLIA_INDEX

algolia_index = algoliasearch.Client(ALGOLIA_ID, ALGOLIA_KEY)\
    .init_index(ALGOLIA_INDEX)
algolia_index.do_stuff(...)

Synchronize your data using NDB hooks

Last but not least, if you’re using the NDB Datastore API to model your data (as we do to represent users, media, relations, etc. in YapMe), keep in mind the pre/post-hooks available when you create or delete an entity.

You can easily attach any Algolia operation to those. For instance, each time a new user is created, YapMe’s back-end calls Algolia as follows, directly from our User model.

def _post_put_hook(self, future):
    # Some put() do not require re-indexing
    if hasattr(self, '_algolia_index'):
        self.algolia_index()

def algolia_index(self):
    """Index the user into Algolia."""
    Algolia.index_object({
        'objectID' : self.key.urlsafe(),
        'name' : self.name,
        'username' : self.username,
        'photo' : self.get_photo_url(),
        'biography' : self.biography,
        # To check if whoever made the search follows users from the result-set
        'followers' : [u.follower.key.urlsafe() for u in self.get_followers()[0]]
    })

Using this hook-baed approach, entities are immediately sync-ed between your Datastore and Algolia’s servers, making them available right away in any client which uses the same Algolia index.

Below, here’s a example of searching for users in our app, featuring the typo-tolerance feature.

Using GAE+Algolia in YapMe's iOS app
Using GAE+Algolia in YapMe’s iOS app

In “Beautiful Code”, Tim Bray wrote that

There are two different flavors of time that apply to problems of search. The first is the time it takes the search to run […], the second is the time invested by the programmer who builds the search function […]

I’m glad to say that using Algolia on top of AppEngine solves both issues, and we’re very happy with the decision of using it for YapMe. While GAE and the Datastore provides a solid fondation for all our major Database operations (managing users and their content, fan-out on social feeds, etc.), Algolia brought a plain-text search layer, definitely required from the user-experience perspective, with a minimal effort from the engineering side.

Leave a Reply

Your email address will not be published. Required fields are marked *