More and more organisations are embracing mono-repositories as it makes dependency management simpler (read more here). However, implementing and maintaining a mono-repository at scale has its challenges. This is due to the fact that building and testing a large repository is no simple feat. At the core is a question of performance: at each commit, we need to re-build and re-test the whole repository. That sounds very expensive, and therefore, a caching mechanism should be introduced. The question then becomes: how do we find out which parts we need to rebuild and retest?
Larger organisations such as Google, Facebook and Twitter have adopted the mono-repository approach as they have the means to maintain such an infrastructure, but unfortunately solutions for smaller organisations are limited. At Electricity Maps, we wanted to embrace the mono-repository structure. This post explains why we built brick, our home-made build tool for mono-repositories.
One of our products is Electricity Maps. It shows in real-time the origin of electricity used across the world, alongside its carbon footprint. We have a paid API which provides forecasts enabling anyone to optimise their electricity usage to reduce cost and emissions. Electricity Maps is divided into geographical zones that are defined in a definition file called zones.json. This definition file contains bounding boxes, data acquisition parameters and much more (I encourage you to take a look at it here).
In our mono-repository, we have a multitude of micro-services, that are represented as folders in our repository. Here are some of them:
We use continuous deployment, which means that each pull request that changes the zones.json definition will update several systems in one go:
This map on our API website is automatically updated each time we change the zones definition. The reason is that the map component is re-used between our API website and the Electricity Maps frontend.
This is in stark contrast with a multi-repository setting, where one would first update and release a new version of the zones.json definition, then, for each dependent service, submit a pull request to update the zones.json dependency. This is quite cumbersome and doesn’t scale well. Having a single repository enables us to submit a pull request representing a single atomic change. It enables us to move faster, keep a very modular codebase and increase the amount of code shared.
The issue faced by mono-repositories is that building and testing takes time as all build steps and tests need to be run, caused by the fact that it is not known which changes affect which builds and/or tests. Several tools address this need, like Bazel used at Google, Buck at Facebook or Pants at Twitter (see a longer list here). The problem is that those tools require a steep learning curve for new developers. We wanted to simplify that drastically.
When designing and assessing existing build tools, we looked at the following requirements:
It turns out that buildkit, the system building Docker images, already fulfills a lot of these requirements:
A small CLI called brick was then built in Python. It searches for a BUILD.yaml build definition file and then passes all relevant flags to the buildkit build engine.
As an example, here’s the BUILD.yaml file we use to build and deploy our API website from the ./api folder of our monorepo:
# This is the docker image that will be used (optional)
# These are the commands that will be run
# Failing to declare those inputs will cause the `yarn`
# command to fail due to missing files
- yarn lint
- yarn build
# Outputs will be copied from the image to the host
Running the build is as easy as running brick build from that folder. For each stage, if the commands have not changed and the inputs have not changed, then the build results are read from cache (this is done by the docker engine itself, which will cache ADD and RUN steps). Under the hood, what actually happens is that brick generates a temporary Dockerfile that it builds.
Once the build is done, brick reads the outputs array defined in the BUILD.yaml and copies the files from the image to the host machine. Voilà!
A no-op build with ~15 targets (i.e. micro-services) runs in 30seconds on our CI server (n1-standard-2 on Google Cloud Compute Engine).
As inputs and outputs are clearly defined, dependencies can automatically be detected: if an input intersects the output of another build, then that other build will automatically be triggered.
Starting to work on a new target is as simple as running brick develop, given that the right configuration is added to BUILD.yaml:
command: yarn develop -H 0.0.0.0
What happens under the hood is that a docker run command is issued, re-using the prepare step to make sure every developer has the same dependencies. Using our API website as an example, if a new dependency is added to the package.json file, then brick develop will automatically re-run the prepare step. This avoids the problem of developers having to remember to run yarn if dependencies have changed. Furthermore, it guarantees that developers are always working on a consistent system.
brick develop runs in an isolated container, and automatically mounts volumes to the host filesystem based on the inputs declarations. This ensures that watching file changes works, while still keeping the guarantee that missing inputs will cause errors, forcing the developer to declare them all.
Tests can also be defined in a similar fashion:
- yarn test
Deployments can also be cached. In this example, we define a deployment configuration which copies the output of our build step (the public folder) to a Google Cloud Bucket. Secrets are also defined, which will be mounted during the deployment stage.
- gsutil -m cp -a public-read -r public/* gs://static.electricitymaps.com/api
brick is still very experimental, and the documentation sparse. However it is already used internally for all of our builds and tests. If you feel like you want to help us out make it more robust, please reach out on our Slack or join the conversation on Github!