sbt remote caching – intro

sbt is a popular build tool used in the Scala community and in this article I want to describe remote caching in sbt.

I’m writing this because I have spent some time fixing a related bug in sbt. I want to explain what I’ve learnt about remote caching and in the next post, describe the bug and how it was fixed.

What is remote caching?

First, let me briefly explain compilation.

What is a compiler?

A compiler translates the code we write in our programmes into something the computer can understand and run.

In our case, working on Scala projects, the compiler is compiling Scala into Java byte code which the Java Virtual Machine (JVM) can run.

For a more detailed explanation on the compiler see here and for more detail on the JVM here.

How does compilation work in sbt?

sbt uses a compiler named Zinc, which is special because it’s an incremental compiler. Instead of compiling all our source code, Zinc knows to only compile the files that have been changed since the last time we compiled.

Zinc uses something called the Analysis File, to keep a check of which files have changed. It’s like book keeping, so that Zinc knows which files to compile.

Zinc is kind of like make up.

So in the morning I might apply make up to my whole face. During the day, instead of re-doing all my make up from scratch several times, I will just touch up the parts of my face which need a top-up. In doing so, I save myself lots of time because I’m not re-starting my make up from the beginning many times, I’m just re-doing the make up that is needed. Zinc is similar – it will only compile the code that has changed, which saves us time.

To read more about Zinc see here:

What is remote caching?

Ok so let’s talk about remote caching now that we’ve discussed incremental compilation.

Zinc is great, because it only compiles the code that has changed since the last compilation. But, what if we don’t just want to share Zinc’s book keeping (its understanding of which files have changed) with one machine? What if we want to share that information across multiple machines?

Here’s an example of this:

Imagine we have two friends, Adeola and Babatunde. They are both working on the same branch of a project, but each on their own machines.

Imagine if Adeola could compile the project on her machine and then Babatunde could pull her latest compilation so that he doesn’t have to re-compile the whole project?

Imagine if we could share incremental compilation across multiple machines? This means that Babatunde saves time, because he’s benefited from Adeola’s compilation of the same source code.

The above is a simple example, but the impact of remote caching can be so much greater. For example, continuous integration processes which are repeatedly compiling source code could share compilation histories to only compile the code that has changed.

How does remote caching work?

With remote caching we can push our compilation to a cache and we can also pull it. The sbt commands are very simple: pushRemoteCache and pullRemoteCache.

When we pushRemoteCache we are packaging up the project into a JAR file.

Here are more details on how to use remote caching in sbt:

In my next article, I’ll talk about the bug I fixed in sbt’s remote caching feature!

Documentation (I really enjoyed this video on Bazel, which I think is a nice introduction to remote caching. Note that it’s really different from remote caching in sbt, because everything is cached in Bazel and there’s no incremental compilation.)

2 replies on “sbt remote caching – intro”