Bazel excels at speeding up build times. But what happens when a single task takes a long time to run? When that task is not cached, it can become a bottleneck for the build.
At $dayjob, I converted a large Java project to Bazel. The project had some integration test classes that had 25+ tests and would take 45+ minutes to fully run. Still being fairly new to Bazel, the only way I knew to speed this up was to break the test classes down to multiple smaller classes. This worked, but it was quite a bit of manual effort.
I hadn’t known at the time about a feature in Bazel called test sharding. Test sharding splits up the test cases for a single test target into multiple shards. Each shard can then be run in parallel, reducing the overall test time.
Let’s set up an example test that takes 5 minutes to run. In this example, we have 5 tests cases that each take 1 minute to run.
package com.example;
import org.junit.jupiter.api.Test;
public class ShardTest {
@Test
public void test1() throws InterruptedException {
Thread.sleep(60 * 1000);
}
@Test
public void test2() throws InterruptedException {
Thread.sleep(60 * 1000);
}
@Test
public void test3() throws InterruptedException {
Thread.sleep(60 * 1000);
}
@Test
public void test4() throws InterruptedException {
Thread.sleep(60 * 1000);
}
@Test
public void test5() throws InterruptedException {
Thread.sleep(60 * 1000);
}
}
java_junit5_test(
name = "ShardTest",
srcs = ["src/test/java/com/example/ShardTest.java"],
test_class = "com.example.ShardTest",
deps = [
":lib",
"@maven//:org_junit_jupiter_junit_jupiter_api",
"@maven//:org_junit_jupiter_junit_jupiter_engine",
"@maven//:org_junit_platform_junit_platform_launcher",
"@maven//:org_junit_platform_junit_platform_reporting",
],
)
➜ bazel test //test-shards-java:ShardTest
...
//test-shards:ShardTest TIMEOUT in 300.1s
Executed 1 out of 1 test: 1 fails locally.
Now, let’s split this test into five shards. We can do this by adding the shard_count
attribute to the java_test
rule.
The test cases within the target will be split into the number of shards specified. In this case, I split the test into 5 shards, since there are 5 test cases and each of them takes the same amount of time to run.
java_junit5_test(
name = "ShardTest",
srcs = ["src/test/java/com/example/ShardTest.java"],
shard_count = 5,
test_class = "com.example.ShardTest",
deps = [
"@maven//:org_junit_jupiter_junit_jupiter_api",
"@maven//:org_junit_jupiter_junit_jupiter_engine",
"@maven//:org_junit_platform_junit_platform_launcher",
"@maven//:org_junit_platform_junit_platform_reporting",
],
)
➜ bazel test //test-shards-java:ShardTest
...
//test-shards-java:ShardTest PASSED in 60.9s
Stats over 5 runs: max = 60.9s, min = 60.9s, avg = 60.9s, dev = 0.0s
Our test now runs in 1 minute instead of 5 minutes. This is a very simplistic example, but it shows how test sharding can reduce the time it takes to run tests by increasing the parallelism.
In a real world situation there some things to consider:
- The number of shards should be equal to or less than the number of test cases
- There is overhead in running a
java_test
target - Test cases are likely to have variance in run time, so you may not see a linear reduction in time as you increase the number of shards
- Keep an eye on the
min
time in the test results. If there are shards with a significantly lower time than the others, you may have too many shards
Because of all of these factors, there can be a bit of trial and error to find the optimal number of shards for your tests. In situations where there are a lot of test cases, I like to take a sort of binary search approach to find the optimal number.
Note: It is up to test runners to integrate with Bazel’s sharding feature. In this case, the runner implemented in rules_jvm is already set up to handle sharding.
The full code example can be found on my GitHub.