BazelCon 2025 recap

Nov 24, 2025

It has been just over two years since I started Blog System/5, and that means it’s time for the now-usual(?) BazelCon 2025 trip report!

The conference, arranged by the Linux Foundation, took place in Atlanta, GA, USA over three days: one for tutorials and two for the main talks. An extra hackathon day, organized by Aspect Build, followed. Unfortunately, a canceled flight meant I missed the tutorials, but I attended the rest of the events. As usual, it was a super-fun time to connect with old acquaintances and an energizing event that left me with plenty of new topics to research.

What follows is not a complete summary of the conference, as there were many talks I did not attend and conversations I missed. If you want the full firehose of videos, see the BazelCon 2025 YouTube playlist. And if you want a TL;DR… I’d pick the following highlights:

The ecosystem is maturing with bzlmod becoming mandatory and the BUILD Foundation being a reality in the near horizon.
Performance remains a key focus of the Bazel core team and the community, with innovative approaches like Skycache for client-side speed, sophisticated RE improvements for backend efficiency, and new rulesets like rules_img that focus on build speed.
Community tooling is expanding Bazel’s scope, with projects like Aspect’s task runner aiming to solve long-standing workflow gaps.

But the above is just tiny peek into the conference. So, strap your seat belts and let’s dive into the conference.

Opening words

Google opened by emphasizing their commitment to Bazel, highlighting its growing internal adoption. Their reasoning is that Bazel improves security and hermeticity, in addition to the usual benefits of faster builds and easier open-sourcing of code. This statement seems to be a response to last year’s proposal to create a non-Google Bazel Foundation, which would act as a “backup plan” should Google ever withdraw from the project.

Google provided two examples of its growing Bazel adoption. The first is Quantum AI, primarily written in Python and Rust, which saw an 80% reduction in CI time after a migration to Bazel that was driven by just one SWE. The second is their Google Distributed Cloud (GDC), a version of their cloud product that can run on-premise and in air-gapped instances. The GDC codebase weighs 2.6 GB, is developed by 1,300 engineers, and produces 600 GB of release artifacts. I have to question if the latter is a number to be proud of: when does this madness in bloat stop?

The introduction concluded with a few statistics: Bazel’s #general Slack channel has grown by 18% from the previous year to 8,500 users; there are about 10,800 repos on GitHub with MODULE.bazel files; and there are about 120,000 bzl files on GitHub.

Community updates

The next session was the customary round of community updates, presented by Jay Conrod from EngFlow and Alex Eagle from Aspect Build.

Here are the highlights:

Training day: There were six different sessions on the Sunday before the conference, and EngFlow is leading training efforts worldwide.
Gazelle: C++ support is on the way for this tool. Version 2.0 will simplify the extensions interface and improve performance.
BCR Mirror: Cloudflare is now hosting a mirror for the Bazel Central Registry (BCR). You can use it with Bazel 8.4+ by adding common --module_mirrors=https://bcr.cloudflaremirrors.com/ to your configuration—and this will become a default in a future release.
Documentation: The most common complaint in community surveys remains the documentation and the steep learning curve. To address this, the BCR website now features icons for sponsorship requests, deprecation notices, and provenance attestations. Furthermore, the Starlark documentation has been published and is now easier to read. In a move to empower the community, the documentation has been migrated out of Google’s internal infrastructure and is available at https://preview.bazel.build/
BUILD Foundation: The foundation currently has three founding members (Spotify, Uber, and Canva) and is looking for four more. The initial meeting is scheduled for December 4th. More on this in its own section.

Bzlmod was also a significant topic in this talk, but since it was covered at length in other presentations as well, I have dedicated a separate section to it below.

State of the union

As is tradition, the next session was the State of the Union talk, led by John Field and Tobias Werth from Google.

Here are some of the highlights from the update:

Local remote repo caching: This new feature is intended to allow the caching of repository rules across different workspaces.
Experimental WASM support: There is experimental support for WASM tools in repo rules to enable platform-independent tooling, but its future is still uncertain.
Performance improvements:
- Changes to NestedSets can save up to 20% of memory.
- Optimizations in Merkle tree handling can reduce wall time by up to 30%.
- Analysis phase caching is coming soon (see the “Skycache” section below for more details).
- There are ongoing efforts to cap disk usage.
- Path stripping, a feature announced last year, is now more mature and integrated with more rule sets, offering up to an 84% reduction in build time.
- Analysis time on flag changes has been reduced.
Java improvements: Caching has been improved, and method signature changes no longer affect downstream header builds.
Starlark flags: A new scoping API is available for Starlark flags.
PROJECT.scl: A new file is being introduced to provide a canonical place for project owners to map targets to flags. More details on its own section.
Starlarkification: This effort is almost complete. All rules are now decoupled, with the exception of a few integration points for C++. As of Bazel 9, autoloading of rules is disabled, which means users must now explicitly @load all the rules they use.
Starlark type system: Type annotations and type-checking are coming to Starlark. The syntax will be supported in Bazel 9, with type-checking planned for Bazel 10. The syntax aims for compatibility with Python 3 types, which introduces some limitations on what can be expressed.
JetBrains Bazel plugin: The JetBrains-owned Bazel plugin has reached general availability, making the Google-owned plugin a legacy tool. This new plugin promises a much-improved user experience, as it is faster and better integrates the Bazel build graph with IntelliJ’s native understanding of the project structure, avoiding expensive “sync” steps.
Internal APIs: There is ongoing work to separate core logic from service interactions (such as remote builds and file system operations), which sounds very similar to how Buck 2 was designed from the start.

Much like the community updates talk, this one also opened with bzlmod. Let’s dive into that topic next.

bzlmod

The workspace is dead; long live bzlmod! With Bazel 9, support for old-style workspaces has been removed, and given that the vast majority of rulesets now support bzlmod, it’s time for everyone to complete their migration. This is a positive development because bzlmod, through the Bazel Central Registry (BCR), simplifies rule discovery, project dependencies, and version conflict resolution. Interestingly, but not surprisingly, bzlmod and the BCR have effectively become a package manager for C++.

But it’s not all roses. The bzlmod migration has been a significant source of friction for the community due to the intrusive and difficult nature of the change. If you haven’t completed the transition, you can no longer upgrade to newer Bazel versions. The official documentation has also been subpar (which is not surprising), although a great set of articles from EngFlow is now available to clarify the migration process in great detail.

To assist with the migration, an automated migration tool is now available, and various people have reported success using AI tools to help with the transition. In a related development, there is now a Maintainer Community Program (MCP) for the BCR.

One major pain point I have faced, and one that seems to affect many others, is the tendency of the Bazel ecosystem to couple rule versions with library versions. For example, if you are using an old version of protobuf with an equally old version of rules_proto that is not compatible with bzlmod, you must upgrade rules_proto to migrate. However, this in turn forces you to upgrade protobuf itself. Updating a library can introduce API incompatibilities and behavioral changes, making the upgrade to bzlmod and subsequent major Bazel releases much more difficult than it needs to be.

RE action routing

Before we dive into remote execution, let me clarify some terminology. Remote Execution is abbreviated as RE, not RBE. RBE was Google’s now-discontinued cloud product for RE. While the terms are often used interchangeably today (even on Bazel’s own website), it’s a good idea to stick to RE. Similarly, avoid using the term “build farm” as there is a specific RE implementation named Buildfarm.

Remote execution is always a hot topic at BazelCon, and for good reason. One of Bazel’s biggest selling points is the performance boost from distributing builds across multiple machines, and nearly every Bazel-related startup offers some form of remote execution solution.

In the first RE-focused talk, Son Luong Ngoc from BuildBuddy explained how their product routes actions to maximize performance and minimize execution latency.

The talk began with a clear premise: remote builds are spiky and hard to binpack, so how can they be scheduled efficiently for both performance and cost? Here are some of the key features of their RE implementation:

Executor types: BuildBuddy provides both managed (OCI, Firecracker, macOS) and self-hosted (Docker, Windows, GPU, and more) executors, each with different performance, isolation, and cost characteristics.
Multiple action queues: Executors are organized into pools, and actions can specify which pool they should run on. When an action is received, the scheduler enqueues it in up to three different executors to minimize tail latency, based on the Sparrow scheduler research paper.
Work stealing: Dynamic scaling of executor pools is critical for handling spiky build workloads while keeping costs down. To make this more efficient, BuildBuddy allows new executors to steal work from existing ones, which helps redistribute the load.
Action merging: This feature coalesces multiple execution requests for the same action into a single execution. As we learned, this can be problematic if a misbehaving executor stalls multiple clients (e.g., several CI jobs). To address this, BuildBuddy speculatively re-executes a running action on a different executor after a certain threshold has passed.
Action cancellation: When a user presses Ctrl+C, they are likely to modify code and restart the build, so continuing to run in-flight actions is wasteful. For greater efficiency, BuildBuddy catches the finished event from the BEP and attempts to cancel all remotely-queued actions.
Binpacking: Different actions have different resource requirements, and it can be difficult to manually assign them to the right pool and executor. BuildBuddy automatically profiles executed actions (for metrics like peak memory and CPU consumption) and stores this information in the auxiliary_metadata field of the action result, which is then stored on the server. The scheduler uses these details to route actions more effectively.
Cold starts: Executing remote actions is similar to running lambda functions: a worker must be started, a container image fetched, and the action executed. To optimize this, BuildBuddy uses affinity routing, where a key is computed based on the primary output name (which is unique for each action) to extract platform, target, and output details. This allows similar actions to be routed to similar executors.
Recycled runners: Some customers need to maintain heavyweight processes on the remote worker, such as test databases or the Docker daemon. While not hermetic, this is often desirable for high-performance scenarios. The use of these features is customized through execution properties.
Custom resources: Some actions may require access to specialized resources like GPUs, FPGAs, or simulators. For better binpacking, customers can define the “size” of different executor types and annotate actions with the resources they consume.
Fair scheduling: With multi-tenancy, users can set a “priority” for the actions of a build using the --remote_execution_priority flag. A common use case is to define three priority bands: interactive builds, CI, and cron jobs. The BuildBuddy scheduler takes this property into account.

After detailing these existing features, the talk concluded with a glimpse into the future: extending the RE protocol with a remote build graph API. The current protocol is very chatty, making it difficult to colocate actions (such as a compile-link-test chain). A protocol that understands action relationships could significantly improve this.

This talk left me wondering which of these features are also offered by other major RE vendors and open-source implementations. I briefly chatted with the EngFlow folks at the conference and they told me they have most of these too; it’d be nice to have a comparison chart among vendors and free solutions.

RE cost savings vs. reliability

The next major topic for RE was cost savings and reliability, which was covered in at least two talks.

The first talk, presented by Rahul Roy from Glean, focused on how their adoption of Buildbarn for scalability unexpectedly doubled their CI costs. The primary causes for this increase were an up-to-20% per-action overhead in the Buildbarn worker and a lack of Bazel client caching in their GitHub Actions runners.

To solve these issues, they chose to adopt spot instances in their deployment of Buildbarn on GCP’s Kubernetes offering, but this is not as easy as it seems:

The default Kubernetes autoscaler relies on CPU and memory utilization for its decisions, but these metrics are poor predictors of CI traffic patterns. Action queue length is a much better indicator of developer activity, so a custom autoscaler is needed to achieve reasonable behavior.
Fair scaling can disrupt ongoing builds because GCP only provides a 90-second shutdown notice before preempting instances, which is not enough time to terminate running actions gracefully.
Cold runners are significantly slower than hot ones because they start with empty local caches. To mitigate this, they implemented a solution to reuse runner disks, but only for disks that had been used for builds covering more than 50% of the build graph. This strategy reduced startup times from eight minutes to less than one.

The second talk, given by Gabriel Russo from Databricks and Yuval Kaplan from EngFlow, focused on building at scale and how a naive move to remote execution can actually make CI slower.

They investigated the specific problem of using Docker in actions. The default behavior for remote actions is to bring up a fresh worker for each execution and tear it down afterward, wiping all state. However, Docker is stateful, which meant that actions were performing a great deal of redundant work. To solve this, they moved the snapshotter (a part of containerd) out of the action sandbox and into the execution container, allowing it to be shared across all actions on a given machine.

The takeaway is that you must be careful with RE. Your intuition for how local processes interact, especially with local state, does not always apply, and you can inadvertently make builds much slower and more expensive. But how do you develop such intuition? That question provides a perfect segue to the next talk.

Understanding build behavior

Bazel is a complex piece of software, and its interactions with other systems are not always straightforward. When things go wrong, can the data tell us what happened? Users are often frustrated by unexpected cache misses, frequently rebuilt targets, non-hermetic actions, and flaky tests. This was the topic of a talk co-presented by Eloise Pozzi from Canva and Helen Altshuler from EngFlow.

The answer to the question above is obviously yes, the data can tell us. But it’s not easy because there is a lot of data to comb through. Bazel produces the following datasets:

Build Event Protocol (BEP): A stream of events that Bazel sends to a remote server to publish build metadata and report progress. The metadata is the closest thing you will get to “usage telemetry” from Bazel as it captures all builds that were executed (who ran them and with which flags, what was executed, etc.) My pet peeve is that the BEP is incredibly complex and really difficult to manipulate post-facto, but I encourage you to generate one locally (via --build_event_json_file) and to spend “a few” minutes understanding what’s in there.
Exec log: This log captures everything that happened for actions, regardless of where they ran (the BEP only contains minimal details on local-only actions). It is not captured by default due to its verbosity. The --execution_log_compact_file flag, available since Bazel 7.1, makes it possible to capture this log unconditionally. Note that you need a parser to convert this binary log into something that can be read and compared across versions, and you need to manually build this parser out of Bazel’s source tree; yikes.
Exec graph log: This log captures how actions depend on each other. It can help quantify drag on the critical path, determine if the critical path is unique, or identify if there are competing ones. Use it to identify actions to prioritize for end-to-end build optimization.
Query commands: See the reference for bazel query, bazel cquery, and bazel aquery.
JSON profile: Also known as the performance profile, this captures a timeline of all actions executed by Bazel and can help understand build bottlenecks and tune parallelization.
RE profile: Similar to the JSON profile but this is captured server-side by some RE implementations. EngFlow generates one of these with specific details on how the workers executed actions (e.g. which pool ran an action, which is not something that’s visible to Bazel).

Returning to the BEP, it’s worth noting that one of the last messages it emits is buildToolLogs, which contains links to some of the other logs mentioned above. If you are using remote caching, these links will point to remote cache entries, allowing you to fetch them after the fact for any user build you need to investigate.

The talk also included a description of how to debug cache misses between CI and interactive developer builds, and I felt it was an almost-literal rehearsal of the article I wrote months ago on the same topic.

IDE support for monorepos

The next talk I attended, which hits close to my heart, was on exposing developer tools to the PATH. I believe that bazel run provides a terrible user experience, so I was keen to hear about alternatives. The talk was given by Florian Berchtold from Zipline.

One possible solution to this dilemma is to use direnv, a long-standing tool that hooks into the shell’s before-prompt command to run arbitrary code when entering specific directories. Scary? Yes. Useful? Also yes. The idea is to leverage direnv to bring project-specific tools into the PATH.

But where do these tools come from? While some people use package managers like nixpkgs, this can lead to duplication and inconsistencies in a Bazel-native world. For example, it’s common to pull in buildifier via bzlmod, but you might also want to expose it in the PATH. This is where bazel_env comes in: a hook for direnv that fetches tools using Bazel.

The talk explained how to use bazel_env with dev containers to install bazelisk, direnv, IDE extensions, and even starpls (the LSP for Starlark). It was also mentioned that for C++, bazel-compile-commands-extractor with clangd works reasonably well for VSCode but struggles with large repositories. For those, configure-vscode-for-bazel is recommended for a better experience.

The speaker also prepared a sample repository to demonstrate Bazel integration in the IDE for various languages, which you can find at hofbi/bazel-ide.

Skycache

A few months ago, we hosted a Buildbarn mini-conference at Snowflake where, in my opinion, the most exciting talk was Ed Schouten’s presentation on Bonanza. Shortly after, I published an article imagining the next generation of Bazel builds, because Bazel’s fat client model is problematic in many scenarios.

At BazelCon, we now heard Google’s approach to solving slow cold builds and Bazel client scalability in a talk on Skycache by Shahan Yang.

The core idea of Skycache is to serialize and remotely cache Skyframe, Bazel’s internal tracking system for build state (also known as the “build graph”). In his talk, Shahan outlined three major considerations for making this solution viable:

Top-down pruning: When you get a cache hit for a node in the graph, you don’t have to worry about anything below that node anymore. You can throw away everything underneath to keep memory usage constrained.
Invalidation computation: To determine what needs to be re-fetched from the cache, Skycache assumes “the same baseline” and then looks for file changes between the local system and the cache to find “what’s missing”. I know, this sounds fuzzy; refer back to the talk for the specifics.
Efficiency: For some nodes, it’s cheaper to recompute them than to fetch them from the cache, and this was true for many nodes before applying two optimizations. One was in the nested sets data structure, because the original approach to serializing them caused a 10x space blowup. The other was around serializing individual node values, because most of the time, those values share internals across nodes.

Internal dogfooding of Skycache showed that some builds dropped from 46 to 13 seconds, with similar reductions observed for analyzed targets, loaded packages, and more.

On the server side, this solution is RAM-intensive (similar to Bazel’s in-memory representations) and is complicated by the fact that users want to build at older versions and with a high version cadence. To be effective, Skycache needs to maintain “thousands of base images”.

A specific insight toward the end of the talk was that, for Google as a whole, 2.5% of targets account for 90% of all targets built. This suggests a potential optimization where only those targets are cached, but this has not yet been implemented.

There is no open-source implementation of Skycache, but the talk provided hints about which classes would need to be implemented to make it work. It seems that it shouldn’t be too difficult: the serialization code is already in place, so all that’s required is integration with a key-value store and Git.

While this talk was fascinating, I can’t help but feel that Google’s solution is a bit strange. They are opting to maintain a fat Bazel client instead of moving it entirely to the cloud, as Bonanza is attempting to do, and this feels weird to me knowing how the rest of their infrastructure works (or used to work a few years back).

Dynamic actions and Buck 2

Yes, this was BazelCon, but given Buck’s spiritual heritage, it was no surprise to see some Buck 2 content. Andreas Herrmann from Tweag took the stage to compare Bazel and Buck 2’s approaches to the efficient compilation of Haskell, highlighting the key role of dynamic actions in Buck 2.

The core of the issue lies in how Haskell modules and libraries are compiled and exposed in the Haskell rules. The summary is as follows:

Modules are individual .hs source files. These act as the compilation unit.
Libraries are collections of modules, and is what’s often modeled in the build via haskell_library rules. Therefore, library targets tend to group various modules.
Compiling an .hs file produces an .o object file but also a .hi interface file. Think of the latter as a precompiled header or an interface JAR.
To compile a module, we need the .hi files of its dependencies, not their .o files. This is the key difference between compiling a cc_library vs. a haskell_library, because in the C/C++ case, all individual sources can be compiled in any order, but in the Haskell case, they cannot.

With this in mind, the central question is: how can we parallelize the compilation of modules within a library when they must be compiled in dependency order?

In Bazel, the solution is to model the internal library modules as separate haskell_module rules, each with a static representation of its cross-module dependencies. However, this approach can be incredibly noisy. While Gazelle can help mitigate the issue, it is still not an ideal user experience.

Buck 2, on the other hand, provides dynamic dependencies, which make it possible to infer the module-level dependency graph at build time. The idea is to have an action that runs ghc -m to emit the cross-module dependency “mini-graph” for a set of modules, and then use a dynamic action to generate module-level compilation actions with the correct dependencies.

Task execution via Starlark

One of my original critiques of Bazel in 2015 was that while Bazel is excellent at building, it is not well-suited for other workflows. The specific example I gave was that developers want to install the software they have just built (the equivalent of make install), which is not easy to model in Bazel.

Well, fear no more. Aspect Build is developing a solution to this problem with Starlark-defined tasks and a custom CLI to run them. I found this to be very exciting, and it was a “hot topic” at the hackathon that followed the conference.

The premise of the talk was that, even with Bazel, developers still often rely on auxiliary scripts to install tools, Makefiles to drive workflows like setting up test servers or linting code, and YAML files to define complex CI tasks. While all of this should ideally be expressed in Bazel, there is currently no good way to do so.

In essence, Bazel is missing a “task runner”, and this is where Aspect’s newly-announced Extension Language (AXL) comes in. It’s a Starlark dialect for running tasks, which requires the Aspect CLI to execute. The CLI is a companion tool to Bazel that once “replaced” Bazel but no longer does.

With the new AXL language, you can define tasks in a way that is very similar to defining rules: you create a Starlark function, receive a context, and can then perform “stuff”. The key difference between tasks and rules is that tasks can trivially execute a build with ctx.bazel.build. Even more exciting is the ability to iterate over the BEP events that the build emits and interact with them!

The talk also demonstrated the use of WASM binaries for things like buildozer to write platform-agnostic AXL scripts that help with migrations and the like. But the sky is the limit here, and the new aspect-extensions GitHub organization is meant to collect all user-contributed tasks.

BUILD Foundation

The desire to create a Bazel foundation to protect the project and ecosystem, should Google ever “pull the plug”, was announced a year ago, but not much has seemed to happen since. In reality, a lot has been going on behind the scenes, but nothing has yet materialized for the average user.

As part of the unconference, we voted to have a BoF on the foundation to discuss its future.

The main question we tried to answer during the session was, “What could the foundation do?” Many ideas were brainstormed, including funding a technical writer, improving the quality of pull requests, maintaining important rulesets, and tackling tricky IDE integrations. However, the most popular idea was for the foundation to act as an intermediary between the community and Google, helping to prioritize the projects that the community needs most.

I think there is an AI transcript of the session somewhere but there is no recording. You’ll have to stay tuned for the news, or you can get involved via Slack. Reach out to Alex Eagle or Helen Altshuler.

Flagsets

When you have a small repository with a single project, you can easily record project settings (such as compilation targets and debug flags) in the top-level .bazelrc file. But what do you do when you start combining multiple projects into one repository? The build settings for a backend service might be different from those required for a frontend application.

PROJECT.scl files are here to help, and Susan Steinman and Greg Estren from Google were on hand to explain them.

The key problem being addressed is that while everyone intuitively understands what a “project” is, Bazel lacks a first-class representation of this concept. By introducing such a concept, the goal is to make bazel build //foo work consistently everywhere, without the need to specify any flags. This is the opposite of the current situation, where it is common for developers to create auxiliary scripts to run Bazel with different flags for different targets.

As for the format of PROJECT.scl, the presenters reminded us that .bazelrc is an ad-hoc language and one of the few places where Bazel does not use Starlark, despite the community’s desire for consistency. As a result, these new project files are written in the language we have all come to appreciate.

More specifically, PROJECT.scl files can appear multiple times in the directory tree, just like BUILD files. The first one found when walking up the tree from a given build target is the one that is used. The file contains a project definition, which in turn contains buildable units. These units can enforce different policies for flags, such as setting default flag values for a target or preventing users from modifying certain flags. Finally, it is also possible to define multiple configurations for a unit (e.g., release vs. development) and to switch between them using --scl_config=NAME.

Target-aware workflows with bazel-diff

One mistake that everyone makes when moving to a monorepo is retaining operations that scale with the size of the monorepo instead of the size of the change. In particular, it is extremely common to see CI workflows that run bazel test //..., either executing all tests from scratch or hoping that remote caching will prevent the re-running of unmodified tests. This is a bad practice. The overhead is significant, and the end-user experience is often terrible, especially when flaky tests are present.

bazel-diff is a tool that helps determine the targets affected by a given code change, allowing Bazel to build and test only those targets. Maxwell Elliott and Connor Wybranowsky were on hand to share the impact that developing this tool at Tinder has had on the company’s developer workflows.

The initial results of deploying bazel-diff were a 40% reduction in CI times at the 90th percentile, and up to a 76% reduction in the worst case. These kinds of improvements were transformative for users. In particular, because CI flows became much faster and more accurate, developers began to take ownership of test breakages and flakiness.

Unfortunately, as is often the case with “transformative performance improvements” (remember SSDs or the M1 chip?), the codebase continued to grow and eventually consumed all the gains from bazel-diff.

To improve on the original deployment, the new approach is to integrate bazel-diff more deeply with CI. The idea is to dynamically generate pipelines based on the changes in a pull request and select which ones to run at review time. For example, if any of the modified files have automated formatters, only the formatter pipelines will be triggered.

To recap, the presenters mentioned that the end-to-end adoption of bazel-diff has helped them save up to 93% of their time in CI. While the extra gains beyond the initial 76% did not lead to the same kinds of cultural changes that were originally observed, developers always appreciate faster workflows.

Supply chain security

If you have attended previous BazelCons, you will know that supply chain security is a recurring topic. This year was no different, with Mark Zeren from Broadcom and Tony Aiuto from Datadog presenting the latest news in this area.

The reason this topic is relevant to the conference is that Bazel is a key tool for producing reliable SBOMs, thanks to its hermeticity, sandboxing features, and fine-grained build graph entries. However, it’s not quite there yet.

From the beginning, Bazel included rules_license as a way to define per-package licensing details. However, this ruleset was “thrown over the wall” by Google when Bazel was first open-sourced and has not been fit for purpose.

Today, there is a new ruleset called supply-chain, with only one person from Google on the eight-person team. This new ruleset focuses on two things: metadata rules that code authors can apply to their BUILD files, and tools to generate provenance information and produce SBOMs. These two components are separate because the metadata rules are designed to be stable over time, while the tools are expected to change frequently.

What is missing from the new ruleset is licensing: the ability to generate copyright notices, validate linkage, and so on.

Faster container builds with rules_img

As I mentioned earlier, your intuition about what works well for local actions may not apply to remote actions, and container image building is a prime example of this.

In this talk, Malte Poll from Tweag took the stage to introduce rules_img, a new ruleset that replaces rules_oci and rules_docker. It is designed to minimize large blob transfers, resulting in significantly more efficient container builds.

I do not have written notes on this talk because I was too focused absorbing the many, many details given during the talk, so I strongly encourage you to watch it.

My talk on Java test slowness

To conclude this long recap, I will leave you with my own lightning talk on how our Java integration tests at Snowflake became significantly slower after we migrated from Maven to Bazel.

It’s only eight minutes long, but if you want the summary: Maven and Bazel compile Java code differently. Maven writes class files to a few directories on disk, while Bazel creates intermediate JAR files for every Java library. With Bazel’s more-detailed build graph, this causes an explosion in the CLASSPATH size and means that class files must be read from compressed JAR (ZIP) files instead of from disk.

I spent some time analyzing the problem and ruled out obvious factors like sandboxing and ZIP compression. I concluded that reading from JAR files is indeed slower than reading individual files from disk. (Why? I’m not sure, but I suspect there is an optimization that could be made in the class loader to fix this.)

To mitigate the problem, I created a new rule that uses the singlejar tool to merge all intermediate JARs into one. But this was easier said than done. The resulting combined JAR was huge and could not be reused across tests, so I had to develop a complex dependency-pruning rule to generate a combined JAR that could be reused across all tests without introducing class duplicates—all while remaining remote-execution-friendly.

With this new rule in place, we saw test runtimes drop by about 10 seconds per test, which brought them back to pre-Bazel levels. And with the improvements that Bazel brings to the build, an end-to-end reduction in test times.

And that’s a wrap! This has been more of a detailed summary than a brief recap, but my goal was to clean up and share all the notes I took during the conference. Apologies for the many talks I could not cover in this recap. Once again, head to the BazelCon 2025 YouTube playlist for all recordings.

If you are involved with Bazel at all or have any interest in build systems, I strongly encourage you to plan to attend next year. You’ll learn a lot from the talks of course, but what’s more, you’ll get to meet key people from tens of companies—people that hold the keys to how modern build tools and scalable development processes are being developed worldwide.

Blog System/5