Why Ember CLI uses Broccoli

Dive into Ember CLI's history with Broccoli.js, and learn about the thorny build tool problems it solves.

Transcript

To start off — what is Broccoli in the first place, and why is it used by Ember CLI at all?

The best way to answer this question is to go back in time before Ember CLI existed, and look at Ember's previous project management tool, which was called Ember App Kit. Ember App Kit provided some conventions and tooling to help devs write Ember applications. And just like Ember CLI, it needed to do things like compile JavaScript, Sass and Handlebars files, manage dependencies, minify code for production, run tests and more.

Now, one of the main libraries that Ember App Kit used under the hood was Grunt.js, which at the time was very popular. Grunt falls under a class of tools known as "task runners", which are libraries designed to provide nice high-level APIs for ordering and executing various tasks. And a "task" is just a general concept for any small program that you might run while doing your everyday application development.

Some of the tasks we run today in Ember CLI are things like ember build, ember serve, ember deploy, ember test, and ember generate. Ember App Kit had its own tasks similar to these, and it used Grunt to help execute them.

Now, as applications started to grow in size and complexity, one task in particular started to get really slow: the build task. This task is responsible for converting a project's input files into the final output files that get shipped to the browser — so doing things like

taking ES6 modules and compiling them to ES5
taking SASS files and compiling them to CSS
then taking those files and minifying and gzipping them

and so on. Building a project is the most performance-sensitive task, since it happens every time you change one of your source files in your project. When you make a change to a source file, you need to wait until the project has rebuilt before you'll see those changes in the browser. So if builds end up taking 10 seconds, and you're making dozens of changes to your codebase every day, this can end up putting a real strain on your development workflow.

Besides being slow, builds can also get corrupted, for example if files change in the middle of a build that's currently taking place.

As these and other problems continued to escalate, it became clear that Grunt — while great as a high-level task runner — was insufficient to handle the increasingly complicated task of building client-side applications. Building was a hard problem, and it deserved a dedicated toolchain that was focused on keeping rebuilds fast, durable and consistent.

And here's where Broccoli came in. Broccoli was designed specifically to deal with the problem of building applications. It made a few key architectural decisions that would enable developers to easily express their build programs, while keeping rebuilds fast. Let's unpack the most important aspects of this architecture, starting with the one that's responsible for giving Broccoli its name: trees.

So, what is a tree?

To explain, let's first imagine that we were writing a build tool of our own. In order for our users to be able to use our tool to write their own build programs, we'd need to expose some functionality that would let them compile their project's files. Now, one way we might do this would be to expose a function that took in a file and returned a transformed file

function(file) {
  return transform(file);
}

This would let users write programs to, say, compile their ES6 source files into ES5 files using the Babel compiler. This seems like a good approach — but what about when our users also want to generate source maps for each input file? This function signature no longer works; our users need to be able to pass in a single ES6 file, and return two files from the transformation. We could require them to use two different functions to do this — one that returns the file and another that returns the source map — but that's not really a natural expression of what it is they're trying to do.

So what if instead we change our function signature to take in a directory and return a directory?

function(directory) {
  return transform(directory);
}

We know that directories can contain many files (as well as other directories), so now our use case of compiling an ES6 module into an ES5 module plus a source map is very easily expressible.

This API also accounts for transformations that take multiple files as their input and produce one single output file — for example, taking many SASS files, running them through a SASS compiler (which traverses over all the @import statements), and producing a single CSS output file.

So, using directories makes all these scenarios much easier to express.

Now, there are even cases where users would want to take two separate directories, in two entirely different locations on disk, and transform both of them into a single output directory. As an example, think about merging an images folder and a fonts folder into a single public folder.

For this reason, we want to update our function signature again. Let's make it take in an array of directories, and return a single directory:

function([ dir1, dir2, ... ]) {
  return transform(...args);
}

This API accommodates all the situations we've mentioned so far, and it is essentially the API that Broccoli itself ended up exposing as a way to author build transformations.

Now, so far we've been talking about plain old files and directories, which are just standard concepts from the filesystem that we're all familiar with. So what are these trees that we keep hearing about?

Well, let's suppose we were done designing our build tool, and for our main interface we landed on the previous function that takes in an array of directories and outputs a single directory. Now, JavaScript applications have become quite complex, and our users will be running their project files through many transformations: compilation, compression and so on. Any application of reasonable size won't be able to just read all of their source files into JavaScript memory, perform all these transformations and then spit out the result; the memory footprint is way too big. What our users will inevitably end up doing is writing out their intermediate build transforms to temporary files on disk. But now, they'll need to manage the locations of those temporary files as they move their app through the rest of the build. Dealing with the filesystem adds a whole new layer of complexity to the problem, and managing temporary files can be quite tricky - our users now have to worry about issues like collisions, deadlocks, and inconsistencies across different operating systems.

So what if instead we offload that burden from our users and manage those temporary files ourselves? Users would still write functions that transform multiple directories and return a single directory, but they wouldn't be concerned with the actual root path on disk. By reading from and writing to managed directories, they wouldn't have to worry about file cleanup, deadlocks or any other thorny issues associated with async file I/O. But because they're still dealing with real directories under the hood, our users wouldn't have to learn an entirely new API - they'd still be able to use any existing tool or library that works with the file system and normal file handles.

And this is precisely what a Broccoli tree is. It's a small abstraction around filesystem directories that allows Broccoli to manage all the intermediate artifacts of your build program. The fact that a tree gives you direct access to the underlying filesystem means that transforms can be anything from node modules to C++ compilers - basically any program that runs on the command line.

And trees are the only primitive that Broccoli is aware of. Because of this, trees give us a great composition story: every transform operates on trees and returns a tree - so these transforms compose with each other out of the box.

So, now we know what a tree is - and also where Broccoli gets its name from. And we also know that Broccoli lets you write build programs by manipulating these trees. Now let's cover one last point about Broccoli's architecture, which has to do with how it ensures that your build program stays fast during rebuilds of your application.

In order to keep rebuilds fast, Broccoli relies heavily on caching. But you might be surprised to learn that Broccoli core itself doesn't actually cache any build artifacts. Instead, it delegates all of its caching to the plugin layer.

Broccoli plugins are essentially the transforms we described earlier. We'll cover their API later in this series, but for now it's enough to know that a Broccoli plugin is the main interface exposed by Broccoli: a plugin reads from an array of trees, and writes a single tree as its output.

But plugins also have the ability to write to a cache. How much or how little each plugin caches is up to them, and different plugins that leverage different compilers will use different caching strategies. In this way, Broccoli's architecture stays agnostic about how each plugin caches its transformations, and instead focuses on staying out of the way of developers being able to write fast plugins.

This means that Broccoli does not do partial rebuilds of your application. So when you change an ES6 file, Broccoli is going to rerun the entire Babel plugin on your whole tree of files. If it were to do a partial rebuild, Broccoli itself would need to understand the entire dependency graph of your ES6 code — and that's just for the Babel transformation. So, instead of pushing knowledge of how to build dependency graphs from ES6 code, from SASS files, from Handlebars templates and more into its core library, Broccoli simply sidesteps this problem and instead lets plugins and compilers handle it in a way that makes the most sense for each transformation.

As a result, Broccoli will actually rebuild your entire application any time an input file is changed. But most plugins will know that they don't have to rebuild all their trees, and instead just return their cached output — which is why Ember CLI rebuilds remain fast. This means that, practically speaking, if you change a CSS file in your Ember CLI application, you won't be recompiling all your Handlebars templates, or re-running all your JavaScript code through Babel, because those plugins will just return their cached outputs.

So, that's an overview of how Broccoli works. Let's recap with a quick summary.

Broccoli gives us a low-level composable DSL for writing out a build definition. Broccoli then takes that definition, turns it into a program, and reruns that program each time any one of your project's files change. Broccoli's architecture and plugin caching make these rebuilds fast, consistent and durable.

Broccoli's core primitive is a tree, which you can think of as a managed directory. A tree's root location on disk is obscured from you, but you can work with it just as if it were a directory. By managing your directory for you, Broccoli takes care of many thorny filesystem issues like creating and cleaning up temp directories and avoiding deadlocks. In return, you get to write build programs using the full power of the filesystem as your core API.

Finally, build definitions are written using a series of plugins. A plugin is a class that exposes a small interface that reads and writes trees. Plugins compose with each other, meaning we get to write our build programs in the same way that we write normal programs: using function composition, named variables and so on. In this way, you can really think of Broccoli as a programming environment for build pipelines.

In the next video, we'll dive into some code and take a first look at Broccoli's API.

Why Ember CLI uses Broccoli

Transcript

Questions?