Check in your node_modules folder

Sep 27, 2019
13 min read

It is considered a best practice to add your node_modules folder to your .gitignore file when you are building your JavaScript application. That way all your dependencies are not included in the git repository. You are only checking in the package.json and the package-lock.json. Your colleagues and your CI simply run npm ci or npm install to download the actual code from the npm registry. Afterwards they can build the application.

So why even think about adding all those files to your repo? Well both views have their positive and negative aspects and in the end it is up to you and your team how you want to deal with those issues.

Keep the dependencies outside your git repository

This is the common workflow and widely accepted. Keeping the dependencies outside your repository has several advantages:

  1. The git repository stays small and clean

    Our node_modules folder is 458MB large and increases our repository size significantly. It also makes all git commands slightly slower because git has to track more items. An initial git clone is slower since we have to download more files.

    $ du -sh node_modules/
    458M    node_modules/
  2. The registry is our single source of truth

    When you are adding your dependencies to the repo you will create a copy. Now we have the package within our repo and we have the same package in the registry. We will never get any updates for our packages. Even if one of our dependencies has a new patch version, e.g. old version was 3.6.1 and the new version is 3.6.2, we will not automatically pull this version. We have to do it manually.

  3. Using native modules is a piece of cake

    Native modules like node-sass have to be compiled on your machine. npm and Node.js will take care of the heavy lifting when you run npm install node-sass. The workflow will be different and a little more complicated when you compile native modules manually. However, as we will see later on, it is not a dealbreaker.

  4. Code review for package updates is easy

    When you update one of your dependencies you simply increase the version inside your package.json and package-lock.json. Code review for such a pull request is super easy. We usually create a single pull request where we update all of our dependencies in one go.

Now that we have learned why it might be a good idea to keep your dependencies out of version control let us take a look at the pro arguments for adding them to our repository.

Add the dependencies to your git repository

This approach is a bit more controversial. It has some downsides but in our opinion the advantages outweigh the disadvantages by far. Let us take a look at some arguments for adding the dependencies to your git repository.

  1. Building offline

    That might sound crazy but we can build even if the internet connection is broken. It happens to all of us especially in a corporate environment where firewall settings change, new rules are introduced, network infrastructure must follow strict security guidelines, and so on. You can even build your project while working on a train or on an airplane.

  2. Reproducible builds

    Since we have everything we need inside our repository we have 100% reproducible builds. That makes it easy to work across multiple machines, regions and operating systems. We never have the excuse "... but it works on my machine ...". We can go back in time to some earlier release and have the right dependencies for building the exact same bundle from a previous release. When we are done and want to continue on the latest commit from the master branch we do not have to reinstall our dependencies. It cannot happen that we have an old version of the node_modules folder but new application code.

  3. Faster Continuous Integration

    The default script command on Travis CI for JavaScript projects is either npm install or npm ci. It depends on Travis CI being able to find a package-lock.json. In our case installing dependencies took roughly 20 seconds. We can completely skip this step because we have all dependencies already within our repo. At some CI providers you pay for the total amount of time you have used their virtual machines. So by reducing our build times we can save some real money. Imagine pushing ten times a day and each run takes 20 seconds less. That is 200 seconds per day and if we take 20 working days within a month we save 4000 seconds or 67 minutes per month.

  4. Less reliabilities

    We usually build our binaries and bundles inside our Continuous Integration. Those artifacts are copied to some permanent storage or are attached to GitHub releases. Now imagine you have an internal project and you would like release the next version. You push all changes, tag your release and wait for CI to build your bundle. However your CI never finishes and eventually times out. It cannot reach the registry and does not have all dependencies to build your project. That is definitely an issue. No website has an uptime of 100%. We do not know what the future holds. We do not even know if npm is still around in 10 years.

  5. Simpler workflow

    When new developers join the team they usually run the following commands to get started.

    $ git clone git@github.com:company/abc.git
    $ npm ci
    $ npm run dev

    With the new setup we can remove the second step npm ci. We do not have to download any dependencies. They are already on disk after we ran git clone .... It is only one line and npm ci is so much better than the old npm install but still, it is easier to get started.

Even the engineers at Facebook check in all dependencies and on top of that the binaries like node.js and npm/yarn.

What also causes us to stay awake all night is that both GitLab and GitHub now have their own registries. You could end up in a situation where dependency A comes from npm, B from GitLab and C from GitHub. To be able to install all three packages you have to change the registry every single time.

$ npm config set registry https://registry.company.com

We hope we could show you some real benefits of adding node_modules to your git repo. Now that we want to do it, the next question is, how are we actually going to do it? The following recipe worked for us and I think it will be very similar for you.

Recipe for adding node_modules

First of all remove node_modules from your .gitignore file. This is huge and you will suddenly see a lot of untracked files on the command line and within your editor of choice. Your editor might even complain about too many files it has to track. Don't worry about it and ignore the warning. In our case we had to modify our .gitignore a little more. Before it looked like this.

# .gitignore
dist
node_modules
coverage
.vscode
.cache

We removed node_modules and modified the dist line. In our project we have a top level dist folder and we did not want to include it in our repository. Without thinking more about it we simply added dist to our .gitignore. Unfortunately by simply writing dist you tell git to ignore every single dist folder inside your project. Git even ignores the dist folders deep inside the node_modules directory. However you need them to make your build work. By adding a slash and changing the line to /dist you are telling git to only ignore our top level folder. The same applies to bin if that is part of your .gitignore file.

# .gitignore
/dist- node_modules
coverage
.vscode
.cache

If your developers work on various operating systems using the binaries from node_modules/.bin/ will not work. Here Node.js creates symbolic links to their corresponding binaries deep inside the node_modules folder. The executables will be different between Windows, Linux and Mac.

$ ls -la node_modules/.bin/
acorn -> ../acorn/bin/acorn
ansi-html -> ../ansi-html/bin/ansi-html
atob -> ../atob/bin/atob.js
autoprefixer -> ../autoprefixer/bin/autoprefixer
prettier -> ../prettier/bin-prettier.js

So instead of using the short notation within your npm scripts you have to specify the whole path for every single executable. You could even go as far as telling npm not to create the symlinks at all by using the --no-bin-links flag.

$ npm i prettier --no-bin-links

Take a look at our package.json which has a script to run Prettier.

"scripts": {
  "prettier": "prettier --write \"./**/*.{js,jsx,json,md,ts,tsx}\""
},

You cannot use this script across all platforms when you have checked in the node_modules folder. You have to use the full path.

"scripts": {
  "prettier": "node_modules/prettier/bin-prettier.js --write \"./**/*.{js,jsx,json,md,ts,tsx}\""},

Now all of our dependencies should more or less work. What does not work are native modules like node-sass. They are written in C/C++ and are not platform independent. It took us three attempts to get things right.

First attempt

Our first attempt was to download the node-sass binaries from their releases page. We got them for all of our platforms (darwin, win32, linux) and checked them into version control. We created a dependencies folder for them.

$ tree dependencies
dependencies
├── darwin-x64-72_binding.node
├── linux-x64-72_binding.node
└── win32-x64-72_binding.node

0 directories, 3 files

We removed the direct node-sass dependency by running npm uninstall node-sass. This removes the node_modules/node-sass folder and all entries in your package.json and package-lock.json. We would then compile from source on our developer machines and during Continuous Integration using the --sass-binary-path argument.

$ npm install node-sass --sass-binary-path="dependencies/linux-x64-72_binding.node"

The first CI run failed fast.

Error: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/travis/build/HBM/dse/node_modules/node-sass/vendor/linux-x64-72/binding.node)

Our toolchain was too old and we could not build the native node-sass module. So we told Travis CI to use a newer Ubuntu which brings along a better gcc version.

# .travis.yml
dist: bionic

We used a tiny Node.js script to automatically detect the current platform and compile node-sass using the appropriate bindings.

// scripts/install.js
const execSync = require('child_process').execSync

const command = `npm install node-sass --sass-binary-path="dependencies/${process.platform}-x64-72_binding.node"`

const options = { stdio: 'inherit' }
execSync(command, options)
execSync('npm rebuild node-sass', options)

Since we have all dependencies checked into version control and build node-sass from source Travis CI should not install anything during CI. We had to override the default script command.

# .travis.yml
install: node scripts/install.js

That all worked great and we thought we were done. Unfortunately we forgot that node-sass has its own dependencies.

// node_modules/node-sass/package.json
"dependencies": {
  "async-foreach": "^0.1.3",
  "chalk": "^1.1.1",
  "cross-spawn": "^3.0.0",
  "gaze": "^1.0.0",
  "get-stdin": "^4.0.1",
  "glob": "^7.0.3",
  "in-publish": "^2.0.0",
  "lodash": "^4.17.11",
  "meow": "^3.7.0",
  "mkdirp": "^0.5.1",
  "nan": "^2.13.2",
  "node-gyp": "^3.8.0",
  "npmlog": "^4.0.0",
  "request": "^2.88.0",
  "sass-graph": "^2.2.4",
  "stdout-stream": "^1.4.0",
  "true-case-path": "^1.0.2"
},

They all show up when checking the current git status. The additional files were created during the compilation.

$ git status -s
?? node_modules/.bin/in-install
?? node_modules/.bin/in-publish
?? node_modules/.bin/node-gyp
?? node_modules/.bin/node-sass
?? node_modules/.bin/nopt
?? node_modules/.bin/not-in-install
?? node_modules/.bin/not-in-publish
?? node_modules/.bin/sassgraph
?? node_modules/.bin/strip-indent
?? node_modules/amdefine/
?? node_modules/are-we-there-yet/
?? node_modules/async-foreach/
?? node_modules/block-stream/
?? node_modules/camelcase-keys/
?? node_modules/console-control-strings/
?? node_modules/delegates/
?? node_modules/fstream/
?? node_modules/gauge/
?? node_modules/gaze/
?? node_modules/globule/
?? node_modules/has-unicode/
?? node_modules/in-publish/
?? node_modules/is-finite/
?? node_modules/js-base64/
?? node_modules/meow/
?? node_modules/node-gyp/
?? node_modules/node-sass/
?? node_modules/nopt/
?? node_modules/npmlog/
?? node_modules/os-homedir/
?? node_modules/osenv/
?? node_modules/redent/
?? node_modules/repeating/
?? node_modules/sass-graph/
?? node_modules/scss-tokenizer/
?? node_modules/stdout-stream/
?? node_modules/strip-indent/
?? node_modules/tar/
?? node_modules/trim-newlines/
?? node_modules/true-case-path/
?? node_modules/wide-align/

So our first attempt had two major disadvantages. First of all Continuous Integration took even longer than before. We were building node-sass from source every single time we checked something in. That is totally not what we intended. Secondly we were left with a lot of untracked files. As a developer you had to be very careful to select the right files when committing a new feature. You were not able to simply add everything to the stage with the git add . command. So we had to look for another solution.

Second attempt

During our failed first attempt we learned that node-sass is looking for the binaries in the node_modules/node-sass/vendor folder. So we actually installed node-sass using npm install node-sass to get all of its dependencies and additional files. We manually removed the node_modules/node-sass/vendor folder because it only contained the single binary for the platform we ran the npm install command on. We also told git to exclude the vendor folder from our repository.

# .gitignore
/node_modules/node-sass/vendor

We modified our helper script to create this folder and to copy the binaries from our dependencies folder to it.

// scripts/install.js
const fs = require('fs')
const path = require('path')

// create vendor directory if it does not exist
const vendor = 'node_modules/node-sass/vendor'
if (!fs.existsSync(vendor)) {
  fs.mkdirSync(vendor)
}

const platforms = ['darwin', 'linux', 'win32']

platforms.forEach(platform => {
  // create directories if they do not exist
  const directory = `node_modules/node-sass/vendor/${platform}-x64-72`
  if (!fs.existsSync(directory)) {
    fs.mkdirSync(directory)
  }

  // copy files into their vendor directories
  fs.copyFileSync(
    `dependencies/${platform}-x64-72_binding.node`,
    path.join(directory, 'binding.node')
  )
})

That solved our problem from the first attempt and we were not left with a bunch of untracked files. However it was not ideal yet. We still had to run the script in order to copy the binaries into our node_modules folder. It was this single step we initially thought we could get rid of.

Third attempt and final solution

The solution is pretty simply. We manually copied the binaries for all platforms to the node_modules/node-sass/vendor/ folder and checked them all in. We finally had what we were looking for and we could even get rid of the additional install step. Finally you just have to tell Travis CI to skip the install step.

# .travis.yml
install: echo we are already good to go. nothing needs to be done.

Conclusion

In the end the solution was quite easy. We needed some time and multiple attempts to get there. We learned more and more along the way and finally found a solution we like. Adding the node_modules folder to our repository is also new for us. It is an experiment and we will learn from it. It may affect us during our daily business but so far we are quite happy with the final result.

This is not just about JavaScript and npm. Every programming language that has a central registry for dependencies is affected by the above mentioned issues. We have seen the same problems with Python PyPi, Ruby Gem, Java Maven, Rust Cargo and so on. Having everything at one central place let us finally sleep at night.