November 29, 2019

How I Deploy This Thing

When I initially set out to build this website, I had planned to spend most of my time building the actual application. In the end, I spent a majority of my time figuring out how to create a sustainable deployment pipeline. After some digging, I developed a deployment strategy that worked well for my use case—allowing me to get my hands dirty.

From a high level, the deployment strategy I chose pushes to a remote bare repository that runs a deployment script when it receives incoming changes. Outlined here are the components and techniques I used to stitch it all together.

Droplet

Choosing a VPS provider was straight-forward. I wanted to keep my budget to a minimum. The web app itself would be no larger than 500MB, so it was really just a matter of choosing a reasonable option. I decided to go with DigitalOcean because I really liked their branding, experience, and documentation. Even though there were slightly cheaper options, brand trust and good docs matter more to me.

I deployed my web app on a Droplet. Every part of the setup was well documented on DigitalOcean’s website:

The key server setup steps were to configure a web root and NGINX:

Web Root

My web server points to this directory when serving my personal website. At the end of the deployment pipeline, this will be where my production build resides.

$ mkdir /var/www/shawncruz.com

NGINX Configuration

This step was copy-paste from this article. For brevity, this is a sample of my configuration file:

server {
   root /var/www/shawncruz.com/build;
   server_name shawncruz.com www.shawncruz.com;
   index index.html index.htm;
   location / {
   }
}

The root points to the build folder that npm creates.

Bare Repository

I created a bare repository on my Droplet as a means of receiving and reacting to changes.

A bare repository acts as a centralized repository, separate from the main branch, where developers can collaboratively push changes. This is distinct from a typical remote branch, because by default a bare repository doesn’t have a working tree, and therefore cannot function as a place where changes can be checked in and committed. It is also important to not pull changes from within a bare repository, because a bare repository conventionally indicates that it should only receive changes.

It’s worth noting that a non-bare repository can also receive commits, similar to a bare repository, but it generally also has a workspace. Repositories with workspaces that also receive remote commits can block incoming changes or cause unwanted merge conflicts if the workspace is dirty or has diverged from the last time changes were pushed.

In general though, I see no reason to pull changes from a bare repository. This would require logging into a machine, pulling the correct changes, then rebuilding the application. I’ll explain a much simpler method that I use to automate these steps.

Create a Bare Repository

I created my bare repository as such:

# I created a directory where the bare repository would be located, then navigated to that directory
$ mkdir shawncruz.com-bare && cd "$_"
# Initialized a new bare repository
$ git init --bare

The bare repository only acts as a means of tracking the git commit history. My changes are actually checked out in a separate folder, from where they’re served. The benefit of doing this is cleanliness, security, and a matter of separating any git-related data from the actual project files my application is built from.

Post Receive Hook

In order for my bare repository to receive, rebuild, and deploy those changes to my web root I use a post-receive hook. A post-receive hook is a git hook that triggers downstream processes after incoming changes have been received by a repository.

Creating a post-receive Hook

Since the post-receive hook is not part of the standard git hooks, I had to create one within my bare repository.

# From within the bare repo
$ cd hooks && touch post-receive
# Makes it executable
$ chmod +x post-receive

Inside the post-receive hook file, I added the following:

#!/bin/bash

GIT_DIR=~/shawncruz.com-bare
WORKING_TREE=/var/www/shawncruz.com

# MIT © Sindre Sorhus - sindresorhus.com. GIST: https://gist.github.com/sindresorhus/7996717
changed_files="$(git diff-tree -r --name-only --no-commit-id HEAD^ HEAD)"
check_run() {
    echo "$changed_files" | grep -x -q "$1" && eval "$2"
}

# Inspired by http://mattfairbrass.com/2015/08/25/push-to-deploy-to-production-with-git/
while read oldrev newrev ref
do
    if [[ $ref =~ refs/heads/master ]];
    then
        echo "Master ref received. Deploying master branch to production…"
        git --work-tree=WORKING_TREE --git-dir=GIT_DIR checkout -f
        cd WORKING_TREE;
        echo $(eval check_run package.json "npm install";)
        echo $(eval npm run build;)
    else
        echo "Ref $ref successfully received. Doing nothing: only the master branch may be deployed on this server."
    fi
done

This script will run after changes have successfully been received by the repository. While the post-receive hook doesn’t take any arguments, it receives info from standard input and can be read as such:

while read oldrev newrev ref
do
# ...
done

Breaking down each item read from standard input:

oldrev: References the previous commit’s SHA-1
newrev: References the incoming commit’s SHA-1
ref : Name of the ref that’s being updated.

Since the post-receive hook runs after all refs have been updated, it’s important to read them in a while loop so that all of them are processed. For the purposes of my use case, I focused solely on a single ref so that I can build from a single commit history. To accomplish this, I wrapped the logic in an if-statement to ignore unwanted refs:

# Checks if the ref that was read from standard input is the master branch, which is where our upstream changes will be pushed.
if [[ $ref =~ refs/heads/master ]];

Next, it echoes some feedback to the client so that I know things are working as expected:

echo "Master ref received. Deploying master branch to production…"

Then, it checks out the new changes from my bare repository into my WORKING_TREE located in my web root:

# This command checks out the changes from my bare repository (GIT_DIR) into my web root (WORKING_TREE), ovewriting any pre-existing files:
$ git --work-tree=WORKING_TREE --git-dir=GIT_DIR checkout -f

Next, it rebuilds the app after navigating to it’s location in the filesystem:

cd WORKING_TREE;
echo $(eval check_run package.json "npm install";)
echo $(eval npm run build;)

It runs the check_run function first to check if package.json has been updated. This check prevents re-installing versions of dependencies that have already been installed. In reality this naively checks if the last commit contained changes to package.json—not actually checking if any dependencies were modified.

Examining the check_run function a bit closer:

changed_files="$(git diff-tree -r --name-only --no-commit-id HEAD^ HEAD)"

Each argument is important:

-r: Examines subtrees in the top-level repo for changes
--name-only: Shows name of files that were changed between the two commits
--no-commit-id: Suppresses commit ID output
HEAD: Refers to the commit that’s currently checked out (see: cat .git/HEAD). In the context of this script, this value will always be refs/heads/master.
HEAD^: Refers to the commit immediately before HEAD

Putting it all together, changed_files will output the names of the files that were added, edited, or deleted in the last commit.

In the larger scheme, changed_files is used in check_run:

check_run() {
    echo "$changed_files" | grep -x -q "$1" && eval "$2"
}

check_run pipes the output from changed_files into the next portion of the script:

grep -x -q "$1" && eval "$2"

The left-hand side of double ampersands will grep the list of changed files for the first argument passed to the function (e.g. package.json).

grep -x -q "$1"

-x: makes sure that the first argument is an exact match so that it doesn’t match on a substring of the file it is searching for (e.g. grepping for test.txt will not match on another-test.txt; only test.txt)
-q: suppresses output from any matches

grep will return a zero exit code if a match is found, otherwise it will return a non-zero exit code. Since a zero exit code is mapped to true, the double ampersand in between the two statements acts as a guard from evaluating the second portion of the script if the string is not found. If a match is found we evaluate the right-hand side of the double ampersand:

eval "$2"

This executes the second argument that gets sent to this function, which is presumably some type of function call (e.g. npm install). Examining how this function is utilized in the post-receive hook:

# Check if any changes have been made to `package.json`, and if so run `npm install`
echo $(eval check_run package.json "npm install";)
# Runs the `build` script specified in my `package.json`
echo $(eval npm run build;)

Each command’s output is echoed to the client so that progress, success, or failure can be tracked. The last part of the script is the else block which catches and relays any updated refs that we are not interested in and echoes that to the client.

else
        echo "Ref $ref successfully received. Doing nothing: only the master branch may be deployed on this server."

That sums up the post-receive hook script.

Local Machine

I need to be able to push changes from my local machine to the server. This required minimal setup.

Add Remote to Bare Repository

On my local machine, I added a new remote pointing to the bare repository I created. I named mine production to distinguish it from the standard remote origin.

# This remote can be called anything other than "origin"
$ git remote add production <server-url>:shawncruz.com-bare
# Checked to make sure that the new remote was present
$ git remote -v
# This indicates that I can fetch changes from and push changes to that repository.
production <server-url>:shawncruz.com-bare (fetch)
production <server-url>:shawncruz.com-bare (push)

Deploying

In order to verify that this deployment pipeline works from my local machine, I can push a change to the remote I created. The messages echoed back indicat the step by step procedure that occurred while running the post-recieve hook. The feedback will indicate whether the deployment was successful or not. On a successful run, the output looks like:

$ git commit -am "Testing deployment pipeline w/ a small UI change"
$ git push production
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (2/2), 200 bytes | 200.00 KiB/s, done.
Total 3 (delta 2)
remote: Master ref received. Deploying master branch to production…
remote: # Depending on whether or not my package.json file was updated, the next section will either be blank or filled with npm install messages
remote: # Messages from `build` script
To <server-url>:shawncruz.com-bare
   <prev-SHA-1>..<curr-SHA-1>  master -> master

Upon success, reloading shawncruz.com will include my most recent changes.

Improvements

Currently, I have to manually push changes to two separate remotes: origin and production. The problem with this setup is that both remotes can diverge. A potential solution is to utilize a git hook that runs git push production post-push. This will couple changes being deployed to both remotes.

Table of Contents