Automating Renewal of Let's Encrypt Certificates with GitLab CI

How to set up automatic renewal of your free TLS certificate from Let’s Encrypt for your GitLab Pages-hosted site.

As I had promised in my earlier post on manually setting up HTTPS for GitLab Pages, I have since automated the renewal of my blog’s TLS certificate. This post details my new automated setup.

You are going to learn how to set up a scheduled GitLab CI build that automatically generates a new Let’s Encrypt certificate at regular intervals. In particular, it invokes certbot with two custom scripts, which automatically pass the HTTP-01 authentication challenge by updating the Jekyll site with the necessary challenge token. Finally, it updates the GitLab Pages configuration with the newly created certificate.

The setup I describe here is based mainly on Bas Harenslak’s setup (and therefore quite similar to that). As a notable difference, Harenslak uses Hugo as a static blogging engine, whereas I am using Jekyll — this only affects small parts of the whole process, and I will point out where it does.

While researching how to automate the renewal of my Let’s Encrypt certificate in conjunction with GitLab Pages, I also came across these articles by Andy Mayhew on Autonomic Guru and by Michael Ummels on Lost and Found. Mayhew’s article provides ready-made scripts that can be used with certbot to automatically go through the HTTP-01 challenge, but it does not include steps to schedule the certificate renewal with GitLab CI. Ummels’ article does mention scheduled GitLab CI pipelines, but uses the DNS-01 challenge type in conjunction with a custom Docker image. As Ummels notes, DNS-01 challenges only really work if your DNS provider lets you modify your DNS entries via an API — which mine doesn’t.

What You Need Before Starting

For this post, I’m going to assume you are already sufficiently familiar with Let’s Encrypt and certbot, as well as GitLab Pages and GitLab CI. If you are not, I recommend reading up first — my aforementioned previous blog post could be a good start for this.

Note that you do not need to have implemented the previous post’s instructions; this post contains everything you need to do.

Furthermore, you should already have a working GitLab Pages site with a custom domain up and running. Ideally, you already have a working certificate from Let’s Encrypt in place as well. If you don’t use Jekyll as your blogging engine (or your Jekyll config significantly differs from mine), you’re going to have to adjust some things to fit your needs.

What We Are Doing

Let me elaborate a bit more on what our goal is here. Step-by-step instructions to get there follow in the next section.

The entire process is kicked off by a scheduled build pipeline in GitLab CI. Our static GitLab Pages site is already being published by a GitLab CI build, which is triggered every time content is pushed to the Git repo. So we need to configure a separate job of the build, which we attach exclusively to the schedule pipeline.

As with all GitLab CI builds, this all runs in a Docker container, so there is some preparation needed to install some dependencies and set up few things.

We use certbot to get our Let’s Encrypt certificate issued, which is the same tool that we previously used when doing this manually. In contrast, however, this time we are going to provide it with two hook scripts.

certbot starts the issuing process and generates a challenge, which we need to pass in order to prove we are actually in control of the concerned domain. It then invokes the first hook (authentication), passing it the challenge token and the challenge validation string.

Our authentication hook script writes the token and validation string to file from which Jekyll will create the challenge page. The script commits this file and pushes it to the repository, thereby triggering a rebuild of the Jekyll site. Then it waits until the challenge page is available at the expected URL, before yielding control back to certbot.

With the challenge preparation done, certbot fetches the challenge page and lets us pass the challenge. It invokes the second hook script, which allows us to clean up the challenge page by removing the file we previously created and pushing to the repo again. This triggers another rebuild, of course, but this time we don’t need to wait for that.

certbot writes out our newly generated certificate to its config folder in the Docker container, so we need to do something with that before the build finishes. Using the GitLab API, we update the HTTPS certificate of our GitLab Pages site by uploading the new private key and certificate files.

And that’s it! When this build is through, the certificate should be successfully updated. If you want to check this in your browser, be sure to do so in a clean state,1 since the browser is likely to cache the previous certificate for a while.

Let’s Take It From the Top

Alright, now that we know what’s supposed to happen when the renewal process actually runs, let’s actually set it up.

Create Shell Scripts

First of all, let’s create the three shell scripts that make up the process. That’s the two hook scripts for certbot, plus the main script that is run in the CI build.

I’m putting all of these in the scripts/ folder in the site repo. Of course, you can put them wherever it suits your needs (and rename them), just remember to adjust all references accordingly.

Note: If you haven’t already done so, I recommend excluding these script files (or better yet, the entire scripts/ folder) from the Jekyll build by adding them to the exclude list in your _config.yml.

scripts/letsencrypt_generate.sh:

#!/usr/bin/env bash

# Exit if any subcommand fails
set -e

certbot certonly -n --agree-tos --preferred-challenges=http \
    --manual --manual-public-ip-logging-ok \
    --manual-auth-hook ${CI_PROJECT_DIR}/scripts/letsencrypt_authenticator.sh \
    --manual-cleanup-hook ${CI_PROJECT_DIR}/scripts/letsencrypt_cleanup.sh \
    -m ${GITLAB_USER_EMAIL} --work-dir=work-dir/ --logs-dir=logs/ --config-dir=config/ \
    -d ${DOMAIN} \
    --staging

curl --request PUT --header "PRIVATE-TOKEN: ${GITLAB_TOKEN}" \
    --form "certificate=@config/live/${DOMAIN}/fullchain.pem" \
    --form "key=@config/live/${DOMAIN}/privkey.pem" \
    https://gitlab.com/api/v4/projects/${CI_PROJECT_ID}/pages/domains/${DOMAIN}

This script only does two things: call certbot and then upload the new certificate to GitLab using the API. Here (and in the other two scripts) we use set -e to ensure the script exits immediately if any of the commands in it fail.

Note that this script is intended to be run from a working folder created specifically for this purpose, hence the relative references to certbot folders.

You can find more info on the CLI switches of the two commands in their respective documentations: certbot CLI switches and curl man page.

As in the previous post, the --staging switch is present again in the certbot invocation, which you should use while testing your setup. Almost the entire process will work with a staging certificate, only the final upload to the GitLab API will fail. Once you’re satisfied that all pieces are in place, you can remove the --staging switch.

The variables CI_PROJECT_DIR, GITLAB_USER_EMAIL, and CI_PROJECT_ID are predefined variables of GitLab CI, so you don’t need to do anything about these. The other two variables, GITLAB_TOKEN and DOMAIN, need to defined by us; we’ll get to that later.

scripts/letsencrypt_authenticator.sh:

#!/usr/bin/env bash

# Exit if any subcommand fails
set -e
echo "LE authenticator script started"

cd ${CI_PROJECT_DIR}

# write out the challenge file content
cat > ${CHALLENGE_FILE_NAME} <<EOM
---
layout: null
sitemap: false
permalink: /.well-known/acme-challenge/${CERTBOT_TOKEN}/index.html
---

${CERTBOT_VALIDATION}
EOM

# add it to git and push
git config user.name "GitLab CI runner"
git config user.email ""
git add ${CHALLENGE_FILE_NAME}
git commit -m "[GitLab CI runner] Add certbot challenge file for certificate renewal"
git push ${PUSH_URL} HEAD:${CI_COMMIT_REF_NAME}

interval_sec=15
max_tries=80 # 20 minutes
n_tries=0
while [ $n_tries -le $max_tries ]
do
  status_code=$(curl -L --write-out "%{http_code}\n" --silent --output /dev/null https://${DOMAIN}/.well-known/acme-challenge/${CERTBOT_TOKEN})
  if [[ $status_code -eq 200 ]]; then
    echo "LE authenticator script finished successfully"
    exit 0
  fi

  n_tries=$((n_tries+1))
  sleep $interval_sec
done

echo "LE authenticator script aborting due to timeout"
exit 1

This script writes out the challenge file, commits and pushes it, then waits (with timeout) for the challenge page to become available.

Since this script is invoked by certbot, which is run in another working folder (by our previous script), we need to cd back into the repo checkout before adding the file there.

As I explained in my earlier blog post, we are putting the challenge file in an arbitrary place (Jekyll just needs to see it as content), and use the permalink directive in the frontmatter to determine its final URL. Since the template is a multi-line string, we specify it using heredoc. The two variables we use to fill out the content (CERTBOT_TOKEN and CERTBOT_VALIDATION) are automatically provided by certbot for us.

This point — putting the right file in the right place — is what you need to adjust if you’re not using Jekyll (plus the deletion of the file, see the cleanup script below), or if you are using a significantly different Jekyll config. All that really needs to happen is for the challenge page to go up at the expected URL, so this should be fairly straightforward for all static (and statically generated) sites.

Before committing the file to the Git repo, we locally set a committer identity. This isn’t strictly necessary; Git will complain during committing if you don’t, but AFAIK it still goes through with the commit. I just prefer having a consistent committer name there (and silencing Git’s warning).

Then we use curl to fetch the URL where the challenge page is supposed to appear. We repeat this every 15 seconds until we see an HTTP status of 200, then exit the script with status code 0 (success).

If the challenge page isn’t reachable within 80 retries, we assume something has gone wrong and exit with a non-zero status code, indicating to certbot to abort the procedure.

As before, we use some pre-defined environment variables, namely CI_PROJECT_DIR and CI_COMMIT_REF_NAME, and some custom variables, CHALLENGE_FILE_NAME, PUSH_URL, and DOMAIN.

scripts/letsencrypt_cleanup.sh:

#!/usr/bin/env bash

# Exit if any subcommand fails
set -e
echo "LE cleanup script started"

cd ${CI_PROJECT_DIR}

if [ -f ${CHALLENGE_FILE_NAME} ]; then
    git rm ${CHALLENGE_FILE_NAME}
    git commit -m "[GitLab CI runner] Remove certbot challenge file"
    git push ${PUSH_URL} HEAD:${CI_COMMIT_REF_NAME}
fi

echo "LE cleanup script done"

This cleanup script is comparatively simple. It just removes the challenge file if it exists, and commits and pushes this change to the Git repo.

The conditional is in there because certbot runs the cleanup hook even when the authenticator hook fails. If our auth script fails before creating the challenge file, then there is nothing for us to cleanup.

There are no new environment variables used in this script.

Update GitLab CI Config

Until now, you should have had a build job in your .gitlab-ci.yml that looks vaguely like this:

pages:
  stage: deploy
  script:
  - bundle exec jekyll build -d public
  artifacts:
    paths:
    - public
  only:
    - master

First, let’s set this (and all other existing build jobs) not to run during scheduled pipelines using the except key. Modify your existing job(s) like so:

pages:
  stage: deploy
  script:
  - bundle exec jekyll build -d public
  artifacts:
    paths:
    - public
  only:
    - master
  except:       # <- add this
    - schedules # <-

If you want to have multiple different scheduled builds running different jobs, please do consult the GitLab CI docs about more advanced variations of only and except.

Next, let’s create a new build job for our certificate renewal pipeline. Add the following to your .gitlab-ci.yml:

update_cert:
  stage: update_cert
  image: python:3-alpine3.8
  variables:
    CHALLENGE_FILE_NAME: lets-encrypt-challenge.html
    DOMAIN: <your domain>  # in my case, blog.lehnerpat.com
    PUSH_URL: https://token:${GITLAB_TOKEN}@gitlab.com/<username>/<repo>.git
  before_script:
    - apk add --no-cache curl certbot git bash
    - chmod +x $CI_PROJECT_DIR/scripts/letsencrypt_authenticator.sh $CI_PROJECT_DIR/scripts/letsencrypt_cleanup.sh $CI_PROJECT_DIR/scripts/letsencrypt_generate.sh
    - cd ~
    - mkdir certbot
    - cd certbot
    - mkdir work-dir
    - mkdir logs
    - mkdir config
  script:
    - $CI_PROJECT_DIR/scripts/letsencrypt_generate.sh
  only:
    - schedules

First of all, we specify that this job should run in the update_cert stage. Since that is not one of the default stages for GitLab CI, remember to add it to your top-level stages definition in .gitlab-ci.yml.

Since Python is a prerequesite of certbot, we’re using an Alpine + Python3 variant of the official Python Docker image. Due to Alpine’s barebones nature, we install some dependencies as the first step of the before_script phase.

We define most of our custom variables here (remember to fill in the placeholders <your domain>, <username>, and <repo>). The only one missing now, GITLAB_TOKEN, will follow shortly.

Besides installing software dependencies, we also set the executable bit on our three scripts, and create the working folder I mentioned earlier. You can set the executable bit for the scripts before committing them to the repo and skip this step, if you want. (Though that’s annoying to do on Windows, so the way I demonstrate here is an alternative.)

Finally, we invoke our main script with the script key, and mark this job to run only in scheduled pipelines with the only key.

Create a Personal Access Token

To let our scripts push to the Git repo and modify the GitLab Pages certificates via the API, we need to grant it the appropriate access. From what I have found out so far, the only way to do this right now is using a Personal Access Token, so let’s create one.

Go to your GitLab account settings, and then to the Access Tokens section.

Screenshot of GitLab Personal Access Token settings page

Give the token a name, and optionally set an expiry date. Select the “api” scope, then click the confirmation button.

You are presented with your new token, along with a hint to save it right away. You should indeed do this, since you won’t be able to access it again.

Screenshot of success message after creating a Personal Access Token

You won’t need to use this token repeatedly, so it’s fine to just copy it to a temporary location. Even if you do lose it, you can just delete the token on this page and generate a new one.

Create a Scheduled Pipeline in GitLab CI

The final piece of this puzzle is setting up the actual scheduled build plan. For this, go to your Pages repo on GitLab, and navigate to the CI / CD -> Schedules section, and create a new schedule.

Screenshot of form for creating a new GitLab CI scheduled pipeline

Give the schedule a name and set its interval pattern. “Every month” should work fine, though that is more frequent than really necessary. Alternatively, you can set a custom interval using cron syntax.

If master is not your main branch that you want to run this on, adjust the “Target Branch” appropriately. Remember that this process creates two commits (one to create the challenge file, and one to delete it) and deploys them, so this should be the be the branch that you usually deploy from, too.

Finally, define the GITLAB_TOKEN variable, and use the Personal Access Token you generated earlier as the value. Then save the pipeline.

Testing and Final Touches

If you have not done so by now, push your changes (scripts, updated CI config, etc.) to GitLab.

Now that everything is in place, you can do the first test by manually running the scheduled pipeline.

Screenshot of GitLab CI scheduled pipelines list

The build is going to take a few minutes, mainly because it triggers a Jekyll rebuild and waits for that to complete. It should complete successfully, but the log should contain the following output at the end: {“message”:{“certificate”:[“misses intermediates”]}}. That’s okay for now — GitLab rejects the certificate because we generated it from the Let’s Encrypt staging authority, and it therefore does not contain a publicly valid authority chain.

If the build fails or you see anything else that’s odd in the build output, investigate now and try to work out any kinks. You can run certificate requests against the staging authority quite frequently, whereas the production authority has rather strict rate limits.

Once you’re satisfied that the setup looks solid, edit scripts/letsencrypt_generate.sh and remove the --staging switch from the certbot invocation. Commit and push this to GitLab, then run the pipeline manually again.

When this build is done, the output log should no longer contain the aforementioned error message about missing intermediates. Instead, there should be a large block of output containing some info about the new certificate. You can now go to confirm the new certificate is in use. If you do this in a browser, remember that most browsers cache certificates for a while. So it’s probably easiest to use a different browser with which you haven’t recently opened your pages site.

Caveats

While this setup works quite neatly so far, there are two things I would like to point out. They are likely not big problems, but more like… suboptimal, especially in regards to security.

Personal Access Tokens cannot be scoped to a single repository: As you may have seen when you created the Personal Access Token, the “api” scope is the only one that grants any kind of write-access to the repo. However, it is actually scoped to your entire account, i.e. all repositories and all user data and activity. This is not at all necessary for the task we’re trying to accomplish; a token with write-access to a single repo would suffice entirely.

Unfortunately, I have not been able to find another way to grant the build job the write-access that it needs, so this will need to do for now. Just remember that if you ever feel like your Personal Access Token has been compromised, revoke it immediately.

certbot config folder is not persisted: You may have noticed that we are not using certbot’s renewal functionality, but instead requesting new certificates each time. This seems somewhat inelegant. As far as I know, there is no good way to keep this data between runs of the scheduled pipeline.

Since the entire build is running in a Docker container, using Docker volumes seems to be one possibility. Unfortunately, GitLab CI doesn’t support specifying Docker volumes yet. (There are, however, some open issues about this feature: gitlab-runner#3207, gitlab-ce#21892, gitlab-runner#1525.)

This lack of previous config data is not a problem right now, because Let’s Encrypt does not need your previously generated “account information” to issue a certificate for the same domain. If this ever changes on Let’s Encrypt’s side, the approach presented here may no longer work. But there’s no use in guessing, and we’ll have to tackle that when it comes up.


Footnotes

  1. The surest way to get a clean state is to use a clean separate profile in your browser (or a different browser entirely) in which you have not previously visited your page. Cleaning the cache in your browser, or using an incognito/private-browsing window might also work, but your mileage may vary.

Content © Copyright 2021. Patrick Lehner. All Rights Reserved.
Design based on Chalk by Nielsen Ramon.