This blog post serves as a detailed guide on how to build custom Docker images from scratch using a build tool called Kubler to reduce attack surface and increase security.
TL/DR
Why not just use an image from Docker Hub?
If an image already exists and you’re happy with it, that’s great! but… sometimes you need to build an image, and that is what this guide is about.
Firstly, sometimes you won’t find a ready made container for what you want, so you will have to create your own.
Secondly, a lot of images on Docker Hub are an opaque binary blob, they don’t have a link to the source code where you can see how an image was built, or have the option to build it yourself. Perhaps you want make some small changes, upgrade the packages or libraries etc. Without the source, that is harder to do.
Also most premade images are pretty bloaty.
Being closed source and/or bloaty makes it harder to audit the images as well.
Image Rot
A lot of pre-made images are not updated very often, and may become out of date. To upgrade packages and libraries, and rebuild an image without source, is hard. Having a clean and reproducible way to build images, including packages and library versions - will help prevent image rot as new images can easily be built. Also the source can be maintained in source-control.
For example, the most popular nmap image on Docker Hub is uzyexe/nmap, but it is quite stale. The version at the time of writing is 7.12 whos image was created back in July 2016.
Image Bloat
Originally, most Docker images were based off Ubuntu or Debian, meaning that every container had a little Ubuntu or Debian rootfs inside it, and consumed around 188MiB to 125MiB. Ubuntu is quite big considering it would then just launch a single process. They have since switched to Alpine Linux which is much smaller, around 1.7MiB now. You can also base your images off Busybox or from nothing (scratch).
It is still common to see images on Docker Hub etc. based off Ubuntu and Debian, because it’s easy. Alpine Linux is based off musl rather than glibc, and some projects due to glibc-isms won’t compile out-of-the-box with musl. They can either be fixed (patched), or a glibc installed into Alpine. Typically, compilers and build tools are installed, the source downloaded, the tool built, and then if the creator was diligent they then remove the compiler and build tools, package caches, and other build time dependencies to save space. This is an easy but kludgy way. It’s easy to forget to remove something. It’s better if it’s never there to begin with.
An example is the most popular Metasploit image on Docker Hub, strm/metasploit, is based of Debian Jessie, is full stack with PostgreSQL, nmap and a bunch of other tools and weighs in at about 815MiB.
Image Security
Most images you find on Docker Hub have not been hardened. Typically, they run the application as root. This is despite the documentation stating that root in Docker is equivalent to root on the host.
So, if you can find the application you want on Docker Hub, and you can find its source, you can check the Dockerfile out to see if they run as a user or root. This is assuming the binary image was really built using that source Dockerfile, which is hard to verify.
The official Nginx image is one example of an image that runs as root. The Dockerfile isn’t linked on Docker Hub, but it can be found. There’s no USER instruction, so by default it is running the web server as root. This is presumably so that it can listen on port 80 inside the container. However, typically ports are mapped, so this isn’t really necessary.
Quickly Inspecting an Image
To quickly inspect an image, have a look at it’s config, in particular it’s user, entrypoint and cmd settings. You can use this command: docker inspect nginx | jq '{"User": .[].Config.User, "Entrypoint": .[].Config.Entrypoint, "Cmd": .[].Config.Cmd }'
If the User is “”, then the default user ‘root’ is used.
Alternatively, you can just spin it up and have a look, but then you are executing arbitrary code from the Internet as root. Try docker run --rm -it some/image /bin/sh and see if you are root. If the entrypoint is overwritten you may need docker run --rm -it --entrypoint /bin/sh another/image (assuming /bin/sh exists inside the image, and is safe to run).
General Goals when Building an Image
For me, I like my software to be secure and performant. I have the following goals for my docker images.
Minimal images
Separation of Build-time and Run-time dependencies
Consistent Builds
Why minimal? Because it reduces the Attack Surface - you can’t attack what’s not there. Similarly, during attacks you can’t leverage what’s not there, and there is less to audit. Further, you’re optimising scale, images (and therefore containers) take up less space, they boot (and copy) faster, because they take up less resources you can run more per host, so they are more efficient.
Kubler
Kubler, is an open-source tool for building minimal Docker images. Kubler is basically a build tool that gives you the infamous nested docker builds. Kubler builds the image in Docker containers for consistent builds. Under the hood, Kubler’s build host container is based off a Gentoo image and its Portage package system. This allows a lot of control for customising the package installed, and handles package dependencies.
Don’t let the fact that Gentoo is used under the hood scare you, it has a reputation for being hard to use, but with Kubler doing all the heavy lifting you’ll find it very easy to use. However, you can still take advantage of the power of Gentoo’s Portage system.
To install Kubler, clone it’s repository from github: -
First, to make things easier, lets add the kubler command to our PATH.
Now we can look at Kubler’s help
Kubler Namespaces
Kubler has the concept of namespaces, which is what separates one group of images from another. This is similar to the Docker concept of image repositories. By default the namespaces are subdirectories of the dock directory in kubler’s directory. On a fresh install two default namespaces will already exist, dummy and kubler. The kubler namespace already has a bunch of images defined.
Initially, the directory structure of Kubler looks like this: -
Create a Kubler Namespace for all your Images
Creating a namespace is easy. Lets create a new namespace called mynamespace, and answer the questions in the wizard.
Now there is a new mynamespace subdirectory in the dock directory.
Building an nmap image
The Docker images are built from the Kubler image definitions, this is the equivalent of a Dockerfile, but instead of being a file it’s a directory with several files controlling the build process.
Dockerfile.template used as a template to generate a Dockerfile. Allows for variable substitution.
README.md used to document your image, customise to your liking.
build.conf defines the parent image for inheritence and some other variables for Kubler.
build.sh contains some variables and hook (callback) functions used during the Kubler build process.
Create a new image definition
Use kubler to create a new image definition for us, we will base it off kubler/glibc.
`
This is what we have now: -
Now edit the generated build.sh file and set the _packages variable to net-analyzer/nmap.
localhost:/var/home/user/co/git/kubler$ vi dock/mynamespace/images/nmap/build.sh
Let’s make the image run like we are running a command, rather than giving a shell. To do this update the Dockerfile.template and change the ENTRYPOINT. We also want a place to persist the nmap logs, but we don’t need to persist the whole container, just the logs, so lets define a named volume for the logs whilst we are at it.
Updating Portage
As Kubler uses Gentoo/Portage, which is in constant development, you will get HTTP 404 errors like this when trying to access old files that no longer exist: -
To update to the latest files run kubler update.
Build the image
Build help
The kubler tool has several sub-commands, such as build, use the --help argument to get help, like this: -
Building nmap
Because we based the image off glibc, and are building a dynamically linked executable, we need to copy the GCC libraries. Uncomment the copy_gcc_libs line in our build.sh file like so: -
If you forget to do this, you will see errors like the following: -
If you need to rebuild try kubler build mynamespace/nmap -nF to just rebuild the mynamespace/nmap container.
Building: -
Now that the build is complete we get a PACKAGES.md that shows what packages/libraries and their versions it contains. An example is shown below: -
mynamespace/nmap:20170925
Built: Wed Oct 18 00:14:10 GMT 2017
Image Size: 44.6MB
The above can be used to combat image rot, it is easy to see what versions of libraries and packages are being used in the image. This can easily be integrated for automated searching of vulnerable packages/libraries.
Building static nmap binaries
Now lets say for some reason you want a statically compiled nmap, ncat, and nping binaries. Say you want to run it on some target Linux system that doesn’t have Docker Engine and you don’t want to have to deal with incompatible libraries or copying the whole dependency graph of libraries over. You want a statically built nmap that will just run on just about any Linux.
A good guide can be found at ZeroSec’s blog post How to Statically Compile NMAP . However, the result is still dynamically compiled due to glibc. It’s not as portable as it could be, it won’t run out-of-the-box on Linux systems that use a different glibc.
Building static nmap binaries with Kubler
You can leverage Kubler to build a static nmap. Kubler builds Linux rootfs file systems, as part of building the Docker image. The resulting rootfs.tar can be copied to a target system, untarred, to extract the nmap binary, and it’s data files, and NSE scripts.
But first, a trap for young players is that the underlying Gentoo/Portage doesn’t have a static USE flag for nmap. So we’ll have to whip out some Gentoo kungfu and make it do what we want.
We need to modify the nmap ebuild so that it will build a static binary rather than a dynamically linked one. The diff we need is: -
How to manage the custom ebuild? There’s several ways to achieve this but I think the best way is to make a Gentoo Ebuild Repository, aka an “overlay”. This is how to create one from scratch based off the guide in the Handbook: -
We now have created an overlay and saved it in a git bundle in the mounted /config directory which is the dock/mynamespace/images/nmap directory on the host. You can now clone it somewhere, e.g.: -
Ideally you’d host your Gentoo Overlay git repository somewhere, like on GitHub. Or, you can use my ready made one at Berney’s Overlay.
Create a new image for the static nmap build, this time we are going to use musl instead of glibc.
Edit dock/mynamespace/images/nmap-musl-static/build.sh so that it’s like this: -
Now build your the mynamespace/nmap-musl-static image, with kubler build mynamespace/nmap-musl-static. You can rebuild just the mynamespace/nmap-musl-static image without rebuilding all the dependencies by using the -nF arguments to kubler build, as shown below: -
Note that when rebuilding, packages that have been emerged previously will now be installed as binary packages. This speeds up rebuilds. New packages, or packages with different USE flags, if they haven’t been emerged previously will be built from source, but once built from source the binary packages are availabe for quick installation. This brings the flexibility of source-compiling and the efficiency of binary packages. When used, another build speed-up comes with the inheritance of Kubler images, the predecesor images only need to be built once, then leaf images can be built, and rebuilt, without rebuilding the predecessors.
Run the new mynamespace/nmap-musl-static image to test that it works; we can check the version and compilation options with nmap’s --version argument: -
To extract the files, just untar the rootfs.tar file. e.g.; -
And the binaries are static: -
Accepting Gentoo Unstable Keywords
Gentoo uses nmap-7.40 as the latest stable version on AM64, but the last stable (upstream) is 7.60. If you want the latest stable nmap, we need to accept the AMD64 testing keyword. In the build.sh file add update_keywords '=net-analyzer/nmap-7.60' '+~amd64' after the the add_overlay line and before the emerge line.
If you want to use bleeding edge (git) version of nmap, add update_keywords '=net-analyzer/nmap-9999' '+**'.
Feature/Size Comparisons
It can be hard to make a fair comparison between pre-built nmap images and Kubler built ones due to the version drift and feature set. The following table shows variants of Kubler built nmap images and some alternatives.
Binary size is the size of the nmap binary and (if applicable) all dependent shared objects.
Data size is the size of /usr/share/nmap, which includes OS Fingerprints, MAC Address and service lookup tables, and NSE scripts, and so on.
All sizes are MiB.
Conclusion
Kubler is a flexible and powerfull tool to provide a clean build environment and enable repeatable builds of Docker images. In the in depth example above, we have used Kubler to create an image that is up-to-date, fully featured, and statically compiled. It is easy to select the desired feature set, enabling or disabling the desired features, such as in order to build tools for resource-constrained environments such as embedded devices.
Using Kubler we were able to build the smallest nmap static binary, which compared to the alternatives is the latest stable version and built with a hardened tool-chain. We can also produce a Docker image of the latest stable version with same feature-set as uzyexe/nmap, however it’s lighter despite being a newer version and a hardened tool-chain.
This post provides an overview of Talkback Chronicles for viewing snapshots of trending infosec resources for points in time, and also how to subscribe to a new weekly Newsletter feature.