CMS Data Analysis School Pre-Exercises - Seventh Set
Overview
Teaching: 0 min
Exercises: 60 minQuestions
What is an image? How about a container?
What is Docker/Singularity?
Why is containerization useful?
Ummmm…how is this different from a virtual machine?
Objectives
Gain a basic understanding of how to run and manage a container.
Understand the absolute basic commands for Docker.
Know how to start a Singularity container.
Introduction
Warning
As a prerequisite for this exercise, please make sure that you have correctly followed the setup instructions for installing Docker and obtaining a DockerHub account.
Objective
Please post your answers to the questions in the Google form seventh set.
Limitation
This exercise seeks to introduce the student to the benefits of containerization and a handful of container services. We cannot cover all topics related to containerization in this short exercise. In particular, we do not seek to explain what is happening under the hood or how to develop your own images. There are other great tutorials covering a variety of containerization topics as they relate to LHC experiments:
- Docker/Singularity HATS@LPC
- Introduction to Docker
- Software containers for CMSSW
- Official Docker documentation and tutorial
There are undoubtedly also other, non-LHC oriented tutorials online.
Containers and Images
Containers are like lightweight virtual machines. They behave as if they were their own complete OS, but actually only contain the components necessary to operate. Instead, containers share the host machine’s system kernel, significantly reducing their size. In essence, they run a second OS natively on the host machine with just a thin additional layer, which means they can be faster than traditional virtual machines. These container only take up as much memory as necessary, which allows many of them to be run simultaneously and they can be spun up quite rapidly.
Images are read-only templates that contain a set of instructions for creating a container. Different container orchestration programs have different formats for these images. Often a single image is made of several files (layers) which contain all of the dependencies and application code necessary to create and configure the container environment. In other words, Docker containers are the runtime instances of images — they are images with a state.
This allows us to package up an application with just the dependencies we need (OS and libraries) and then deploy that image as a single package. This allows us to:
- replicate our environment/workflow on other host machines
- run a program on a host OS other than the one for which is was designed (not 100% foolproof)
- sandbox our applications in a secure environment (still important to take proper safety measures)
Container Runtimes
For the purposes of this tutorial we will only be considering Docker and Singularity for container runtimes. That said, these are really powerful tools which are so much more than just container runtimes. We encourage you to take the time to explore the Docker and Singularity documentation.
Side Note
As a side note, Docker has very similar syntax to Git and Linux, so if you are familiar with the command line tools for them then most of Docker should seem somewhat natural (though you should still read the docs!).
Exercise 20 - Pulling Docker Images
Much like GitHub allows for web hosting and searching for code, the image registries allow the same for Docker/Singularity images. Without going into too much detail, there are several public and private registries available. For Docker, however, the defacto default registry is Docker Hub. Singularity, on the other hand, does not have a defacto default registry.
To begin with we’re going to pull down the Docker image we’re going to be working in for this part of the tutorial (Note: If you already did the docker pull
, this image will already be on your machine. In this case, Docker should notice it’s there and not attempt to re-pull it, unless the image has changed in the meantime.):
docker pull sl
#if you run into a premission error, use "sudo docker run ..." as a quick fix
# to fix this for the future, see https://docs.docker.com/install/linux/linux-postinstall/
# if you have a M1 chip Mac, you may want to do "docker pull sl --platform amd64"
Using default tag: latest
latest: Pulling from library/sl
175b929ba158: Pull complete
Digest: sha256:d38e6664757e138c43f1c144df20fb93538b75111f922fce57930797114b7728
Status: Downloaded newer image for sl:latest
docker.io/library/sl:latest
The image names are composed of NAME[:TAG|@DIGEST]
, where the NAME
is composed of REGISTRY-URL/NAMESPACE/IMAGE
and is often referred to as a repository. Here are some things to know about specifying the image:
- Some repositories will include a
USERNAME
as part of the image name (i.e.fnallpc/fnallpc-docker
), and others, usually Docker verified content, will include only a single name (i.e.sl
). - A registry path (
REGISTRY-URL/NAMESPACE
) is similar to a URL, but does not contain a protocol specifier (https://). Docker uses the https:// protocol to communicate with a registry, unless the registry is allowed to be accessed over an insecure connection. Registry credentials are managed by docker login. If no registry path is given, the docker daemon assumes you meant to pull from Docker Hub and automatically appendsdocker.io/library
to the beginning of the image name. - If no tag is provided, Docker Engine uses the
:latest
tag as a default. - The SHA256
DIGEST
is much like a Git hash, where it allows you to pull a specific version of an image. - CERN GitLab’s repository path is
gitlab-registry.cern.ch/<username>/<repository>/<image_name>[:<tag>|@<digest>]
.
Now, let’s list the images that we have available to us locally
docker images
If you have many images and want to get information on a particular one you can apply a filter, such as the repository name
docker images sl
REPOSITORY TAG IMAGE ID CREATED SIZE
sl latest 5237b847a4d0 2 weeks ago 186MB
or more explicitly
docker images --filter=reference="sl"
REPOSITORY TAG IMAGE ID CREATED SIZE
sl latest 5237b847a4d0 2 weeks ago 186MB
You can see here that there is the TAG
field associated with the
sl
image.
Tags are way of further specifying different versions of the same image.
As an example, let’s pull the 7
release tag of the
sl image (again, if it was already pulled during setup, docker won’t attempt to re-pull it unless it’s changed since last pulled).
docker pull sl:7
docker images sl
7: Pulling from library/sl
Digest: sha256:d38e6664757e138c43f1c144df20fb93538b75111f922fce57930797114b7728
Status: Downloaded newer image for sl:7
docker.io/library/sl:7
REPOSITORY TAG IMAGE ID CREATED SIZE
sl 7 5237b847a4d0 2 weeks ago 186MB
sl latest 5237b847a4d0 2 weeks ago 186MB
Question 20.1
Pull down the
python:3.7-slim
image and then list all of thepython
images along with thesl:7
image. What is the ‘Image ID’ of thepython:3.7-slim
image? Try to do this without looking at the solution.Solution
docker pull python:3.7-slim docker images --filter=reference="sl" --filter=reference="python"
3.7-slim: Pulling from library/python 7d63c13d9b9b: Pull complete 7c9d54bd144b: Pull complete a7f085de2052: Pull complete 9027970cef28: Pull complete 97a32a5a9483: Pull complete Digest: sha256:1189006488425ef977c9257935a38766ac6090159aa55b08b62287c44f848330 Status: Downloaded newer image for python:3.7-slim docker.io/library/python:3.7-slim REPOSITORY TAG IMAGE ID CREATED SIZE python 3.7-slim 375e181c2688 13 days ago 120MB sl 7 5237b847a4d0 2 weeks ago 186MB sl latest 5237b847a4d0 2 weeks ago 186MB
Exercise 21 - Running Docker Images
To use a Docker image as a particular instance on a host machine you run it as a container. You can run in either a detached or foreground (interactive) mode.
Run the image we pulled as a container with an interactive bash terminal:
docker run -it sl:7 /bin/bash
The -i
option here enables the interactive session, the -t
option gives access to a terminal and the /bin/bash
command makes the container start up in a bash session.
You are now inside the container in an interactive bash session. Check the file directory
pwd
ls -alh
Output
/ total 56K drwxr-xr-x 1 root root 4.0K Oct 25 04:43 . drwxr-xr-x 1 root root 4.0K Oct 25 04:43 .. -rwxr-xr-x 1 root root 0 Oct 25 04:43 .dockerenv lrwxrwxrwx 1 root root 7 Oct 4 13:19 bin -> usr/bin dr-xr-xr-x 2 root root 4.0K Apr 12 2018 boot drwxr-xr-x 5 root root 360 Oct 25 04:43 dev drwxr-xr-x 1 root root 4.0K Oct 25 04:43 etc drwxr-xr-x 2 root root 4.0K Oct 4 13:19 home lrwxrwxrwx 1 root root 7 Oct 4 13:19 lib -> usr/lib lrwxrwxrwx 1 root root 9 Oct 4 13:19 lib64 -> usr/lib64 drwxr-xr-x 2 root root 4.0K Apr 12 2018 media drwxr-xr-x 2 root root 4.0K Apr 12 2018 mnt drwxr-xr-x 2 root root 4.0K Apr 12 2018 opt dr-xr-xr-x 170 root root 0 Oct 25 04:43 proc dr-xr-x--- 2 root root 4.0K Oct 4 13:19 root drwxr-xr-x 11 root root 4.0K Oct 4 13:19 run lrwxrwxrwx 1 root root 8 Oct 4 13:19 sbin -> usr/sbin drwxr-xr-x 2 root root 4.0K Apr 12 2018 srv dr-xr-xr-x 13 root root 0 Oct 25 04:43 sys drwxrwxrwt 2 root root 4.0K Oct 4 13:19 tmp drwxr-xr-x 13 root root 4.0K Oct 4 13:19 usr drwxr-xr-x 18 root root 4.0K Oct 4 13:19 var
and check the host to see that you are not in your local host system
hostname
<generated hostname>
Question 21.1
Check the
/etc/os-release
file to see that you are actually inside a release of Scientific Linux. What is the Version ID of this SL image? Try to do this without looking at the solution.Solution
cat /etc/os-release
NAME="Scientific Linux" VERSION="7.9 (Nitrogen)" ID="scientific" ID_LIKE="rhel centos fedora" VERSION_ID="7.9" PRETTY_NAME="Scientific Linux 7.9 (Nitrogen)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:scientificlinux:scientificlinux:7.9:GA" HOME_URL="http://www.scientificlinux.org//" BUG_REPORT_URL="mailto:scientific-linux-devel@listserv.fnal.gov" REDHAT_BUGZILLA_PRODUCT="Scientific Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.9 REDHAT_SUPPORT_PRODUCT="Scientific Linux" REDHAT_SUPPORT_PRODUCT_VERSION="7.9"
Exercise 22 - Monitoring, Exiting, Restarting, and Stopping Containers
Monitoring Your Containers
Open up a new terminal tab on the host machine and list the containers that are currently running
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<generated id> <image:tag> "/bin/bash" n minutes ago Up n minutes <generated name>
Notice that the name of your container is some randomly generated name. To make the name more helpful, rename the running container
docker rename <CONTAINER ID> my-example
and then verify it has been renamed
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<generated id> <image:tag> "/bin/bash" n minutes ago Up n minutes my-example
Specifying a name
You can also startup a container with a specific name
docker run -it --name my-example sl:7 /bin/bash
Exiting a Container
As a test, go back into the terminal used for your container, and create a file in the container
touch test.txt
In the container exit at the command line
exit
You are returned to your shell. If you list the containers you will notice that none are running
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
but you can see all containers that have been run and not removed with
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<generated id> <image:tag> "/bin/bash" n minutes ago Exited (0) t seconds ago my-example
Restating a Container
To restart your exited Docker container start it again and then attach it interactively to your shell
docker start <CONTAINER ID>
docker attach <CONTAINER ID>
exec
commandThe attach command used here is a handy shortcut to interactively access a running container with the same start command (in this case
/bin/bash
) that it was originally run with.In case you’d like some more flexibility, the exec command lets you run any command in the container, with options similar to the run command to enable an interactive (
-i
) session, etc.For example, the
exec
equivalent toattach
ing in our case would look like:docker start <CONTAINER ID> docker exec -it <CONTAINER ID> /bin/bash
You can start multiple shells inside the same container using
exec
.
Notice that your entry point is still /
and then check that your
test.txt
still exists
ls -alh test.txt
-rw-r--r-- 1 root root 0 Oct 25 04:46 test.txt
Clean up a container
If you want a container to be cleaned up — that is deleted — after you exit it then run with the
--rm
option flagdocker run --rm -it <IMAGE> /bin/bash
Stopping a Container
Sometimes you will exited a container and it won’t stop. Other times your container may crash or enter a bad state, but still be running. In order to stop a container you will exit it (exit
) and then enter:
docker stop <CONTAINER ID> # or <NAME>
Exercise 23 - Removing Containers and Images
You can cleanup/remove a container docker rm
docker rm <CONTAINER NAME>
Note: A container must be stopped in order for it to be removed.
Start an instance of the sl:latest
container, exit it, and then remove it:
docker run sl:latest
docker ps -a
docker rm <CONTAINER NAME>
docker ps -a
Output
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES <generated id> <image:tag> "/bin/bash" n seconds ago Exited (0) t seconds ago <name> <generated id> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
You can remove an image from your computer entirely with docker rmi
docker rmi <IMAGE ID>
Question 23.1
Pull down the Python 2.7 image (2.7-slim tag) from Docker Hub and then delete it. What was the image ID for the
python:2.7-slim
images? Try not to look at the solution.Solution
docker pull python:2.7-slim docker images python docker rmi <IMAGE ID> docker images python
2.7: Pulling from library/python <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete Digest: sha256:<the relevant SHA hash> Status: Downloaded newer image for python:2.7-slim docker.io/library/python:2.7-slim REPOSITORY TAG IMAGE ID CREATED SIZE python 2.7-slim eeb27ee6b893 14 hours ago 148MB python 3.7-slim 375e181c2688 13 days ago 120MB Untagged: python@sha256:<the relevant SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> REPOSITORY TAG IMAGE ID CREATED SIZE python 3.7-slim 375e181c2688 13 days ago 120MB
Exercise 24 - File I/O with Containers
Copying Files To and From a Container
Copying files between the local host and Docker containers is possible. On your local host find a file that you want to transfer to the container and then
touch io_example.txt
# If on Mac need to do: chmod a+w io_example.txt
echo "This was written on local host" > io_example.txt
docker cp io_example.txt <NAME>:<remote path>
Note: Remember to do docker ps
if you don’t know the name of your container.
From the container check and modify the file in some way
pwd
ls
cat io_example.txt
echo "This was written inside Docker" >> io_example.txt
<remote path>
io_example.txt
This was written on local host
and then on the local host copy the file out of the container
docker cp <NAME>:<remote path>/io_example.txt .
and verify if you want that the file has been modified as you wanted
cat io_example.txt
This was written on local host
This was written inside Docker
Volume Mounting
What is more common and arguably more useful is to mount volumes to containers with the -v
flag. This allows for direct access to the host file system inside of the container and for container processes to write directly to the host file system.
docker run -v <path on host>:<path in container> <image>
For example, to mount your current working directory on your local machine to the data
directory in the example container
docker run --rm -it -v $PWD:/home/`whoami`/data sl:7
From inside the container you can ls
to see the contents of your directory on your local machine
ls
and yet you are still inside the container
pwd
/home/<username>/data
You can also see that any files created in this path in the container persist upon exit
touch created_inside.txt
exit
ls *.txt
created_inside.txt
This I/O allows for Docker images to be used for specific tasks that may be difficult to do with the tools or software installed on the local host machine. For example, debugging problems with software that arise on cross-platform software, or even just having a specific version of software perform a task (e.g., using Python 2 when you don’t want it on your machine, or using a specific release of TeX Live when you aren’t ready to update your system release).
Mounts in Cygwin
Special care needs to be taken when using Cygwin and trying to mount directories. Assuming you have Cygwin installed at
C:\cygwin
and you want to mount your current working directory:echo $PWD
/home/<username>/<path_to_cwd>
You will then need to mount that folder using
-v /c/cygwin/home/<username>/<path_to_cwd>:/home/docker/data
Exercise 24 - Using Singularity on CMSLPC
So far we’ve only discussed using Docker images and using the Docker runtime. For a variety of reasons Docker is not ideal for use on HPCs like CMSLPC, but luckily Singularity is. Therefore, this next section will cover how to run Docker and Singularity images in a Singularity runtime environment.
Before we go into any detail, you should be aware of the central CMS documentation.
Running custom images with Singularity
As an example, we are going to run a container using the ubuntu:latest
image. Begin by loggin into cmslpc-sl7
:
ssh -Y <username>@cmslpc-sl7.fnal.gov
Before running Singularity, you should set the cache directory (i.e.
the directory to which the images are being pulled) to a
place outside your $HOME
/AFS space (here we use the ~/nobackup
directory):
export SINGULARITY_CACHEDIR="`readlink -f ~/nobackup/`/Singularity"
singularity shell -B `readlink $HOME` -B `readlink -f ${HOME}/nobackup/` -B /cvmfs docker://ubuntu:latest
# try accessing cvmfs inside of the container
source /cvmfs/cms.cern.ch/cmsset_default.sh
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob d72e567cc804 done
Copying blob 0f3630e5ff08 done
Copying blob b6a83d81d1f4 done
Copying config bbea2a0436 done
Writing manifest to image destination
Storing signatures
2020/09/27 23:48:29 info unpack layer: sha256:d72e567cc804d0b637182ba23f8b9ffe101e753a39bf52cd4db6b89eb089f13b
2020/09/27 23:48:31 info unpack layer: sha256:0f3630e5ff08d73b6ec0e22736a5c8d2d666e7b568c16f6a4ffadf8c21b9b1ad
2020/09/27 23:48:31 info unpack layer: sha256:b6a83d81d1f4f942d37e1f17195d9c519969ed3040fc3e444740b884e44dec33
INFO: Creating SIF file...
INFO: Convert SIF file to sandbox...
WARNING: underlay of /etc/localtime required more than 50 (66) bind mounts
If you are asked for a docker username and password, just hit enter twice.
It’s not really a great practice to bind /eos/uscms
into the container and you really shouldn’t need to use the EOS fuse mount anyway.
One particular difference from Docker is that the image name needs to be prepended by docker://
to tell Singularity that this is a Docker image. Singularity has its own registry system, which doesn’t have a de facto default registry like Docker Hub.
As you can see from the output, Singularity first downloads the layers from the registry, and is then unpacking the layers into a format that can be read by Singularity, the Singularity Image Format (SIF). This is a somewhat technical detail, but is different from Docker. It then unpacks the SIF file into what it calls a sandbox, the uncompressed image files needed to make the container.
-B
(bind strings)The -B option allows the user to specify paths to bind to the Singularity container. This option is similar to ‘-v’ in docker. By default paths are mounted as rw (read/write), but can also be specified as ro (read-only).
You must bind any mounted file systems to which you would like access (i.e.
nobackup
).If you would like Singularity to run your
.bashrc
file on startup, you must bind mount your home directory.
In the next example, we are executing a script with singularity using the same image.
export SINGULARITY_CACHEDIR="`readlink -f ~/nobackup/`/Singularity"
echo -e '#!/bin/bash\n\necho "Hello World!"\n' > hello_world.sh
singularity exec -B `readlink $HOME` -B `readlink -f ${HOME}/nobackup/` docker://ubuntu:latest bash hello_world.sh
exec
vs.shell
Singularity differentiates between providing you with an interactive shell (
singularity shell
) and executing scripts non-interactively (singularity exec
).
Saving the Singularity Sandbox
You may have noticed that singularity caches both the Docker and SIF images so that they don’t need to be pulled/created on subsequent Singularity calls. That said, the sandbox needed to be created each time we started a container. If you will be using the same container multiple times, it may be useful to store the sandbox and use that to start the container.
Begin by building and storing the sandbox:
export SINGULARITY_CACHEDIR="`readlink -f ~/nobackup/`/Singularity"
singularity build --sandbox ubuntu/ docker://ubuntu:latest
INFO: Starting build...
Getting image source signatures
Copying blob d72e567cc804 skipped: already exists
Copying blob 0f3630e5ff08 skipped: already exists
Copying blob b6a83d81d1f4 [--------------------------------------] 0.0b / 0.0b
Copying config bbea2a0436 done
Writing manifest to image destination
Storing signatures
2020/09/28 00:14:16 info unpack layer: sha256:d72e567cc804d0b637182ba23f8b9ffe101e753a39bf52cd4db6b89eb089f13b
2020/09/28 00:14:17 warn xattr{etc/gshadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
2020/09/28 00:14:17 warn xattr{/uscms_data/d2/aperloff/rootfs-7379bde5-0149-11eb-9685-001a4af11eb0/etc/gshadow} destination filesystem does not support xattrs, further warnings will be suppressed
2020/09/28 00:14:38 info unpack layer: sha256:0f3630e5ff08d73b6ec0e22736a5c8d2d666e7b568c16f6a4ffadf8c21b9b1ad
2020/09/28 00:14:38 info unpack layer: sha256:b6a83d81d1f4f942d37e1f17195d9c519969ed3040fc3e444740b884e44dec33
INFO: Creating sandbox directory...
INFO: Build complete: ubuntu/
Once we have the sandbox we can use that when starting the container. Run the same command as before, but use the sandbox rather than the Docker image:
export SINGULARITY_CACHEDIR="`readlink -f ~/nobackup/`/Singularity"
singularity exec -B `readlink $HOME` -B `readlink -f ${HOME}/nobackup/` ubuntu/ bash hello_world.sh
WARNING: underlay of /etc/localtime required more than 50 (66) bind mounts
Hello World!
You will notice that the startup time for the container is significantly reduced.
Question 24.1
What is the size of the singularity sandbox? Hint: Use the command
du -hs <sandbox>
.
Key Points
Docker images are super useful for encapsulating a desired environment.
Docker images can be run using the Docker or Singularity runtimes.