CMS Data Analysis School Pre-Exercises - Seventh Set

Overview

Teaching: 0 min
Exercises: 60 min

Questions

What is an image? How about a container?

What is Docker/Singularity?

Why is containerization useful?

Ummmm…how is this different from a virtual machine?

Objectives

Gain a basic understanding of how to run and manage a container.

Understand the absolute basic commands for Docker.

Know how to start a Singularity container.

Introduction

Warning

As a prerequisite for this exercise, please make sure that you have correctly followed the setup instructions for installing Docker and obtaining a DockerHub account.

Objective

Please post your answers to the questions in the Google form seventh set.

Limitation

This exercise seeks to introduce the student to the benefits of containerization and a handful of container services. We cannot cover all topics related to containerization in this short exercise. In particular, we do not seek to explain what is happening under the hood or how to develop your own images. There are other great tutorials covering a variety of containerization topics as they relate to LHC experiments:

Docker/Singularity HATS@LPC

Introduction to Docker

Software containers for CMSSW

Official Docker documentation and tutorial

There are undoubtedly also other, non-LHC oriented tutorials online.

Containers and Images

Containers are like lightweight virtual machines. They behave as if they were their own complete OS, but actually only contain the components necessary to operate. Instead, containers share the host machine’s system kernel, significantly reducing their size. In essence, they run a second OS natively on the host machine with just a thin additional layer, which means they can be faster than traditional virtual machines. These container only take up as much memory as necessary, which allows many of them to be run simultaneously and they can be spun up quite rapidly.

DockerVM

Images are read-only templates that contain a set of instructions for creating a container. Different container orchestration programs have different formats for these images. Often a single image is made of several files (layers) which contain all of the dependencies and application code necessary to create and configure the container environment. In other words, Docker containers are the runtime instances of images — they are images with a state.

DockerImage

This allows us to package up an application with just the dependencies we need (OS and libraries) and then deploy that image as a single package. This allows us to:

replicate our environment/workflow on other host machines
run a program on a host OS other than the one for which is was designed (not 100% foolproof)
sandbox our applications in a secure environment (still important to take proper safety measures)

Container Runtimes

For the purposes of this tutorial we will only be considering Docker and Singularity for container runtimes. That said, these are really powerful tools which are so much more than just container runtimes. We encourage you to take the time to explore the Docker and Singularity documentation.

Side Note

As a side note, Docker has very similar syntax to Git and Linux, so if you are familiar with the command line tools for them then most of Docker should seem somewhat natural (though you should still read the docs!).

Exercise 20 - Pulling Docker Images

Much like GitHub allows for web hosting and searching for code, the image registries allow the same for Docker/Singularity images. Without going into too much detail, there are several public and private registries available. For Docker, however, the defacto default registry is Docker Hub. Singularity, on the other hand, does not have a defacto default registry.

To begin with we’re going to pull down the Docker image we’re going to be working in for this part of the tutorial (Note: If you already did the docker pull, this image will already be on your machine. In this case, Docker should notice it’s there and not attempt to re-pull it, unless the image has changed in the meantime.):

docker pull sl

#if you run into a premission error, use "sudo docker run ..." as a quick fix
# to fix this for the future, see https://docs.docker.com/install/linux/linux-postinstall/
# if you have a M1 chip Mac, you may want to do "docker pull sl --platform amd64"

Using default tag: latest
latest: Pulling from library/sl
175b929ba158: Pull complete 
Digest: sha256:d38e6664757e138c43f1c144df20fb93538b75111f922fce57930797114b7728
Status: Downloaded newer image for sl:latest
docker.io/library/sl:latest

The image names are composed of NAME[:TAG|@DIGEST], where the NAME is composed of REGISTRY-URL/NAMESPACE/IMAGE and is often referred to as a repository. Here are some things to know about specifying the image:

Some repositories will include a USERNAME as part of the image name (i.e. fnallpc/fnallpc-docker), and others, usually Docker verified content, will include only a single name (i.e. sl).
A registry path (REGISTRY-URL/NAMESPACE) is similar to a URL, but does not contain a protocol specifier (https://). Docker uses the https:// protocol to communicate with a registry, unless the registry is allowed to be accessed over an insecure connection. Registry credentials are managed by docker login. If no registry path is given, the docker daemon assumes you meant to pull from Docker Hub and automatically appends docker.io/library to the beginning of the image name.
If no tag is provided, Docker Engine uses the :latest tag as a default.
The SHA256 DIGEST is much like a Git hash, where it allows you to pull a specific version of an image.
CERN GitLab’s repository path is gitlab-registry.cern.ch/<username>/<repository>/<image_name>[:<tag>|@<digest>].

Now, let’s list the images that we have available to us locally

docker images

If you have many images and want to get information on a particular one you can apply a filter, such as the repository name

docker images sl

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
sl                  latest              5237b847a4d0        2 weeks ago         186MB

or more explicitly

docker images --filter=reference="sl"

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
sl                  latest              5237b847a4d0        2 weeks ago         186MB

You can see here that there is the TAG field associated with the sl image. Tags are way of further specifying different versions of the same image. As an example, let’s pull the 7 release tag of the sl image (again, if it was already pulled during setup, docker won’t attempt to re-pull it unless it’s changed since last pulled).

docker pull sl:7
docker images sl

7: Pulling from library/sl
Digest: sha256:d38e6664757e138c43f1c144df20fb93538b75111f922fce57930797114b7728
Status: Downloaded newer image for sl:7
docker.io/library/sl:7

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
sl                  7                   5237b847a4d0        2 weeks ago         186MB
sl                  latest              5237b847a4d0        2 weeks ago         186MB

Question 20.1

Pull down the python:3.7-slim image and then list all of the python images along with the sl:7 image. What is the ‘Image ID’ of the python:3.7-slim image? Try to do this without looking at the solution.

Solution

docker pull python:3.7-slim
docker images --filter=reference="sl" --filter=reference="python"

3.7-slim: Pulling from library/python
7d63c13d9b9b: Pull complete 
7c9d54bd144b: Pull complete 
a7f085de2052: Pull complete 
9027970cef28: Pull complete 
97a32a5a9483: Pull complete 
Digest: sha256:1189006488425ef977c9257935a38766ac6090159aa55b08b62287c44f848330
Status: Downloaded newer image for python:3.7-slim
docker.io/library/python:3.7-slim

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
python              3.7-slim            375e181c2688        13 days ago         120MB
sl                  7                   5237b847a4d0        2 weeks ago         186MB
sl                  latest              5237b847a4d0        2 weeks ago         186MB

Exercise 21 - Running Docker Images

To use a Docker image as a particular instance on a host machine you run it as a container. You can run in either a detached or foreground (interactive) mode.

Run the image we pulled as a container with an interactive bash terminal:

docker run -it sl:7 /bin/bash

The -i option here enables the interactive session, the -t option gives access to a terminal and the /bin/bash command makes the container start up in a bash session.

You are now inside the container in an interactive bash session. Check the file directory

pwd
ls -alh

Output

/
total 56K
drwxr-xr-x   1 root root 4.0K Oct 25 04:43 .
drwxr-xr-x   1 root root 4.0K Oct 25 04:43 ..
-rwxr-xr-x   1 root root    0 Oct 25 04:43 .dockerenv
lrwxrwxrwx   1 root root    7 Oct  4 13:19 bin -> usr/bin
dr-xr-xr-x   2 root root 4.0K Apr 12  2018 boot
drwxr-xr-x   5 root root  360 Oct 25 04:43 dev
drwxr-xr-x   1 root root 4.0K Oct 25 04:43 etc
drwxr-xr-x   2 root root 4.0K Oct  4 13:19 home
lrwxrwxrwx   1 root root    7 Oct  4 13:19 lib -> usr/lib
lrwxrwxrwx   1 root root    9 Oct  4 13:19 lib64 -> usr/lib64
drwxr-xr-x   2 root root 4.0K Apr 12  2018 media
drwxr-xr-x   2 root root 4.0K Apr 12  2018 mnt
drwxr-xr-x   2 root root 4.0K Apr 12  2018 opt
dr-xr-xr-x 170 root root    0 Oct 25 04:43 proc
dr-xr-x---   2 root root 4.0K Oct  4 13:19 root
drwxr-xr-x  11 root root 4.0K Oct  4 13:19 run
lrwxrwxrwx   1 root root    8 Oct  4 13:19 sbin -> usr/sbin
drwxr-xr-x   2 root root 4.0K Apr 12  2018 srv
dr-xr-xr-x  13 root root    0 Oct 25 04:43 sys
drwxrwxrwt   2 root root 4.0K Oct  4 13:19 tmp
drwxr-xr-x  13 root root 4.0K Oct  4 13:19 usr
drwxr-xr-x  18 root root 4.0K Oct  4 13:19 var

and check the host to see that you are not in your local host system

hostname

<generated hostname>

Question 21.1

Check the /etc/os-release file to see that you are actually inside a release of Scientific Linux. What is the Version ID of this SL image? Try to do this without looking at the solution.

Solution

cat /etc/os-release

NAME="Scientific Linux"
VERSION="7.9 (Nitrogen)"
ID="scientific"
ID_LIKE="rhel centos fedora"
VERSION_ID="7.9"
PRETTY_NAME="Scientific Linux 7.9 (Nitrogen)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:scientificlinux:scientificlinux:7.9:GA"
HOME_URL="http://www.scientificlinux.org//"
BUG_REPORT_URL="mailto:scientific-linux-devel@listserv.fnal.gov"

REDHAT_BUGZILLA_PRODUCT="Scientific Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
REDHAT_SUPPORT_PRODUCT="Scientific Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.9"

Exercise 22 - Monitoring, Exiting, Restarting, and Stopping Containers

Monitoring Your Containers

Open up a new terminal tab on the host machine and list the containers that are currently running

docker ps

CONTAINER ID        IMAGE         COMMAND             CREATED             STATUS              PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n minutes ago       Up n minutes                            <generated name>

Notice that the name of your container is some randomly generated name. To make the name more helpful, rename the running container

docker rename <CONTAINER ID> my-example

and then verify it has been renamed

docker ps

CONTAINER ID        IMAGE         COMMAND             CREATED             STATUS              PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n minutes ago       Up n minutes                            my-example

Specifying a name

You can also startup a container with a specific name
docker run -it --name my-example sl:7 /bin/bash

Exiting a Container

As a test, go back into the terminal used for your container, and create a file in the container

touch test.txt

In the container exit at the command line

exit

You are returned to your shell. If you list the containers you will notice that none are running

docker ps

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

but you can see all containers that have been run and not removed with

docker ps -a

CONTAINER ID        IMAGE         COMMAND             CREATED            STATUS                     PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n minutes ago      Exited (0) t seconds ago                       my-example

Restating a Container

To restart your exited Docker container start it again and then attach it interactively to your shell

docker start <CONTAINER ID>
docker attach <CONTAINER ID>

exec command

The attach command used here is a handy shortcut to interactively access a running container with the same start command (in this case /bin/bash) that it was originally run with.

In case you’d like some more flexibility, the exec command lets you run any command in the container, with options similar to the run command to enable an interactive (-i) session, etc.

For example, the exec equivalent to attaching in our case would look like:
docker start <CONTAINER ID>
docker exec -it <CONTAINER ID> /bin/bash
You can start multiple shells inside the same container using exec.

Notice that your entry point is still / and then check that your test.txt still exists

ls -alh test.txt

-rw-r--r--   1 root root    0 Oct 25 04:46 test.txt

Clean up a container

If you want a container to be cleaned up — that is deleted — after you exit it then run with the --rm option flag
docker run --rm -it <IMAGE> /bin/bash

Stopping a Container

Sometimes you will exited a container and it won’t stop. Other times your container may crash or enter a bad state, but still be running. In order to stop a container you will exit it (exit) and then enter:

docker stop <CONTAINER ID> # or <NAME>

Exercise 23 - Removing Containers and Images

You can cleanup/remove a container docker rm

docker rm <CONTAINER NAME>

Note: A container must be stopped in order for it to be removed.

Start an instance of the sl:latest container, exit it, and then remove it:

docker run sl:latest
docker ps -a
docker rm <CONTAINER NAME>
docker ps -a

Output

CONTAINER ID        IMAGE         COMMAND             CREATED            STATUS                     PORTS               NAMES
<generated id>      <image:tag>   "/bin/bash"         n seconds ago      Exited (0) t seconds ago                       <name>

<generated id>

CONTAINER ID        IMAGE         COMMAND             CREATED            STATUS                     PORTS               NAMES

You can remove an image from your computer entirely with docker rmi

docker rmi <IMAGE ID>

Question 23.1

Pull down the Python 2.7 image (2.7-slim tag) from Docker Hub and then delete it. What was the image ID for the python:2.7-slim images? Try not to look at the solution.

Solution

docker pull python:2.7-slim
docker images python
docker rmi <IMAGE ID>
docker images python

2.7: Pulling from library/python
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
<some numbers>: Pull complete
Digest: sha256:<the relevant SHA hash>
Status: Downloaded newer image for python:2.7-slim
docker.io/library/python:2.7-slim

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
python              2.7-slim            eeb27ee6b893        14 hours ago        148MB
python              3.7-slim            375e181c2688        13 days ago         120MB

Untagged: python@sha256:<the relevant SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>
Deleted: sha256:<layer SHA hash>

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
python              3.7-slim            375e181c2688        13 days ago        120MB

Exercise 24 - File I/O with Containers

Copying Files To and From a Container

Copying files between the local host and Docker containers is possible. On your local host find a file that you want to transfer to the container and then

touch io_example.txt
# If on Mac need to do: chmod a+w io_example.txt
echo "This was written on local host" > io_example.txt
docker cp io_example.txt <NAME>:<remote path>

Note: Remember to do docker ps if you don’t know the name of your container.

From the container check and modify the file in some way

pwd
ls
cat io_example.txt
echo "This was written inside Docker" >> io_example.txt

<remote path>
io_example.txt
This was written on local host

and then on the local host copy the file out of the container

docker cp <NAME>:<remote path>/io_example.txt .

and verify if you want that the file has been modified as you wanted

cat io_example.txt

This was written on local host
This was written inside Docker

Volume Mounting

What is more common and arguably more useful is to mount volumes to containers with the -v flag. This allows for direct access to the host file system inside of the container and for container processes to write directly to the host file system.

docker run -v <path on host>:<path in container> <image>

For example, to mount your current working directory on your local machine to the data directory in the example container

docker run --rm -it -v $PWD:/home/`whoami`/data sl:7

From inside the container you can ls to see the contents of your directory on your local machine

ls

and yet you are still inside the container

pwd

/home/<username>/data

You can also see that any files created in this path in the container persist upon exit

touch created_inside.txt
exit
ls *.txt

created_inside.txt

This I/O allows for Docker images to be used for specific tasks that may be difficult to do with the tools or software installed on the local host machine. For example, debugging problems with software that arise on cross-platform software, or even just having a specific version of software perform a task (e.g., using Python 2 when you don’t want it on your machine, or using a specific release of TeX Live when you aren’t ready to update your system release).

Mounts in Cygwin

Special care needs to be taken when using Cygwin and trying to mount directories. Assuming you have Cygwin installed at C:\cygwin and you want to mount your current working directory:
echo $PWD
/home/<username>/<path_to_cwd>
You will then need to mount that folder using -v /c/cygwin/home/<username>/<path_to_cwd>:/home/docker/data

Exercise 24 - Using Singularity on CMSLPC

So far we’ve only discussed using Docker images and using the Docker runtime. For a variety of reasons Docker is not ideal for use on HPCs like CMSLPC, but luckily Singularity is. Therefore, this next section will cover how to run Docker and Singularity images in a Singularity runtime environment.

Before we go into any detail, you should be aware of the central CMS documentation.

Running custom images with Singularity

As an example, we are going to run a container using the ubuntu:latest image. Begin by loggin into cmslpc-sl7:

ssh -Y <username>@cmslpc-sl7.fnal.gov

Before running Singularity, you should set the cache directory (i.e. the directory to which the images are being pulled) to a place outside your $HOME/AFS space (here we use the ~/nobackup directory):

export SINGULARITY_CACHEDIR="`readlink -f ~/nobackup/`/Singularity"
singularity shell -B `readlink $HOME` -B `readlink -f ${HOME}/nobackup/` -B /cvmfs docker://ubuntu:latest
# try accessing cvmfs inside of the container
source /cvmfs/cms.cern.ch/cmsset_default.sh

INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob d72e567cc804 done
Copying blob 0f3630e5ff08 done
Copying blob b6a83d81d1f4 done
Copying config bbea2a0436 done
Writing manifest to image destination
Storing signatures
2020/09/27 23:48:29  info unpack layer: sha256:d72e567cc804d0b637182ba23f8b9ffe101e753a39bf52cd4db6b89eb089f13b
2020/09/27 23:48:31  info unpack layer: sha256:0f3630e5ff08d73b6ec0e22736a5c8d2d666e7b568c16f6a4ffadf8c21b9b1ad
2020/09/27 23:48:31  info unpack layer: sha256:b6a83d81d1f4f942d37e1f17195d9c519969ed3040fc3e444740b884e44dec33
INFO:    Creating SIF file...
INFO:    Convert SIF file to sandbox...
WARNING: underlay of /etc/localtime required more than 50 (66) bind mounts

If you are asked for a docker username and password, just hit enter twice.

It’s not really a great practice to bind /eos/uscms into the container and you really shouldn’t need to use the EOS fuse mount anyway.

One particular difference from Docker is that the image name needs to be prepended by docker:// to tell Singularity that this is a Docker image. Singularity has its own registry system, which doesn’t have a de facto default registry like Docker Hub.

As you can see from the output, Singularity first downloads the layers from the registry, and is then unpacking the layers into a format that can be read by Singularity, the Singularity Image Format (SIF). This is a somewhat technical detail, but is different from Docker. It then unpacks the SIF file into what it calls a sandbox, the uncompressed image files needed to make the container.

-B (bind strings)

The -B option allows the user to specify paths to bind to the Singularity container. This option is similar to ‘-v’ in docker. By default paths are mounted as rw (read/write), but can also be specified as ro (read-only).

You must bind any mounted file systems to which you would like access (i.e. nobackup).

If you would like Singularity to run your .bashrc file on startup, you must bind mount your home directory.

In the next example, we are executing a script with singularity using the same image.

export SINGULARITY_CACHEDIR="`readlink -f ~/nobackup/`/Singularity"
echo -e '#!/bin/bash\n\necho "Hello World!"\n' > hello_world.sh
singularity exec -B `readlink $HOME` -B `readlink -f ${HOME}/nobackup/` docker://ubuntu:latest bash hello_world.sh

exec vs. shell

Singularity differentiates between providing you with an interactive shell (singularity shell) and executing scripts non-interactively (singularity exec).

Saving the Singularity Sandbox

You may have noticed that singularity caches both the Docker and SIF images so that they don’t need to be pulled/created on subsequent Singularity calls. That said, the sandbox needed to be created each time we started a container. If you will be using the same container multiple times, it may be useful to store the sandbox and use that to start the container.

Begin by building and storing the sandbox:

export SINGULARITY_CACHEDIR="`readlink -f ~/nobackup/`/Singularity"
singularity build --sandbox ubuntu/ docker://ubuntu:latest

INFO:    Starting build...
Getting image source signatures
Copying blob d72e567cc804 skipped: already exists
Copying blob 0f3630e5ff08 skipped: already exists
Copying blob b6a83d81d1f4 [--------------------------------------] 0.0b / 0.0b
Copying config bbea2a0436 done
Writing manifest to image destination
Storing signatures
2020/09/28 00:14:16  info unpack layer: sha256:d72e567cc804d0b637182ba23f8b9ffe101e753a39bf52cd4db6b89eb089f13b
2020/09/28 00:14:17  warn xattr{etc/gshadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
2020/09/28 00:14:17  warn xattr{/uscms_data/d2/aperloff/rootfs-7379bde5-0149-11eb-9685-001a4af11eb0/etc/gshadow} destination filesystem does not support xattrs, further warnings will be suppressed
2020/09/28 00:14:38  info unpack layer: sha256:0f3630e5ff08d73b6ec0e22736a5c8d2d666e7b568c16f6a4ffadf8c21b9b1ad
2020/09/28 00:14:38  info unpack layer: sha256:b6a83d81d1f4f942d37e1f17195d9c519969ed3040fc3e444740b884e44dec33
INFO:    Creating sandbox directory...
INFO:    Build complete: ubuntu/

Once we have the sandbox we can use that when starting the container. Run the same command as before, but use the sandbox rather than the Docker image:

export SINGULARITY_CACHEDIR="`readlink -f ~/nobackup/`/Singularity"
singularity exec -B `readlink $HOME` -B `readlink -f ${HOME}/nobackup/` ubuntu/ bash hello_world.sh

WARNING: underlay of /etc/localtime required more than 50 (66) bind mounts
Hello World!

You will notice that the startup time for the container is significantly reduced.

Question 24.1

What is the size of the singularity sandbox? Hint: Use the command du -hs <sandbox>.

Key Points

Docker images are super useful for encapsulating a desired environment.

Docker images can be run using the Docker or Singularity runtimes.

previous episode

CMS DAS Pre-Exercises

lesson home

CMS Data Analysis School Pre-Exercises - Seventh Set

Overview

Introduction

Warning

Objective

Limitation

Containers and Images

Container Runtimes

Side Note

Exercise 20 - Pulling Docker Images

Question 20.1

Solution

Exercise 21 - Running Docker Images

Output

Question 21.1

Solution

Exercise 22 - Monitoring, Exiting, Restarting, and Stopping Containers

Monitoring Your Containers

Specifying a name

Exiting a Container

Restating a Container

exec command

Clean up a container

Stopping a Container

Exercise 23 - Removing Containers and Images

Output

Question 23.1

Solution

Exercise 24 - File I/O with Containers

Copying Files To and From a Container

Volume Mounting

Mounts in Cygwin

Exercise 24 - Using Singularity on CMSLPC

Running custom images with Singularity

-B (bind strings)

exec vs. shell

Saving the Singularity Sandbox

Question 24.1

Key Points

previous episode

lesson home

`exec` command

`-B` (bind strings)

`exec` vs. `shell`