Introducing containers

Overview

Teaching: 30 min
Exercises: 10 min
Questions
  • What are containers, and why might they be useful to me?

Objectives
  • Show how software depending on other software leads to configuration management problems.

  • Explain the notion of virtualisation in computing.

  • Explain the ways in which virtualisation may be useful.

  • Explain how containers streamline virtualisation.

Disclaimer

This looks like the Carpentries’ formatting of lessons because it is using their stylesheet (which is openly available for such purposes). However the similarity is visual-only: the lesson is not being developed by the Carpentries and has no association with the Carpentries.

The fundamental problem: software has dependencies that are difficult to manage

Consider Microsoft Excel: a typical workplace productivity tool. Most Excel users probably give little thought to the underlying software dependencies allowing them to open a given XLSX file on their computer. Indeed, in an ideal world none of us would need to think about fixing software dependencies, but we are far from that world. For example:

All of the above discussion is just about one piece of software: Microsoft Excel. Excel mostly just depends on the version of the operating system you’re running. However the situation gets far worse when building software in programming environments like R or Python. Those languages change over time, and depend on an enormous set of software libraries written by unrelated software development teams.

What if you wanted to distribute a software tool that automated interaction between R and Python. Both of these language environments have independent version and software dependency lineages. As the number of software components such as R and Python increases, this can rapidly lead to a combinatorial explosion in the number of possible configurations, only some of which will work as intended. This situation is sometimes informally termed “dependency hell”.

The situation is often mitigated in part by factors such as:

It’s not very practical to have every version of every piece of software installed on any computer that might need to resolve dependencies, as the mostly-redundant space used by the different versions will continue to mount.

Thankfully there are ways to get underneath (a lot of) this mess: containers to the rescue! Containers provide a way to package up software dependencies and access to resources such as files and communications networks in a uniform manner.

Background: virtualisation in computing

Some uses of computers and the software they run should be deterministic, particularly when building reproducible computational environments. The meaning of “deterministic” is: if you feed in the same input, to the same computer, then the same output will appear.

However a computer running deterministic software, let’s call it the “guest”, should itself be able to be simulated as a deterministic computational process running on a computer we will call the “host”. The guest computer can be said to have been virtualised: it is no longer a physical computer. Note that “virtual machine” is frequently referred to using the abbreviation “VM”.

We have avoided the software dependency issue by looking for the lowest common factor across all software systems, which is the computer itself, beneath even the operating system software.

Omitting details and avoiding complexities…

Note that this description omits many details and avoids discussing complexities that are not particularly relevant to this introduction session, for example:

  • Thinking with analogy to movies such as Inception, The Matrix, etc., can the guest computer figure out that it’s not actually a physical computer, and that it’s running as software inside a host physical computer? Yes, it probably can… but let’s not go there within this episode.
  • Can you run a host as a guest itself within another host? Sometimes… but let’s not go there either.

Motivation for virtualisation

What features does virtualisation offer?

Downsides of “full” virtualisation:

Containers are a type of lightweight virtualisation

Containers are similar to virtual machines but offer a more lightweight solution. Containers sacrifice the strong isolation that full virtualisation provides in order to vastly reduce the resource requirements on the virtualisation host.

/2020-02-03-durham-docker/VM%20vs%20Container

The term “container” can be usefully considered with reference to shipping containers. Before shipping containers were developed, packing and unpacking cargo ships was time consuming, and error prone, with high potential for different clients’ goods to become mixed up. Software containers standardise the packaging of a complete software system (the lightweight virtual machine): you can drop a container into a container host, and it should “just work”.

The same containers can run on:

We should certainly see people using the same containers on macOS and Windows today.

And what do you do?

Talk to your neighbour, office mate or rubber duck about your research. How does computing help you do your research? How do you think containers (or virtualisation) could help you do more or better research?

Docker is software that manages containers and the resources that containers need. While Docker is a leader in the container space, there are many similar technologies available… the concepts we learn today will allow us to use other container platforms even if their command syntax will be a little different.

Docker’s terminology

Key Points

  • Almost all software depends on other software components to function, but these components have independent evolutionary paths.

  • Projects involving many software components can rapidly run into a combinatoric explosion in the number of software version configurations available, yet only a subset of possible configurations actually works as desired.

  • Containers collect software components together and can help avoid software dependency problems.

  • Virtualisation is an old technology that container technology makes more practical.

  • Docker is just one software platform that can create containers and the resources they use.