width:1000px

udocker - be anywhere

Advanced technical details

https://github.com/indigo-dc/udocker

Mario David david@lip.pt Jorge Gomes jorge@lip.pt

width:150px width:1200px

Programing languages and OS

Implemented
- python, C, C++, go
Can run:
- CentOS 6, CentOS 7, RHEL 8 or RHEL 9 (compatible distros)
- Ubuntu >= 16.04
- Any distro that supports python 2.7 and >= 3.6

Components

Command line interface docker like
Pull of containers from Docker Hub
Local repository of images and containers
Execution of containers with modular engines

udocker: Execution engines - I

udocker supports several techniques to achieve the equivalent to a chroot without using privileges, to execute containers.

They are selected per container id via execution modes.

udocker: Execution engines - II

Mode	Base	Description
P1	PRoot	PTRACE accelerated (with SECCOMP filtering): DEFAULT
P2	PRoot	PTRACE non-accelerated (without SECCOMP filtering)
R1	runC/Crun	rootless unprivileged using user namespaces
R2	runC/Crun	rootless unprivileged using user namespaces + P1
R3	runC/Crun	rootless unprivileged using user namespaces + P2
F1	Fakechroot	with loader as argument and LD_LIBRARY_PATH
F2	Fakechroot	with modified loader, loader as argument and LD_LIBRARY_PATH
F3	Fakechroot	modified loader and ELF headers of binaries + libs changed
F4	Fakechroot	modified loader and ELF headers dynamically changed
S1	Singularity	where locally installed using chroot or user namespaces

udocker: PRoot engine (P1 and P2)

PRoot uses PTRACE to intercept system calls
Pathnames are modified before the call
- To expand container pathnames into host pathnames
- /bin/ls to /home/user/.udocker/containers/CONTAINER-NAME/ROOT/bin/ls
If pathnames are returned they are modified after the call
- To shrink host pathnames to container pathnames
- /home/user/.udocker/containers/CONTAINER-NAME/ROOT/bin/ls to /bin/ls
P1 and P2 are very generic modes adequate for most applications
- They also offer root emulation
As new system calls are added they must be also be added to PRoot
- Often compatibility also needs to be added for older kernels.

udocker: PRoot engine (P1)

The P1 mode uses PTRACE + SECCOMP filtering
- P1 is the udocker default mode
- System call interception is limited to the set of calls that manipulate pathnames
- We fixed PRoot for SECCOMP on recent kernels, most changes incorporated upstream
The impact of tracing depends on the system call frequency
- In most cases the performance is good
- Applications that are heavily threaded or pathname intensive might be impacted

udocker: PRoot engine (P2)

The P2 mode uses PTRACE without SECCOMP
- Therefore intercepts all system calls even the if they don’t make use pathnames
- P1 falls back to P2 on old Linux kernels without SECCOMP (e.g. CentOS 6)
The impact of tracing depends on the system call frequency
- Since all system calls are intercepted can be slow
- Applications that are heavily threaded or pathname intensive can be very impacted
- In such cases using Fn modes is recommended

udocker: runC/crun engine (R1) - I

runC and crun are tools to spawn containers according to the Open Containers Initiative (OCI) specification:
- They support unprivileged namespaces using the user namespace.
- User namespaces have several limitations but allow execution without privileges.
- Within the Rn modes you can only run in the container as a less privileged root.
- Access to the host devices is limited.

udocker: runC/crun engine (R1) - II

To support runC/crun in udocker:
- We added conversion of Docker metadata to the OCI spec format.
- udocker can produce an OCI spec and run the containers with runC/crun transparently.
- While runC is written in go, crun is written in C and is generally faster.
- Depending on the host system udocker selects crun or runC.
- crun provides support for the kernel cgroups version 2 which became required in some distributions.

udocker: runC/crun engine (R2 and R3)

The R2 and R3 execution modes are nested:
- These modes make use of P1 or P2 from inside the R engine.
- It is used to overcome some of the user namespace limitations.
- They are not generally necessary.
- All limitations of the P1 and P2 modes also apply to R2 and R3.
The Pn modes require a tmp directory that is writable.

udocker run  -v /tmp myContainerId

udocker: Fakechroot engine - I

Fakechroot is a library to provide chroot-like behaviour.
- It was conceived to support debootstrap in debian
For udocker
- It has been heavily modified to support Linux containers with udocker
- Supports both glibc and musl libc (ported by the udocker developers)
Uses the Linux loader LD_PRELOAD mechanism to;
- Intercept calls to the libc.so functions that manipulate pathnames.
- Translates the pathnames before and after the call similarly to PRoot.
- Does not work with statically compiled executables.

udocker: Fakechroot engine - II

In the original fakechroot the executables must match the host loader and libc.
- Shared libraries are loaded from the host not the container.
- Causing symbol mismatches and application crashes.
Why is this ?
- The path to the loader ld.so is inside the ELF header of all executables.
  - is an absolute path pointing to the host
  - readelf --program-headers /bin/ls | grep interpreter
  - since loading starts before execution we cannot intercept and translate
- Pathnames to library locations and ld.so.cache inside ld.so are absolute.
  - loaders are statically linked so we cannot intercept and translate
- Absolute paths also may exist in the ELF headers of executables and libraries.

udocker: Fakechroot engine - III

The shared library loader ld.so searches for libraries:
- If the pathname has a / they are directly loaded (PROBLEM).
- If the pathname does not contain / a search path or location can be obtained from:
  1. DT_RPATH dynamic section attribute of the ELF executable (PROBLEM).
  2. LD_LIBRARY_PATH environment variable (this can be easily set).
  3. DT_RUNPATH dynamic section attribute of the ELF executable (PROBLEM).
  4. cache file /etc/ld.so.cache (PROBLEM).
  5. default paths such as /lib64, /usr/lib64, /lib, /usr/lib (PROBLEM).

udocker: Fakechroot engine (F1) - I

The path to the loader ld.so is inside the ELF header of all executables;
- the loader is the executable that loads libraries and calls the actual executable,
- also acts as a library providing functions to dynamically load other libraries.
- the loader is provided and tightly coupled with the libc.
Is essential that executables in the container are run with the loader from the container
- as symbols and functions will not match causing crashes
- binaries, libc, other libs and ld.so must match

udocker: Fakechroot engine (F1) - II

The mode F1 enforces the use of the loader provided by the container:
- Passes it as 1st argument in exec and similar system calls shifting argv.
- Like this executables are always started by the loader of the container
- /pathname/ld-linux-x86-64.so /pathname/bin/ls
Enforcement of library locations:
- Is performed by filling in LD_LIBRARY_PATH with the container paths.
- Uses library paths extracted from the container ld.so.cache
- export LD_LIBRARY_PATH=/home/u/containers/ID/ROOT/lib64: ...
If the ELF headers of binaries contain absolute paths then host libraries may endup being loaded.

udocker: Fakechroot engine (F2) - I

The mode F2 modifies the loader binary within the container:
- A copy of the container loader is made.
- The loader binary is then edited by udocker.
- The loading from host locations /lib, /lib64 etc is disabled.
- The loading using the host ld.so.cache is disabled.
- LD_LIBRARY_PATH is renamed to LD_LIBRARY_REAL.

udocker: Fakechroot engine (F2) - II

Upon execution:
- Invocation is performed as in mode F1.
- The LD_LIBRARY_REAL is filled with library paths from the container and its ld.so.cache.
- Changes made by the user to LD_LIBRARY_PATH are intercepted
  - the pathnames are adjusted to container locations and inserted in LD_LIBRARY_REAL.
  - like this LD_LIBRARY_PATH remains untouched for the executables
  - but in practice is the LD_LIBRARY_REAL with the containers paths that is used

udocker: Fakechroot engine (F3 and F4) - I

The mode F3 modifies binaries both executables and libraries:
- The PatchELF tool was heavily modified to enable easier change of:
  - Loader location in ELF headers of executables.
  - Library path locations inside executables and libraries.
With F3 or F4 the ELF headers of container executables and libraries are edited with PatchELF:
- The loader location is changed to point to the container.
- The libraries location if absolute are changed to point to the container.
- The libraries search paths inside the binaries are changed to point to container locations.

udocker: Fakechroot engine (F3 and F4) - II

The loader no longer needs to be passed as first argument.
The libraries are always fetched from container locations.
The LD_LIBRARY_REAL continues to be used in F3 and F4.
The mode F4 adds dynamic editing of executables and libraries.
- This is useful if libraries or executables are added to a container or created as result of a compilation.

udocker: Fakechroot engine (F3 and F4) - III

Containers in modes F3 and F4 cannot be transparently moved across different systems:
- The absolute pathnames to the container locations will likely differ.
- In this case convert first to another mode before transfer.
- Or at arrival use: setup --execmode=Fn --force.

Thank you!

Questions ?

udocker@lip.pt

width:200px width:1200px

tutorials

Repository hosting several tutorials