Contents

Debugging Docker Container Networking Issues with nsenter

😗Translated Content😗

This article is machine translated which hasn’t been proofreaded by the author. The info it contains may be inaccurate. The author will do his best to get back (when he has time) and revise these articles. 🥰

For Chinese version of this article, see here.

Docker is a good thing, we all know it. Its advantage is “isolation”, all dependencies, listening network ports, and junk files generated by running are isolated in Docker containers, and application data, logs, and server level ports are exposed by Docker in a unified form, which is simple and easy to maintain.

But its disadvantage is also “isolation”. In the production environment, we certainly think that the cleanest environment is the best, but once some bugs are exposed, we need to operate on the services in the container, such as writing Python, modifying database content, and debugging service connectivity., it’s time for a headache. In the container, < gt r = “6 “/> 、 < gt r =” 7 “/> such a luxurious tool naturally can not be expected, and some containers even < gt r =” 8 “/> 、 < gt r = “9”/> this The most basic command line tools are not available. How to do it?

Generally speaking, we have two options. One is to expose the server level port and connect the exposed port outside the container for operation. The second is to install the debugging tools and operating environment we need in the container, or bind mount the necessary tools to operate inside the container. But these two methods have their own advantages and disadvantages. So today, let’s try ** the third option ** - use the < gt r = “10”/> tool to let the program on the host “borrow” the running context of the Docker container [^ doc-ns] (mainly It is the network and process namespace), breaking the boundaries between inside and outside the container, so that developers can use the tools on the host to quickly solve problems.

At the same time, the article will also share a small script I wrote to gracefully solve the problem of ** container intranet domain name resolution ** through a special mechanism provided by our host glibc. I dare to guarantee that for this problem, mine is definitely the second best solution in the Eastern Hemisphere.

Existing problems

We have all used Docker for many years, and some people here will have questions. Isn’t it just a Docker container, < gt r = “12”/>, no matter if it is a Web or something, the port can’t be mapped out, why is it not fragrant?

Of course it smells good. But it can only be cool for a while, not for a lifetime. On the host below, there are 16 Web services alone. In fact, Web services are good. We can also use traefik to manage uniformly, and manage backend service mapping through different domain names. In addition, different services contain different dependencies. Through docker-compose, different services are built in isolated internal networks. Postgres: 5432 in the intranet of service A, the database connected to and the postgres of service B: 5432 Of course, it is different. All exposed have a great impact on security, isolation, and management difficulty. What should I do if the port number and default port are maliciously scanned? What should I do if the port numbers conflict? Just write a port number, and who remembers the corresponding ports of twenty or so services? If it is a public network VPS, this problem is even bigger. And like ES, HDFS and other services that rely on the host IP to ensure normal function, when the * master node * instructs the client side to connect < gt r = “13”/>, it is provided to the client side with an intranet IP. In this case, even if it is mapped Out of the port, what’s the use?

/2020/docker-service-management-w-nsenter/00.png

The above is for < gt r = “15”/> mentioned in the preface, now let’s talk about < gt r = “16”/>. The problem with this method is even more obvious. ** You never know how weird the next docker image you encounter will be **. Not all containers have package managers such as apt-get, apk, etc. Some containers don’t even have bash, busybox, or even libc. Even if the configured tool directory is bound mounted, whether it can run or not is a problem. From a development point of view, I use the most VSCode every day. If I just connect to a Linux server to write code, VSCode Remote is a good choice, but if all dependencies are packed into containers, Even if the files inside the container can be exposed, but there is no addition of Language Server, no code completion, API documentation, and syntax analysis, then VSCode can only be a bare editor, which will also have a great impact on work efficiency. influence.

For example, the screenshot below is the contents of the < gt r = “17”/> mirror 1. As you can see, in addition to the program body < gt r = “18”/>, there are only CA preset certificate lists, passwd files and time zone databases. There is no libc, no package manager, not even busybox, and therefore no shell.

/2020/docker-service-management-w-nsenter/01.png

The secret here is a line in the Dockerfile, the special usage of < gt r = “20”/> [^ scratch]. We won’t go into the principle here, we just need to know that when developers are complacent about the size of the image they build, there are tears of bitterness from users behind them.

Personally, I like using Python very much, but for container environments, especially other people’s containers, Python is a very cumbersome thing. Install Python, install pip, and then install the required libraries through pip. Some native packages also need to rely on gcc. For different base containers, we have to adapt… Should we have a one-and-done approach?

What is namespace

Namespace [^ namespace] is a system resource isolation mechanism provided by the Linux kernel. It is the cornerstone of Docker to realize its functions. The namespace mechanism provides isolation of 8 kinds of system resources, of which we are most concerned about the changes of the network (Network) namespace and the mount (Mount) namespace in different process contexts. Docker uses these two namespaces to isolate the network interface and file system visible to the process in the container from the host. The network interface mentioned here includes not only virtual bridges such as < gt r = “21”/> and < gt r = “22”/> created by docker, but also local connection loopback, that is, < gt r = “23”/>.

Nsenter example

Let’s demonstrate through the following example what happens to the process after changing the process namespace through the nsenter [^ nsenter] command. From the host’s point of view, for each process, its namespace information can be represented by files in the < gt r = “24”/> directory in the proc filesystem, and each namespace is represented as a file descriptor. After calling the command with nsenter, we can see that multiple namespaces of the < gt r = “25”/> process have indeed changed. The specific performance is the id change in the square brackets behind. At the same time, this id will not change with the multiple execution of the command.

/2020/docker-service-management-w-nsenter/02.png

Use nsenter to access the container intranet

From the manpage [^ nsenter] of nsenter, we can know that we only need to know the PID of the target process to enter its namespace through nsenter. However, it should also be noted that users with access rights are restricted by the Linux capabilities mechanism [^ cap] (the same access restrictions as the Linux filesystem). Generally speaking, only the root user or the user with < gt r = “27”/> Users with permission can access the < gt r = “28”/> directory of other users, and the owner of this directory is the same as the owner of the process corresponding to the PID in the path. As long as we run nsenter with the < gt r = “29”/> command, we don’t need to worry about these problems.

How to get this PID? It is definitely impossible to run commands such as < gt r = “30”/> from the inside of the container. From the inside of the container, the PID must be 1. But outside the container, through the < gt r = “31”/> command, you can get the PID corresponding to the process with PID 1 on the host in a container.

docker inspect -f '{{.State.Pid}}' $CONTAINER_NAME

String these commands together, we can verify the effect.

sudo nsenter --all -t "$(docker inspect -f '{{.State.Pid}}' "$CONTAINER_NAME")" $COMMAND

/2020/docker-service-management-w-nsenter/03.png

Disadvantages of nsenter access method

It should be noted that in the above example I am running < gt r = “33”/>, giving ** all namespaces ** of the processes in the container to the command we are about to run, including the mount namespace. This means that when the process is loaded, it will ** load all the required files and function libraries from inside the ** container, so in the above example, I am actually running the < gt r = “34”/> command that comes with the huginn_web_1 container, not on the host, which is inconsistent with the purpose of this article - we just don’t want to change the container to do this.

However, all process management tools, including < gt r = “35 “/> 、 < gt r =” 36 “/>, etc., depend on the contents of the < gt r =” 37 “/> directory. Therefore, if you want to use this type of tool, you can only use the other methods mentioned in the preface. If it were me, I would probably statically compile a htop or busybox, and then < gt r =” 38 “/> go in.

Therefore, what can be done with nsenter is mainly intranet penetration. For the management of processes and file modification in the container, it may be necessary to combine other methods.

Container intranet penetration example

Let’s take a daily operation as an example to explain step by step how to easily and quickly access the Docker container network and perform service maintenance outside the container. In this example, a Huginn service is built using compose file, and its network topology is as follows. In the web container, each service in the intranet can be accessed through the container name + port of each service such as < gt r = “39 “/> 、 < gt r =” 40 “/>.

/2020/docker-service-management-w-nsenter/04.png

In this example, I wrote a Python script to retrieve data from a PostgreSQL database and output it to ElasticSearch. There are two containers connected to both the PG database network (backend) and the ES database network (es). Here, we choose web The container is the target of < gt r = “42”/>. The Python library psycopg2 for operating PG is a Python module written in pure C language, and the installation relies on the gcc compiler, so I installed it into a pyenv virtual environment on the host machine.

The content of the code is to process the data in the PG and store it in ES, which will not be described in detail here. Our general idea is to use nsenter to enable the Python process to connect to the database service of the container intranet. Then, when configuring the server connection in the code, it is still necessary to specify the IP address of a container intranet. Where does this address come from?

/2020/docker-service-management-w-nsenter/05.png

It can be seen that the address of the PG database is < gt r = “44”/>, we can call python in pyenv with the following command to verify whether the psycopg2 installed on the host can connect to the database on the intranet.

sudo nsenter --net -t "$(docker inspect -f '{{.State.Pid}}' huginn_web_1)" "$(pyenv prefix web)/bin/python"

By comparison, it can be found that if the connection fails, psycopg2 will throw an exception after the < gt r = “45”/> command is executed, so this IP is available.

/2020/docker-service-management-w-nsenter/06.png

But there is still a question, how can I hardcode IP in by writing code? This is too ugly. The network segment of the docker network is reset after < gt r = “47”/>. After the service is restarted, neither the IP nor the network segment can be guaranteed to be the same as before. We need to think of a way to make the DNS resolution rules inside the container also apply to programs outside our container.

Tips for DNS Resolution in Container Intranet

In the previous section, we looked at the contents of < gt r = “48”/> in the container, and found that the DNS used to resolve the domain name in the container is < gt r = “49”/>, which is a DNS service embedded in Docker, its specific implementation can be found in a Stackoverflow answer [^ ddns]. But we can’t use this DNS address directly, because even Docker itself doesn’t have a better way to override the execution process of domain name resolution, so we can only mount it in the container namespace to overwrite < gt r = “50”/> content. To sum up, there are only the following methods to modify the DNS of a single process.

  1. Create a new mount namespace and overwrite that file with a custom resolv.conf.
  2. Modify resolv.conf files in the system.
  3. Hook off fopen [^ hook] or getaddrinfo/gethostbyname [^ hostalias] in libc.

Still not elegant. However, if method 3 can be implemented, it should be simpler to operate than the method I mentioned below. I will try some more in the future.

In this case, we can use the HOSTALIASES mechanism [^ hostalias] [^ hostname] [^ ha-sof1] that comes with glibc to construct a list of domain name aliases similar to the hosts file, and then use the name < gt r = “51”/> The environment variables are passed to the target process, which can be realized in disguise so that the application on the host can resolve the IP and domain inside the container.

As for this list of domain aliases, we can use the < gt r = “52”/> command with the < gt r = “53”/> command to query in batches from the container and finally generate a file. For this requirement I wrote the following small script.

#!/bin/bash

CONTAINER="$1"
PID="$(docker inspect -f '{{.State.Pid}}' "$CONTAINER")"
PROVIDER="xip.io"
# PROVIDER="traefik.me"


if [ -z "$PID" ]; then
    >&2 echo "usage: $0 <container name>"
    exit 1
fi

while IFS='' read -r line; do
    IP="$(sudo nsenter --net -t "$PID" dig +short $line @127.0.0.11)"
    [ -z "$IP" ] && echo "wtf? I get nothing for $line"
    echo "$line $IP.$PROVIDER"
done

As for the usage method, you only need to pass the domain name to be resolved to this script through stdin, specify the name of the target container by running the parameter, and save its output.

$ ./hostalias.sh huginn_web_1 > /tmp/huginn_web.hostalias <<EOF
postgres
elasticsearch
EOF
$ cat /tmp/huginn_web.hostalias
postgres 172.27.0.2.traefik.me
elasticsearch 172.26.0.3.traefik.me

One thing to note is that the mechanism of < gt r = “54”/> is to convert a ** query for one domain name ** to a ** query for another domain name **, which ** is different from the hosts file **. Therefore, in a line of the file, both fields before and after the space ** must be the domain name **. Therefore, we need a public pan-domain name resolution service to provide us with the ability to convert arbitrary IP to domain names [^ ha-sof2]. A few of the better services I know of include xip.io and traefik.me. The effect of this parsing service is obvious from the example above, or you can click on their website for instructions.

With this < gt r = “55”/> file, we can write the docker intranet domain name < gt r = “56”/> or < gt r = “57”/> directly into the script, and then Let it convert the IP by itself through the mapping relationship of this file. No need to care about IP anymore.

As for the parameter configuration and running of the final script, it becomes unpretentious and boring.

/2020/docker-service-management-w-nsenter/07.png

/2020/docker-service-management-w-nsenter/08.png

Conclusion

As I mentioned earlier, this is the second most useful solution in the Eastern Hemisphere? right. Originally I wrote * best to use *. But in the process of writing the article, after summarizing and thinking, I think that inserting a hook through < gt r = “60”/>, modifying resolv.conf changing DNS to < gt r = “61”/>, and then cooperating with nsenter should be the most convenient way. After all, using HOSTALIAS needs to be reconfigured once for each docker network. Even if it is convenient to use my script, it is still necessary to enter one more line of commands after all, but for lazy people, it is fine. These workarounds are already much more convenient than the ones we were familiar with before.

Cover image:

/2020/docker-service-management-w-nsenter/09.png

A slightly more normal cover image:

/2020/docker-service-management-w-nsenter/10.jpg

<! – The original text was first published on < gt r = “64”/> and personal official account: rabyte – >


  1. imageproxy/Dockerfile at main · willnorris/imageproxy < gt r = “66”/> ↩︎