A tale of zombie processes in a docker container
While testing locally our product, after interacting with it for a while my machine cpu usage will spike over 95% for a long while, sometimes becoming useless for long periods. After finding out the usage pattern that would lead to this cpu usage I searched for the reason.
System description
While I would not give too much info about the product, I will say that the particular component is a worker which processes a task queue with 4 threads. It runs locally inside a docker container and it uses a very simplified version of the production code to reduce waiting times (among other reasons).
Symptoms
This problem was not reported on production and it became only visible when running a specific task multiple times simultaneously. Once the tasks would finish, the cpu usage will come down again. By repeating this cycle of triggering the tasks and waiting for them to finish, the cpu usage would grow higher each iteration, and taking longer to recover, to the point were it would not recover. By restarting the container the system will go back to the first stage.
Debugging process
The first step was to check that it was actually this testing container which was requesting the cpu. For that I monitored the container with the command below:
while true; do docker stats --no-stream | grep testing_container | tee --append stats.txt; sleep 1; done
That would let me see the stats on the terminal as at the same time it will log them into a file. Which looked something like:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
07085618e6f9 testing_container 0.08% 1.089GiB / 15.29GiB 7.12% 0B / 0B 537MB / 4.43MB 37
07085618e6f9 testing_container 0.40% 1.089GiB / 15.29GiB 7.12% 0B / 0B 537MB / 4.43MB 37
07085618e6f9 testing_container 174.94% 1.185GiB / 15.29GiB 7.75% 0B / 0B 541MB / 4.46MB 53
07085618e6f9 testing_container 459.66% 1.942GiB / 15.29GiB 12.70% 0B / 0B 541MB / 4.46MB 213
07085618e6f9 testing_container 458.73% 2.198GiB / 15.29GiB 14.37% 0B / 0B 541MB / 4.46MB 213
07085618e6f9 testing_container 542.71% 2.099GiB / 15.29GiB 13.73% 0B / 0B 541MB / 4.46MB 213
07085618e6f9 testing_container 538.65% 2.139GiB / 15.29GiB 13.99% 0B / 0B 541MB / 4.46MB 213
07085618e6f9 testing_container 495.53% 2.073GiB / 15.29GiB 13.55% 0B / 0B 541MB / 4.46MB 213
07085618e6f9 testing_container 489.07% 2.126GiB / 15.29GiB 13.90% 0B / 0B 541MB / 4.46MB 213
...
So, yes, as assumed, the docker container was sucking the cpu.
Checking container processes
Once the testing container was found guilty, the next step was to discover what inside it made it faulty. So on a similar manner I logged the processes inside the container while reproducing the error. This command had to run inside the container and you will note a filter for the task queue processor:
while true; do top -b | grep celery | tee --append top.txt; sleep 1; done
Which gave the following output (note that it is truncated):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2248 root 20 0 1267016 232920 109484 R 48.3 1.5 0:01.45 <faulty-process-name-a>
2251 root 20 0 1267016 234828 111320 R 47.7 1.5 0:01.43 <faulty-process-name-a>
2243 root 20 0 1267016 232912 109480 R 46.3 1.5 0:01.39 <faulty-process-name-a>
2253 root 20 0 1264968 233488 111400 R 46.0 1.5 0:01.38 <faulty-process-name-a>
2179 root 20 0 4360 3124 2872 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2181 root 20 0 4360 3220 2968 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2183 root 20 0 4360 3216 2968 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2209 root 20 0 4360 3292 3044 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2251 root 20 0 1751036 422712 165728 R 132.2 2.6 0:05.41 <faulty-process-name-a>
...
From this output it was seen an increasing number of process with faulty process name. The list will grow each time the tasks queue was triggered.
These processes should terminate after the task finishes. But for some reason they stayed on an interruptible sleep state.
The process in particular runs an external software dependency which does some image rendering, a cpu intensive task.
Talking to other teams which heavily use this dependency locally, none of them had this issue before. I did run some tests locally and outside of the container. To my surprise all processes terminated as expected.
Research findings
Here I took a small guess. Looking into the differences between my system and the docker container, one of the things I could tell is that there is no systemd running on the container.
Searching online for zombie processes in docker containers you can quickly arrive to tini. Tini spawns a process with PID 1 (tini) which acts as a process reaper and handles external signals. His author did a very nice post with more information about its internals.
According to it, my docker container had no way to reap zombie processes. Could this be related?
Solving the problem
I modified the docker command to run with tini. This can be done by passing the –init flag to docker run
But wait, there was not a single zombie process in the logs. Was this solving our issue?
Going back to the logs
I went back to further analyze the output of the top command above. This time I focused rather at the last entries:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
...
2243 root 20 0 1797372 402416 163856 R 130.3 2.5 0:16.44 <faulty-process-name-a>
2253 root 20 0 1790204 422508 165808 R 127.7 2.6 0:16.46 <faulty-process-name-a>
2248 root 20 0 1804540 438272 163596 R 126.3 2.7 0:16.59 <faulty-process-name-a>
2251 root 20 0 1784060 426068 165812 R 124.7 2.7 0:16.73 <faulty-process-name-a>
2179 root 20 0 4360 3124 2872 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2181 root 20 0 4360 3220 2968 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2183 root 20 0 4360 3216 2968 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2209 root 20 0 4360 3292 3044 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2243 root 20 0 1797372 406316 163856 R 122.9 2.5 0:20.14 <faulty-process-name-a>
2248 root 20 0 1804540 439880 163596 S 122.9 2.7 0:20.29 <faulty-process-name-a>
2253 root 20 0 1790204 424976 165808 R 122.6 2.7 0:20.15 <faulty-process-name-a>
2251 root 20 0 1784060 401904 165812 R 119.9 2.5 0:20.34 <faulty-process-name-a>
2179 root 20 0 4360 3124 2872 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2181 root 20 0 4360 3220 2968 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2183 root 20 0 4360 3216 2968 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2209 root 20 0 4360 3292 3044 S 0.0 0.0 0:00.00 <faulty-process-name-b>
2243 root 20 0 1797372 398724 163856 R 131.0 2.5 0:24.07 <faulty-process-name-a>
Here I could discern a pattern. The number of process was increasing by 4 each time I run the task that caused the issue. The task in particular was asking the docker container to render four different images at the same time. Since the worker had 4 different threads, 4 processes were expected to work at the same time.
Looking closely, the running time for the processes increases with about 4 mins difference. That was exactly the time I took between different runs.
How could tini have fixed our problem?
Well, lets look at the PIDs. If we look into the PIDs, there are only 8 PIDs which repeated themselves. Therefore we had not really a problem of an increased amount of processes. I had misinterpreted the logs. The error I did was to think that the number of processing was increasing, while what was happening is that the command I used to store the logs was attaching them to the same file.
Lessons learnt
- Image rendering is a very intensive process.
- Docker lacks a reaping process by default. Adding it is rather simple.