kubernetes 信号处理机制与僵尸进程优化

容器与信号的关系

SIGTERM信号版权声明：本文遵循 CC 4.0 BY-SA 版权协议，若要转载请务必附上原文出处链接及本声明，谢谢合作！：程序结束(terminate)信号，这是用来终止进程的标准信号，也是 kill 、 killall 、 pkill 命令所发送的默认信号。与SIGKILL不同的是该信号可以被阻塞和处理。通常用来要求程序自己正常退出。shell命令kill缺省产生这个信号。SIGTERM is the default signal sent to a process by the kill or killall commands
SIGKILL信号：终止进程，杀死进程。此信号为 “必杀（sure kill）” 信号，处理器程序无法将其阻塞、忽略或者捕获，故而 “一击必杀”，总能终止程序
SIGHUP信号：当终端断开（挂机）时，将发送该信号给终端控制进程。SIGHUP 信号还可用于守护进程（比如，init 等）。许多守护进程会在收到 SIGHUP 信号时重新进行初始化并重读配置文件
SIGINT信号：当用户键入终端中断字符（通常为 Control-C ）时，终端驱动程序将发送该信号给前台进程组。该信号的默认行为是终止进程
SIGQUIT信号：当用户在键盘上键入退出字符（通常为 Control-\ ）时，该信号将发往前台进程组。默认情况下，该信号终止进程，并生成用于调试的核心转储文件。进程如果陷入无限循环，或者不再响应时，使用 SIGQUIT 信号就很合适
SIGTSTP信号：这是作业控制的停止信号，当用户在键盘上输入挂起字符（通常为 Control-Z ）时，将该信号给前台进程组，使其停止运行
当你在执行 Docker 容器时，主要执行程序(Process)的 PID 将会是 1，只要这个程序停止，容器就会跟着停止
由于容器中一直没有像systemd 或sysvinit 这类的初始化系统(init system)，少了初始化系统来管理程序，会导致当程序不稳定的时候，无法进一步有效的处理程序的状态，或是无法有效的控制Signal 处理机制
我们以docker stop 为例，这个命令实质上是对容器中的PID 1 送出一个SIGTERM 讯号，如果程序本身并没有处理Signal 的机制，就会直接忽略这类讯号，这就会导致docker stop 等了10秒之后还不结束，然后Docker Engine 又会对PID 1 送出另一个SIGKILL 讯号，试图强迫砍掉这个程序，这才会让容器彻底停下来。但因为 SIGKILL 是无法被捕获（trapped）地，所以没有办法干净地终止掉子进程。比如主程序在被终止时正在写入文件，那么该文件就会因此损坏。这就像直接拔掉了服务器的电源线一样残酷

ENTRYPOINT 与 CMD

CMD 有三种格式：

CMD [“executable”,“param1”,“param2”] （exec 格式, 推荐使用这种格式）
CMD [“param1”,“param2”] （作为 ENTRYPOINT 指令参数）
CMD command param1 param2 （shell 格式，默认 /bin/sh -c ）

ENTRYPOINT 有两种格式：

ENTRYPOINT [“executable”, “param1”, “param2”] （exec 格式，推荐优先使用这种格式）
ENTRYPOINT command param1 param2 （shell 格式）

通常都是因为容器启动入口使用了 shell，比如使用了类似/bin/sh -c my-app或/docker-entrypoint.sh这样的ENTRYPOINT或CMD，这就可能就会导致容器内的业务进程收不到SIGTERM信号，原因是：

容器主进程是 shell，业务进程是在 shell 中启动的，成为了 shell 进程的子进程，不管你 Dockerfile 用其中哪个指令，两个指令都推荐使用 exec 格式，而不是 shell 格式。原因就是因为使用 sh版权声明：本文遵循 CC 4.0 BY-SA 版权协议，若要转载请务必附上原文出处链接及本声明，谢谢合作！ell 格式之后，程序会以 /bin/sh -c 的子命令启动，并且 shell 格式下不会传递任何信号给程序。这也就导致，在 docker stop 容器的时候，以这种格式运行的程序捕捉不到发送的信号，也就谈不上优雅的关闭了
shell进程默认会处理SIGTERM信号，自己会退出，但是不会将信号传递给子进程，所以当我们以子进程启动业务就会导致业务进程不会触发停止逻辑

# 在当前进程上启动一个sleep进程
[root@VM-0-3-centos ~]# echo $$ && sleep 1000
28347

# kill 28347,这个28347为父进程，根据下面结果我们可以知道,父进程未把终止信号传递给子进程(因为sleep未被关闭)
[root@VM-0-3-centos ~]# kill 28347
[root@VM-0-3-centos ~]# ps -ef | grep slee
root      4922 28347  0 22:55 pts/1    00:00:00 sleep 1000
root      5331 29381  0 22:56 pts/2    00:00:00 grep --color=auto slee


# 我们不kill 父进程了，这次我们kill sleep自身进程
[root@VM-0-3-centos ~]# kill 4922

# 查看另一个窗口，我们可以发现shell进程本身可以处理终止信号
[root@VM-0-3-centos ~]# echo $$ && sleep 1000
28347
Terminated

当等到 K8S 优雅停止超时时间 (terminationGracePeriodSeconds，默认 30s)，发送SIGKILL强制杀死 shell 及其子进程

无法处理信号

案例一

# 执行简单的sleep命令
[root@VM-0-3-centos ~]# docker run -d --rm --name=test ubuntu:22.04 /bin/sh -c "sleep 10000"
ec604a00f360c6b52554455c0b95f638ed5d97ec2578ae03462c65a345bd795a

# 查看容器内部进程,可以发现sleep是子进程，子进程是无法接收到中止信号的
[root@VM-0-3-centos ~]# docker exec test ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 10:17 ?        00:00:00 /bin/sh -c sleep 10000   # 父进程
root         7     1  0 10:17 ?        00:00:00 sleep 10000							 # 子进程
root         8     0  0 10:18 ?        00:00:00 ps -ef

# 尝试停止这个容器，此时你会发现要等10秒，容器才会结束，其实是/bin/sh预设并不会处理(handle)讯号，所以他会把所有不认得的讯号忽略，直到作业系统把他SIGKILL为止
[root@VM-0-3-centos ~]# time docker stop test
test

real    0m10.141s
user    0m0.008s
sys     0m0.010s

案例二

shell 格式，默认 /bin/sh -c

FROM ubuntu:22.04
RUN apt-get update && apt-get -y install redis-server && rm -rf /var/lib/apt/lists/*
EXPOSE 6379
CMD "/usr/bin/redis-server"			# 以子进程方式启动进程


# 运动redis-shell容器
[root@VM-0-3-centos ~]# docker run -d --name=redis-shell redis:shell

# 查看redis日志
[root@VM-0-3-centos ~]# docker logs d5704fc16e2b27cf1fda076ca0eed80100df7f8fd4ef3d0e05487c1a6ee691fa
7:C 14 Aug 2022 10:23:21.682 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
7:C 14 Aug 2022 10:23:21.683 # Redis version=6.0.16, bits=64, commit=00000000, modified=0, pid=7, just started
7:C 14 Aug 2022 10:23:21.683 # Warning: no config file specified, using the default config. In order to specify a config file use /usr/bin/redis-server /path/to/redis.conf
7:M 14 Aug 2022 10:23:21.684 * Running mode=standalone, port=6379.
7:M 14 Aug 2022 10:23:21.684 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
7:M 14 Aug 2022 10:23:21.684 # Server initialized
7:M 14 Aug 2022 10:23:21.684 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
7:M 14 Aug 2022 10:23:21.684 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
7:M 14 Aug 2022 10:23:21.684 * Ready to accept connections


# 查看redis进程信息,可以发现redis是子进程方式启动的
[root@VM-0-3-centos ~]# docker exec redis-shell ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 10:23 ?        00:00:00 /bin/sh -c "/usr/bin/redis-server"
root         7     1  0 10:23 ?        00:00:00 /usr/bin/redis-server *:6379
root        12     0  0 10:25 ?        00:00:00 ps -ef


# 手动停止redis-shell进程，等停止完成后，我们查看redis日志，我们可以发现redis并没有主动关闭服务，而是直接被干掉了
[root@VM-0-3-centos ~]# docker stop redis-shell  			# 这里会停大概10s
redis-shell
[root@VM-0-3-centos ~]# docker logs d5704fc16e2b
7:C 14 Aug 2022 10:23:21.682 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
7:C 14 Aug 2022 10:23:21.683 # Redis version=6.0.16, bits=64, commit=00000000, modified=0, pid=7, just started
7:C 14 Aug 2022 10:23:21.683 # Warning: no config file specified, using the default config. In order to specify a config file use /usr/bin/redis-server /path/to/redis.conf
7:M 14 Aug 2022 10:23:21.684 * Running mode=standalone, port=6379.
7:M 14 Aug 2022 10:23:21.684 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
7:M 14 Aug 2022 10:23:21.684 # Server initialized
7:M 14 Aug 2022 10:23:21.684 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
7:M 14 Aug 2022 10:23:21.684 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
7:M 14 Aug 2022 10:23:21.684 * Ready to accept connections

案例三

以exec方式启动进程，但是这个进程以非exec方式启动脚本或程序(脚本和程序实际是等效的)

[root@VM-0-3-centos redis-server-exec]# cat sleep.sh 
#!/bin/bash
sleep 10000 # 等效/bin/sh -c sleep 10000  

[root@VM-0-3-centos redis-server-exec]# cat dockerfile 
FROM ubuntu:22.04
COPY sleep.sh /
CMD ["/sleep.sh"]

# 启动容器
[root@VM-0-3-centos redis-server-exec]# docker run -d --name=test redis:son
fe59095f93e2f687c67cb43e13c37011e6032882fea6d00d928eafd968aab737

# 查看进程，我们可以发现sleep 也是以子进程方式启动的，故无法对信号进行处理
[root@VM-0-3-centos redis-server-exec]# docker exec test ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 13:50 ?        00:00:00 /bin/bash /sleep.sh
root         7     1  0 13:50 ?        00:00:00 sleep 10000
root         8     0  0 13:50 ?        00:00:00 ps -ef

正常处理信号

案例一

exec 格式, 推荐使用这种格式

FROM ubuntu:22.04
RUN apt-get update && apt-get -y install redis-server && rm -rf /var/lib/apt/lists/*
EXPOSE 6379
CMD ["/usr/bin/redis-server"]			# 以当前进程方式启动进程


# 运动redis-exec容器
[root@VM-0-3-centos ~]# docker run -d --name=redis-exec redis:exec
9bed75cfdd7d9efaf0775abb7f43d1a7c13931c1c959f65c28081e9b89f7d5e8

# # 查看redis日志
[root@VM-0-3-centos ~]# docker logs 9bed75cfdd7d9efaf0775abb7f43d1a7c13931c1c959f65c28081e9b89f7d5e8
1:C 14 Aug 2022 13:39:35.582 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 14 Aug 2022 13:39:35.582 # Redis version=6.0.16, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 14 Aug 2022 13:39:35.582 # Warning: no config file specified, using the default config. In order to specify a config file use /usr/bin/redis-server /path/to/redis.conf
1:M 14 Aug 2022 13:39:35.583 * Running mode=standalone, port=6379.
1:M 14 Aug 2022 13:39:35.583 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 14 Aug 2022 13:39:35.583 # Server initialized
1:M 14 Aug 2022 13:39:35.583 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 14 Aug 2022 13:39:35.583 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
1:M 14 Aug 2022 13:39:35.584 * Ready to accept connections

# 查看redis进程信息,可以发现redis是在当前进程方式启动的
root@VM-0-3-centos ~]# docker exec redis-exec ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 13:39 ?        00:00:00 /usr/bin/redis-server *:6379
root        11     0  2 13:40 ?        00:00:00 ps -ef


# 手动停止redis-exec进程可以发现立刻停止了，我们查看redis日志，我们可以发现redis并有主动关闭服务，而不是直接被干掉了
[root@VM-0-3-centos ~]# docker stop redis-exec
redis-exec

[root@VM-0-3-centos ~]# docker logs redis-exec
1:C 14 Aug 2022 13:39:35.582 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 14 Aug 2022 13:39:35.582 # Redis version=6.0.16, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 14 Aug 2022 13:39:35.582 # Warning: no config file specified, using the default config. In order to specify a config file use /usr/bin/redis-server /path/to/redis.conf
1:M 14 Aug 2022 13:39:35.583 * Running mode=standalone, port=6379.
1:M 14 Aug 2022 13:39:35.583 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 14 Aug 2022 13:39:35.583 # Server initialized
1:M 14 Aug 2022 13:39:35.583 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
1:M 14 Aug 2022 13:39:35.583 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
1:M 14 Aug 2022 13:39:35.584 * Ready to accept connections
1:signal-handler (1692020490) Received SIGTERM scheduling shutdown...
1:M 14 Aug 2022 13:41:30.888 # User requested shutdown...
1:M 14 Aug 2022 13:41:30.888 * Saving the final RDB snapshot before exiting.
1:M 14 Aug 2022 13:41:30.896 * DB saved on disk
1:M 14 Aug 2022 13:41:30.896 # Redis is now ready to exit, bye bye...        # 我们可以发现，redis是主动退出了

案例二

以exec方式启动进程，这个进程也以exec方式启动脚本或程序(脚本和程序实际是等效的)
脚本中执行二进制

[root@VM-0-3-centos redis-server-exec]# cat sleep.sh 
#!/bin/bash
exec sleep 10000			# 以非子进程启动

[root@VM-0-3-centos redis-server-exec]# cat dockerfile 
FROM ubuntu:22.04
COPY sleep.sh /
CMD ["/sleep.sh"]

# 启动容器
root@VM-0-3-centos redis-server-exec]# docker run -d --name=test redis:son
d9c45f8d3b0f9aa7842b0e3b83a6a716310beff24b0e8c3990820e7d8b54f71a

# 查看容器进程,我们可以发现，这是以非子进程启动的，所以可以接受信号
[root@VM-0-3-centos redis-server-exec]# docker exec test ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 13:53 ?        00:00:00 sleep 10000
root         7     0  0 13:53 ?        00:00:00 ps -ef

案例三

通常我们一个容器只会有一个进程，也是 Kubernetes 的推荐做法。但有些时候我们不得不启动多个进程，比如从传统部署迁移到 Kubernetes 的过渡期间，使用了富容器，即单个容器中需要启动多个业务进程，这时也只能通过 shell 启动，但无法使用上面的 exec 方式来传递信号，因为 exec只能让一个进程替代当前 shell 成为主进程

[root@VM-0-3-centos redis-server-exec]# cat entrypoint.sh 
#!/bin/bash
sleep 100000 & pid1="$!"  # 启动一个进程,并记录pid
echo "sleep1 started with pid $pid1"

sleep 100000 & pid2="$!"  # 启动一个进程,并记录pid
echo "sleep2 started with pid $pid2"

handle_sigterm() {
  echo "[INFO] Received SIGTERM"
  kill -SIGTERM $pid1 $pid2 # 传递 SIGTERM 给业务进程
  wait $pid1 $pid2 # 等待所有业务进程完全终止
}
echo "[INFO] sleep1/sleep2 start ok"

trap handle_sigterm SIGTERM # 捕获 SIGTERM 信号并回调 handle_sigterm 函数
wait # 等待回调执行完，主进程再退出

[root@VM-0-3-centos redis-server-exec]# cat dockerfile 
FROM ubuntu:22.04
COPY entrypoint.sh  /
CMD ["/entrypoint.sh"]

# 启动容器
[root@VM-0-3-centos redis-server-exec]# docker run -d --name test test:v1
5979b214264d87592c8b3515fb47714b2c51efd2af12f36a1fffa56bc60d5b88

# 查看进程，这里我们可以看到，entrypoint.sh 启动了二个sleep子进程
[root@VM-0-3-centos redis-server-exec]# docker exec 5979b214264d ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 14:17 ?        00:00:00 /bin/bash /entrypoint.sh
root         7     1  0 14:17 ?        00:00:00 sleep 100000
root         8     1  0 14:17 ?        00:00:00 sleep 100000
root         9     0  0 14:18 ?        00:00:00 ps -ef

# 停止的时候我们可以发现，立刻被中止了，说明终止信号被正常处理了
[root@VM-0-3-centos redis-server-exec]# docker stop test
test

僵尸进程

当一个子进程终止后，它首先会变成一个失效(defunct)的进程，也称为僵尸zombie进程，等待父进程或系统收回（reap）。在Linux内核中维护了关于“僵尸”进程的一组信息（PID，终止状态，资源使用信息），从而允许父进程能够获取有关子进程的信息。如果不能正确回收“僵尸”进程，那么他们的进程描述符仍然保存在系统中，系统资源会缓慢泄露
大多数设计良好的多进程应用可以正确的收回僵尸子进程，比如NGINX master进程可以收回已终止的worker子进程。如果需要自己实现，则可利用如下方法：
利用操作系统的waitpid()函数等待子进程结束并请除它的僵死进程
由于当子进程成为“defunct”进程时，父进程会收到一个SIGCHLD信号，所以我们可以在父进程中指定信号处理的函数来忽略SIGCHLD信号，或者自定义收回处理逻辑
如果父进程已经结束了，那些依然在运行中的子进程会成为孤儿orphaned进程。在Linux中Init进程(PID1)作为所有进程的父进程，会维护进程树的状态，一旦有某个子进程成为了“孤儿”进程后，init就会负责接管这个子进程。当一个子进程成为“僵尸”进程之后，如果其父进程已经结束，init会收割这些“僵尸”，释放PID资源
Linux 中，若子进程缺失父进程，其残留资源会由 init 进程回收。但在 Docker 中，容器并非一个完整的操作系统，不会初始化 init 进程，容器中的第一个进程只是一个普通进程，所以并不会回收僵尸进程

下面我们做几个试验来验证不同的PID1进程对僵尸进程不同的处理能力

FROM ubuntu:22.04
RUN apt-get update && apt-get -y install redis-server && rm -rf /var/lib/apt/lists/*
EXPOSE 6379
CMD ["/usr/bin/redis-server"]			# 以当前进程方式启动进程

# 启动容器
[root@VM-0-3-centos redis-server-exec]# docker run -d --name=redis redis:v1
daca6972da659b99922df3fe227120b473bc22f4b28e6436e2337979c50ab706

# 在redis容器中启动一个bash进程，并创建子进程“sleep 10000”
[root@VM-0-3-centos redis-server-exec]# docker exec -it redis bash
root@daca6972da65:/# sleep 10000

# 查看进程，我们可以发现一个sleep进程是bash进程的子进程
[root@VM-0-3-centos ~]# docker exec redis ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 14:38 ?        00:00:00 /usr/bin/redis-server *:6379
root        11     0  0 14:38 pts/0    00:00:00 bash
root        19    11  0 14:38 pts/0    00:00:00 sleep 10000
root        20     0  0 14:39 ?        00:00:00 ps -ef

# 我们杀死bash进程之后查看进程列表，这时候bash进程已经被杀死。这时候sleep进程(PID为19)，虽然已经结束，而且被PID1进程（redis-server）接管，但是其没有被父进程回收，成为僵尸状态。这是因为PID1进程“redis-server”没有考虑过作为init对僵尸子进程的回收的场景
[root@VM-0-3-centos ~]# docker exec redis ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 14:38 ?        00:00:00 /usr/bin/redis-server *:6379
root        19    11  0 14:38 pts/0    00:00:00 [sleep] <defunct>
root        20     0  0 14:39 ?        00:00:00 ps -ef

FROM ubuntu:22.04
RUN apt-get update && apt-get -y install redis-server && rm -rf /var/lib/apt/lists/*
EXPOSE 6379
CMD "/usr/bin/redis-server"			# 以子进程方式启动进程

# 启动容器
[root@VM-0-3-centos redis-server-exec]# docker run -d --name=redis redis:v1
8325ff6ad59350249b3f0bdeb24e356b53a897906ae739a603fe594f518c0c38

# 在redis容器中启动一个bash进程，并创建子进程“sleep 10000”
[root@VM-0-3-centos redis-server-exec]# docker exec -it redis bash
root@8325ff6ad593:/# sleep 10000

# 查看进程，我们可以发现一个sleep进程是bash进程的子进程
[root@VM-0-3-centos ~]# docker exec redis ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 14:44 ?        00:00:00 /bin/sh -c "/usr/bin/redis-server"...# ??????????????????????????????
root         7     1  0 14:44 ?        00:00:00 /usr/bin/redis-server *:6379
root        12     0  0 14:45 pts/0    00:00:00 bash
root        20    12  0 14:45 pts/0    00:00:00 sleep 10000
root        21     0  0 14:45 ?        00:00:00 ps -ef

# 我们杀死bash进程之后查看进程列表，发现“bash”和“sleep 1000”进程都已经被杀死和回收
[root@VM-0-3-centos ~]# docker exec redis kill -9 12
[root@VM-0-3-centos ~]# docker exec redis ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 14:44 ?        00:00:00 /bin/sh -c "/usr/bin/redis-server"...# ??????????????????????????????
root         7     1  0 14:44 ?        00:00:00 /usr/bin/redis-server *:6379
root        33     0  2 14:46 ?        00:00:00 ps -ef

这是因为sh/bash等应用可以自动清理僵尸进程。简单而言，如果在容器中运行多个进程，PID1进程需要有能力接管“孤儿”进程并回收“僵尸”进程。我们可以

利用自定义的init进程来进行进程管理，比如S6，phusion myinit，dumb-init,tini等
Bash/sh等缺省提供了进程管理能力，如果需要可以作为PID1进程来实现正确的进程回收
如果我们父进程以bash/sh启动，能提供进程收割能力，防止容器出现僵尸进程，但是缺无法将终止信号传递给子进程，为此我们一般采用：dumb-init、tini作为父进程(提供进程版权声明：本文遵循 CC 4.0 BY-SA 版权协议，若要转载请务必附上原文出处链接及本声明，谢谢合作！接管能力、信号传递能力)

tini&dumb-init进程

tini 是一套更简单的init 系统，专门用来执行一个子程序(spawn a single child)，并等待子程序结束，即便子程序已经变成僵尸程序(zombie process)也能捕捉到，同时也能转送Signal 给子程序
Tini一般在容器中运行，用于生成子进程，等待它推出，reap僵尸进程，并执行信号转发
dumb-init 和 tini 都可以作为 init 进程，作为主进程 (PID 1) 在容器中启动，然后它再运行 shell 来执行我们指定的脚本 (shell 作为子进程)，shell 中启动的业务进程也成为它的子进程，当它收到信号时会将其传递给所有的子进程，从而也能完美解决 SHELL 无法传递信号问题，并且还有回收僵尸进程的能力。
如果你使用Docker 来跑容器，可以非常简便的在docker run 的时候用–init 参数，就会自动注入tini 程式(/sbin/docker-init) 到容器中，并且自动取代ENTRYPOINT 设定，让原本的程式直接跑在tini 程序底下。注意：Docker 1.13 以后的版本才开始支援 –init 参数，并内建 tini 在内
init系统有以下几个特点
- 它是系统的第一个进程，负责产生其他所有用户进程
- init 以守护进程方式存在，是所有其他进程的祖先
- 它主要负责：启动守护进程、回收孤儿进程、将操作系统信号转发给子进程

Tini

当我们docker版本>=1.13时，docker默认集成了tini

# 以init方式启动容器
[root@VM-0-3-centos ~]# docker run -d --init --name=test ubuntu:22.04 sleep 10000
203e03c6ac46f2806e648d4b3dce9efe3b70f9d239474136e6874f35885a642d

# 查看容器进程
[root@VM-0-3-centos ~]# docker exec test ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 01:37 ?        00:00:00 /sbin/docker-init -- sleep 10000
root         7     1  0 01:37 ?        00:00:00 sleep 10000
root         8     0  0 01:38 ?        00:00:00 ps -ef

# 停止容器，我们可以发现，正常情况下，sleep是作为子进程，无法正常处理信号的，需要等待10s,这里立马就停止了，说明docker-init将信号传递给子进程了
[root@VM-0-3-centos ~]# docker stop test
test

Dockerfile集成tini

FROM ubuntu:22.04
ENV TINI_VERSION v0.19.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini
# [-vvv/-vv/-v] 设置日志级别,可选参数
ENTRYPOINT ["/tini", "-vvv","--"]

# 启动程序,这里仅测试，启动命令由docker run传递进来
# CMD ["/your/program", "-and", "-its", "arguments"]

# 启动容器
[root@VM-0-3-centos ~]# docker run -d --name=test test:tini sleep 10000
ca4f0d14c341223d530bbac9a5beb0907e633e23b095eb2168ade9b6df3e6d70

# 查看容器进程
[root@VM-0-3-centos ~]# docker exec test ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 01:45 ?        00:00:00 /tini -vvv -- sleep 10000
root         7     1  0 01:45 ?        00:00:00 sleep 10000
root         8     0  0 01:46 ?        00:00:00 ps -ef

# 测试僵尸进程回收，我们另起一个窗口，启动sleep，然后将其父进程bash kill掉
[root@VM-0-3-centos ~]# docker exec -it test /bin/bash
root@ca4f0d14c341:/# sleep 10000

# 查看此时进程信息，sleep的父进程IP为14
[root@VM-0-3-centos ~]# docker exec test ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 01:45 ?        00:00:00 /tini -vvv -- sleep 10000
root         7     1  0 01:45 ?        00:00:00 sleep 10000
root        14     0  0 01:47 pts/0    00:00:00 /bin/bash
root        22    14  0 01:47 pts/0    00:00:00 sleep 10000
root        23     0  0 01:48 ?        00:00:00 ps -ef

# 我们可以发现tini进程将子进程回收了，并没有产生僵尸进程
[root@VM-0-3-centos ~]# docker exec test kill -9 14
[root@VM-0-3-centos ~]# docker exec test ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 01:45 ?        00:00:00 /tini -vvv -- sleep 10000
root         7     1  0 01:45 ?        00:00:00 sleep 10000
root        35     0  0 01:48 ?        00:00:00 ps -ef

参考文献

https://blog.miniasp.com/post/2021/07/09/Use-dumb-init-in-Docker-Container
http://www.oschina.net/translate/docker-and-the-pid-1-zombie-reaping-problem

kubernetes 信号处理机制与僵尸进程优化

容器与信号的关系

ENTRYPOINT 与 CMD

无法处理信号

案例一

案例二

案例三

正常处理信号

案例一

案例二

案例三

僵尸进程

tini&dumb-init进程

Tini

参考文献

Google SRE 二十年的经验教训

虚机网格(istio)管理实战篇

云原生混沌工程平台 – ChaosMeta

云原生落地实践指南