Recently, I was maintaining an open source project called air on the weekend. It is a hot loading code tool for Golang, which will monitor local file changes and then automatically reload.
Recently, I encountered a particularly interesting problem, that is, when using the
kill -9 pid command to kill the process, although it will kill its child process, its grandchild process will still survive.
In short, our hot loading component will run commands, and then will monitor file changes, once the file changes, it will kill the previous process, then recompile the code, and then execute the running command.
But I encountered a user who raised such a problem: https://github.com/cosmtrek/air/issues/216#issuecomment-982348931 When executing the command, use
dlv exec --accept-multiclient --log --headless --continue --listen :2345 --api-version 2 ./tmp/main to run the code and start debugging, our component will not completely kill the process, but will continue to survive. This causes the corresponding port to be occupied the next time it comes up.
ps -efj | grep "tmp/main" you can clearly see that actually running this command will start three processes
And it’s very clear to see the grandparent-child relationship of the processes:
75277 is the parent process
75280 is the child process
75281 is the grandchild process
If you just use
kill -9 pid to kill the process, its child process will also be killed, but the grandchild process will still survive.
You can see that only the 75281 process is left, and the parent process of this process has now become 1, an orphan process. It’s really an orphan.
If this process continues to occupy the port, it will prevent the command from being executed normally the next time.
After consulting various materials, I found a good solution: use the pgid parameter to allow the processes in the process group to share a process group number.
You can see that the third column corresponds to the pgid. Although the pgid we start with the command is different, we can use Golang to set the process group number, so that we can share the process group number.
At the same time, when killing the process, you also need to use this pgid parameter, so that you can kill the corresponding process group. You can refer to
Negative PID values may be used to choose whole process groups; see the PGID column in ps command output.
That is, for pid, it represents PGID, which is the entire process group. When killing, it will kill all processes in the process group.
Although it is impossible to share processes in the above command, for this bug, we can use
Setpgid to enable PGID, so that the started process can share the process group number. At the same time, use
syscall.Kill(-pgid, 15) to kill the process group.
Add relevant unit tests in the unit test to ensure that the behavior of
kill all child processes will not be lost due to iteration. https://github.com/cosmtrek/air/commit/1c27effe33a180f3fbbcee8f2d9ea7122d89a50b#diff-6266cec6be43e607de84d431f656ea78fac62405058d84312d9c12f3f52c7462R146
Finally, I would like to share with you some good articles that I have been reading recently. I thought about sending them in a weekly newsletter, but because I read them sporadically, I decided to put them at the end of each blog post. I hope you find them useful!
The Pomodoro Technique for Intermittent High Efficiency After reading an article in a news letter, I discovered a crucial trick: If you set a Pomodoro timer for a task, but finish early, say you’re taking notes for a chapter in a book, but you finish early - you shouldn’t immediately move on to the next task, or end the Pomodoro timer early.
Useful commands for checking dependencies between two Golang modules
- go mod graph
go mod why -m "module"
LastMod 2023-10-08 (05461cf)