Troubleshooting High CPU Usage: Using Kill Proc to Fix Unresponsive Applications

Automating Kill Proc: Scripts to Monitor and Restart Failed Services

Maintaining reliable services often means detecting failures and restarting processes automatically. This guide shows practical, safe ways to monitor processes, kill stuck or misbehaving ones, and restart services using short scripts and tools on Linux and macOS (Windows notes at the end). Examples use common shell utilities so you can adapt them to your environment.

Goals

  • Detect unresponsive or resource-hogging processes.
  • Gracefully stop or forcefully kill when necessary.
  • Restart services and notify operators.
  • Run checks periodically or as a lightweight watchdog.

Principles and safety

  • Prefer graceful shutdowns (SIGTERM) before forceful kills (SIGKILL).
  • Confirm process identity to avoid killing the wrong PID (match by owner, command, and start time).
  • Add logging and rate limits to avoid restart loops.
  • Test scripts in staging before production.

Simple monitor + restart (systemd-friendly)

Use systemd for services where possible—systemd already handles restarts robustly. For small custom daemons not managed by systemd, use this script to detect failure and restart.

Script: monitor-restart.sh

bash

#!/usr/bin/env bash SERVICE_CMD=”/usr/local/bin/mydaemon” SERVICE_NAME=“mydaemon” LOGFILE=”/var/log/\({SERVICE_NAME}</span><span class="token" style="color: rgb(163, 21, 21);">-watch.log"</span><span> </span><span></span><span class="token assign-left" style="color: rgb(54, 172, 170);">MAX_RESTARTS</span><span class="token" style="color: rgb(57, 58, 52);">=</span><span class="token" style="color: rgb(54, 172, 170);">5</span><span> </span><span></span><span class="token assign-left" style="color: rgb(54, 172, 170);">RESTART_WINDOW</span><span class="token" style="color: rgb(57, 58, 52);">=</span><span class="token" style="color: rgb(54, 172, 170);">300</span><span></span><span class="token" style="color: rgb(0, 128, 0); font-style: italic;"># seconds</span><span> </span><span></span><span class="token assign-left" style="color: rgb(54, 172, 170);">PIDFILE</span><span class="token" style="color: rgb(57, 58, 52);">=</span><span class="token" style="color: rgb(163, 21, 21);">"/var/run/</span><span class="token" style="color: rgb(54, 172, 170);">\){SERVICE_NAME}.pid” log(){ echo \((</span><span class="token" style="color: rgb(57, 58, 52);">date</span><span class="token" style="color: rgb(54, 172, 170);"> --iso-8601</span><span class="token" style="color: rgb(57, 58, 52);">=</span><span class="token" style="color: rgb(54, 172, 170);">seconds</span><span class="token" style="color: rgb(54, 172, 170);">)</span><span class="token" style="color: rgb(163, 21, 21);"> </span><span class="token" style="color: rgb(54, 172, 170);">\) >>\(LOGFILE</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">}</span><span> </span> <span></span><span class="token" style="color: rgb(0, 128, 0); font-style: italic;"># Ensure only one watcher runs</span><span> </span><span></span><span class="token builtin" style="color: rgb(43, 145, 175);">exec</span><span> </span><span class="token file-descriptor" style="color: rgb(238, 153, 0); font-weight: bold;">9</span><span class="token" style="color: rgb(57, 58, 52);">></span><span>/var/lock/</span><span class="token" style="color: rgb(54, 172, 170);">\){SERVICE_NAME}.lock if ! flock -n 9; then log “Watcher already running, exiting.” exit 1 fi # Track restarts (simple file-based) touch /var/run/\({SERVICE_NAME}</span><span>.restarts </span><span></span><span class="token assign-left" style="color: rgb(54, 172, 170);">now</span><span class="token" style="color: rgb(57, 58, 52);">=</span><span class="token" style="color: rgb(54, 172, 170);">\)(date +%s) # prune old entries awk -v now=\(now</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span> -v </span><span class="token assign-left" style="color: rgb(54, 172, 170);">w</span><span class="token" style="color: rgb(57, 58, 52);">=</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);">\)RESTART_WINDOW ‘BEGIN{ORS=” “} { if (now - \(1 <= w) print \)0 }’ /var/run/\({SERVICE_NAME}</span><span>.restarts </span><span class="token" style="color: rgb(57, 58, 52);">></span><span> /var/run/</span><span class="token" style="color: rgb(54, 172, 170);">\){SERVICE_NAME}.restarts.tmp mv /var/run/\({SERVICE_NAME}</span><span>.restarts.tmp /var/run/</span><span class="token" style="color: rgb(54, 172, 170);">\){SERVICE_NAME}.restarts is_running(){ if [ -f \(PIDFILE</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">]</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">then</span><span> </span><span> </span><span class="token assign-left" style="color: rgb(54, 172, 170);">pid</span><span class="token" style="color: rgb(57, 58, 52);">=</span><span class="token" style="color: rgb(54, 172, 170);">\)(cat \(PIDFILE</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);">)</span><span> </span><span> </span><span class="token" style="color: rgb(0, 0, 255);">if</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">kill</span><span> -0 </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);">\)pid 2>/dev/null; then return 0 fi fi return 1 } start_service(){ log “Starting \(SERVICE_NAME</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span> </span><span> </span><span class="token" style="color: rgb(54, 172, 170);">\)SERVICE_CMD &> /var/log/\({SERVICE_NAME}</span><span>.out </span><span class="token" style="color: rgb(57, 58, 52);">&</span><span> </span><span> </span><span class="token builtin" style="color: rgb(43, 145, 175);">echo</span><span> </span><span class="token" style="color: rgb(54, 172, 170);">\)! > \(PIDFILE</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span> </span><span> </span><span class="token builtin" style="color: rgb(43, 145, 175);">echo</span><span> </span><span class="token" style="color: rgb(54, 172, 170);">\)(date +%s) >> /var/run/\({SERVICE_NAME}</span><span>.restarts </span><span></span><span class="token" style="color: rgb(57, 58, 52);">}</span><span> </span> <span></span><span class="token function-name" style="color: rgb(57, 58, 52);">stop_service</span><span class="token" style="color: rgb(57, 58, 52);">(</span><span class="token" style="color: rgb(57, 58, 52);">)</span><span class="token" style="color: rgb(57, 58, 52);">{</span><span> </span><span> </span><span class="token" style="color: rgb(0, 0, 255);">if</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">[</span><span> -f </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);">\)PIDFILE ]; then pid=\((</span><span class="token" style="color: rgb(57, 58, 52);">cat</span><span class="token" style="color: rgb(54, 172, 170);"> </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(163, 21, 21);">\)PIDFILE) if kill -TERM \(pid</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span> </span><span class="token file-descriptor" style="color: rgb(238, 153, 0); font-weight: bold;">2</span><span class="token" style="color: rgb(57, 58, 52);">></span><span>/dev/null</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">then</span><span> </span><span> log </span><span class="token" style="color: rgb(163, 21, 21);">"Sent SIGTERM to </span><span class="token" style="color: rgb(54, 172, 170);">\)pid # wait up to 10s for i in {1..10}; do if ! kill -0 \(pid</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span> </span><span class="token file-descriptor" style="color: rgb(238, 153, 0); font-weight: bold;">2</span><span class="token" style="color: rgb(57, 58, 52);">></span><span>/dev/null</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">then</span><span> </span><span class="token builtin" style="color: rgb(43, 145, 175);">break</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">fi</span><span> </span><span> </span><span class="token" style="color: rgb(57, 58, 52);">sleep</span><span> </span><span class="token" style="color: rgb(54, 172, 170);">1</span><span> </span><span> </span><span class="token" style="color: rgb(0, 0, 255);">done</span><span> </span><span> </span><span class="token" style="color: rgb(0, 0, 255);">if</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">kill</span><span> -0 </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);">\)pid 2>/dev/null; then kill -KILL \(pid</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span> </span><span class="token file-descriptor" style="color: rgb(238, 153, 0); font-weight: bold;">2</span><span class="token" style="color: rgb(57, 58, 52);">></span><span>/dev/null </span><span class="token" style="color: rgb(57, 58, 52);">&&</span><span> log </span><span class="token" style="color: rgb(163, 21, 21);">"Sent SIGKILL to </span><span class="token" style="color: rgb(54, 172, 170);">\)pid fi fi rm -f \(PIDFILE</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span> </span><span> </span><span class="token" style="color: rgb(0, 0, 255);">fi</span><span> </span><span></span><span class="token" style="color: rgb(57, 58, 52);">}</span><span> </span> <span></span><span class="token" style="color: rgb(0, 128, 0); font-style: italic;"># Main</span><span> </span><span></span><span class="token" style="color: rgb(0, 0, 255);">if</span><span> is_running</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">then</span><span> </span><span> log </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);">\)SERVICE_NAME already running” exit 0 fi # check restart rate restarts=\((</span><span class="token" style="color: rgb(57, 58, 52);">wc</span><span class="token" style="color: rgb(54, 172, 170);"> -l </span><span class="token" style="color: rgb(57, 58, 52);"><</span><span class="token" style="color: rgb(54, 172, 170);"> /var/run/\){SERVICE_NAME}.restarts) if [ \(restarts</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span> -ge </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);">\)MAX_RESTARTS ]; then log “Too many restarts (\(restarts</span><span class="token" style="color: rgb(163, 21, 21);"> in last </span><span class="token" style="color: rgb(54, 172, 170);">\)RESTART_WINDOW s). Not restarting.” exit 2 fi startservice

Usage: configure a cron job to run every minute or run as a lightweight systemd timer. This script logs, rate-limits restarts, and prefers SIGTERM then SIGKILL.

Monitor by resource usage (CPU / memory) and kill if over threshold

Use a small script to find processes by name using ps, check CPU or RSS, and restart when thresholds exceeded.

Script: kill-on-high-resources.sh

bash

#!/usr/bin/env bash TARGET_CMD=“mydaemon” CPU_LIMIT=80.0 # percent MEM_LIMIT=$((10241024)) # KiB, e.g., 1 GiB = 1048576 KiB for pid in \((</span><span class="token" style="color: rgb(54, 172, 170);">pgrep -f </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(163, 21, 21);">\)TARGET_CMD); do read pid cmd <<< \((</span><span class="token" style="color: rgb(57, 58, 52);">ps</span><span class="token" style="color: rgb(54, 172, 170);"> -p \)pid -o pid= -o comm=) cpu=\((</span><span class="token" style="color: rgb(57, 58, 52);">ps</span><span class="token" style="color: rgb(54, 172, 170);"> -p \)pid -o %cpu= | awk ’{print \(1}'</span><span class="token" style="color: rgb(54, 172, 170);">)</span><span> </span><span> </span><span class="token assign-left" style="color: rgb(54, 172, 170);">rss</span><span class="token" style="color: rgb(57, 58, 52);">=</span><span class="token" style="color: rgb(54, 172, 170);">\)(awk ’/VmRSS/ {print \(2}'</span><span class="token" style="color: rgb(54, 172, 170);"> /proc/\)pid/status 2>/dev/null || echo 0) # compare CPU (float) or memory (int) cpu_exceeded=\((</span><span class="token" style="color: rgb(57, 58, 52);">awk</span><span class="token" style="color: rgb(54, 172, 170);"> -v </span><span class="token assign-left" style="color: rgb(54, 172, 170);">c</span><span class="token" style="color: rgb(57, 58, 52);">=</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(163, 21, 21);">\)cpu -v lim=\(CPU_LIMIT</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);"> </span><span class="token" style="color: rgb(163, 21, 21);">'BEGIN{print (c>lim)?1:0}'</span><span class="token" style="color: rgb(54, 172, 170);">)</span><span> </span><span> </span><span class="token assign-left" style="color: rgb(54, 172, 170);">mem_exceeded</span><span class="token" style="color: rgb(57, 58, 52);">=</span><span class="token" style="color: rgb(54, 172, 170);">\)([ \(rss</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);"> -gt </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(163, 21, 21);">\)MEM_LIMIT ] && echo 1 || echo 0) if [ \(cpu_exceeded</span><span class="token" style="color: rgb(163, 21, 21);">"</span><span> -eq </span><span class="token" style="color: rgb(54, 172, 170);">1</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">]</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">||</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">[</span><span> </span><span class="token" style="color: rgb(163, 21, 21);">"</span><span class="token" style="color: rgb(54, 172, 170);">\)mem_exceeded -eq 1 ]; then echo \((</span><span class="token" style="color: rgb(57, 58, 52);">date</span><span class="token" style="color: rgb(54, 172, 170);">)</span><span class="token" style="color: rgb(163, 21, 21);"> Killing </span><span class="token" style="color: rgb(54, 172, 170);">\)pid (\(cmd</span><span class="token" style="color: rgb(163, 21, 21);">) cpu=</span><span class="token" style="color: rgb(54, 172, 170);">\)cpu rss=\({rss}</span><span class="token" style="color: rgb(163, 21, 21);">KiB"</span><span> </span><span> </span><span class="token" style="color: rgb(57, 58, 52);">kill</span><span> -</span><span class="token environment" style="color: rgb(54, 172, 170);">TERM</span><span> </span><span class="token" style="color: rgb(54, 172, 170);">\)pid sleep 5 if kill -0 \(pid</span><span> </span><span class="token file-descriptor" style="color: rgb(238, 153, 0); font-weight: bold;">2</span><span class="token" style="color: rgb(57, 58, 52);">></span><span>/dev/null</span><span class="token" style="color: rgb(57, 58, 52);">;</span><span> </span><span class="token" style="color: rgb(0, 0, 255);">then</span><span> </span><span> </span><span class="token" style="color: rgb(57, 58, 52);">kill</span><span> -KILL </span><span class="token" style="color: rgb(54, 172, 170);">\)pid fi # optionally restart via systemd or custom command: # systemctl restart mydaemon.service fi done

Notes:

  • /proc parsing works on Linux. Use ps+vmmap or vmstat on macOS.
  • Match by command line (-f) and consider owner checks (UID) to avoid killing system processes.

Use a supervisor (recommended)

Rather than reinventing, use established supervisors:

  • systemd (Linux): use Restart=on-failure, RestartSec=, StartLimitBurst/IntervalSec to rate-limit.
  • runit, s6, supervisord, pm2 (Node.js): these provide robust restart logic, logging, and health checks. Example systemd unit snippet:

Code

[Service] ExecStart=/usr/local/bin/mydaemon Restart=on-failure RestartSec=5 StartLimitBurst=5 StartLimitIntervalSec=300

Notifications

Add alerts to know why restarts happen:

  • Send email with mailx or msmtp.
  • Post to Slack/Teams via webhook using curl.
  • Emit metrics to Prometheus (pushgateway) or log to a central system.

Example curl notification:

bash

curl -X POST -H ‘Content-type: application/json’ -d ”{text:mydaemon restarted on $(hostname)}” https://hooks.slack.com/services/...

Testing and rollout checklist

  • Test kill and restart in staging.
  • Verify PIDfile logic and permissions.
  • Run under correct user (avoid root where unnecessary).
  • Confirm logs rotate and don’t fill disk.
  • Add rate limits to avoid restart storms.

Windows notes (brief)

  • Use NSSM or Windows Services to supervise apps.
  • PowerShell Example: use Get-Process, Stop-Process -Force, Start-Process and Task Scheduler for periodic checks.

Final recommendation: Prefer systemd or a supervisor for production; use scripts only for small tools or where supervisors aren’t available. The examples above give safe patterns: graceful shutdown, identity checks, restart rate limiting, logging, and notification.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *