How to troubleshoot runaway (100% CPU) deno process
Deno has been running a couple of websites rock solid for many months. But during last week or so I've had the server (
using Deno.serve()
) go unresponsive twice. That is, connection times out and I can see 100% CPU usage for the Deno process.
No idea if it's a Deno bug, or bug in my code (like a blocking infinite loop or something). Thing is, I have no stacktrace, console.log output, or error logs to work with. I only know Deno process is stuck with something, using a lot of CPU and unable to serve requests (connection times out). Problem is I don't know where it is stuck.
I do push new versions out every couple of days or so, as well as update deno to latest version regularly. Both deno and my own code has changed in this timeframe so I can't really say whether this problem is related to some version of Deno, or is it related to my userland code updates.
Is there any way to perhaps stop/interrupt a running Deno process from the outside and get some info what it was doing at the moment to try and find the problem? Again not sure if it's Deno internal problem, or a problem with my userland code, or a problem in some third party library. No hints in stdout/stderr.
I have no idea how to reproduce the issue other than letting it run on production server and just wait perhaps days or weeks for the issue to manifest. Since this only happens rarely, on production server, obviously I'd like to run any diagnostics or tracing tools quickly to then kill deno and restart the process as soon as possible. 🤔6 Replies
I wasn't able to find the answer to your question in the official Deno documentation. However, based on the information you've provided, there are a few general suggestions I can offer, but please note that these are not based on the Deno documentation, and you should double-check the accuracy of this advice.
1. Profiling: If you suspect a performance issue, you might want to use profiling tools to monitor your Deno process. You can use the
--inspect
flag when starting your Deno process, which will allow you to connect Chrome DevTools and take a CPU profile to see where the CPU time is being spent.
2. Logging: Enhance logging in your application to include more verbose information about the application's state and behavior. This might help you identify patterns or specific requests that lead to the high CPU usage.
3. Graceful Shutdown: Implement a graceful shutdown mechanism in your server code that can be triggered externally. This could allow you to attempt a controlled shutdown and possibly log the current state of the application before it completely stops responding.
4. Resource Limits: Use operating system tools to limit the resources available to the Deno process. This won't solve the underlying issue but might mitigate the impact on the rest of your system and make the problem manifest more quickly for easier diagnosis.
5. Monitoring: Set up monitoring for your application that can alert you when CPU usage spikes or when the application becomes unresponsive. This can help you respond more quickly to issues.
6. Version Control: Since you update Deno and your code regularly, use version control (if you're not already) to track changes. This can help you identify which changes might be correlated with the onset of theissue.
Remember, these suggestions are not from the Deno documentation, and you should verify their applicability to your situation. For a more specific solution, you might want to reach out to the Deno community or look for similar issues reported by other users.
Not so well versed with os signals etc. But I guess what I'm looking for, is a way to send a kill signal kind of thing from the terminal, which would stop Deno whatever it's doing, but also cause it to puke out a Javascript or Rust stacktrace. Which would help diagnose where it's stuck, is it on an infinite JS loop or what.
Gonna bump this, as it seems this keeps happening roughly once or twice a day now on a live web server (deno suddenly stops responding and uses 100% CPU). I set up some additional logging and monitoring to detect this and quickly recover, but little in the ways of tools to debug and find the root cause.
Log timestamps suggest there wasn't any complex/more involved requests being served at the time of the process freezing, which hints at this might be some Deno/V8 bug as opposed to javascript problems in "userland". Going with that train of thought or assumption, it might be something introduced between 1.40.0, and 1.41.1, and persisting with 1.41.2 since everything has been running rock solid before. For sure it started happening only after Temporal came out (kinda makes me suspect a little some problems with Temporal implementation... but it's a long shot).
Is there any way to get some usable information out from a runaway deno process? Like, I wouldn't know what to do with a memory dump, but something like interrupting the process and getting a rust backtrace might help to know what it is/was doing when it is having problems.
More people having the same problem https://github.com/denoland/deno/issues/23033
GitHub
100% CPU usage · Issue #23033 · denoland/deno
I have a server with Deno running and every few days, the server got stuck due 100% CPU usage: Restarting the process fixes the problem. I found several bugs about Deno using 100% CPU so not sure i...
For full disclosure; i'm one of the commenters in this issue, jtoppine (noticed too that someone had similar problems and had created an issue). So I guess that makes total of two people having what appears to be the same problem.
I haven't run to these freezes for a while now, everything has been running smoothly since. The cause remains a mystery, and I guess low priority now that it seems like a rare occurence. I was able to implement quick detection & recovery if it ever happens again so it's not as much as a disastrous event anymore.
Still interested in potential ways to investigate if these issues happen in the future though!
In the github issue it was suggested to run Deno with remote debugger enabled - A good idea, but feels like not practically feasible. (running a headless cloud VPS production server with debug hook enabled, and keeping a SSH tunnel open / remote debugger connected for days in hopes the issue reproduces. That is, if the debugger and remote connection even work when the process is in a 100% CPU usage, unresponsive state) Like, maybe it could work, but I kinda gave up trying to set it up without even trying :)
Yes. It would be nice if we could just connect the debugger via CLI at any time and stop the execution of the program. Node has a builtin CLI-based debugger
https://nodejs.org/api/debugger.html#debugger