Deno Memory Usage Increases, Does Not Decrease.
Good Morning, I am coming in here because we have several applications which we are using Deno to run them with, they all suffer from the same issue. One such application is a push service which is using Oak for routing and is a simple API which communicates with Redis to authenticate token requests as well as store a list of channels in a redis key so that we know what channels have subscriptions. There are only a few endpoints to subscribe, unsubscribe and to send a push notification into a redis queue for our consumer to process.
We have found that when running this application on our cloud provider that our Docker containers are consuming a lot of memory and eventually hit our scaling limits which cause auto scaling to add another container to the environment and the old one is taken down. The problem we are having is that this is constantly happening. We have noticed that memory usage is increasing but never decreasing. I would assume that memory would increase but that at some point that memory would be freed and the usage would return to the baseline. This is not happening. I am wondering if there are any Deno tools which could assist us in determining what is using and holding on to that memory. I looked at denosoar but that seems to be for small applications or at-least applications where you do not run them under Docker and do not need to send specific payloads to the endpoints. We also tried to use Deno.memoryUsage and found that our RSS was 477 MB, heap was 13.5 MB, 11.9 MB heap used and 3.6 MB external. I am wondering what could be using so much under rss but not be part of any other metrics. I have attached a screenshot and as you can see the cliffs are when the memory usage triggers autoscaling, as you can also see from the screen shot is that the memory does not decrease to the baseline until scaling kicks in and the new container is put into rotation and the old container is taken down. Any suggestions or advice would be appreciated.
We have found that when running this application on our cloud provider that our Docker containers are consuming a lot of memory and eventually hit our scaling limits which cause auto scaling to add another container to the environment and the old one is taken down. The problem we are having is that this is constantly happening. We have noticed that memory usage is increasing but never decreasing. I would assume that memory would increase but that at some point that memory would be freed and the usage would return to the baseline. This is not happening. I am wondering if there are any Deno tools which could assist us in determining what is using and holding on to that memory. I looked at denosoar but that seems to be for small applications or at-least applications where you do not run them under Docker and do not need to send specific payloads to the endpoints. We also tried to use Deno.memoryUsage and found that our RSS was 477 MB, heap was 13.5 MB, 11.9 MB heap used and 3.6 MB external. I am wondering what could be using so much under rss but not be part of any other metrics. I have attached a screenshot and as you can see the cliffs are when the memory usage triggers autoscaling, as you can also see from the screen shot is that the memory does not decrease to the baseline until scaling kicks in and the new container is put into rotation and the old container is taken down. Any suggestions or advice would be appreciated.
22 Replies
If memory keeps going up and never down then it sounds like you may have a memory leak in your code.
Yea I was thinking the same thing, this is why I came to ask if there were any tools for Deno that may shed some light on where this may be coming from. We are using Oak so I assumed that when the controllers were completed the GC would clean up this stuff but looking through the code I do not see any blatant areas in which a memory leak would be caused. The controllers are quite simple the subscribe endpoint will reach out to an api to get some data, determine if the user can subscribe to a channel and then adds a channel to redis. The unsubscribe endpoint will reach out to redis and remove a channel, the authorize channel simply reaches out to pusher to authorize the subscription.
It would be nice if there was a tool that I am unaware of that may allow me to trace the executions to a file for inspection to determine where this leak may be coming from.
Anyone have any ideas on tools that can be used with Deno? I will assume no if no one replies.
Use Chrome Dev tools and see if the js heal keeps growing. V8 usually doesn't release memory back to the os if there's no memory pressure in case it needs it for the future
yea I thought about this as well, however, if the container is crashing it would cause me to believe that the memory is needed but has not been released. I am pretty sure the containers are crashing since they are being taken down by auto scaling, if they were not then it would simply spin up another instance wait until it leveled out and take down the extra instance. What we are seeing everytime is that memory hits the peak, then a new conainer is brought up and then the one that was high in memory usage is taken down.
I am not sure how to check chrome dev tools since that would entail having to have debugging enabled and this seems to only happen in production under heavy load. It is not happening in our sandbox or staging environments.
Do you have any more information what your program is doing and which Deno version you're using?
Sure, we are using Deno 1.46.3 in production, the app is quite simple it is running an api server which uses Oak for the routing. The application will check redis with every request to authenticate the bearer token and then will simply either add a channel (or increment the value) for a channel name, it will also on another endpoint remove or decrement the value of a key. during the subscription process it will authroize with the pusher js library and return the value of that from the server to the pusher client library.
the clients can also send a pusher message through our api which will just add the message to a redis queue, another container which runs the queue consumer will process these messages and send them out through pusher for clients to receive. The consumer being a seperate service does not seem to suffer from this memory issue at all.
We are using Oak 16.1.0, Zod 3.23.8 and ioredis 5.3.2
I have not found there to be any memory issues reported in these versions, also these are updated versions as we were using older versions and thought that may be the source of the issue.
Pusher: 5.2.0, Winston 3.8.2 and deno/std@0.224.0
we do have 5 containers running but we are sending between 200-300M messages per day so it is high load.
the issue I see is that it is not all containers recycling, it is one container but it seems to happen several times a day
I still think using Chrome DevTools is your best bet here; you should still be able to connect inspector there and take a few heapsnapshots
yea I am just a bit concerned with enabling the debugger on production but will speak to my boss about it. I am also going to see if there is a way I could write a script possibly on a lower environment to simulate a ton of traffic to the service and maybe run the debugger there and see if I can replicate the issue.
How should we do this exactly? Should we run the program with
--inspect-wait
and keep the inspector connected all the time or is it possible to connect it to a running process at any moment?Run with
--inspect
and then connect at any moment you see fit
Then you can go to "Memory" tab and take some heap profiles - you'll have to experiment which option is best for you, one is cheaper but takes longer, other is faster but might cause some halts
Best to take a few snapshots, save them and then compare them offline@Joseph Crawford Did you ever pin this down? I'm testing out ioredis on deno and am wondering if that could be the cultprit, since it's not designed for Deno.
@frigjord unfortunately I have not yet. My boss has time-boxed that task as we just threw more instances at the service in aws and for the time being it has slowed the issue but not solved it. I would love to get back to this and maybe once we are through this crunch I will be able to. If you find anything related to ioredis I would be highly interested in your findings.
Thanks for the quick reply! I'll setup a test environment and bombard ioredis with messages and see if I can isolate the issue to this driver - we're rolling out into production ourselves and the denodriver/redis keeps crashing. ioredis seems to be the only alternative.
I'll post here with any findings
thank you, yea we are personally hitting our deno endpoints with around 400M requests a day. Every request is authenticating the request through redis queries.
Then it sounds like ioredis is more than production ready 😄
I'll run it through it's paces anyway.
Which features of it do you use?
we are simply using it to query for auth purposes, storing keys which are incremented/decremented and or removed and querying for keys to see if they exist.
Alright, so no pub/sub or expires?
but ioredis is the only communication library that we are currently using and may be where our memory issues are coming from, I have not been able to diagnose as of yet.
we have keys with ttl but we are not calling expires through ioredis. No pub/sub at the moment though that may become a reality in the future
The issue we had with denodrivers/redis is that it has no connection pooling or pipeling. I'm assuming that ioredis has connection pooling, but it's also there I suspect memory issues.
I am not sure about pooling but it does support pipelining, we just are not using those featured. I looked into pipelining in the past which is how I know they support it
If you're doing read/writes in parallell towards the same redis instance in deno, you'd require pooling or only a single command can be run at one time.
Redis errors out if you overlap commands from a single connection.
We're using KeyDB to enable multi-threading for redis server, but this single-command thing is a protocol limitation.
I've done some extreme testing with 20 000 keys taking 2GB of memory, reading at 4Gbit/s in 100+ iterations. There is memory buildup, but the garbage collector cleans it after ~200 to 250MB.
So this isn't the cause of your issues, which is great to know!
I'd recommend checking what you do with the value after it's been read, you can manually clean it from memory with
delete myvariable
thanks for letting me know your findings