Skip to content

Remove Puma plugin#303

Closed
rosa wants to merge 1 commit intomainfrom
remove-puma-plugin
Closed

Remove Puma plugin#303
rosa wants to merge 1 commit intomainfrom
remove-puma-plugin

Conversation

@rosa
Copy link
Member

@rosa rosa commented Aug 24, 2024

After discussing with @dhh, we don't want to be tied to a particular server here, and this won't be the recommended way to run Solid Queue with Rails, so just remove the provided plugin. It can still be used but it just won't be provided directly by the gem.

@uurcank
Copy link

uurcank commented Aug 28, 2024

@rosa @dhh I cannot get the reasoning here. Puma plugin is a good idea to for a more robust process. If there is a memory leak Puma would then kill it and keep the app responsive (or not?). This was happening in delayed_jobs a lot, so I can see it happening for solid_queue as well. Background jobs can be a bottleneck too for the app.

Is there any other way to keep a supervisor running when a problem occurs? I think the plugin should still be part of the gem and maintained by rails.

@dhh
Copy link
Member

dhh commented Aug 28, 2024

I don't understand what you mean. If there's a memory leak in your job handling, you want that process to be isolated, so it can be restarted independently of the web server.

@uurcank
Copy link

uurcank commented Aug 28, 2024

Yes, but there is not a defined process to restart it since solid_queue is based on database like delayed job, is there? With delayed_job i have had issues with process not restarting in the past.

a job can fail silently, sometimes it does not crash but become unresponsive and stale. In such cases, the process manager may not detect that the worker is no longer effectively processing jobs and restart it. Also jobs can pile up in such cases.

This puma plugin seems like addressing this problem. It checks for time-outs and memory leaks and kills it when necessary to keep the app responsive. (At least that's what my understand was so far, it has a better chance of detecting a failure)

Therefore i am not sure isolation provides more resilience here. I feel like there is a benefit of being dependent on the web server (puma specifically), which might be part of a bigger background job problem, rather than whether rails favoring a specific server or not.

@rosa
Copy link
Member Author

rosa commented Aug 28, 2024

Yes, but there is not a defined process to restart it since solid_queue is based on database like delayed job, is there? With delayed_job

I'm not sure using a database as storage plays a role on this 🤔 At least I can't see why that'd be the case.

This puma plugin seems like addressing this problem. It checks for time-outs and memory leaks and kills it when necessary to keep the app responsive.

It doesn't do that. It just checks if the process is still running, and if it's not, it stops itself as well. Conversely, if Puma dies, Solid Queue is stopped as well. It can't know if a worker in particular is been processing a job for a long time without progressing (for example, if the job has an infinite loop). Same about memory usage.

It'd be nice to have memory usage and time consumed by the same job monitored by the supervisor, certainly something for the future, but this is all independent of the Puma plugin.

@uurcank
Copy link

uurcank commented Aug 28, 2024

I'm not sure using a database as storage plays a role on this 🤔 At least I can't see why that'd be the case.

Since background jobs are not processed on the memory like Sidekiq would do, on the database jobs can pile up and crash the app completely. There is vulnerability here. There is strong chance that process manager does not restart solid_queue.

It doesn't do that. It just checks if the process is still running, and if it's not, it stops itself as well. Conversely, if Puma dies, Solid Queue is stopped as well. It can't know if a worker in particular is been processing a job for a long time without progressing (for example, if the job has an infinite loop). Same about memory usage.

Yes, plugin itself does not do that but Puma does, doesn't it? When the thread running the background job has a problem, it could kill that thread. If solid_queue isolated, it would never see it.

It'd be nice to have memory usage and time consumed by the same job monitored by the supervisor, certainly something for the future, but this is all independent of the Puma plugin.

I might be wrong but seems like Puma already does that for the solid_queue with this plugin. Therefore suggesting to keep this plugin for now as part of the gem.

@dhh
Copy link
Member

dhh commented Aug 28, 2024

I don't think we have a shared understanding here. There's no path where database jobs "piling up" will crash the app. You'll just have a backlog. Anyway, we're going to proceed with a single way of operating SQ. Appreciate your input, but going to move on from this discussion at this time. We have a lot of work to do to get 1.0 out the door, and we're going to focus on that.

@uurcank
Copy link

uurcank commented Aug 28, 2024

Crash in the sense that, say, an email application is no longer sending emails due to piling up. And due to huge backlog, the app cannot come back anytime soon (which is a vulnerability). At that point, it does not matter if the pages are running when the core component of the app which are dependent on background jobs is no longer running.

There is currently no process that detects that. I had the assumption that puma plugin was initially for that and this can be monitored from there (possibly in the async mode, now checking the code this is not the case). I still think this can be potentially monitored from a web server, and worst case the plugin should stop the app completely in case of a stale or slow running job so infinite backlog of jobs would not be created.

@uurcank
Copy link

uurcank commented Aug 29, 2024

Sorry one last input here. Here is an example so hopefully it will make more sense. The plugin could be useful for something like this and we can monitor each job within a thread, restart when needed:

def start(launcher)
   ...

   launcher.events.on_booted do
     monitor_threads
   end
 end
 
  def monitor_threads
   Thread.new do
     loop do
       Puma.stats['worker_status'].each do |worker|
         worker.backlog_size > MAX_BACKLOG_THRESHOLD && handle_stale_threads(worker)
       end
     end
   end
 end
 
 
def handle_stale_threads(worker)
 job_id = worker.job_id
   if worker.running? == false
     stop_solid_queue_job(worker)
      mark_job_as_failed(job_id)
   end
 end
 
 ...

After discussing with DHH, we don't want to be tied to a particular
server and this won't be the recommended way to run Solid Queue with Rails,
so just remove the provided plugin. It can still be used but it just
won't be provided directly by the gem.
@rosa rosa force-pushed the remove-puma-plugin branch from 67fde68 to c125821 Compare September 2, 2024 12:27
@dhh
Copy link
Member

dhh commented Sep 5, 2024

After reviewing the setup flow further, we're going to keep this puma plugin to ensure that out-of-the-box deployment on a single server doesn't require any form of setup or complication. For scale and performance, it's obviously going to be a bit compromised, but we have a very clear path to move to a dedicated job server in the new stock config/deploy.yml setup and we'll control this plugin activation via an ENV switch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants