JobRunr Pro

Database Fault Tolerance

Keep your jobs running - even in a volatile infrastructure landscape.

On this page

JobRunr by defaults stops completely if the SQL/NoSQL database goes down and there is a reason for this - JobRunr namely uses your database for a lot of things:

Master node election for the BackgroundJobServer
Fetching the details of a Job and update the state when it’s done
Monitoring whether there are no zombie jobs (jobs that were being processed on a BackgroundJobServer node that crashed)
Optimistic locking so that a job will be only executed once
…

The moment JobRunr loses it’s connection to the database (or the database goes down), there will be a lot of threads that will try to read and write updates to the database but all of these will of course fail. This will result in a huge amount of logging and if JobRunr would try to continue job processing, it would flood the disks fast because of each attempt to process a job fails. I’ve seen it happen and it happens faster than you think.

That’s why I decided that if there are too many exceptions because of the StorageProvider, JobRunr stops all background job processing. This can of course be monitored via the dashboard and health endpoints.

JobRunr Pro has your back

JobRunr Pro increases resilience for this and pauses all job processing as soon as the StorageProvider goes down. It continues to monitor the database whether it comes up again and if so, automatically restarts processing on all the different BackgroundJobServers.

Configuring a Grace Period

Available since JobRunr Pro 8.4.0.

By default JobRunr Pro waits indefinitely for database recovery. Instead you may prefer the server to shut down after a certain period. This is useful in containerized environments where you want orchestrators like Kubernetes to restart unhealthy pods.

You can configure a grace period after which the server shuts down if the database hasn’t recovered. Once shut down, health endpoints will report the server as down, allowing your orchestrator to take action.

JobRunrPro
    .configure()
    // ...
    .useBackgroundJobServer(usingStandardBackgroundJobServerConfiguration()
        .andStorageProviderUnhealthyGracePeriod(Duration.ofHours(1))
    )
    // ...

jobrunr.background-job-server.storage-provider-unhealthy-grace-period=1h

jobrunr:
  background-job-server:
    storage-provider-unhealthy-grace-period: 1h

Try JobRunr Pro!