Frequently Asked Questions

Some frequently asked questions about JobRunr...

BackgroundJobServer FAQ

Does JobRunr need open ports for distributing jobs?

No, JobRunr does not require an open port for distributing the workload - this is orchestrated via the StorageProvider.

How is the coordination between different nodes done?

Each BackgroundJobServer registers itself on startup in the StorageProvider. For an RDBMS, this is a plain old table called jobrunr_backgroundjobservers. The master is the server which is the longest running (so, the one that was registered as first node).
Then, every 15 seconds, each BackgroundJobServer updates a lastHeartBeat timestamp. If a node crashes for some reason (this can also be the master node), the lastHeartBeat timestamp is not updated anymore. All other server participating in processing jobs see that the master node is not active anymore and it is removed from the StorageProvider.
Next, the master reelection process starts which is again nothing more than the longest running BackgroundJobServer.

Pro tip: if you are running in a Kubernetes environment, it is best to always keep your first BackgroundJobServer running and scale other pods up and down. This will result in less Master reelection processes and thus less database queries.

What is the role of the master?

The master is a BackgroundJobServer like all other nodes processing but it does some extra tasks:

  • it checks for recurring jobs and schedules them when they are about to run
  • it checks for scheduled jobs and enqueues them when they need to run
  • it checks for orphaned jobs and reschedules them
  • it does some zookeeping like deleting all the succeeded jobs

My recurring jobs are not running nor available in the dashboard?

To schedule your recurring jobs, you must make sure that the code scheduling these jobs is executed on startup of your application. See the examples in Recurring jobs

JobRunr stops completely if my SQL / NoSQL database goes down

JobRunr uses your database for a lot of things:

  • Master node election for the BackgroundJobServer
  • Monitoring whether there are no zombie jobs (jobs that were being processed on a BackgroundJobServer node that crashed)
  • Optimistic locking so that a job will be only executed once

The moment JobRunr loses it’s connection to the database (or the database goes down), there will be a lot of threads that will try to write updates to the database but all of these writes will off-course fail. This will result in a huge amount of logging and if JobRunr would try to continue job processing, it would flood the disks fast because of each attempt to process a job fails. That’s why I decided that if there are too many exceptions because of the StorageProvider, JobRunr stops all background job processing. This can off-course be monitored via the dashboard and health endpoints.

JobRunr Pro improves this by monitoring if the StorageProvider comes up again and if so, automatically restarts processing on all the different BackgroundJobServers.

Job FAQ

What if I don’t want to have 10 retries when a job fails?

You can configure the amount of retries for all your jobs or per job.

  • To change the default for all jobs, just register a RetryFilter with the amount of retries you want using the withJobFilter method in the Fluent API or in case of the Spring configuration, just pass the filter to the setJobFilters method of the BackgroundJobServer class.
  • To change the amount of retries on a single Job, just use the @Job annotation:
@Job(name = "Doing some work", retries = 2)
public void doWork() {
    ...
}

I’m encountering a java.lang.IllegalThreadStateException

While developing, you may encounter the following error:

java.lang.IllegalThreadStateException: Job was too long in PROCESSING state without being updated.
at org.jobrunr.server.JobZooKeeper.lambda$checkForOrphanedJobs$2(JobZooKeeper.java:134)

This is because you stopped a running JVM instance where a BackgroundJobServer was processing a job. When a job is being processed, it is regularly updated with a timestamp so that in case of a node failure, the job can be retried automatically on a different server. The error message you see here, is an example of such a case.

I’m listening for jobs using Service Bus messages in a load-balanced environment and I want to schedule jobs only once.

If you are in an environment using JMS or any other Service Bus Message and you are listening on multiple nodes for these messages to create jobs, you will probably enqueue the same job on each node. This is because each node that is listening, receive the JMS message and enqueue the same job.

This can easily be solved using the following technique:

public class JobMessageListener implements MessageListener {
    private JobScheduled jobScheduler;
    private MessageHandler messageHandler;
 
    public ConsumerMessageListener(JobScheduled jobScheduler, MessageHandler messageHandler) {
        this.jobScheduler = jobScheduler;
        this.messageHandler = messageHandler;
    }
 
    public void onMessage(Message message) {
        TextMessage textMessage = (TextMessage) message;
        jobScheduler.enqueue(
          message.getJMSCorrelationID(), // by passing the JMS correlation id, this will be the id of the job and thus unique.
          () -> messageHandler.handleServiceBusMessage(textMessage.getText())
        );

    }
}

This is because you stopped a running JVM instance where a BackgroundJobServer was processing a job. When a job is being processed, it is regularly updated with a timestamp so that in case of a node failure, the job can be retried automatically on a different server. The error message you see here, is an example of such a case.