0

We have a spring batch job where we trying to process around 10 million records. Now doing this in single thread will be very slow since we have to match SLA.

To improve performance, we have developed a POC where master step is creating partitions where each partition represent one unique prod id. This can range from anywhere between 500 to 4500. In POC we have 500 such unique prod id. Now each partition is being given a prod id and step work on it. All this end to end works fine.

What we noticed is that master steps takes more than 5min to send partition info to step execution request. What i mean by that is that, there is more than 5 min diff between master step generates partitions and step being executed for 1st partition.

What might be causing this slowness? What spring batch framework does during this 5 min?

Here are the 3 selects which is executed during that 5 min so many time

SELECT JOB_EXECUTION_ID, START_TIME, END_TIME, STATUS, EXIT_CODE, EXIT_MESSAGE, CREATE_TIME, LAST_UPDATED, VERSION, JOB_CONFIGURATION_LOCATION from BATCH_JOB_EXECUTION where JOB_INSTANCE_ID = ? order by JOB_EXECUTION_ID desc;

SELECT JOB_EXECUTION_ID, KEY_NAME, TYPE_CD, STRING_VAL, DATE_VAL, LONG_VAL, DOUBLE_VAL, IDENTIFYING from BATCH_JOB_EXECUTION_PARAMS where JOB_EXECUTION_ID = ?; SELECT STEP_EXECUTION_ID, STEP_NAME, START_TIME, END_TIME, STATUS, COMMIT_COUNT, READ_COUNT, FILTER_COUNT, WRITE_COUNT, EXIT_CODE, EXIT_MESSAGE, READ_SKIP_COUNT, WRITE_SKIP_COUNT, PROCESS_SKIP_COUNT, ROLLBACK_COUNT, LAST_UPDATED, VERSION from BATCH_STEP_EXECUTION where JOB_EXECUTION_ID = ? order by STEP_EXECUTION_ID;

1 Answer 1

0

Take a look at your job repository's configuration. Once the Partitioner has created the ExecutionContexts for each slave step, the master creates a StepExecution for each before sending it to the slave to be processed. So that lag is probably due to the insertion of all of those StepExecutions into your job repository. As a follow up, make sure you're using the latest versions. There was an optimization done to that not too long ago (batch inserting the executions instead of doing it one by one).

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Michael. We are using spring batch 3.0.3 and looking at the logs it executing group of 3 select. Please see my original comment updated with those select. Looking at the insert, it happens only once to save context info
Also i have one more question about, job repo. What do you mean by look at your job repo configuration?
What I noticed is that the above 3 select(original comment) is being exeucuted for each parition and those 3 selects combined take 1 sec. So if we have 500 parition then it is execution 500 times those 3 selects. And this is taking most of the time

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.