I'm using Partitioner to parallelize the import of *.csv files. There are about 30k files in the folder.
Problem: the job initialization takes about 1-2h hours until all files are set up. The bottleneck is in SimpleStepExecutionSplitter.split().
Question: is it normal that the step initializations require that much time? Or could I improve it somehow?
@Bean
public Step partitionStep(Partitioner partitioner) {
return stepBuilderFactory.get("partitionStep")
.partitioner(step())
.partitioner("partitioner", partitioner)
.taskExecutor(taskExecutor())
.build();
}
@Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(4); //run import always with 4 parallel files
taskExecutor.setMaxPoolSize(4);
taskExecutor.afterPropertiesSet();
return taskExecutor;
}
@Bean
public Partitioner partitioner() throws IOException {
MultiResourcePartitioner p = new MultiResourcePartitioner();
p.setResources(new PathMatchingResourcePatternResolver().getResources("mypath/*.csv"));
return p;
}