Spring Batch Introduction Example
In this post, we feature a comprehensive a Spring Batch Introduction article. Many enterprise applications need bulk processing to perform many business operations. These business operations typically include time-based events or complex business rules across very large data sets. Batch processing is used to handle these workloads efficiently. In this post, We will look at Spring Batch as a solution for these batch processing needs.
1. Spring Batch Introduction
Spring Batch is a lightweight, comprehensive batch framework which builds upon the POJO-based development approach. Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job and resource management. Spring Batch is designed to work in conjunction with various commercial and open-source schedulers such as Quartz, Tivoli, Control-M, etc.
Spring Batch follows a layered architecture with three major components – Application, Batch Core and Batch Infrastructure. Application is the client code written by developers to achieve the intended functionality. The Batch Core contains the core runtime classes necessary to launch and control a batch job while the infrastructure contains common services needed for the Batch core and Application.
Let’s start with a simple batch processing use case in the next section. Before that, We will look at the stack involved in creating the example. We will use Maven for managing the build and dependencies with Java 8 as the programming language. All the dependencies required for the example are listed in maven’s pom.xml given below
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.1.7.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.jcg</groupId>
<artifactId>springBatch</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>springBatch</name>
<description>Demo project for Spring Batch</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>org.hsqldb</groupId>
<artifactId>hsqldb</artifactId>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>- This maven configuration indicates
Spring Boot Starter Parentas the dependency and the version is specified as 2.1.7. All the other Spring dependencies inherit from the parent. - Java Version is specified as 1.8 for the project.
- Spring Batch is specified as the dependency for the project which is the topic of our example.
- Now, Spring Batch requires the Job metadata such as start and end to be saved into a persistent store. For this purpose,
HSQLDBis specified as a dependency. This is an embedded database which saves the information and gets destroyed as the application exits. Spring Batch auto-creates the required tables for maintaining the job information.
2. Batch Example
A typical Spring Batch Job typically involves a Reader, Writer and optionally a Processor. A Processor is typically involved when we need to apply business rules on the data read. There is alternatively a Tasklet involved which we will delve into the next section.
In this section, We will consume a movie JSON dataset and write it to a CSV file. We will look at the entity structure of Movie which helps to understand the JSON structure.
Movie.java
package com.jcg.springBatch.entity;
import java.util.List;
public class Movie {
private String title;
private long year;
private List cast;
private List genres;
public String getTitle() {
return title;
}
public void setYear(long year) {
this.year = year;
}
public void setCast(List cast) {
this.cast = cast;
}
public void setTitle(String title) {
this.title = title;
}
public List getGenres() {
return genres;
}
public void setGenres(List genres) {
this.genres = genres;
}
}
- Movie Class has four fields
- Title – This holds the movie name
- Year – This is the year in which movie was released
- Cast – This includes the actors in the movie.
- Genre – This represents the genre of the movie such as action, Comedy and thriller
- The
movies.jsonis a public dataset obtained from GitHub
We will create a SpringBoot Application capable of running the Spring Batch Job. Our job is going to read all the movies and output a CSV file containing the movie and its corresponding genres.
Application.java
package com.jcg.springBatch;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
}
- This is a typical SpringBoot application setup where we annotate the class to enable SpringBoot.
- Spring Boot takes an opinionated view of the Spring platform and third-party libraries. Most Spring Boot applications need very little Spring configuration reducing development time.
In the sections below, We will see various steps involved in configuring the batch job. We are going to break the Java Class BatchConfiguration into various snippets for understanding.
BatchConfiguration.java
@Configuration
@EnableBatchProcessing
public class BatchConfiguration {
@Autowired
JobBuilderFactory jobBuilderFactory;
@Autowired
StepBuilderFactory stepBuilderFactory;
}
- The class is annotated with
@Configurationto ensure this is a configuration to be processed by the Spring Boot. Previously these were XML files but now Spring Boot favors Java configuration. - The other annotation
@EnableBatchProcessingindicates that this is a batch project. - We have two builders specified
JobBuilderFactory– used to build the movie Job. In Spring Batch Job is the top-level abstraction. Job indicates the business functionality which needs to be achieved.StepBuilderFactory– used to build the steps involved in the Job. A job can contain multiple steps with each step fulfilling a particular task. For our Simple Job, we have only one step.
A step is where all the action begins. As indicated in the top of the section, a step contains the three components of ItemReader, ItemProcessor and ItemWriter. Spring provides out of the box reader and writer for various file formats. Considering our JSON dataset, we will look at the JsonItemReader below.
ItemReader
@Bean
public JsonItemReader jsonItemReader() throws MalformedURLException {
return new JsonItemReaderBuilder()
.jsonObjectReader(new JacksonJsonObjectReader(Movie.class))
.resource(new UrlResource(
"https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json"))
.name("movieJsonItemReader")
.build();
}
- Spring follows the builder pattern where we provide various pieces of input required to build the entire object.
- We load the JSON data from the URL by specifying a
URLResourceas input. - We also specify the
Movieentity to be the type to which data has to be transformed. - Rest of the configurations are just providing a suitable name for the class.
Once the reader reads the data, data is available to be consumed by the further components in the step. In our Step, We have a custom processor which processes the data from the Reader.
ItemProcessor
@Bean
public ItemProcessor movieListItemProcessor() {
return movie -> new MovieGenre(movie.getTitle(), movie.getGenres().toString());
}
- The processor is written as an inline lambda
- It takes in each movie and converts it to another entity
MovieGenrewhich has two fields- Title – Movie Name
- Genre – Genres comma separated instead of a List
MovieGenreclass is listed below which is self-explanatory
MovieGenre.java
package com.jcg.springBatch.entity;
public class MovieGenre {
private String genre;
public String getGenre() {
return genre;
}
public String getTitle() {
return title;
}
private String title;
public MovieGenre(String title, String genre) {
this.genre = genre;
this.title = title;
}
}
Now we come to the final component in the step – ItemWriter.
ItemWriter
@Bean
public FlatFileItemWriter movieGenreWriter() {
return new FlatFileItemWriterBuilder()
.name("movieGenreWriter")
.resource(new FileSystemResource("out/movies.csv"))
.delimited()
.delimiter(",")
.names(new String[]{"title", "genre"})
.build();
}
- We use
FlatFileItemWriterto write the output to a CSV file which is specified as the resource. - We specify the delimiter to be used within a line – can be space or any other character. Since it is a CSV, a comma is specified as the delimiter.
- The column names to be consumed from the entity are specified to the names argument.
All of these components are Bean definitions specified in the configuration class. Now, a Step definition is the one which glues together all of these components.
MovieStep
@Bean
public Step movieStep() throws MalformedURLException {
return stepBuilderFactory
.get("movieStep")
.<Movie, MovieGenre>chunk(10)
.reader(jsonItemReader())
.processor(movieListItemProcessor())
.writer(movieGenreWriter())
.build();
}
- Spring Batch processes the records(items) in chunks. We specify chunk size as 10 which indicates ItemReader to read 10 records at a time.
- The input (Reader Datatype) and output(Writer Datatype) type are specified explicitly in the step.
- These are then fed to the processor one by one but the output from the processor is aggregated and sent to the Writer with the specified chunk size.
The final component is the MovieJob which is explained below
MovieJob
@Bean
public Job movieJob(Step movieStep) {
return jobBuilderFactory.get("movieJob")
.incrementer(new RunIdIncrementer())
.flow(movieStep)
.end()
.build();
}
- A Spring Batch Job can run multiple times. Hence to differentiate each run of the job, Spring provides a
RunIdIncrementerwhich increments the run id every time the job is run. - Flow is analogous to a Step and the
movieStepis provided here. But there are other execution flows which can also be provided.
Now to execute the job, run the class Application and CSV file similar to the one below is generated.
movies.csv
After Dark in Central Park,[] Boarding School Girls' Pajama Parade,[] Buffalo Bill's Wild West Parad,[] Caught,[] Clowns Spinning Hats,[] Capture of Boer Battery by British,[Short, Documentary] The Enchanted Drawing,[] Feeding Sea Lions,[] ....
But this does not give information about the records in the file. To specify column headings, FlatFileItemWriter has a header callback which can be specified as .headerCallback(writer -> writer.write("Movie Title,Movie Genres")). This writes the header of the file even before any of the other records are written.
2.1 Listener
In the previous section, We saw the batch processing capability of Spring. But after the job completes, We did not get any statistics about the Job or step. Spring provides a listener interface using which we can listen during the lifecycle of the job. We will see the example of a StepExecutionListener which will be executed before and after the step.
Listener
@Bean
public StepExecutionListener movieStepListener() {
return new StepExecutionListener() {
@Override
public void beforeStep(StepExecution stepExecution) {
stepExecution.getExecutionContext().put("start",
new Date().getTime());
System.out.println("Step name:" + stepExecution.getStepName()
+ " Started");
}
@Override
public ExitStatus afterStep(StepExecution stepExecution) {
long elapsed = new Date().getTime()
- stepExecution.getExecutionContext().getLong("start");
System.out.println("Step name:" + stepExecution.getStepName()
+ " Ended. Running time is "+ elapsed +" milliseconds.");
System.out.println("Read Count:" + stepExecution.getReadCount() +
" Write Count:" + stepExecution.getWriteCount());
return ExitStatus.COMPLETED;
}
};
}
- In the
beforeStepmethod, We obtain the step name and log to the console. - We store the start time in Step’s
ExecutionContextwhich is similar to a map containing a string key and can take any object as the value. - In the
afterStepmethod, we log the running time using the start time stored in ExecutionContext. - We log the read record count and write record count for the step which is the original intention of adding the listener.
We have just defined the listener but have not associated the listener to the created step. We will see how we can associate the listener to the moviestep.
Listener to Step
@Bean
public Step movieStep() throws MalformedURLException {
return stepBuilderFactory
.get("movieStep")
.listener(movieStepListener())
.chunk(10)
.reader(jsonItemReader())
.processor(movieListItemProcessor())
.writer(movieGenreWriter())
.build();
}
This is just one listener. We also have other listeners similar to it. For Example, there is another listener – JobExecutionListener which executes before and after the job. It has its own ExecutionContext for storing the job-related information. Running the job produces the following output.
Logs
2019-08-31 15:11:06.163 INFO 24381 --- [ main] o.s.b.a.b.JobLauncherCommandLineRunner : Running default command line with: []
2019-08-31 15:11:06.214 INFO 24381 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=movieJob]] launched with the following parameters: [{run.id=1}]
2019-08-31 15:11:06.226 INFO 24381 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [movieStep]
Step name:movieStep Started
Step name:movieStep Ended. Running time is 3340 milliseconds.
Read Count:28795 Write Count:28795
2019-08-31 15:11:09.572 INFO 24381 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=movieJob]] completed with the following parameters: [{run.id=1}] and the following status: [COMPLETED]
2019-08-31 15:11:09.575 INFO 24381 --- [ Thread-5] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2019-08-31 15:11:09.577 INFO 24381 --- [ Thread-5] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
3. Tasklet
In this section, We will see another form of Spring Batch step – Tasklet Step. This comes in handy when the flow does not fit the pattern of Reader, Writer and processor. This is a single step executing with the same safety guarantees of restartability and fault tolerance.
ListStep
@Bean
public Step listStep() {
return stepBuilderFactory.get("listStep")
.tasklet((stepContribution, chunkContext) -> {
Resource directory = new FileSystemResource("out");
System.out.println(directory.getFile()
+ " directory is available");
for (File file : directory.getFile().listFiles()) {
System.out.println(file.getName()
+ " is available");
}
return RepeatStatus.FINISHED;
}).build();
}
- A simple
TaskletStepnamedlistStepis created. - It has two parameters –
StepContributionandChunkContextStepContributionis much similar to theStepExecutionContextproviding context for the step.ChunkContextis similar toStepContributionbut it provides context around the chunk being processed.
- The current step looks at the output directory and lists all the files inside the directory.
Job Definition
@Bean
public Job movieJob(Step movieStep, Step listStep) {
return jobBuilderFactory.get("movieJob")
.incrementer(new RunIdIncrementer())
.flow(movieStep)
.next(listStep)
.end()
.build();
}
We wire the listStep to the movieJob in the above code snippet to chain the sequence of steps. This verifies the creation of output CSV file in the out directory.
Logs
...
2019-08-31 15:12:07.472 INFO 24390 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [listStep]
out directory is available
movies.csv is available
2019-08-31 15:12:07.473 INFO 24390 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=movieJob]] completed with the following parameters: [{run.id=1}] and the following status: [COMPLETED]
2019-08-31 15:12:07.476 INFO 24390 --- [ Thread-5] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2019-08-31 15:12:07.478 INFO 24390 --- [ Thread-5] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
4. Download the Source Code
You can download the full source code of this example here: Spring Batch Introduction Example


