Partial commit and job restart with Spring Batch

If a batch fails after partial commit, it must be possible to start again with the processing of the file by skipping the lines already committed

Advertisements

A classic batch is the processing of a file, for which records are read, for each record the data are processed and are persisted on database (reader, processor and writer).

In case the file is large and contains thousands of records, partial commits must be expected during processing. For example, every 1000 records, we can decide to commit the processing on the database.

Through Spring Batch it is very easy to get partial commits, it’s a simple parameter that is passed to the StepBuilder. In the case that it’s necessary to implement more complex partial commit policies, it is possible to implement custom completition policies (which in this case we will not see because it is not the subject of this article).

Finally, if a batch fails after partial commit, it must be possible to start again with the processing of the file by skipping the lines already committed. This last thing is also expected by Spring Batch, but we must add a few lines of code, it is not a simple configuration parameter. Also in this case it’s possible to implement custom skip and retry policies (and also in this case we will not see why not the subject of this article).

The environment is as follows:

  • Java7
  • Spring Boot 1.1.8
  • Spring Batch 3.0

There are two concepts, the job instance and job execution. An instance of a job is accomplished through n executions (typically one, or more if there have been failures). Furthermore, only a failed job can be restarted.

To implement the restart we need the jobRegistry, jobOperator, jobExplorer and jobLauncher. Here is the complete code of the batch configuration.

Basically these are the steps:

  • register the job in the jobRegistry
  • get job instances through the jobOperator
  • given the last instance, get executions through the jobOperator
  • through the jobExplorer check if the last execution has failed
  • in case the last execution has failed, the job must be restarted via the jobOperator
  • in case the last execution was successful, launch a new job instance via the jobLauncher

Spring Batch takes care of managing the restart starting from the first uncommitted record.

Thanks Spring Batch 🙂

 

 

 

 

Performance tuning for File Reading with Java 8 and Parallel Streams

Is it always convenient to use parallel streams?

Java 8 has introduced with the Stream the possibility of using in a very simple way all the resources made available by the hardware, in particular the cores of the multicore architectures. And all this with the paradigm of declarative and non-imperative programming.

For example, suppose you have to do a batch that looks for a string in all the files that have a certain prefix in a certain directory. The batch must notify in a log in which files the string is found.

It is interesting to compare the two approaches, pre java 8 and java 8 with the streams.

I used my dev computer, a Samsung laptop with this features:

  • Intel Core i7-4500U (2 Core, 4 Thread) 1.8GH
  • 8 GB RAM
  • 256 GB SSD
  • Windos 8.1 Pro 64 Bit
  • Java 1.8.0_101

Having 4 Threads I expect the performance of the parallel stream solution to be 400% better than the classic pre java 8 version.

The code that uses the pre java style 8 is as follows: StringMatchingOld

The code that uses the streams is as follows: StringMatching

The code that uses the parallel streams is as follows: ParallelStringMatching

These are the results (4 files read):

File Reading Performance

What is going on? Truly strange, there is no benefit in the use of the stream (I think also from the point of view of the code… I don’t think it’s more readable but this is another story… I’m getting older…). Furthermore, we see that using parallel streams does nothing but make things worse.

Conclusions

It seems that reading the files does not find any improvement in the use of the streams. However, the fact remains that declarative programming makes it possible to abstract from current hardware, so maybe on a server with better hardware than my development pc, parallel streams performance is better.

Or maybe I’m doing something wrong … I’m looking forward to that.

 

Data Masking with JPA and Spring Security

The protection of sensitive data is an increasingly popular topic in IT applications

The protection of sensitive data is an increasingly popular topic in IT applications. Also in our case, a customer asked us, on an already existing web application, to implement a data masking solution that is dynamic and based on security profiles.

The application is developed in Java, with Spring MVC for the management of the Model View Controller, JPA for data access and Spring Security for the management of security profiles.

There are two approaches in literature: SDM (Static Data Masking) and DDM (Dynamic Data Masking).

SDM

SDM plans to clone the current database by masking sensitive data. Specific inquiry applications that provide data masking can read from the cloned database.

Advantages:

  • performance of data access at runtime

Disadvantages:

  • data read can be not updated (update takes place via batch and, depending on the mode, the update can last from minutes to hours)
  • not ideal for a role-based / field-based security scenario

DDM

DDM plans to mask data when it is read at runtime.

Advantages:

  • real data reading,
  • ideal for a role-based / field-based security scenario

Disadvantages:

  • read / write overhead performance
  • possible unmusk algorithms to avoid data corruption (to prevent the masked data from persisting on the DB)

Given the customer’s requests, the DDM technique is the one that best suits a dynamic scenario based on security profiles.

At this point another choice had to be made because for DDM there are two approaches:

JPA Rewriting

In the literature we talk about SQL Rewriting, in our specific case JPA rewriting, JPA being our data access layer. The data is masked in a PostLoad or PostUpdate annotated method of a JPA Entity Listener, that means in the persistent layer.

Advantages:

  • punctual masking of the data in the load phase from the DB
  • easy data-masking mapping

Disadvantages:

  • masking depending on the data type (for example a string can be masked with ‘***’, or with ‘###’, a number with ‘000’ or ‘999’, a date with ’99 / 99 / 9999 ‘, etc etc …)
  • difficulty in the Look & Feel for rendering the view if the data is masked (each view should declare the masking … re-enter in the case of View rewriting below)
  • unmask algorithms that use the user session to store unmasked data. JPA makes shering of objects loaded by DB, so it is not said that an object loaded by an inquiry function is not then used for an update function. In this case the masked data would be persisted on DB, that means data corruption
  • complex make the masking dependent on the function (use of the user session for function-masking mapping)
  • complex use of the user session (see above for unmask and function-masking mapping)

View Rewriting

The data is masked in the presentation layer, typically in jsp pages.

Advantages:

  • homogeneous masking (does not depend on the type of data, everything can be masked for example with ‘***’)
  • it is not required unmusk phase
  • easy rendering for a look & feel (each view declares whether or not it wants masking)
  • easy to make it dependent function (each function declare whether or not it wants masking)

Disadvantages:

  • not punctual masking (all the views must mask … the tags reused by the view simplify, but not completely)
  • difficult data-masking mapping (each view must declare the data)

We chose to adopt the View Rewriting, because analyzing the effort (which in this article omits because not relevant), it was, more or less, similar between the two approaches, while the risk of data corruption and out of memory exceptions of user session are absent. Moreover the View Rewriting solution is much more customizable for what concerns the Look & Feel

To implement the solution we need the following things in detail:

  • a generic editor to enable or not a field for masking
  • a masking class that performs data masking based on security profiles
  • to modify all existing views to use the masking class above

Let’s see in detail

Role-based security mapping

We use a role-based security mapping based on Spring Security (already present in the application). For any data that you want to mask, a role is created made like this:

ROLE_MASK_DOMAIN-NAME_FIELD-NAME

for example, if I want to mask the tax code field of the people table, since the field is mapped via JPA in Person.taxCode, the role will be

ROLE_MASK_PERSON_TAXCODE

The mapping editing is managed dynamically with a special GUI function. We used the existing Domanin Editor function, a generic domain editor that for all domain classes it allows the modification of all the fields mapped to the database.
We have added a new editing form for managing data-masking mapping.
The form will contain all the fields of the chosen domain class. For each field you can choose (with a special checkbox) whether or not to enable the relative masking. When saving, the function performs the following steps:

  • look in the Authorities table if the role ROLE_MASK_DOMAIN-NAME_FIELD-NAME exists. If it does not exist it creates (the opposite if the field must be disabled)

For mapping with profiles (Spring Security Groups) are used the already present Spring Security functions implemented in the appropriate View of the application.

Masking class

Creation of a class that receives as input the data to be masked and its name (for example, Person.taxCode).
The class looks for (with the methods that provide Spring Secutiry) if the current user’s profile is associated with the corresponding field (ROLE_MASK_PERSON_TAXCODE / Person.taxCode). If exists, the class mask the data and returns it to the view.

Change Views

The functions that provide for masking the data are typically those of inquiry. In our case it helps us the fact that we have adopted tags in the presentation layer so all the shows and lists use a display.tagx tag and a table.tagx tag. We need to change these two tags to make them use the masking class.
The longest work concerns modifies all jsps that use the two tags, which must declare the name of the field they are viewing.

Finally we have also modified the search filters to make sure that if the filter provides the search for a field that must be masked, the filter is disabled.
For example, if the filter requires a search for tax code, the filter must use the masking class to know at runtime if the profile expects to mask this data.
If so, the filter is disabled.

Conclusions

View Rewriting with role based security is the best solution for the following reasons:

  • effort slightly greater than the JPA Rewriting solution but more or less similar
  • use of spring security to map the data to be masked to the profile
  • greater custom in terms of look & feel
  • absence of data corruption risk
  • absence of user session out of memory risk

IBAN, iban4j and CIN calculation

The International Bank Account Number (IBAN) is used to uniquely identify bank details internationally.

The code is as follows:

  • 2 capital letters representing the Nation (IT for Italy)
  • 2 control digits
  • the national BBAN code

For Italy, the BBAN code (Basic Bank Account Number) is composed of:

  • CIN (1 uppercase letter)
  • ABI (5 digits)
  • CAB (5 digits)
  • Account number (12 alphanumeric characters possibly preceded by zeros if the number of characters is less than 12)

The CIN (Control Internal Number) code consists of a single letter and is used as a control character. It’s calculated based on the ABI and CAB codes and account number.

Both the two control digits and the CIN can be calculated to verify that the IBAN entered in a form by a user is valid and compliant. To do this in java there is the iban4j library.

The problem with this library is that it calculates the two control digits but not the CIN. On the net I didn’t find any java library that made the CIN calculation. I only found an example written in Visual Basic  of which I ported in java. The class name is CINUtil and it can be downloaded from PasteBin.

A method for checking the iban inserted in a form by a user can be the following:


public static boolean checkIban(String ibanCode) {
String countryCode = ibanCode.substring(0, 2);
String abi = ibanCode.substring(5, 10);
String cab = iban.substring(10, 15);
String conto = iban.substring(15);


org.iban4j.Iban ibanToCheck = new org.iban4j.Iban.Builder()
.countryCode(CountryCode.valueOf(countryCode))
.bankCode(abi)
.nationalCheckDigit(CINUtil.computeCin(abi, cab, conto))
.branchCode(cab)
.accountNumber(conto)
.build(true);


return ibanCode.equals(ibanToCheck.toString());
}

Maven Resources Plugin and Binary Fonts

Copy WAR resources without corrupt binary files

The Maven Resources Plugin copies project resources into the output dir during the creation of the WAR. During this phase the plugin performs the filtering operation, where the variables in the resource files are replaced with the specified values. But this filtering operation does not have to be done for binary files, to prevent these files from being corrupted. For example binary files that can be corrupted by the filtering phase of the Maven Resources Plugin are the fonts.

If there are any corrupted fonts in the deployed WAR, just check with the developer’s tools that each browser provides, if there are any errors in the font decoding phase of the browser itself. For example, as in this case:

fonts_corruptedmessages such as “Failed to decode downloaded font”, or “OTS parsing error: Failed to convert WOFF 2 font to SFNT”, or “incorrect file size in WOFF header”, or “GDEF: invalid table offset” are all signs that binary fonts are corrupt.

To avoid these problems just add the exclusions in the configuration section of the plugin in the pom.xml file. For example:

       
        <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-resources-plugin</artifactId>
        <version>2.7</version>
        <configuration>
          <delimiters>
            <delimiter>§</delimiter>
          </delimiters>
          <useDefaultDelimiters>false</useDefaultDelimiters>
          <nonFilteredFileExtensions>
            <nonFilteredFileExtension>ico</nonFilteredFileExtension>
            <nonFilteredFileExtension>jpg</nonFilteredFileExtension>
            <nonFilteredFileExtension>png</nonFilteredFileExtension>
            <nonFilteredFileExtension>eot</nonFilteredFileExtension>
            <nonFilteredFileExtension>svg</nonFilteredFileExtension>
            <nonFilteredFileExtension>woff</nonFilteredFileExtension>
            <nonFilteredFileExtension>woff2</nonFilteredFileExtension>
            <nonFilteredFileExtension>ttf</nonFilteredFileExtension>
          </nonFilteredFileExtensions>
        </configuration>
      </plugin>

in this case all files with extensions declared in nonFilteredFileExtensions will only be copied to the output dir without being filtered.

Spring Boot, Spring Batch and exit codes

When creating batches to be invoked by a scheduler, it is very important to correctly manage the JVM exit codes.

When creating batches to be invoked by a scheduler, it is very important to correctly manage the JVM exit codes.
By convention, the JVM ends with an exit code equal to zero if there were no problems, otherwise with an exit code greater than zero.
In this way if the batch is not terminated correctly, interpreting the exit code, the scheduler can for example inform the application manager via email, or adopt strategies to relaunch or recover the batch itself, or terminate a job box.

If you use Spring Boot to start a Spring Batch-based batch, the JVM always ends with an exit code of zero, even in the case of runtime exceptions. In order to correctly manage the JVM exit codes, it is necessary to intervene by means of an ExitCodeGenerator.

The application stack is composed of:

Spring Core 4.0.7
Spring Boot 1.1.8
Spring Batch 3.0.1

in the class that configures the batch, we need to add the following methods:

@Bean public JobExecutionExitCodeGenerator jobExecutionExitCodeGenerator() {

return new JobExecutionExitCodeGenerator();

}

protected JobExecution addToJobExecutionExitCodeGenerator(JobExecution jobExecution) {

JobExecutionExitCodeGenerator jobExecutionExitCodeGenerator = jobExecutionExitCodeGenerator(); jobExecutionExitCodeGenerator.onApplicationEvent(new JobExecutionEvent(jobExecution)); return jobExecution;

}

as ExitCodeGenerator we can use the default implementation of Spring Boot which is JobExecutionExitCodeGenerator. So in the addToJobExecutionExitCodeGenerator method we pass the jobExecution to the exit code generator forcing the creation of the JobExecutionEvent event. When we launch the job, we must force the call to the addToJobExecutionExitCodeGenerator method:

addToJobExecutionExitCodeGenerator(jobLauncher.run(job(), jobParameters(jobParametersMap)));

In this way, when we end the batch in the Application class, the exit code will be the one actually returned from the batch:

int exitCode SpringApplication.exit(SpringApplication.run(batchConfiguration, args)); System.exit(exitCode);

Spring MVC, Spring Boot and Resources Caching

Modern web application and browser resources caching

Modern web applications use a large number of static resources such as js files, css, fonts, images, etc.
Even if the internet connections are always more efficient, it’s always worth asking if it’s useful to use the resources caching, that’s to say to allow the browser to store static resources in its cache. In general it’s better to find the right compromise, since in reality resources can also change such as the application js files (very rarely those of the application stack).

Our web applications are based on Spring Boot and Spring MVC which by default do not allow the caching of resources.
To enable resource caching, you can intervene in the application config files. For example, if we want the cache to have a validity of one week,
you have to add the following line in the application.properties file:

spring.resources.cache-period=604800

If a particular resource handler is used, configure it for caching. For example, for resources loaded with the webjars protocol, we had to configure a resource handler for compatibility issues with Websphere. Even in this case just configure the cache period like this:

@Override
public void addResourceHandlers(ResourceHandlerRegistry registry) {
super.addResourceHandlers(registry);
registry.addResourceHandler("/webjars/**").addResourceLocations("classpath:/META-INF/resources/webjars/").
setCachePeriod(604800).resourceChain(true).addResolver(new WebJarsResourceResolver(new WasWebJarAssetLocator()));
}

To force the cache to refresh before expiry of the validity period (for example, if you release a new version of the application that requires changes substantial in the js files) two techniques can be used. On the client side, you can force the cache to be refreshed with the classic << Shift + F5 >> from the browser. On the server side you can use the file versioning technique, adding the version in the file name.