Transfer Large Files Using a Rest API

or how to avoid Out of Memory

Simone Maletta
The Startup

--

INTRO

This week my team and I faced an issue I read the first time when I was at the college. I completely forgot it since this wednesday of october: transport a very large file over HTTP.

Requirement and Design

Our customer substituted its CRM with a cloud one and we engaged us to integrate it with the entire software map.

One integration flow posts documents to CRM from a local storage and associates them to stored customer accounts; file size has no upper bound and we assumed 1 GB as a medium value.

All CRM integrations are REST based, no shared folders, no staging DB, only REST API OAUTH1 secured are allowed.

I sketched for you a simplified architectural model in the picture below.

IMG 1 — Solution Architecture

Our application, in the same way as other parts of the application map, runs on premise environment while the CRM is hosted on a cloud tenant.

The exposed API accepts a Multipart body with two parts: one with a JSON doc containing metadata such as filename, customer account id and so on, and the other the binary content of the file.

Standard Solution

The application is made of two parts: the first a file poller which creates a thread every time a new file is seen into the staging folder and one which relates it to the client account and send to CRM.

If you’re interested in creating a file poller, I link you Apache Camel Polling Consumer, it’s a great solution to do it easily.

Here I’m interested in talking to talk you about the way we send files to CRM.

Let’s start coding; here an extract of out pom.xml is:

<dependency>
<groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
</dependency>

Here our standard solution code is

RestTemplate remoteService = new RestTemplate();//HTTP has two parts: Header and Body//Here is the header:HttpHeader header = new HttpHeader();//Here is the bodyMultiValueMap<String,Object>  bodyMap =  new MultiValueMap<String,Object>();
bodyMap.add(“customer_file”,new FileSystemResource(fileName));
bodyMap.add(“customer_name”, customerJSON);
HttpEntity<MultiValueMap> request = new HttpEntity(bodyMap, header);
ResponseEntity<String> restResponse = remoteService .exchange(remoteServiceURL, HttpMethod.POST, request, String);

the customerJson variable is a javax.json.JsonObject; in this way the multipart request choose the right content type autonomously and the same behaviour is expected while using a org.springframework.core.io.FileSystemResource instance.

We did those tests:

  1. send a small file in order to look for some malformed requests
  2. send a huge file proving our application robustness

We faced nothing important regarding this paper with test 1, some header value missing, a wrong URL formatted input and so on.
Test 2 made up us waiting for a couple of minutes then all Java developer nightmare appared

java.lang.OutOfMemoryError: Java heap space

The issue didn’t arise only because we ran the code in a development environment but also because the app tried to load the entire file content in RAM, making it more memory hungry than J. Wellington Wimpy.

It was clear analyzing the application memory footprint, sic et simpliciter.

To recap, this wasn’t a great solution, from an architectural point of view too, because:

  • we cannot assume a maximum size for incoming files
  • we cannot work sequentially on files

We needed to improve it.

The chunked solution

What we needed was to train our code not to load the entire file content in memory but to use the feature HTTP1.1 supports till my years in college: chunked transfer encoding.

This feature tells the server that the incoming request is made of more than one HTTP message and it needs to receive all of them to start processing.

The advantage from the client side point of view rises from the fact that you load in memory only the slice you’re transferring at the moment.

If you would like to know more about how HTTP implements the chunked transfer encoding follow those WIKI, W3C.

We improved our class configuring properly the RestTemplate:

RestTemplate remoteService = new RestTemplate();SimpleClientHttpRequestFactory requestFactory = new SimpleClientHttpRequestFactory();requestFactory.setBufferRequestBody(false); remoteService.setRequestFactory(requestFactory);

We repeated Test 2 and this time it went well.

We observed a memory footprint less than 300 MB for the full transfer of 1.5 GB file! A success!

Conclusions and Regards

In this paper I described the solution we found to transfer a large sized file; you can found multiple others using different libraries.

I would like to add that this feature come only with HTTP1.1 and that HTTP 2 no longer supports chunked transfer encoding; I think you have to look for some sort of streaming API.

Here we chose to use the well known RestTemplate class, instead the newer WebClient: I’m not able to tell you if you can adapt to it.

At the very end I wish to thank Luca and Davide for the time spent working on the full solution which inspired this paper, and, of course, for all the laughs we have everyday.

--

--

Simone Maletta
The Startup

Born in the early ’80s I fall in love with technology at the age of five! Today l work as Solution Architect, Project Manager and Trainer in consulting.