Archive

Archive for June, 2012

Enabling large file transfers through SOAP with Spring WS, MTOM/XOP and JAXB2

June 13th, 2012 6 comments

Some people might say that using a SOAP web service to transfer large binary files is the wrong way to do it. I agree. If you can handle large file transfers using some other method, do it. But, using SOAP to send the files has some advantages and so I will try and describe how I have gotten it to work using Spring web services and Apache Axis 2 or Apache CXF.

I am going to assume that you already know how to set up web services with spring.

The technology stack that I used is:

  • Spring WS (2.0.4)
  • SAAJ
  • JAXB2
  • Axis 2 or Apache CXF (for the client)

What is MTOM/XOP?

MTOM stands for Message Transmission Optimization Mechanism. It is a method of sending binary data along with an XML SOAP request.

When sending smaller files it is much simpler to base64 encode the file and include the base64 encoded string inside an XML node as text. The receiving server would then decode the string and obtain the binary data and use it as it sees fit.

The problem with this approach when sending a large amount of binary data is 3 fold:

  1. base64 encoding data takes time.
  2. base64 encoding data increases its size by 33%.
  3. Most SOAP implementations will try to load the entire SOAP message into memory causing out of memory errors.

Concerning point number 3. MTOM does not actually handle this itself (MTOM is a method and not an implementation). But it allows implementations to cache the binary data on a hard disk without having to load it into memory. I will talk about this later.

When a SOAP message has an MTOM attachment, the binary data of the attachment is stored “below” the actual XML under a MIME header. This allows the SOAP message to be free of the burden of having to carry a very large string of base64 encoded data. But how do we get the binary data if it isn’t located in the XML itself? This is where XOP comes in.

XOP is a method of referencing the binary data in your message from inside the SOAP XML envelope. When the server or client receives a SOAP message it needs to know where in the message the binary data is. By looking at the XOP node we can find the binary data from the Content ID. It looks something like this:

<xop:Include href="cid:0123456789" xmlns:xop="http://www.w3.org/2004/08/xop/include"/>

The CID lets the SOAP message unmarshaller/receiver know where the binary data is inside the entire payload so it can be referenced later. When the XML is being created it allows you to handle the XML without the binary data as well.

Streaming binary data

Enabling MTOM/XOP sometimes isn’t enough to send large attachments. Certain implementations of SOAP message factories and marshallers still try to load the entire message into memory when it is sent or received. When the data you need to send starts to exceed hundreds of megabytes this starts to become a problem. MTOM does reduce the amount of binary data by 33% but it doesn’t stop the client/server from trying to load it in memory.

So how do we send a 10 gig file with a SOAP request? The answer is by leveraging technologies that support steaming/caching the data without loading it into memory first. Of course there are a few frameworks that allow this functionality. A very popular option is Apache CXF. In my case I needed to use Spring web services inside tomcat.

Using Spring and JAXB2 to stream and cache large MTOM requests

I will begin with the server configuration.

So the ultimate goal here is to use SOAP to send an XML document (that contains XML nodes and information) and include with it VERY large binary attachments without having the client or server run out of memory during or after the transfer.

There is A LOT of misinformation about this on the internet. Some people say that Spring does not support streaming attachments, some people say you need to use the AxiomSoapMessageFactory instead of the default SaajSoapMessageFactory. Some people swear by Apache CXF. In this case we are going to use basic spring web services and JAXB2.

Proper XSD usage

I am going to assume you already have an XSD file(s) that are being auto generated into a WSDL. This is basic spring web services stuff. If you haven’t gotten this far yet you should turn around and do more reading about spring before continuing here.

When using JAXB to marshall/unmarshall XML documents we need to specify datatypes. For instance for strings you might use the datatype xs:string. In our case we need to use a datatype that maps to a Java object that can handle reading/writing files in a way that does not load them into memory completely. For this we need to use a DataHandler object. Here is an example XSD that will auto generate java source files using a DataHandler instead of a byte array to hold binary data:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
	xmlns:tns="http://example.com/ws/file" 
	xmlns:xmime="http://www.w3.org/2005/05/xmlmime"
	targetNamespace="http://example.com/ws/file">
 
	...
 
	<xs:complexType name="FileData">
		<xs:all>
			<xs:element name="fileData" type="xs:base64Binary"
				xmime:expectedContentTypes="*/*" />
		</xs:all>
	</xs:complexType>
 
	...
</xs:schema>

As you can see we import the xmlmime name space and use the xmime:expectedContentTypes=”*/*” attribute on the node that will contain the binary data (or rather the XOP reference to our MIME binary data). With the above XSD we can use JAXB to create a Java object representation of our document that uses a DataHandler instead of a byte array.

Spring configuration

Spring needs to know that we are using MTOM. Specifically the JAXB2 unmarshaller needs to know. If we do not tell the marshaller that we are using MTOM we will get the dreaded out of memory error. Here is an example spring configuration to enable the Jaxb2Marshaller with MTOM enabled:

<bean id="messageReceiver"
	class="org.springframework.ws.soap.server.SoapMessageDispatcher">
	<property name="endpointAdapters">
		<list>
			<ref bean="defaultMethodEndpointAdapter" />
		</list>
	</property>
</bean>
 
<bean id="marshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
	<property name="classesToBeBound">
		<list>
			<!-- When you generate your JAXB classes from your XSD. Place them here. -->
			<value>com.example.ws.MyGeneratedJavaClass</value>
		</list>
	</property>
 
	<!-- This is the important part! -->
	<property name="mtomEnabled" value="true" />
</bean>
 
<bean id="marshallingPayloadMethodProcessor"
	class="org.springframework.ws.server.endpoint.adapter.method.MarshallingPayloadMethodProcessor">
	<constructor-arg ref="marshaller" />
	<constructor-arg ref="marshaller" />
</bean>
 
<bean id="defaultMethodEndpointAdapter"
	class="org.springframework.ws.server.endpoint.adapter.DefaultMethodEndpointAdapter">
	<property name="methodArgumentResolvers">
		<list>
			<!-- Be careful here! You might need to add more processors if you do more than webservices! -->
			<ref bean="marshallingPayloadMethodProcessor" />
		</list>
	</property>
	<property name="methodReturnValueHandlers">
		<list>
			<ref bean="marshallingPayloadMethodProcessor" />
		</list>
	</property>
</bean>

If you have a better way of doing this please let me know. This is not a perfect configuration but does work for a server that will only be doing web services.

SAAJ MimePull System Property

In order for SAAJ to be able to stream attachments and save them to the hard drive we need to enable MimePull. To do so set a JVM system property like so:

-Dsaaj.use.mimepull=true

There is a JIRA entry describing this option.

Temporary folder to store files being received

Since we are not loading the contents of the file into memory we need to store them somewhere while the transmission is taking place. The server or more specifically SAAJ, will use the currently set java temp file location to store the files temporarily. The default location is different based on OS. Here is how you configure the temporary file location for the JVM:

-Djava.io.tmpdir=/path/to/tmpdir

Just add it to your JVM system properties. Do not forget to manually clear this temporary folder periodically. The files are not automatically removed!

The server should now be complete.

Configuring the client (Axis 2)

For the client portion I am using Axis 2 and/or Apache CXF. I am not going to cover generating the client stub classes in this post.

Since the client also needs to be able to receive files and not run out of memory we need to enable file caching and specify a folder to temporarily store the files as they are downloaded (the folder also requires manual cleaning!). Here we enable MTOM, set the caching threshold (at what size should we cache a file or not), set the folder where we want to store the temporary files and set a timeout large enough to support the file size we are uploading:

// Axis 2 configuration
 
// First we set our MTOM settings and options.
Options mtomEnableServiceOptions = new Options();
mtomOptions.setProperty( Constants.Configuration.ENABLE_MTOM, Constants.VALUE_TRUE );
mtomOptions.setProperty( Constants.Configuration.ATTACHMENT_TEMP_DIR, "c:/temp/axisclient/" );
mtomOptions.setProperty( Constants.Configuration.CACHE_ATTACHMENTS, Constants.VALUE_TRUE );
mtomOptions.setProperty( Constants.Configuration.FILE_SIZE_THRESHOLD, "1024" );
mtomOptions.setTimeOutInMilliSeconds( TIMEOUT );
 
MyServiceStub service = new MyServiceStub();
 
// Set the options on the service stub.
service._getServiceClient().setOptions( mtomOptions );
 
// Set the endpoint URL.
EndpointReference wRef = new EndpointReference();
wRef.setAddress( "http://localhost:8080/ws/" );
service._getServiceClient().setTargetEPR( wRef );
 
//At this point you would set the data into your stub object.
MyRequest wRequest = new MyRequest();
 
DataHandler wHldr1 = new DataHandler( new URL(
				"file:///c:/verylargefile.zip" ) );
 
// Stick wHdlr1 into your generated request class.
wRequest.setFileData( wHldr1 );
 
// Send the request.

A word of warning. At the time of writing this post I fell victim to a bug in Axis 2. For some reason it is unable to download and cache MTOM files correctly. To fix this… and I hate to do this. I needed to comment out 1 line from the auto generated client stub.

You can find it inside your generated service stub class.

if (_messageContext.getTransportOut() != null) {
    // Comment out this line.
    //_messageContext.getTransportOut().getSender().cleanup(_messageContext);
}

Configuring the client (Apache CXF)

Apache CXF is slightly more complicated to get setup when generating the classes and stubs but it works out of the box without any bugs. Here is an Apache CXF example for uploading a large attachment to a service:

MyService wService = new MyService(new URL("http://localhost:8080/my.wsdl"));
 
// Get the port.
My wMyClient = wService.getMySoap11();
 
// Set client receive timeout to unlimited.
// If we are sending really large files a timeout would be bad.
Client cl = ClientProxy.getClient( wMyClient );
HTTPConduit http = (HTTPConduit)cl.getConduit();
HTTPClientPolicy httpClientPolicy = new HTTPClientPolicy();
httpClientPolicy.setConnectionTimeout( 0 );
httpClientPolicy.setReceiveTimeout( 0 );
http.setClient( httpClientPolicy );
 
// Set MTOM enabled.
javax.xml.ws.BindingProvider bp = (javax.xml.ws.BindingProvider)wMyClient;
SOAPBinding binding = (SOAPBinding)bp.getBinding();
binding.setMTOMEnabled( true );
 
//At this point you would set the data into your stub object.
MyRequest wRequest = new MyRequest();
 
DataHandler wHldr1 = new DataHandler( new URL(
				"file:///c:/verylargefile.zip" ) );
 
// Stick wHdlr1 into your generated request class.
wRequest.setFileData( wHldr1 );
 
// Send the request
MyResponse wResponse = wMyClient.my( wRequest );

At this point you should be able to send and receive very large files. I have tested (on a local server) a file that was 11 gigabytes without any memory issues.

Another small note. Sending a receiving files on the client side have different behaviors. Usually an Axis or CXF client can receive an MTOM file without many modifications. The above examples are more for sending large files rather than receiving. It might take some tuning for it to work for you.

Categories: Java, Programming, Spring, Tomcat Tags: , ,