comSysto bei XING

10
.
07
.
2013

Java-R-Integration with JRI for On-Demand Predictions

This article provides you with a short overview of how to use JRI for using R from within a Java application. In particular, it will give you an understanding of how to use this technology for on-demand predictions based on R models.

Christian

Lean Java Expert 

This article provides you with a short overview of how to use JRI for using R from within a Java application. In particular, it will give you an understanding of how to use this technology for on-demand predictions based on R models.

Note: Trivial aspects such as constant definitions or exception handling are omitted in the provided code snippets.

What is JRI?

JRI is a Java/R Interface providing a Java API to R functionality. A JavaDoc specification of this interface (org.rosuda.JRI.Rengine) can be found here. The project homepage describes how to initially set up JRI in various environments.

Typical Use Cases for On-Demand Predictions via JRI

Classification or numeric prediction models embedded in R scripts can originate from legacy implementations or conscious decisions to use R for a certain use case. Typically, a static set of data is used to train and validate a model which can then be applied to another static set of unclassified data. However, this approach rather aims at deriving general insights from data sets than predicting concrete instances. It is in particular insufficient for systems with real-time user interaction, for example for custom welcome screens depending on the estimated value of a user that has just registered.

Hello R World from Java

After installing R as well as the JRI package, any Java application will be able to instantiate org.rosuda.JRI.Rengine after adding the corresponding JARs to its build path. The following simplistic example demonstrates how we can use the R interface.

github:ac2d40c9b4a3391670ca

Calling Rengine.eval(String) corresponds to typing commands to the R console and hence provides access to any required functionality. Note that even in this trivial example two separate calls share a common context which is maintained throughout the lifecycle of a Rengine instance. Objects of org.rosuda.JRI.REXP encapsulate any output from R to the user. Depending on the evaluated command, other methods than REXP.asString() may be suitable for extracting its result (see JavaDoc).

Running .R scripts from Java

Even though it would be possible to implement large R scripts in Java by passing each statement to Rengine.eval(String), this is much less convenient then writing or even re-using traditional .R scripts. So let’s have a look at how we can achieve the same result with a slightly different solution.

Project structure:

github:4a98e3fcf2989c50203c

helloWorld.R:

github:4e2cdacb68b0630ccbd4

HelloRWorld2.java:

github:5298a762efca74a26f92

Any .R script can be executed like this and all variables it adds to the context will be accessible via JRI. However, this code does not work if the Java application is packaged as a JAR or WAR archive because the .R script will not have a valid absolute path. In this case, copying the script to a regular folder (e.g. java.io.tmpdir) at runtime and passing the temporary file to R is a feasible workaround.

Training R Models with Java Application Data

Now that we know how to execute .R scripts using JRI we are able to integrate prediction models based on R into a Java application. The only remaining question is: how can R access the required data for training the model? The easiest way is to use the following file-based approach. We will build a linear regression model that predicts y from x1 and x2.

1. Extract suitable data from the Java persistence layer and store it in a temporary .csv file.

github:decba0c78a03e725c4b6

2. Pass the location of this file to R using JRI.

github:28f226821cfc9ed6c20d

3.Execute a .R script to build the model. The script needs to be syntactically compatible with the extracted .csv file.

github:da6d512b942902dabfea

After executing this script, the resulting model will be available for any future calls until the entire application or the Rengine instance is re-initialized.

On-Demand Predictions using the R Model

With the knowledge we already have, predicting a new instance (x1,x2) with unknown y is now pretty straightforward:

github:81594e52a9ded695dbdc

If you have any feedback, please write to Christian.Kroemer@comsysto.com!

Themen:

Kommentare?

comSysto Logo