To run R in Gramex, install rpy2 first:
conda install -c r rpy2
This installs a new R to the Anaconda PATH, ignoring the system R.
Caution: You’ll have 2 R
s in your system – the Anaconda R and the system
R. Running R
from the command line will run whichever is first in your PATH.
Installing a package in one does not install a package in the other.
Here is an example of a prime number calculation in R:
See the Python source and the R script.
Call gramex.ml.r('R expression')
to run an R
command and return its result.
import gramex.ml
total = gramex.ml.r('sum(c(1,2,3))') # Add up numbers and return the result
Multi-line commands are allowed. The last line is returned.
total = gramex.ml.r('''
x <- rnorm(10) # Generate 10 random numbers
sum(x) # Add them up and return the value
''')
But avoid multi-line commands. Run .R scripts instead. This lets you re-use the scripts elsewhere, unit-test them, lint them, etc. You also get editor syntax-highlighting.
Gramex runs a single R session. All variables are remembered across calls:
gramex.ml.r('x <- rnorm(10)') # R variable "x" has 10 random numbers
total = gramex.ml.r('sum(x)') # "x" defined earlier can be used
gramex.ml.r('rm(x)') # Now "x" is deleted. Memory is released
Call gramex.ml.r(path='script.R')
to source script.R
.
sieve.R defines a prime number function
sieve(n)
. To load it, use:
gramex.ml.r(path='sieve.R') # Loads relative to the Python file
gramex.ml.r('sieve(10)') # Returns [2, 3, 5, 7] -- primes up to 10
Gramex loads sieve.R
relative to the Python file that calls it. (But specify
an absolute path to play it safe.)
Scripts can source other scripts. For example:
source('sieve.R', chdir=T) # Always use chdir=T when using source()
sieve(n)
All keyword arguments passed to gramex.ml.r()
are available as global
variables to the script. For example:
>>> gramex.ml.r('rnorm(n, mean, sd)', n=3, mean=100, sd=20)
array([125.80012342, 104.30249101, 97.31857082])
In the script above, rnorm(n)
uses the R variables n
, mean
and sd
, which
are set by Gramex by passing keyword arguments.
Pandas Series are automatically converted into R vectors, and vice versa.
>>> gramex.ml.r(
... 'pnorm(x, log.p=log)', # Use variables x and log
... x=pd.Series([0.2, 0.5, 1.0]), # Pandas series converted into a vector
... log=False, # Boolean values converted to R booleans
... )
array([0.57925971, 0.69146246, 0.84134475])
R DataFrames are automatically converted into Pandas DataFrames and vice versa.
>>> gramex.ml.r('data(cars)') # Load the cars dataset in R
>>> cars = gramex.ml.r('cars') # Returns the dataset as a DataFrame
>>> cars.head() # To prove that, print it
speed dist
1 4.0 2.0
2 4.0 10.0
3 7.0 4.0
4 7.0 22.0
5 8.0 16.0
>>> type(cars) # Check the type
<class 'pandas.core.frame.DataFrame'>
It’s OK to pass small data this way. Avoid converting large data though. Instead, pass the path to the data. For example:
gramex.ml.r('data <- read.csv(csv_file)', csv_file='../formhandler/flags.csv')
To get the location of the R script from within the R script, use the here package:
library(rprojroot)
# Loads data.csv from same directory as the R script
path = file.path(dirname(thisfile()), 'flags.csv')
flags = read.csv(path)
Install packages in your R script as you would, normally. For example:
packages <- c('randomForest', 'e1071', 'rpart', 'xgboost')
new.packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if (length(new.packages)) install.packages(new.packages)
# Rest of your script can use the packages above.
library(randomForest)
# ... etc
This installs packages from Microsoft CRAN instead of prompting the user for a repository.
Remember: not all packages install both on Windows and Linux. Choose packages with care.
To render plots, save them into a temporary file. This script plot.R saves a plot to a temporary file and returns the path.
library(grDevices) # This library saves to files
temp <- tempfile(fileext='.png') # Get a temporary PNG file name
png(file=temp, width=512, height=512) # Save graphics to temp file
library(ggplot2) # Use ggplot2 for graphics
plot(ggplot(norm) + aes_string(x='x', y='y') + geom_density2d()) # Draw the plot
dev.off() # Stop saving to file
temp # Return the temp file path
This code renders the plot:
def plot(handler):
path = gramex.ml.r(path='plot.R')
return gramex.cache.open(path[0], 'bin')
Note: Requires conda install -c r r-ggplot2
to be installed
Run computations asynchronously if they take time. This frees up Gramex to handle other requests.
To do this, you must:
@tornado.gen.coroutine
pool = concurrent.futures.ProcessPoolExecutor()
yield pool.submit(gramex.ml.r, **kwargs)
instead
of gramex.ml.r(**kwargs)
For example, here the asynchronous version of the plotting code above:
import concurrent.futures
pool = concurrent.futures.ProcessPoolExecutor()
@tornado.gen.coroutine
def plot_async(handler):
path = yield pool.submit(gramex.ml.r, path='path/to/plot.R')
raise tornado.gen.Return(gramex.cache.open(path[0], 'bin'))
RMarkdown is deprecated since Gramex 1.81 and removed in Gramex 1.83.
Gramex renders RMarkdown files as HTML outputs using
FileHandler
transform rmarkdown
.
Note: This requires conda install -c r r-rmarkdown
to install RMarkdown.
Also saves the HTML file to the directory where .Rmd
files are located.
Use below configuration to renders all *.Rmd
files as HTML:
r/rmarkdown:
pattern: /$YAMLURL/(.*Rmd)
handler: FileHandler
kwargs:
path: $YAMLPATH # path at which Rmd files (.*Rmd) are located
transform:
"*.Rmd": # Any file matching .Rmd
function: rmarkdown(content, handler)
headers:
Content-Type: text/html
Cache-Control: max-age=3600
RMarkdown example
To learn more about Rmarkdown, head over to RStudio’s: Get started with Rmarkdown.