Comparison with R / R libraries¶

Since pandas aims to provide a lot of the data manipulation and analysis functionality that people use R for, this page was started to provide a more detailed look at the R language and its many third party libraries as they relate to pandas. In comparisons with R and CRAN libraries, we care about the following things:

Functionality / flexibility: what can/cannot be done with each tool

Performance: how fast are operations. Hard numbers/benchmarks are preferable

Ease-of-use: Is one tool easier/harder to use (you may have to be the judge of this, given side-by-side code comparisons)

This page is also here to offer a bit of a translation guide for users of these R packages.

Base R¶

`subset`¶

New in version 0.13.

The query() method is similar to the base R subset function. In R you might want to get the rows of a data.frame where one column’s values are less than another column’s values:

df <- data.frame(a=rnorm(10), b=rnorm(10))
subset(df, a <= b)
df[df$a <= df$b,]  # note the comma

In pandas, there are a few ways to perform subsetting. You can use query() or pass an expression as if it were an index/slice as well as standard boolean indexing:

In [1]: from pandas import DataFrame

In [2]: from numpy.random import randn

In [3]: df = DataFrame({'a': randn(10), 'b': randn(10)})

In [4]: df.query('a <= b')

          a         b
2 -1.950301  0.173875
3 -1.478332 -0.798063
5 -0.806934  0.141070
8  0.084343  0.879800
9 -0.590813  0.465165

In [5]: df[df.a <= df.b]

          a         b
2 -1.950301  0.173875
3 -1.478332 -0.798063
5 -0.806934  0.141070
8  0.084343  0.879800
9 -0.590813  0.465165

In [6]: df.loc[df.a <= df.b]

          a         b
2 -1.950301  0.173875
3 -1.478332 -0.798063
5 -0.806934  0.141070
8  0.084343  0.879800
9 -0.590813  0.465165

For more details and examples see the query documentation.

`with`¶

New in version 0.13.

An expression using a data.frame called df in R with the columns a and b would be evaluated using with like so:

df <- data.frame(a=rnorm(10), b=rnorm(10))
with(df, a + b)
df$a + df$b  # same as the previous expression

In pandas the equivalent expression, using the eval() method, would be:

In [7]: df = DataFrame({'a': randn(10), 'b': randn(10)})

In [8]: df.eval('a + b')

 -0.316408
  2.764941
  2.079059
 -0.149641
  1.708174
 -0.695574
 -0.513258
  0.543637
  1.373293
  0.466815
dtype: float64

In [9]: df.a + df.b  # same as the previous expression

 -0.316408
  2.764941
  2.079059
 -0.149641
  1.708174
 -0.695574
 -0.513258
  0.543637
  1.373293
  0.466815
dtype: float64

In certain cases eval() will be much faster than evaluation in pure Python. For more details and examples see the eval documentation.

Table Of Contents

Search

Comparison with R / R libraries¶

Base R¶

`subset`¶

`with`¶

zoo¶

xts¶

plyr¶

reshape / reshape2¶

Navigation

Table Of Contents

Search

Comparison with R / R libraries¶

Base R¶

subset¶

with¶

zoo¶

xts¶

plyr¶

reshape / reshape2¶

Navigation

`subset`¶

`with`¶