dplyr 0.5 - new functions: part - I
dplyr version 0.5 introduced several new functions:
- coalesce()
- case_when()
- if_else()
- na_if()
- near()
- recode()
- union_all()
- summarise_all(), mutate_all()
- summarise_at() and mutate_at()
- summarise_if() and mutate_if()
- select_if()
Let’s take a look at the first five.
coalesce()
library(dplyr)
The coalesce()
function takes two or more vectors as arguments and finds the first non-missing value at each position. It serves a similar purpose as the COALESCE SQL function.
It is easy to illustrate what the function does with a simple example:
y <- c(NA, 2, NA, NA, 5)
z <- c(NA, NA, 3, 4, NA)
w <- c(10, 20, 30, NA, NA)
coalesce(y, z, w)
## [1] 10 2 3 4 5
All vectors must be of the same type, if you try to mix different types it will result in an error:
z <- c(NA, NA, "3", "4", NA)
coalesce(y, z)
## Error: Vector 1 has type 'character' not 'double'
The function also recycles the second or more vectors, to deal with vectors of different sizes. For instance:
coalesce(y, 0)
## [1] 0 2 0 0 5
if_else()
The if_else()
function is very similar to the ifelse()
function from base R. There a few differences. The first one is that it takes a third argument to replace missing values.
x <- c(1:5, NA)
ifelse(x < 3, "a", "b")
## [1] "a" "a" "b" "b" "b" NA
if_else(x < 3, "a", "b", "c")
## [1] "a" "a" "b" "b" "b" "c"
The new function also preserves types, the type is defined by the true vector. This means that if_else()
is type-safe, and any incompatible types will raise an error:
ifelse(x < 3, x, "b")
## [1] "1" "2" "b" "b" "b" NA
if_else(x < 3, x, "b")
## Error: `false` has type 'character' not 'integer'
Finally the function is faster than the base R function. The following example illustrates this:
library(microbenchmark)
set.seed(7867)
x <- runif(10000)
microbenchmark(
ifelse(x < 0.5, "a", "b"),
if_else(x < 0.5, "a", "b")
)
## Unit: microseconds
## expr min lq mean median uq
## ifelse(x < 0.5, "a", "b") 1990.063 2041.839 2727.0666 2108.643 2945.422
## if_else(x < 0.5, "a", "b") 457.695 478.701 668.0394 491.841 563.120
## max neval
## 33813.663 100
## 1609.154 100
case_when()
The case_when()
function is a general vectorised if
and else if
statements.
It takes a sequence of two-sided formulas. The left-hand side determines the match, and the right-hand side the value. It is very useful to replace concatenated if-else statements.
Unfortunately, the function does not yet work inside mutate()
, but this will be solved in a future version of dplyr
accordingly to Hadley Wickham.
It is possible to use it inside mutate()
using a simple workaround:
df <- data_frame(x = 1:10, y = 10:1)
df %>% mutate(z = case_when(.$x > 5 & .$y < 5 ~ 1,
.$x < 4 & .$y > 7 ~ 2,
TRUE ~ 0))
## # A tibble: 10 × 3
## x y z
## <int> <int> <dbl>
## 1 1 10 2
## 2 2 9 2
## 3 3 8 2
## 4 4 7 0
## 5 5 6 0
## 6 6 5 0
## 7 7 4 1
## 8 8 3 1
## 9 9 2 1
## 10 10 1 1
The arguments are evaluated in the order that they are inserted. So in the previous example z will only be equal to zero if the previous statments are false.
na_if()
This is a useful function that changes a certain value to NA. Take the following data frame:
df <- data_frame(x = c(-99, rnorm(3), -99, -99))
df
## # A tibble: 6 × 1
## x
## <dbl>
## 1 -99.0000000
## 2 1.1762076
## 3 1.1065201
## 4 -0.5262338
## 5 -99.0000000
## 6 -99.0000000
If we want to substitute all the values of -99 to NA we can do:
df %>%
mutate(x = na_if(x, -99))
## # A tibble: 6 × 1
## x
## <dbl>
## 1 NA
## 2 1.1762076
## 3 1.1065201
## 4 -0.5262338
## 5 NA
## 6 NA
near()
near()
allows the comparison of two floating point vectors within a given tolerance.
Let’s take the following vector:
x <- c(sqrt(2) ^ 2, sqrt(3) ^ 2)
x[1] == 2
## [1] FALSE
x[2] == 3
## [1] FALSE
We can compare the x
vector with c(2,3)
vector using near()
:1
near(x, c(2,3))
## [1] TRUE TRUE
The function takes a third argument that defines the tolerance (accuracy) of the comparison. The default value is 1.490116e-08
.
-
Confused about the results? If so, take a look at What every computer scientist should know about floating-point arithmetic ↩︎