All Posts

dplyr 0.5 - new functions: part - II

29 April 2017
In the previous post I have described the first five functions, introduced by dplyr 0.5, that are listed below: coalesce() case_when() if_else() na_if() near() recode() union_all() summarise_all() and mutate_all() summarise_at() and mutate_at() summarise_if() and mutate_if() select_if() In this post, I’ll describe the others. Meanwhile, the next version of dplyr is just around the corner, and will also bring new features. recode() The recode() function, as the name states, allow the recoding of a vector of values. There is also a similar function for factors: recode_factor(). Let’s take the following data_frame: library(dplyr) d_f <- data_frame(x = c(1:5, NA), y = letters[1:6]) d_f ## # A tibble: 6 × 2 ## x y ## <int> <chr> ## 1 1 a ## 2 2 b ## 3 3 c ## 4 4 d ## 5 5 e ## 6 NA f We can use recode to change numeric or alphanumeric values, but replacements must be all of the same type:

dplyr 0.5 - new functions: part - I

11 March 2017
dplyr version 0.5 introduced several new functions: coalesce() case_when() if_else() na_if() near() recode() union_all() summarise_all(), mutate_all() summarise_at() and mutate_at() summarise_if() and mutate_if() select_if() Let’s take a look at the first five. coalesce() library(dplyr) The coalesce() function takes two or more vectors as arguments and finds the first non-missing value at each position. It serves a similar purpose as the COALESCE SQL function. It is easy to illustrate what the function does with a simple example: y <- c(NA, 2, NA, NA, 5) z <- c(NA, NA, 3, 4, NA) w <- c(10, 20, 30, NA, NA) coalesce(y, z, w) ## [1] 10 2 3 4 5 All vectors must be of the same type, if you try to mix different types it will result in an error:

R's valentine

14 February 2017
There are several ways of plotting a heart shaped function. The following is a simple one using ggplot2: library(ggplot2) heart <- function(x) { h <- suppressWarnings(sqrt(cos(x))*cos(200*x) + sqrt(abs(x)) - 0.7*(4 - x^2)^0.01) h[which(is.nan(h))] <- 0 return(h) } ggplot(aes(x), data = data.frame(x = c(-2,2))) + stat_function(fun = heart, color="red3", geom = "point", n = 15000, alpha=0.3)

R's tidyverse

20 January 2017
Hadley Wickham’s universe of packages along with pipes (%>%) from the magrittr package has transformed the way I use R. They create a new dialect for R and provide a large set of tools for data manipulation and visualisation. They were formerly and informally known as Hadleyverse, but the author prefers the term tidyverse. This set of packages can now be installed and loaded using a wrapper package called tidyverse. One way to learn more about tidyverse is to watch one, or all, of the several talks given by Hadley: Hadley Wickham: Managing many models with R - YouTube Making Data Analysis Easier - YouTube Pipelines for Data Analysis - YouTube Stanford Seminar - Expressing yourself in R - YouTube Stanford Seminar - Expressing yourself in R - YouTube Hadley Wickham’s “dplyr” tutorial at useR 2014 (1/2) - YouTube Hadley Wickham’s “dplyr” tutorial at useR 2014 (2/2) - YouTube Or, even better, read his latest book: R for Data Science.

2015 Summer School in Data Analysis

14 June 2015
This summer Cristina Amado, João Cerejeira , Luís Aguiar-Conraria, Miguel Portela, Priscila Ferreira and I will be teaching at the UMinho-Exec Summer School in Data Analysis. The event will run from Monday, August 31, until Friday, September 11, at the School of Economics and Management, Braga, Portugal. The event is composed of a selected set of intensive courses, designed to enhance methodological skills in data analysis on Regression and Causality, Panel Data, Survival Analysis, Wavelet Analysis, Downside Risk Measures, Financial Risk Management and Financial Market Volatility. Introductory courses to statistical packages Stata and R are also available. You can find more about the instructors of the courses: