Sat 03 June 2017 | tags: R, -- (permalink)

Yesterday a colleague told me that he was surprised by the speed differences of R, Julia and Stata when estimating a simple linear model for a large number of observations (10 million). Julia and Stata had very similar results, although Stata was faster. But both Julia and Stata were substantially faster than R. He is an expert and long time Stata user and was sceptical about the performance difference. I was also surprised. My experience is that R is usually fast, for interpreted language, and most procedures that are computationally intensive are developed in a compiled language like C, Fortran ...

Fri 26 May 2017 | tags: R, -- (permalink)

Recently Mike Croucher blogged about Microsoft Azure’s free Jupyter notebooks. He showed that the computational power provided by Microsoft Azure Notebooks is quite considerable. This is a free cloud service, and Jupyter notebooks power the interface. We can use within those notebooks both Python (2.7 and 3.5.1), F#, and R (3.3), and we have the ability to install packages if needed. 2

Although there is a 4Gb memory limit, the notebook has access to fast processors, eight in fact.1 I was curious to see if the service allowed parallel computing, and to my surprise ...

Sat 29 April 2017 | tags: R, -- (permalink)

In the previous post I have described the first five functions, introduced by dplyr 0.5, that are listed below:

In this post, I’ll describe the others. Meanwhile, the next version of dplyr is just around the corner, and will also bring new features.


The recode() function, as the name states, allow the recoding of a vector of values. There is also a similar function for factors: recode_factor().

Let’s take the following data_frame:

d_f <- data_frame(x = c(1 ...

Sat 11 March 2017 | tags: R, -- (permalink)

dplyr version 0.5 introduced several new functions:

Let’s take a look at the first five.



The coalesce() function takes two or more vectors as arguments and finds the first non-missing value at each position. It serves a similar purpose as the COALESCE SQL function.

It is easy to illustrate what the function does with a simple example:

y <- c(NA, 2, NA, NA, 5)
z <- c(NA, NA, 3, 4, NA)
w <- c(10, 20, 30, NA, NA)
coalesce ...

Tue 14 February 2017 | tags: R, -- (permalink)

There are several ways of plotting a heart shaped function. The following is a simple one using ggplot2:


heart <- function(x) {
  h <- suppressWarnings(sqrt(cos(x))*cos(200*x) + sqrt(abs(x)) 
                        - 0.7*(4 - x^2)^0.01)
  h[which(is.nan(h))] <- 0

ggplot(aes(x), data = data.frame(x = c(-2,2))) +
  stat_function(fun = heart, color="red3", 
                geom = "point", n = 15000, alpha=0.3) 

heart function

Fri 20 January 2017 | tags: R, -- (permalink)

Hadley Wickham’s universe of packages along with pipes (%>%) from the magrittr package has transformed the way I use R. They create a new dialect for R and provide a large set of tools for data manipulation and visualisation.

They were formerly and informally known as Hadleyverse, but the author prefers the term tidyverse. This set of packages can now be installed and loaded using a wrapper package called tidyverse.

One way to learn more about tidyverse is to watch one, or all, of the several talks given by Hadley:

Sun 14 June 2015 | tags: r, -- (permalink)

This summer Cristina Amado, João Cerejeira , Luís Aguiar-Conraria, Miguel Portela, Priscila Ferreira and I will be teaching at the UMinho-Exec Summer School in Data Analysis. The event will run from Monday, August 31, until Friday, September 11, at the School of Economics and Management, Braga, Portugal. The event is composed of a selected set of intensive courses, designed to enhance methodological skills in data analysis on Regression and Causality, Panel Data, Survival Analysis, Wavelet Analysis, Downside Risk Measures, Financial Risk Management and Financial Market Volatility. Introductory courses to statistical packages Stata and R are also available.

You can find more ...