Data Science with R (and RStudio)

This blog has been silent for a while, and the Covid-19 pandemic has forced me to ditch my R to-do list for 2021. I did, however, manage to assemble a few R-related things in the past couple of years. This note documents the main one, a Data Science with R (and RStudio) course aimed at social scientists.

Historical side note

Around two years ago, I was offered to teach R again at Sciences Po, in Paris, in a spirit close to the Stata-based course that I have been teaching there for over ten years.

I first taught R to social scientists in 2013, but had not repeated the experience since then, except through various short and often focused workshops. I almost got to teach such a course in 2017, just as RStudio Desktop was turning 1.0, but that course failed to materialize.

Many things have changed since 2013, and there is now much higher demand to teach R (and RStudio) to social science audiences. R and RStudio have improved a lot, and the tidyverse, which recently turned 2.0 while still changing a lot, has become a core component of most courses, including mine.

Teaching material

My own attempt to teach R, RStudio and the tidyverse in 2023 has been online for a few months, in the form of a GitHub repository with a few wiki pages, including a long list of readings, videos and Web links, and another list of other R courses.

I have also uploaded a tentative syllabus for the course:

The course has only run once so far, and there are many issues with it that I will try to fix in the coming months. The repository also misses some essential course items (the slides, and the solutions to the exercises), which I am however happy to share privately by email.

A cool aspect of the course is that another instructor, Kim Antunez, will be teaching her own fork of it in the next few weeks. Kim has invested a lot into turning the course into a full-fledged Quarto website, which I will share in a follow-up post once she is done building it.

My own way of teaching the course is more old-school, as I rely on weekly emails and a shared Google Drive folder. I will, however, put some effort in improving the slides and giving the course a Web page, in order to make it more fully and easily accessible online.

Going forward

I feel that I already have enough material to assemble a more advanced R course for social scientists, but first need to streamline this introductory course a bit more, in order to make the reading list, especially, a bit more focused and manageable.

I also feel that there will soon be more changes to the tidyverse that I will have to take into account. I still, for instance, use the %>% pipe for chain operations, whereas the current trend is to use the native |> pipe, introduced in R version 4.1.0, whenever possible.

This note is tangentially related to my previous notes on teaching with RStudio, on R as a data science language and on other technologies for data science.

  • First published on August 23rd, 2023