A personal R to-do list for 2021

This note lists the main things that I will be doing with R next year.

I took some kind of a break from R over the past 18 months. I plan to change that this coming year, and have compiled the following list of things that I want to explore, or come back to.

R Markdown

While I have fully transitioned towards “tidy data” and its wonderful packages, I am still not the type of R user who works in R Markdown documents (notebooks) like David Robinson so brilliantly illustrates in his videos.

I hope to get there next year, because it makes sense from a reproducibility perspective.

Panel models

Most of my quantitative methods courses involve either basic frequentist regression models, or models used on panel data by political scientists, who have imported a lot of their modelling standards from econometrics and political economy. The models and replication material that come with published articles are mostly coded in Stata.

One thing that I need to do this coming year is to check how easy (or difficult) it is, in 2021, to replicate those Stata-coded panel data models in R. This will involve looking mostly at cross-section time-series (CSTS) data, usually measured at the country level, and fitting some of the regression models available in Stata.

Web scraping

I have been wanting to upgrade my Web scraping skills for some time, and while my initial plan was to learn enough Python to switch to that language (and its excellent libraries) for Web scraping, I keep coming back to R due to lack of proper learning time and too little professional incentives to code in Python.

The specific thing that I will be coming back to is headless browsing.

Network models

Network models, and exponential random graph models (ERGMs) in particular, have improved a lot in the past few years. I have followed the literature at a distance, and need to dive into it again, especially for the part that focuses either on generalizing ERGMs beyond binary responses, or on taking time (temporal dependence) into account.

Bayesian models

I received my copy of Gelman, Hill and Vehtari's Regression and Other Stories this summer, and do not want that book to end up with Harrell's Regression Modeling Strategies and McElreath's Statistical Rethinking on my list of books that I want to read, but might never end up doing so. (The list is much longer than that, and also has everything by Hastie and Tibshirani on it.)

Let's call this to-do item my annual attempt at transitioning further towards Bayesianism, which is made easy in R thanks to the rstanarm package, to Bürkner's brms package, to Harrell's rmsb package, and McElreath's rethinking package.

Machine learning

The tidymodels framework and Julia Silge's videos offer a nice invitation to dive deeper into those things that many of us explored when "machine learning" was the keyword that any aspiring methodologist (or data scientist, or else) had to know something about.

Going back to learning about machine learning is something that I look forward to, and which I plan to do while looking again at Cosma Shalizi's course on data mining from 2019.

The to-do list above will have to finds its place alongside coding in Stata for one of my oldest courses, plus reading about many other things that have little to do with R or statistics Let's see if that will happen.

Update (December 14, 2020): corrected the authors of the brms package, with thanks to Dieter Menne, who spotted and reported the error.

  • First published on December 13th, 2020