R / Notes
This page contains a collection of short notes on using the R statistical software.
Subscribe to updates via its Atom feed, or read other R blogs at R-Bloggers.
This page contains a collection of short notes on using the R statistical software.
Subscribe to updates via its Atom feed, or read other R blogs at R-Bloggers.
This year's Ihaka Lecture is about making R work in government. It was delivered by Peter Ellis, the Director of the Statistics for Development Division at the Pacific Community (SPC).
A security issue has been found with how the R language serializes objects, and patched since.
This blog has been silent for a while, and the Covid-19 pandemic has forced me to ditch my R to-do list for 2021. I did, however, manage to assemble a few R-related things in the past couple of years. This note documents the main one, a Data Science with R (and RStudio) course aimed at social scientists.
This note lists the main things that I will be doing with R next year.
This note documents how the sample()
function has changed since R 3.6.0, and how to reproduce its previous behaviour.
As a complement to the previous note, I have collected every single entity that sponsored a national R conference in France over the last decade.
This note celebrates useR! 2019 in Toulouse by listing a few links about R conferences in France and some resources for French R users.
This note takes a look at some of the new features in Stata 16, which was released this month, and compares those to their R equivalents.
This note lists a few places where one can ask R-related questions and get an answer, usually in no more than a few hours.
The only point of this note is to invite you to fill in the R language survey launched by the R Consortium earlier this month.
This note lists a few of the organizations that are pushing the R language forward, as of early 2017. R is a happy language right now.
The R language is a ‘DSL’ – a domain-specific language. The domain that it deals with, however, is not well-defined. In this note, I call R a “data science language” and link to a few resources that make the point better than I could.
As a complement to my note on R as a data science language, this note lists ten other technologies that you might want to learn to use, or at least monitor, if you are interested in learning data science.
This note briefly introduces the tidykml
package, which turns basic KML geometries into tidy data frames that can be visualized with ggplot2
.
Per request from a couple of students in a course on open data that I contribute to, here's a short guide to the "why" and "how" questions about (Web) scraping, with links to examples to illustrate the usefulness of the technique.
Note to self – Remember to serialize R objects as RDS files when it makes sense.
This note explains to compile Hadley Wickham's ggplot2 book on Mac OS.
My collection of R notes is now slightly over one year old. This note reflects on how useful the exercise of blogging about R has been so far, and answers some of the questions that I have received about it.
This note is a follow-up to the previous one. It shows how to use student-submitted keywords to find clusters of shared interests between the students.
This note is addressed to the GLM Fall 2016 students who are currently taking my Statistical Reasoning and Quantitative Methods course at Sciences Po in Paris.
Inspired by the awesome R list that I mentioned a few months ago, I have started the awesome-network-analysis list, which features a large section on R packages.
This note is a ~~shameless plug~~ demo of the ggnetwork
package, which provides several geoms to plot network objects with ggplot2
, and which just got published on CRAN. See the package vignette for a more detailed guide to its functionalities.
This note lists a few of the mistakes that one can make before submitting a package to CRAN. The list is based on my own mistakes when submitting the ggnetwork
package to CRAN for the first time (see this other note for comments about the package itself).
Over at his blog “One Tip per Day”, Xianjun Dong has produced an excellent list of “15 Practical Tips For a Bioinformatician”. This note is my own version of these tips, aimed at social scientists who need to write sustainable (i.e. reproducible) code for either individual or collective research projects.
This note discusses the results of this project, which collects legislative data from several European parliaments (plus Israel). The project is coded in R, which has had consequences on its development.
This note explains how to use an application launcher along with text expansion and shell commands to accomplish a few specific tasks that can be useful to R users.
This note documents the small but growing microverse of R packages on CRAN to produce various forms of exponential random graph models (ERGMs), which are a kind of modelling strategy akin to logistic regression for dyadic data.
dplyr
and ggplot2
This note shows a quick way to draw convex hulls, using dplyr
and ggplot2
.
httr
This note shows how to use the httr
package to scrape the results of a search form.
This note shows how to use the stringr
package to clean a list of full names that need to be turned into unique identifiers, i.e. something that can be assigned as row names to a data frame.
This note describes how I compile(d) my R-related bookmarks on Pinboard.
The RStudio IDE is a central component of the R software ecology that makes it easier to code in R, or to use R with other tools. This note discusses its use in a teaching environment.
Karl Broman has written a nice blog post to recommend writing unit tests for R packages. Here are a few more pointers on how to write these tests.
This note translates the code from an interesting blog post (in French) from Python to R. The code includes a function to compute closeness vitality with the igraph
package.
This note describes a few ways to handle network objects (which might be objects of class igraph
or network
, or data frames representing edge lists) through graphical methods that rely on ggplot2
.
The disparity filter algorithm by Serrano et al. is a network reduction technique to identify the ‘backbone’ of a weighted network. This note explains how to implement the algorithm in full, based on existing implementations and on Serrano et al.'s paper.
This note explains how to implement two edge weighting schemes that are relevant to co-authorship networks, based on my work on legislative cosponsorship networks.
Here are a few things that I have learnt while working with R network objects, using the igraph
and network
+ sna
packages (the last two packages go well together).
Working with statistical model results in R often means that the user has to learn about the class of the model to further manipulate it. A few packages can help with that.
Qin Wenfeng maintains a “curated list of awesome R frameworks, packages and software”, which also includes links to websites, books and other resources to learn R.