Introduction to Data Analysis

François Briatte and Ivaylo Petev
Sciences Po, Euro-American Campus
Spring 2013

This course is an introduction to analyzing data with the R software. It features some mathematics and statistics as well as some statistical computing and data visualization. You will need a laptop with an Internet connection to follow the class.

To get started, download the entire course. To take a look at what the course material is made of, view it on GitHub first. It's not a large download.

Part 1: Introduction to Statistical Computing. The course starts by one month dedicated to setting up R and learning its basic functionalities. All course logistics will be discussed in these weeks.

1. Setup

Readings: Kabacoff, ch. 1 and Teetor, ch. 1 and 3.


2. Objects

Readings: Teetor, ch. 2 and 5.


3. Functions

Readings: Kabacoff, ch. 5, and Teetor, ch. 8.


4. Data

Readings: Kabacoff, ch. 4, and Teetor, ch. 4 and 6-7.


Part 2: Introduction to Statistical Analysis. The course continues by showcasing some statistical techniques, from finding clusters of related data to modelling relationships between several variables.

5. Clusters

Readings: Kabacoff, ch. 14, and Teetor, ch. 13.4, 13.6 and 13.9.


6. Distributions

Readings: Kabacoff, ch. 6, Teetor, ch. 10, and Urdan, ch. 4-6. See also Urdan, ch. 1-3, if you have forgotten everything about statistics.


7. Differences

Readings: Kabacoff, ch. 7, Teetor, ch. 9, and Urdan, ch. 7, 9 and 14.


8. Models

Readings: Kabacoff, ch. 8 and 11, Teetor, ch. 11, and Urdan, ch. 8, 10 and 13. Skim ANOVA to focus on OLS (simple and multiple linear regression).


Part 3: Introduction to Data Visualization. The course ends by focusing on the graphic dimension of quantitative data. We will also try to have guest speakers to talk about their professional use of data.

9. Visualization in time: Time series

Reading: Teetor, ch. 14. Focus on detrending and read ARIMA only if you plan to earn millions as a financial analyst.


10. Visualization in space: Maps

Special guest: Joël Gombin on reproducible science.


11. Networks

Special guest: Alexandre Léchenet on data-driven journalism.


12. Data-driven advances

Special guest: Samuel Goëta will speak in the Distinguished Lecture Series on open data and open government.


We're done!

Thanks to Baptiste Coulmont, Joël Gombin and Timothée Poisot for very valuable advice and comments, to GitHub for hosting and to users at StackExchange for coding assistance.

Special thanks to the Sciences Po Reims staff and students for indefectible support.

Inspired by Christopher Adolph, Dave Armstrong, Christopher Gangrud, Andrew Gelman, Rebecca Nugent, Gaston Sanchez, Cosma Shalizi, David Sparks and Hadley Wickham.

This course has its own GitHub repository; fork at will. This HTML version was compiled from source on Thursday January 09, 2014.