Setting up a reproducible data analysis project in R - featuring GitHub, {renv}, {targets} and more

Abstract

This talk is a follow up to last year’s presentation on the relevance of software engineering best practices in statistics and data science. Here, I will go through how I setup a new data analysis project in R, following some of these best practices to ensure reproducibility and reusability. In particular, I will demonstrate how I use GitHub to version control the code, and how I organise a typical analysis directory. I will also showcase some very useful R packages, such as {renv} to document the computational environment and {targets} to turn the scripts into a reproducible analytical pipeline.

Recording and slides

Slides and corresponding blog post available.

Event information

See the event information on the SSA website.