FAIRbioinfo for bioinformaticians

Introduction to the tools of reproducibility in bioinformatics

C. Hernandez1, T. Denecker1, J. Sellier2 G., Le Corguille2, C. Toffano-Nioche1
1 Institute for Integrative Biology of the Cell (I2BC) UMR 9198, Universit ́e Paris-Sud, CNRS, CEA 91190 - Gif-sur-Yvette, France
2 IFB Core Cluster taskforce
Sept. 2020

Literate programming

Introduction

What is literate programming?

Let us change our traditional attitude to the construction of programs:
Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to humans what we want the computer to do.

— Donald E. Knuth, Literate Programming, 1984

Definition

”Literate programming is a programming paradigm introduced by Donald Knuth in which a computer program is given an explanation of its logic in a natural language, such as English, interspersed with snippets of macros and traditional source code, from which compilable source code can be generated.”
Donald Knuth, 1984.

Wikipedia

What does it look like?

Interactive programming interface allowing to combine both natural and computer languages.

In one file:

Why using literate programming frameworks?

Use cases:

Example of an article entirely written using a notebook

Markup and markdown

Definition :

A markup language uses tags to define elements within a document.

Three different types and usage :

Markdown language

Markdown is a Lightweight markup language.
Designed to be :

You’ve probably see it already on GitHub (README), Wikipedia…

Github guide

But how is this useful for literate programming?

When you want to weave both code (to be interpreted) and formatting information, you precisely need a lightweight language for the formatting part.

The challengers

No need to hide, there are currently two main frameworks used in bioinformatics:
RMarkdown and Jupyter

RMarkdown

At the beginning, there was nothing.

Then came Sweave.
Leisch, Friedrich (2002). ”Sweave, Part I: Mixing R and LaTeX: A short introduction to the Sweave file format and corresponding R functions”
And people saw that the path would be long…

knitR (2011)

”The knitr package was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave,and combine features in other add-on packages into one package”

https://yihui.org/knitr/

”When you run render, R Markdown feeds the .Rmd file to knitr, which executes all of the code chunks and creates a new markdown (.md) document which includes the code and its output> The markdown file generated by knitr is then processed by pandoc which is responsible for creating the finished format.”
https://rmarkdown.rstudio.com

Integrated into RStudio, IDE for R.

R Notebooks

and more…

Jupyter

A bit of history…

https://jupyter.org/

What can it do?
Everything (excepted coffee)

But what is it exactly?
Web-based interactive computational environment.

Dashboard

Notebook editor

Project Jupyter

Inside the Python community (snakemake, conda…)

Integration with GitHub since 2015 (renderer)

Nbviewer : a static renderer for Jupyter notebooks

https://nbviewer.jupyter.org/

Jupyter + Docker = binder

https://mybinder.org/

More to come : Jupyter Lab 1.0

Conclusion ?

Who’s the best?

It depends…

Practical session

Analysis workflow


green=input, blue=tool

Practical session

Savoir FAIRe