R in RStudio

Connect to RStudio

RStudio is a graphical interface designed for R programming language. Let us connect to RStudio and then, we will explore its content.

IFB web RStudio

Clikc → https://rstudio.cluster.france-bioinformatique.fr

rstudio_ifb

Enter the user name and password, then sign-in !

This is a RStudio web sever hosted by French Institute of Bioinformatics. You cannot install anything there, your work is stored there (in France), and must be declared as a research project to the IFB.

IFB Jupyter lab

Click → https://jupyterhub.cluster.france-bioinformatique.fr/hub/

Enter the user name and password, then sign-in !

jupyter_ifb

In Jupyterlab, you can do more than R, but we won’t cover it on this session. Just click on RStudio button and enjoy. This is a RStudio web sever hosted by French Institute of Bioinformatics. You cannot install anything there, your work is stored there (in France), and must be declared as a research project to the IFB.

Use your own RStudio server

Open your favorite terminal, and simply write:

rstudio

And that’s all. This is you local RStudio. No one can easily access it, so it will be more difficult to share your work. You also rely on your (small) computer rather than a big computing cluster. However, you can install anything you want.

Rstudio

First sight

rstudio

Studio displays 4 large panes. Their position may be changed based on your preference. Here are mines:

Pane names and postions
left right
upper Script pane Environment/History pane
lower Console pane Help/Files

Info: your four panes may be blank while two of mines are filled with text. We’ll come on that later.

The console pane (lower left)

console_pane_image

This is a simple R console. Open your bash terminal, enter the following command: R, and you will get the same console.

Warning: Here, we are in a RStudio session powered by the IFB. Your local RStudio might differ: the version of R, the list of available packages, etc. On your local machine, RStudio console will match with the R available in your terminal.

Let’s try to enter the command print():

print("Hello World")
[1] "Hello World"

We just used a function, called print. This function tries to print on screen everything provided between parenthesis ( and ). In this particular case, we gave the character string "Hello World", and the function print successfully printed it on screen !

Now click on Session -> Save Workspace as and save the current work space. What append in the R console pane? You saw it! A command has been automatically written. For me, it is:

save.image("./SingleCell.RData")

When you need help with R, whether on a function error, on a script result or anything alike, please save your work space and send-it to your favorite R-developer. This contains everything you did in your session.

Info: There is a syntax coloration, there is a good autocompletion and parameter suggestion. If I ever see anyone writing down a complete command without typing the tabulation key, then I’ll have to steal their dessert. And I’m always hungry enough for desserts.

The environment/history pane (upper right)

envhit_pane

Environment

This pane has three tabs: Environment, History and Connections.

Environment lists every single variable, object or data loaded in R. This includes only what you typed yourself and does not include environment variables. Example; in you console pane, enter the following command:

zero <- 0;  # May also be written zero = 0

What append in the Environment pane ? You’re right: a variable is now available!

env_my_var

When a more complex object is declared in your work space, then some general information may be available. Example:

small_table <- data.frame("col_a"=c(1, 3), "col_b"=c(2, 4));

You can see the dataframe. Click on it to have a preview of the data it contains, then click on the light-blue arrow have a deeper insight of its content:

df_expanded_env

Now click on Session -> Clear Work space: and see your work disappear. This action cannot be undone. While it is useful to clear one work space from time to time in order to avoid name space collisions, it is better to save your work space before.

History

goto_history

This tab is quite important: while you test and search in the console, your history keeps a track of each command line you entered. This will definitely help you to build your scripts, to pass your command lines to your coworkers, and to revert possible unfortunate errors.

Each history is related to a session. You may see many commands in your history. Some of them are not even listed in your console. R Studio in writes there every command, even the ones that were masked for the sake of your eyes (knitting commands, display commands, help commands, etc.)

Your history has a limit. This limit is set by an environment variable called R_HISTSIZE (standing for: R History Size). It may be checked with the function Sys.getenv() and set with the function Sys.setenv():

Sys.getenv("R_HISTSIZE")
Sys.setenv(R_HISTSIZE = new_number)

The help/plot/file pane

help_file_pane

Help

This is maybe the most important pane of your R Studio. THIS is the difference between R Studio and another code editor. Search for any function here and not on the internet. This pane shows you the available help for YOUR version of R, YOUR version of a given package.

Concurrent version might have both different default parameters and different interfaces. Please be sure over the internet, to copy and type commands that are not harmfull for your computer.

Never ever copy code from the internet right to your terminal. Why? Example: https://www.wizer-training.com/blog/copy-paste

File

Just like any file explorer, we can move accross directories, create folders and file, delete them, etc.

create_working_dir_rstudio

Or use the function dir.create():

dir.create("Intro_R")

You should change your working directory right now:

setwd

Or use setwd():

setwd("Intro_R")

You can send data from your computer to a distant RStudio (e.g. on the IFB):

upload

You can delete files:

download

or use the function file.remove():

file.remove("example.txt")

The script pane (upper left)

script_pane

This is where you write your R scripts. This also accepts other languages (e.g. bash, python, …), but R Studio shines for its R integration.

Please, please ! Write your commands in the Script pane, then execute them by hitting CTRL + Enter. This is very much like your lab-workbook: the history panel only keeps a limited number of function in memory while this script keeps your commands in a file on your disk. You may share it, edit it, comment it, etc.

script_pane

TLDR – Too Long Didn’t Read

Graphic interface presentation :

  1. Write command lines in Script pane (upper left)
  2. Execute command lines by hitting CTRL + Enter from script pane et see them in the console.
  3. Have a look at the environment and history in case on the upper right pane
  4. Search for help in the lower right pane.

R – Basics

Variables and types

Numbers

Remember, a variable is the name given to a value stored in memory. Example 3, the number three, exists in R. You can store it in a variable with the arrow operator <-:

three <- 3

With the code above, the number 3 is stored in a variable called “three”. You can do this in R with anything. Literally anything. Whole files, pipelines, images, anything.

Maths in R works the same as your regular calculator:

3 + three # Add
[1] 6
1 - 2 # Subtract
[1] -1
4 / 2 # Divide
[1] 2
3 * 4 # Multiply
[1] 12
7 %/% 2 # Floor division
[1] 3

Characters

Characters are delimited with quotes: either double " or ' simple:

four <- "4"
five <- '5'

# The example below is a very good example of
# how to never ever name a variable.
<- "happy"

Mathematics do not work with characters at all … Try the following:

"4" + 1
four + 1

You can try to turn characters in numbers with the function: as.numeric:

as.numeric("4") + 1
[1] 5
as.numeric(four) + 1
[1] 5

A function is a R command that is followed by parenthesis ( and ). Between these parenthesis, we enter arguments. Use the help pane to have information about the list of arguments expected and/or understood by a given function.

As said previously, you can store any of the previously typed commands in a variable:

five <- as.numeric("4") + 1
two <- 1 + (0.5 * 2)
print(five)
[1] 5
print(two)
[1] 2

Please! Please! Give your variable a name understandable by humans. I don’t want to see any of you calling their variable “a”, “b”, my_var”, …

Tricky Question:

I have two numbers: mysterious_number_7, and suspicious_number_7. When I apply the function print on them, it return 7. They are both numeric. However, they are not equal … Why ?

# Show the value of the variable mysterious_number_7
print(mysterious_number_7)
[1] 7
# Show the value of the number suspicious_number_7
print(suspicious_number_7)
[1] 7
# Check that mysterious_number_7 is a number
is.numeric(mysterious_number_7)
[1] TRUE
# Check that suspicious_number_7 is a number
is.numeric(suspicious_number_7)
[1] TRUE
# Check that values of mysterious_number_7 and suspicious_number_7 are equal
mysterious_number_7 == suspicious_number_7
[1] FALSE
# Check that values of mysterious_number_7 and suspicious_number_7 are identical
identical(mysterious_number_7, suspicious_number_7)
[1] FALSE

We will talk about difference between equality and identity later.

Answer

This is due to the number of digits displayed in R. You are very likely to have issues with that in the future, as all (bio)informatician around the world.

mysterious_number_7 <- 7.0000001
suspicious_number_7 <- 7
print(mysterious_number_7)
[1] 7
print(suspicious_number_7)
[1] 7
mysterious_number_7 == suspicious_number_7
[1] FALSE
identical(mysterious_number_7, suspicious_number_7)
[1] FALSE

You can change the number of displayed digits with the function options(): options(digits=100)

Boolean

Aside from characters and numeric, there is another very important type in R (and computer science in general): booleans. There are two booleans: TRUE and FALSE.

3 > 4
[1] FALSE
10 < 2
[1] FALSE

Data structures

Vector

You can make vectors and tables in R. Don’t panic, there will be no maths in this presentation.

In R, vectors are created with the function c:

one2three <- c("1", "2", "3", "4", "10", "20")
print(one2three)
[1] "1"  "2"  "3"  "4"  "10" "20"

One can select an element of the vector with squared brackets [ and ]:

one2three[1]
[1] "1"

One can select multiple elements of a vector with ::

one2three[2:4]
[1] "2" "3" "4"

Question 1: Is there a difference between these two vectors ?

c_vector <- c("1", "2", "3")
n_vector <- c( 1,   2,   3 )
Answer

There is a difference indeed: c_vector contains characters, n_vector contains numeric.

print(c_vector)
[1] "1" "2" "3"
print(n_vector)
[1] 1 2 3
print(is.numeric(c_vector))
[1] FALSE
print(is.numeric(n_vector))
[1] TRUE
identical(c_vector, n_vector)
[1] FALSE

You can always use the function identical to test equality with robustness and exactitude.

You may have learned about the operator == for equality. But this is not perfect, look at our example:

c_vector == n_vector
[1] TRUE TRUE TRUE

The operator == is not aware of types.

Another example, mixing numeric and boolans:

1 == TRUE
[1] TRUE
identical(1, TRUE)
[1] FALSE

In computer science, there is a reason why boolean and integers are mixed. We won’t cover this reason now. It’s out of our scope. Feel free to ask if you’re interested in history and maths.

Question 2: Can I include both text and numbers in a vector ?

mixed_vector <- c(1, "2", 3)
Answer

No. We can not mix types in a vector. Either all its content is made of number or all its content is made of characters.

Here, all our values have been turned into characters:

print(mixed_vector)
[1] "1" "2" "3"
print(is.numeric(mixed_vector))
[1] FALSE
print(is.character(mixed_vector))
[1] TRUE
print(all(is.numeric((mixed_vector))))
[1] FALSE
print(all(is.character((mixed_vector))))
[1] TRUE

Above, the function all returns TRUE if all its content equals to TRUE.

Data Frame

In R, tables are created with the function data.frame:

one2three4 <- data.frame(c(1, 3), c(2, 4))
print(one2three4)
  c.1..3. c.2..4.
1       1       2
2       3       4

You can rename columns and row names respectively with function colnames and rownames.

colnames(one2three4) <- c("Col_1_3", "Col_2_4")
rownames(one2three4) <- c("Row_1_2", "Row_3_4")
print(one2three4)
        Col_1_3 Col_2_4
Row_1_2       1       2
Row_3_4       3       4

You can access a column and a line of the data frame using squared brackets [ and ]. Use the following syntax: [row, column]. Use either the name of the row/column or its position.

# Select a row by its name
print(one2three4["Row_1_2", ])
        Col_1_3 Col_2_4
Row_1_2       1       2
# Select a row by its index
print(one2three4[1, ])
        Col_1_3 Col_2_4
Row_1_2       1       2
 # Select a column by its name
print(one2three4[, "Col_1_3"])
[1] 1 3
 # Select a column by its index
print(one2three4[, 1])
[1] 1 3
 # Select a cell in the table
print(one2three4["Row_1_2", "Col_1_3"])
[1] 1
# Select the first two rows and the first column in the table
print(one2three4[1:2, 1]) 
[1] 1 3

If you like maths, you will remember the order [row, column]. If you’re not familiar with that, then you will do like 99% of all software engineer: you will write [column, row], and you will get an error. Trust me. 99%. Remember, an error is never a problem in informatics

Question 3: Can I mix characters and numbers in a data frame row ?

Answer

Yes, it is possible:

mixed_data_frame <- data.frame(
  "Character_Column" = c("a", "b", "c"),
  "Number_Column" = c(4, 5, 6)
)
print(mixed_data_frame)
  Character_Column Number_Column
1                a             4
2                b             5
3                c             6

The function str can be used to look at the types of each elements in an object.

str(mixed_data_frame)
'data.frame':   3 obs. of  2 variables:
 $ Character_Column: chr  "a" "b" "c"
 $ Number_Column   : num  4 5 6
str(one2three4)
'data.frame':   2 obs. of  2 variables:
 $ Col_1_3: num  1 3
 $ Col_2_4: num  2 4

Question 2: Can I mix characters and numbers in a data frame column ?

Answer

No:

mixed_data_frame <- data.frame(
  "Mixed_letters" = c(1, "b", "c"),
  "Mixed_numbers" = c(4, "5", 6)
)
print(mixed_data_frame)
  Mixed_letters Mixed_numbers
1             1             4
2             b             5
3             c             6
str(mixed_data_frame)
'data.frame':   3 obs. of  2 variables:
 $ Mixed_letters: chr  "1" "b" "c"
 $ Mixed_numbers: chr  "4" "5" "6"

Read a table as data frame

Exercise: Use the Help pane to find how to use the function read.csv.

Use the function read.csv to:

  1. open the file ./example_table.csv.
  2. this table has a header (TRUE).
  3. this table has row names in the column called “Gene_id”.

Let all other parameters to their default values.

Save the opened table in a variable called example_table.

Solution
example_table <- read.csv(
  file="./example_table.csv", 
  header=TRUE, 
  row.names="Gene_id"
)

Now let us explore this dataset.

We can click on environment pane:

see_in_the_env_pane

And if you click on it:

open_example_table

Be careful, large table may hang your session.

Alternatively, we can use the function head which prints the first lines of a table:

head(example_table)
        Sample1   Sample2   Sample3   Sample4
Caml   9.998194 10.004116  9.172489  9.139667
Scamp5 9.995917 10.818685 11.417558 14.907892
Dgki   9.993974 13.664396 16.132275 17.420057
Mas1   9.993956 11.370854 11.233629  9.912863
Apba1  9.992540 14.253438 14.001228 13.654701
Phkg2  9.980898  8.748654  8.714821  9.146529

The function summary describes the dataset per sample:

summary(example_table)
    Sample1          Sample2           Sample3           Sample4       
 Min.   : 9.944   Min.   :  6.838   Min.   :  5.551   Min.   :  5.844  
 1st Qu.: 9.953   1st Qu.:  9.000   1st Qu.: 10.120   1st Qu.:  9.779  
 Median : 9.971   Median : 10.954   Median : 11.326   Median : 11.905  
 Mean   :18.937   Mean   : 19.836   Mean   : 20.828   Mean   : 21.412  
 3rd Qu.: 9.994   3rd Qu.: 12.647   3rd Qu.: 12.650   3rd Qu.: 13.968  
 Max.   :99.784   Max.   :105.077   Max.   :112.188   Max.   :111.820  

Have a look at the summary of the dataset per gene, using the function t to transpose:

head(t(example_table))
             Caml    Scamp5      Dgki      Mas1    Apba1    Phkg2    Timm8b
Sample1  9.998194  9.995917  9.993974  9.993956  9.99254 9.980898  99.78373
Sample2 10.004116 10.818685 13.664396 11.370854 14.25344 8.748654 105.07739
Sample3  9.172489 11.417558 16.132275 11.233629 14.00123 8.714821 112.18819
Sample4  9.139667 14.907892 17.420057  9.912863 13.65470 9.146529 109.09544
            Capn7     Yrdc    Coq10a   Gm27000    Lrrc41    Acadsb    Pdzd11
Sample1  9.976005 9.971093  9.970835  9.965511  9.960667  9.959179  9.952750
Sample2 11.314599 8.905508  8.820582  7.414795  9.961954 11.261520  9.031553
Sample3 11.452421 7.367243 10.449131  7.709008 10.435298 12.336088 10.700876
Sample4 11.692871 9.375526 10.865062 13.126211  9.137375 12.703318 10.832218
          Smarca2   Gm26079     Ptpn5    Rexo2     Ifi27   Snhg20
Sample1  9.952224  99.51466  9.947524  9.94634  9.943989 9.943724
Sample2  9.272424 103.08963 11.090058 13.36391 12.407626 6.838499
Sample3 11.194709 109.85654 11.572261 11.47744 13.591186 5.551247
Sample4 12.117571 111.82050 10.255021 12.29288 14.906542 5.843670
summary(t(example_table))
      Caml            Scamp5            Dgki             Mas1       
 Min.   : 9.140   Min.   : 9.996   Min.   : 9.994   Min.   : 9.913  
 1st Qu.: 9.164   1st Qu.:10.613   1st Qu.:12.747   1st Qu.: 9.974  
 Median : 9.585   Median :11.118   Median :14.898   Median :10.614  
 Mean   : 9.579   Mean   :11.785   Mean   :14.303   Mean   :10.628  
 3rd Qu.:10.000   3rd Qu.:12.290   3rd Qu.:16.454   3rd Qu.:11.268  
 Max.   :10.004   Max.   :14.908   Max.   :17.420   Max.   :11.371  
     Apba1            Phkg2           Timm8b           Capn7       
 Min.   : 9.993   Min.   :8.715   Min.   : 99.78   Min.   : 9.976  
 1st Qu.:12.739   1st Qu.:8.740   1st Qu.:103.75   1st Qu.:10.980  
 Median :13.828   Median :8.948   Median :107.09   Median :11.384  
 Mean   :12.975   Mean   :9.148   Mean   :106.54   Mean   :11.109  
 3rd Qu.:14.064   3rd Qu.:9.355   3rd Qu.:109.87   3rd Qu.:11.513  
 Max.   :14.253   Max.   :9.981   Max.   :112.19   Max.   :11.693  
      Yrdc           Coq10a          Gm27000           Lrrc41      
 Min.   :7.367   Min.   : 8.821   Min.   : 7.415   Min.   : 9.137  
 1st Qu.:8.521   1st Qu.: 9.683   1st Qu.: 7.635   1st Qu.: 9.755  
 Median :9.141   Median :10.210   Median : 8.837   Median : 9.961  
 Mean   :8.905   Mean   :10.026   Mean   : 9.554   Mean   : 9.874  
 3rd Qu.:9.524   3rd Qu.:10.553   3rd Qu.:10.756   3rd Qu.:10.080  
 Max.   :9.971   Max.   :10.865   Max.   :13.126   Max.   :10.435  
     Acadsb           Pdzd11          Smarca2          Gm26079      
 Min.   : 9.959   Min.   : 9.032   Min.   : 9.272   Min.   : 99.51  
 1st Qu.:10.936   1st Qu.: 9.722   1st Qu.: 9.782   1st Qu.:102.20  
 Median :11.799   Median :10.327   Median :10.573   Median :106.47  
 Mean   :11.565   Mean   :10.129   Mean   :10.634   Mean   :106.07  
 3rd Qu.:12.428   3rd Qu.:10.734   3rd Qu.:11.425   3rd Qu.:110.35  
 Max.   :12.703   Max.   :10.832   Max.   :12.118   Max.   :111.82  
     Ptpn5            Rexo2            Ifi27            Snhg20     
 Min.   : 9.948   Min.   : 9.946   Min.   : 9.944   Min.   :5.551  
 1st Qu.:10.178   1st Qu.:11.095   1st Qu.:11.792   1st Qu.:5.771  
 Median :10.673   Median :11.885   Median :12.999   Median :6.341  
 Mean   :10.716   Mean   :11.770   Mean   :12.712   Mean   :7.044  
 3rd Qu.:11.211   3rd Qu.:12.561   3rd Qu.:13.920   3rd Qu.:7.615  
 Max.   :11.572   Max.   :13.364   Max.   :14.907   Max.   :9.944  

TLDR – Too Long Didn’t Read

# Declare a variable, and store a value in it:
three <- 3

# Basic operators: + - / * work as intended:
six <- 3 + 3

# Quotes are used to delimiter text:
seven <- "7"

# You cannot perform maths on text:
"7" + 8 # raises an error
seven + 8 # also raises an error
six + 8 # works fine

# You can change the type of your variable with:
as.numeric("4") # the character '4' becomes the number 4
as.character(10) # the number 10 becomes the character 10

# You can compare values with:
six < seven
six + 1 >= seven
identical(example_table, mixed_data_frame)


# You can load and save a dataframe with:
read.table(file = ..., sep = ..., header = TRUE)
write.table(x = ..., file = ...)

# Create a table with:
my_table <- data.frame(...)

# Create a vector with:
my_vector <- c(...)

# You can see the firs lines of a dataframe with:
head(example_table)

# Search for help in the help pane or with:
help(function)

R – Packages

What are modules and packages

Modules and package are considered to be the same thing in this lesson. The difference is technical and does not relates to our session.

Most of the work you are likely to do with R will require one or several packages. A Package is a list of functions or pipelines shipped under a given name. Avery single function you use through R comes from a package or another.

Read the very first line of the help pane:

help(head)

It reads: help {utils}. The function help comes from the package utils.

# Call the function "help", with the argument "example_table"
head(example_table, 1)
      Sample1  Sample2  Sample3  Sample4
Caml 9.998194 10.00412 9.172489 9.139667
# Call the function "help" ***from the package utils***, with the argument "example_table"
utils::head(example_table, 1)
      Sample1  Sample2  Sample3  Sample4
Caml 9.998194 10.00412 9.172489 9.139667

Warning: Sometime, two package may have a function with the same name. They are most certainly not doing the same thing. IMHO, it is a good habbit to always call a function while disambiguating the package name. utils::help() is better than help() alone.

Install a package

You may install a new package on your local computer. You shall not do it on a cluster. The IFB core cluster you are working on today is shared and highly valuable ; no one can install anything besides the official maintainers.

The following lines are written for instruction purpose and should not be used on IFB core cluster.

Use install.packages() to install a package.

# Install a package with the following function
install.packages("tibble")

This will raise a prompt asking for simple questions : where to download from (choose somewhere in France), whether to update other packages or not.

Do not be afraid by the large amount of things prompted in the console and let R do the trick.

Alternatively, you can click Tool -> Install Packages in RStudio.

You can list installed packages with installed.packages(), and find for packages that can be updates with old.packages(). These packages can be updated with update.packages().

While the function install.packages() searches packages in the common R package list, many bioinformatics packages are available on other shared packages warehouses. Just like AppleStore and GoogleStore do not have the same applications on mobile, R has multiple sources for its packages. You need to know one of them, and one only Bioconductor.

bioconductor

One can use Bioconductor with the function BiocManager::install():

# Install BiocManager, a package to use Bioconductor
install.packages("BiocManager")

# Install a package from Bioconductor
BiocManager::install("singleCellTK")

Use a package

You can load a package with the function library():

library(package="singleCellTK")

If there is no error message, then you can try:

help(singleCellTK::importCellRanger)

And search for help about how to run your command.

Alternatively, there is a more complete help page, with the function vignette():

vignette(package="singleCellTK")

TLDR – Too Long Didn’t Read

# Install a package with the following function
install.packages("BiocManager")

# Load a package
library("BiocManager")

# Install a package frio Bioconductor
BiocManager::install("singleCellTK")

# Get help
vignette(package="singleCellTK")

R – Single Cell

R for SingleCell does not differ from classic R work, but with the list of the packages and functions used.

Load and save R objects

While working on your projects and leaning this week, you will process datasets in R. The results of these analyses will be stored on variables. This means, that when you close RStudio, some of this work might be lost.

We already saw the function save.image() to save a complete copy of your working environment.

However, you can save only the content of a give variable. This is useful when you want to save the result of a function (or a pipeline) but not the whole 5 hours of work you’ve been spending on how-to-make-that-pipeline-work-correctly.

The format is called: RDS for R Data Serialization. This is done with the function saveRDS():

saveRDS(object = example_table, file = "example_table.RDS")

You can also load a RDS into a variable. This is useful when you receive a RDS from a coworker, or you’d like to keep going your work from a saved point. This is done with the function readRDS():

example_table <- readRDS(file = "example_table.RDS")
head(example_table)
        Sample1   Sample2   Sample3   Sample4
Caml   9.998194 10.004116  9.172489  9.139667
Scamp5 9.995917 10.818685 11.417558 14.907892
Dgki   9.993974 13.664396 16.132275 17.420057
Mas1   9.993956 11.370854 11.233629  9.912863
Apba1  9.992540 14.253438 14.001228 13.654701
Phkg2  9.980898  8.748654  8.714821  9.146529

Single Cell Tool Kit – Overview

SingleCellTK is the graphic interface over Single Cell analysis that we are going to use in this session.

The interface is quite easy to catch on:

SingleCellTK_overview

A black header holds every section of your analysis. We will get into these sections in details through the week.

SingleCellTK_data

Click on Data button to develop a drop-down menu.

If you’re on a trusted SingleCellTK server (yours, or IFB RStudio), then you may upload your data using CSV-count matrices:

SingleCellTK_import

Warning: If you hold human-related genomic datasets. You cannot upload these data anywhere. This is illegal, and doing surch thing may lead to 5 years in prison and up to 300 000€ fine. Art. 226-16, Section 5, Code pénal.

Global options – default web browser

Some of you may have an web-browser error. Either on your local computer, or on the IFB cluster.

Example of the possible error:

> singleCellTK::singleCellTK()
Loading required package: shiny

Listening on http://127.0.0.1:7290
Error in utils::browseURL(appUrl) : 
  'browser' must be a non-empty character string

In human readable terms, this means “Please tell me where to find a web-browser”. R uses global options to store paths to external tools (including web browsers).

Look at the browser your R is looking for:

getOption("browser")
[1] ""

See ? It’s empty. Now fill this with the path to your favorite web browser of all time. You can find the path to your favorite web-browser with the following bash command line:

which firefox
/snap/bin/firefox

And then, fill it to R options:

options(browser = "/usr/bin/firefox")

Error solved !

Global options – SingleCellTK Matrix error

If you use SingleCellTK for the upcoming months, please note that SingleCellTK suffers from a very new R update error. For short: SingleCellTK may stop on error at launch depending on you R version.

In details: the signature of a function in the package Matrix has been updated. This leads to an error in old versions of SingleCellTK.

It is a good practice to maintain package versions within a work project. If you update a package (whether by need, or by will), then you should restart your work from the begining. This stands as long as you’re not 100% sure the update does not affect your results.

This will lead the following error message:

singlecelltk_error_matrix

Today, we assure you this error message is raised by a simple change in a function name. R has been told to stop when someone uses the “old” function name. We can tell R to keep going. Let’s add two parameters in R global options:

library(Matrix)
## 1. Override Matrix error status from calling with a deprecated scheme
## 2. Force using web browser rather than Rstudio (required for better display, zoom in.out, ...)
options(
  'Matrix.warnDeprecatedCoerce' = 0, 
  "shiny.launch.browser" = .rs.invokeShinyWindowExternal
)
## Launch scTK
singleCellTK::singleCellTK()

Problem solved !

Why R for EBAII SingleCell ?

No programming language is better than any other. Anyone saying the opposite is (over)-specialized in the language they are advertising. This week, we are going to use a package called “Single Cell Toolkit”. This package appends to be written in R. You are already learning to write both bash and R scripts, let’s not add another one.

In the field of bioinformatics, languages used by the community are quite limited. While learning bash cannot be escaped nowadays, it is not enough to perform a complete analysis with publication ready figures and results. You should be interested in another programming language: R and/or Python.

Please, note that this advice is valid today, but may change. Other programming languages are used, some have lost their place on the podium, and others are trying to supersede bash, R, and Python.

Anyway Python is the best programming language in the WORLD. Don’t listen to Bastien.