Learn R (Full Tutorial)


Introduction


What is R

R is a popular programming language used for statistical computing and graphical presentation.

Its most common use is to analyze and visualize data.


Why Use R?

  • It is a great resource for data analysis, data visualization, data science and machine learning
  • It provides many statistical techniques (such as statistical tests, classification, clustering and data reduction)
  • It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc++
  • It works on different platforms (Windows, Mac, Linux)
  • It is open-source and free
  • It has a large community support
  • It has many packages (libraries of functions) that can be used to solve different problems


Get Started


How to Install R

To install R, go to https://cloud.r-project.org/ and download the latest version of R for Windows, Mac or Linux.

When you have downloaded and installed R, you can run R on your computer.

The screenshot below shows how it may look like when you run R on a Windows PC:

If you type 5 + 5, and press enter, you will see that R outputs 10.


Learning R at W3Schools

When learning R at W3Schools.com, you can use our “Try it Yourself” tool, which shows both the code and the result in your browser. This will make it easier for you to test and understand every part as we move forward:

Example

5 + 5

Result:

[1] 10


Syntax


Syntax

To output text in R, use single or double quotes:

Example

“Hello World!”

To output numbers, just type the number (without quotes):

Example

5
10
25

To do simple calculations, add numbers together:

Example

5 + 5


Print Output


Print

Unlike many other programming languages, you can output code in R without using a print function:

Example

“Hello World!”

However, R does have a print() function available if you want to use it. This might be useful if you are familiar with other programming languages, such as Python, which often uses the print() function to output code.

Example

print(“Hello World!”)

And there are times you must use the print() function to output code, for example when working with for loops (which you will learn more about in a later chapter):

Example

for (x in 1:10) {
print(x)
}

Conclusion: It is up to you whether you want to use the print() function to output code. However, when your code is inside an R expression (e.g. inside curly braces {} like in the example above), use the print() function to output the result.



Comments


Comments can be used to explain R code, and to make it more readable. It can also be used to prevent execution when testing alternative code.

Comments starts with a #. When executing code, R will ignore anything that starts with #.

This example uses a comment before a line of code:

Example

# This is a comment
“Hello World!”

This example uses a comment at the end of a line of code:

Example

“Hello World!” # This is a comment

Comments does not have to be text to explain the code, it can also be used to prevent R from executing the code:

Example

# “Good morning!”
“Good night!”

Multiline Comments

Unlike other programming languages, such as Java, there are no syntax in R for multiline comments. However, we can just insert a # for each line to create multiline comments:

Example

# This is a comment
# written in
# more than just one line
“Hello World!”


Variables


Creating Variables in R

Variables are containers for storing data values.

R does not have a command for declaring a variable. A variable is created the moment you first assign a value to it. To assign a value to a variable, use the <- sign. To output (or print) the variable value, just type the variable name:

Example

name <- “John”
age <- 40

name   # output “John”
age    # output 40

Try it Yourself »

From the example above, name and age are variables, while "John" and 40 are values.

In other programming language, it is common to use = as an assignment operator. In R, we can use both = and <- as assignment operators.

However, <- is preferred in most cases because the = operator can be forbidden in some context in R.


Print / Output Variables

Compared to many other programming languages, you do not have to use a function to print/output variables in R. You can just type the name of the variable:

Example

name <- “John Doe”

name # auto-print the value of the name variable

However, R does have a print() function available if you want to use it. This might be useful if you are familiar with other programming languages, such as Python, which often use a print() function to output variables.

Example

name <- “John Doe”

print(name) # print the value of the name variable

And there are times you must use the print() function to output code, for example when working with for loops (which you will learn more about in a later chapter):

Example

for (x in 1:10) {
print(x)
}

Conclusion: It is up to your if you want to use the print() function or not to output code. However, when your code is inside an R expression (for example inside curly braces {} like in the example above), use the print() function if you want to output the result.


Concatenate Elements

You can also concatenate, or join, two or more elements, by using the paste() function.

To combine both text and a variable, R uses comma (,):

Example

text <- “awesome”

paste(“R is”, text)

You can also use , to add a variable to another variable:

Example

text1 <- “R is”
text2 <- “awesome”

paste(text1, text2)

For numbers, the + character works as a mathematical operator:

Example

num1 <- 5
num2 <- 10

num1 + num2

If you try to combine a string (text) and a number, R will give you an error:

Example

num <- 5
text <- “Some text”

num + text

Result:

Error in num + text : non-numeric argument to binary operator

Multiple Variables

R allows you to assign the same value to multiple variables in one line:

Example

# Assign the same value to multiple variables in one line
var1 <- var2 <- var3 <- “Orange”

# Print variable values
var1
var2
var3


Variable Names

A variable can have a short name (like x and y) or a more descriptive name (age, carname, total_volume). Rules for R variables are:

  • A variable name must start with a letter and can be a combination of letters, digits, period(.)
    and underscore(_). If it starts with period(.), it cannot be followed by a digit.
  • A variable name cannot start with a number or underscore (_)
  • Variable names are case-sensitive (age, Age and AGE are three different variables)
  • Reserved words cannot be used as variables (TRUE, FALSE, NULL, if…)
# Legal variable names:
myvar <- “John”
my_var <- “John”
myVar <- “John”
MYVAR <- “John”
myvar2 <- “John”
.myvar <- “John”

# Illegal variable names:
2myvar <- “John”
my-var <- “John”
my var <- “John”
_my_var <- “John”
my_v@ar <- “John”
TRUE <- “John”

Remember that variable names are case-sensitive!



Data Types


In programming, data type is an important concept.

Variables can store data of different types, and different types can do different things.

In R, variables do not need to be declared with any particular type, and can even change type after they have been set:

Example

my_var <- 30 # my_var is type of numeric
my_var <- “Sally” # my_var is now of type character (aka string)

R has a variety of data types and object classes. You will learn much more about these as you continue to get to know R.


Basic Data Types

Basic data types in R can be divided into the following types:

  • numeric – (10.5, 55, 787)
  • integer – (1L, 55L, 100L, where the letter “L” declares this as an integer)
  • complex – (9 + 3i, where “i” is the imaginary part)
  • character (a.k.a. string) – (“k”, “R is exciting”, “FALSE”, “11.5”)
  • logical (a.k.a. boolean) – (TRUE or FALSE)

We can use the class() function to check the data type of a variable:

Example

# numeric
x <- 10.5
class(x)

# integer
x <- 1000L
class(x)

# complex
x <- 9i + 3
class(x)

# character/string
x <- “R is exciting”
class(x)

# logical/boolean
x <- TRUE
class(x)



Numbers


There are three number types in R:

  • numeric
  • integer
  • complex

Variables of number types are created when you assign a value to them:

Example

x <- 10.5   # numeric
y <- 10L    # integer
z <- 1i     # complex

Numeric

numeric data type is the most common type in R, and contains any number with or without a decimal, like: 10.5, 55, 787:

Example

x <- 10.5
y <- 55

# Print values of x and y
x
y

# Print the class name of x and y
class(x)
class(y)


Integer

Integers are numeric data without decimals. This is used when you are certain that you will never create a variable that should contain decimals. To create an integer variable, you must use the letter L after the integer value:

Example

x <- 1000L
y <- 55L

# Print values of x and y
x
y

# Print the class name of x and y
class(x)
class(y)


Complex

complex number is written with an “i” as the imaginary part:

Example

x <- 3+5i
y <- 5i

# Print values of x and y
x
y

# Print the class name of x and y
class(x)
class(y)


Type Conversion

You can convert from one type to another with the following functions:

  • as.numeric()
  • as.integer()
  • as.complex()

Example

x <- 1L # integer
y <- 2 # numeric

# convert from integer to numeric:
a <- as.numeric(x)

# convert from numeric to integer:
b <- as.integer(y)

# print values of x and y
x
y

# print the class name of a and b
class(a)
class(b)



Math


Simple Math

In R, you can use operators to perform common mathematical operations on numbers.

The + operator is used to add together two values:

Example

10 + 5

And the - operator is used for subtraction:

Example

10 – 5

Built-in Math Functions

R also has many built-in math functions that allows you to perform mathematical tasks on numbers.

For example, the min() and max() functions can be used to find the lowest or highest number in a set:

Example

max(51015)

min(51015)


sqrt()

The sqrt() function returns the square root of a number:

Example

sqrt(16)

abs()

The abs() function returns the absolute (positive) value of a number:

Example

abs(-4.7)

ceiling() and floor()

The ceiling() function rounds a number upwards to its nearest integer, and the floor() function rounds a number downwards to its nearest integer, and returns the result:

Example

ceiling(1.4)

floor(1.4)



Strings


Strings are used for storing text.

A string is surrounded by either single quotation marks, or double quotation marks:

"hello" is the same as 'hello':

Example

“hello”
‘hello’

Assign a String to a Variable

Assigning a string to a variable is done with the variable followed by the <- operator and the string:

Example

str <- “Hello”
str # print the value of str

Multiline Strings

You can assign a multiline string to a variable like this:

Example

str <- “Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.”

str # print the value of str

However, note that R will add a “\n” at the end of each line break. This is called an escape character, and the n character indicates a new line.

If you want the line breaks to be inserted at the same position as in the code, use the cat() function:

Example

str <- “Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.”

cat(str)


String Length

There are many usesful string functions in R.

For example, to find the number of characters in a string, use the nchar() function:

Example

str <- “Hello World!”

nchar(str)


Check a String

Use the grepl() function to check if a character or a sequence of characters are present in a string:

Example

str <- “Hello World!”

grepl(“H”str)
grepl(“Hello”str)
grepl(“X”str)


Combine Two Strings

Use the paste() function to merge/concatenate two strings:

Example

str1 <- “Hello”
str2 <- “World”

paste(str1, str2)


Escape Characters

To insert characters that are illegal in a string, you must use an escape character.

An escape character is a backslash \ followed by the character you want to insert.

An example of an illegal character is a double quote inside a string that is surrounded by double quotes:

Example

str <- “We are the so-called “Vikings“, from the north.”

str

Result:

Error: unexpected symbol in "str <- "We are the so-called "Vikings"

To fix this problem, use the escape character \":

Example

The escape character allows you to use double quotes when you normally would not be allowed:

str <- “We are the so-called \”Vikings\“, from the north.”

str
cat(str)

Note that auto-printing the str variable will print the backslash in the output. You can use the cat() function to print it without backslash.

Other escape characters in R:

Code Result
\\ Backslash
\n New Line
\r Carriage Return
\t Tab
\b Backspace


Booleans / Logical Values


Booleans (Logical Values)

In programming, you often need to know if an expression is true or false.

You can evaluate any expression in R, and get one of two answers, TRUE or FALSE.

When you compare two values, the expression is evaluated and R returns the logical answer:

Example

10 > 9    # TRUE because 10 is greater than 9
10 == 9   # FALSE because 10 is not equal to 9
10 < 9    # FALSE because 10 is greater than 9

You can also compare two variables:

Example

a <- 10
b <- 9

a > b

You can also run a condition in an if statement.

Example

a <- 200
b <- 33

if (b > a) {
print (“b is greater than a”)
else {
print(“b is not greater than a”)
}



Operators


Operators are used to perform operations on variables and values.

In the example below, we use the + operator to add together two values:

Example

10 + 5

R divides the operators in the following groups:

  • Arithmetic operators
  • Assignment operators
  • Comparison operators
  • Logical operators
  • Miscellaneous operators

R Arithmetic Operators

Arithmetic operators are used with numeric values to perform common mathematical operations:

Operator Name Example
+ Addition x + y
Subtraction x – y
* Multiplication x * y
/ Division x / y
^ Exponent x ^ y
%% Modulus (Remainder from division) x %% y
%/% Integer Division x%/%y

R Assignment Operators

Assignment operators are used to assign values to variables:

Example

my_var <- 3

my_var <<- 3

3 -> my_var

3 ->> my_var

my_var # print my_var

Note: <<- is a global assigner. It is also possible to turn the direction of the assignment operator. x <- 3 is equal to 3 -> x


R Comparison Operators

Comparison operators are used to compare two values:

Operator Name Example
== Equal x == y
!= Not equal x != y
> Greater than x > y
< Less than x < y
>= Greater than or equal to x >= y
<= Less than or equal to x <= y

R Logical Operators

Logical operators are used to combine conditional statements:

Operator Description
& Element-wise Logical AND operator. It returns TRUE if both elements are TRUE
&& Logical AND operator – Returns TRUE if both statements are TRUE
| Elementwise- Logical OR operator. It returns TRUE if one of the statement is TRUE
|| Logical OR operator. It returns TRUE if one of the statement is TRUE.
! Logical NOT – returns FALSE if statement is TRUE

R Miscellaneous Operators

Miscellaneous operators are used to manipulate data:

Operator Description Example
: Creates a series of numbers in a sequence x <- 1:10
%in% Find out if an element belongs to a vector x %in% y
%*% Matrix Multiplication x <- Matrix1 %*% Matrix2


If … Else


Conditions and If Statements

R supports the usual logical conditions from mathematics:

Operator Name Example
== Equal x == y
!= Not equal x != y
> Greater than x > y
< Less than x < y
>= Greater than or equal to x >= y
<= Less than or equal to x <= y

These conditions can be used in several ways, most commonly in “if statements” and loops.


The if Statement

An “if statement” is written with the if keyword, and it is used to specify a block of code to be executed if a condition is TRUE:

Example

a <- 33
b <- 200

if (b > a) {
print(“b is greater than a”)
}

In this example we use two variables, a and b, which are used as a part of the if statement to test whether b is greater than a. As a is 33, and b is 200, we know that 200 is greater than 33, and so we print to screen that “b is greater than a”.

R uses curly brackets { } to define the scope in the code.


Else If

The else if keyword is R’s way of saying “if the previous conditions were not true, then try this condition”:

Example

a <- 33
b <- 33

if (b > a) {
print(“b is greater than a”)
else if (a == b) {
print (“a and b are equal”)
}

In this example a is equal to b, so the first condition is not true, but the else if condition is true, so we print to screen that “a and b are equal”.

You can use as many else if statements as you want in R.


If Else

The else keyword catches anything which isn’t caught by the preceding conditions:

Example

a <- 200
b <- 33

if (b > a) {
print(“b is greater than a”)
else if (a == b) {
print(“a and b are equal”)
else {
print(“a is greater than b”)
}

In this example, a is greater than b, so the first condition is not true, also the else if condition is not true, so we go to the else condition and print to screen that “a is greater than b”.

You can also use else without else if:

Example

a <- 200
b <- 33

if (b > a) {
print(“b is greater than a”)
else {
print(“b is not greater than a”)
}


Nested If Statements

You can also have if statements inside if statements, this is called nested if statements.

Example

x <- 41

if (x > 10) {
print(“Above ten”)
if (x > 20) {
print(“and also above 20!”)
else {
print(“but not above 20.”)
}
else {
print(“below 10.”)
}


AND

The & symbol (and) is a logical operator, and is used to combine conditional statements:

Example

Test if a is greater than b, AND if c is greater than a:

a <- 200
b <- 33
c <- 500

if (a > b & c > a) {
print(“Both conditions are true”)
}


OR

The | symbol (or) is a logical operator, and is used to combine conditional statements:

Example

Test if a is greater than b, or if c is greater than a:

a <- 200
b <- 33
c <- 500

if (a > b | a > c) {
print(“At least one of the conditions is true”)
}



While Loop


Loops

Loops can execute a block of code as long as a specified condition is reached.

Loops are handy because they save time, reduce errors, and they make code more readable.

R has two loop commands:

  • while loops
  • for loops

R While Loops

With the while loop we can execute a set of statements as long as a condition is TRUE:

Example

Print i as long as i is less than 6:

i <- 1
while (i < 6) {
print(i)
i <- i + 1
}

In the example above, the loop will continue to produce numbers ranging from 1 to 5. The loop will stop at 6 because 6 < 6 is FALSE.

The while loop requires relevant variables to be ready, in this example we need to define an indexing variable, i, which we set to 1.

Note: remember to increment i, or else the loop will continue forever.


Break

With the break statement, we can stop the loop even if the while condition is TRUE:

Example

Exit the loop if i is equal to 4.

i <- 1
while (i < 6) {
print(i)
i <- i + 1
if (i == 4) {
break
}
}

The loop will stop at 3 because we have chosen to finish the loop by using the break statement when i is equal to 4 (i == 4).


Next

With the next statement, we can skip an iteration without terminating the loop:

Example

Skip the value of 3:

i <- 0
while (i < 6) {
i <- i + 1
if (i == 3) {
next
}
print(i)
}

When the loop passes the value 3, it will skip it and continue to loop.


Yahtzee!

If .. Else Combined with a While Loop

To demonstrate a practical example, let us say we play a game of Yahtzee!

Example

Print “Yahtzee!” If the dice number is 6:

dice <- 1
while (dice <= 6) {
if (dice < 6) {
print(“No Yahtzee”)
else {
print(“Yahtzee!”)
}
dice <- dice + 1
}

If the loop passes the values ranging from 1 to 5, it prints “No Yahtzee”. Whenever it passes the value 6, it prints “Yahtzee!”.



For Loop


For Loops

for loop is used for iterating over a sequence:

Example

for (x in 1:10) {
print(x)
}

This is less like the for keyword in other programming languages, and works more like an iterator method as found in other object-orientated programming languages.

With the for loop we can execute a set of statements, once for each item in a vector, array, list, etc..

Example

Print every item in a list:

fruits <- list(“apple”“banana”“cherry”)

for (x in fruits) {
print(x)
}

Example

Print the number of dices:

dice <- c(123456)

for (x in dice) {
print(x)
}

The for loop does not require an indexing variable to set beforehand, like with while loops.


Break

With the break statement, we can stop the loop before it has looped through all the items:

Example

Stop the loop at “cherry”:

fruits <- list(“apple”“banana”“cherry”)

for (x in fruits) {
if (x == “cherry”) {
break
}
print(x)
}

The loop will stop at “cherry” because we have chosen to finish the loop by using the break statement when x is equal to “cherry” (x == "cherry").


Next

With the next statement, we can skip an iteration without terminating the loop:

Example

Skip “banana”:

fruits <- list(“apple”“banana”“cherry”)

for (x in fruits) {
if (x == “banana”) {
next
}
print(x)
}

When the loop passes “banana”, it will skip it and continue to loop.


Yahtzee!

If .. Else Combined with a For Loop

To demonstrate a practical example, let us say we play a game of Yahtzee!

Example

Print “Yahtzee!” If the dice number is 6:

dice <- 1:6

for(x in dice) {
if (x == 6) {
print(paste(“The dice number is”, x, “Yahtzee!”))
else {
print(paste(“The dice number is”, x, “Not Yahtzee”))
}
}

If the loop reaches the values ranging from 1 to 5, it prints “No Yahtzee” and its number. When it reaches the value 6, it prints “Yahtzee!” and its number.


Nested Loops

It is also possible to place a loop inside another loop. This is called a nested loop:

Example

Print the adjective of each fruit in a list:

adj <- list(“red”“big”“tasty”)

fruits <- list(“apple”“banana”“cherry”)
for (x in adj) {
for (y in fruits) {
print(paste(x, y))
}
}



Functions


A function is a block of code which only runs when it is called.

You can pass data, known as parameters, into a function.

A function can return data as a result.


Creating a Function

To create a function, use the function() keyword:

Example

my_function <- function() { # create a function with the name my_function
  print(“Hello World!”)
}

Call a Function

To call a function, use the function name followed by parenthesis, like my_function():

Example

my_function <- function() {
print(“Hello World!”)
}

my_function() # call the function named my_function


Arguments

Information can be passed into functions as arguments.

Arguments are specified after the function name, inside the parentheses. You can add as many arguments as you want, just separate them with a comma.

The following example has a function with one argument (fname). When the function is called, we pass along a first name, which is used inside the function to print the full name:

Example

my_function <- function(fname) {
paste(fname, “Griffin”)
}

my_function(“Peter”)
my_function(“Lois”)
my_function(“Stewie”)

Parameters or Arguments?

The terms “parameter” and “argument” can be used for the same thing: information that are passed into a function.

From a function’s perspective:

A parameter is the variable listed inside the parentheses in the function definition.

An argument is the value that is sent to the function when it is called.


Number of Arguments

By default, a function must be called with the correct number of arguments. Meaning that if your function expects 2 arguments, you have to call the function with 2 arguments, not more, and not less:

Example

This function expects 2 arguments, and gets 2 arguments:

my_function <- function(fname, lname) {
paste(fname, lname)
}

my_function(“Peter”“Griffin”)

If you try to call the function with 1 or 3 arguments, you will get an error:

Example

This function expects 2 arguments, and gets 1 argument:

my_function <- function(fname, lname) {
paste(fname, lname)
}

my_function(“Peter”)


Default Parameter Value

The following example shows how to use a default parameter value.

If we call the function without an argument, it uses the default value:

Example

my_function <- function(country = “Norway”) {
paste(“I am from”, country)
}

my_function(“Sweden”)
my_function(“India”)
my_function() # will get the default value, which is Norway
my_function(“USA”)


Return Values

To let a function return a result, use the return() function:

Example

my_function <- function(x) {
return (5 * x)
}

print(my_function(3))
print(my_function(5))
print(my_function(9))

The output of the code above will be:

[1] 15
[1] 25
[1] 45


Nested Functions

There are two ways to create a nested function:

  • Call a function within another function.
  • Write a function within a function.

Example

Call a function within another function:

Nested_function <- function(x, y) {
a <- x + y
return(a)
}

Nested_function(Nested_function(2,2), Nested_function(3,3))

Example Explained

The function tells x to add y.

The first input Nested_function(2,2) is “x” of the main function.

The second input Nested_function(3,3) is “y” of the main function.

The output is therefore (2+2) + (3+3) = 10.

Example

Write a function within a function:

Outer_func <- function(x) {
Inner_func <- function(y) {
a <- x + y
return(a)
}
return (Inner_func)
}
output <- Outer_func(3# To call the Outer_func
output(5)

Example Explained

You cannot directly call the function because the Inner_func has been defined (nested) inside the Outer_func.

We need to call Outer_func first in order to call Inner_func as a second step.

We need to create a new variable called output and give it a value, which is 3 here.

We then print the output with the desired value of “y”, which in this case is 5.

The output is therefore 8 (3 + 5).


Recursion

R also accepts function recursion, which means a defined function can call itself.

Recursion is a common mathematical and programming concept. It means that a function calls itself. This has the benefit of meaning that you can loop through data to reach a result.

The developer should be very careful with recursion as it can be quite easy to slip into writing a function which never terminates, or one that uses excess amounts of memory or processor power. However, when written correctly, recursion can be a very efficient and mathematically-elegant approach to programming.

In this example, tri_recursion() is a function that we have defined to call itself (“recurse”). We use the k variable as the data, which decrements (-1) every time we recurse. The recursion ends when the condition is not greater than 0 (i.e. when it is 0).

To a new developer it can take some time to work out how exactly this works, best way to find out is by testing and modifying it.

Example

tri_recursion <- function(k) {
if (k > 0) {
result <- k + tri_recursion(k – 1)
print(result)
else {
result = 0
return(result)
}
}
tri_recursion(6)

Global Variables

Variables that are created outside of a function are known as global variables.

Global variables can be used by everyone, both inside of functions and outside.

Example

Create a variable outside of a function and use it inside the function:

txt <- “awesome”
my_function <- function() {
paste(“R is”, txt)
}

my_function()

If you create a variable with the same name inside a function, this variable will be local, and can only be used inside the function. The global variable with the same name will remain as it was, global and with the original value.

Example

Create a variable inside of a function with the same name as the global variable:

txt <- “global variable”
my_function <- function() {
txt = “fantastic”
paste(“R is”, txt)
}

my_function()

txt # print txt

If you try to print txt, it will return “global variable” because we are printing txt outside the function.


The Global Assignment Operator

Normally, when you create a variable inside a function, that variable is local, and can only be used inside that function.

To create a global variable inside a function, you can use the global assignment operator <<-

Example

If you use the assignment operator <<-, the variable belongs to the global scope:

my_function <- function() {
txt <<- “fantastic”
paste(“R is”, txt)
}

my_function()

print(txt)

Also, use the global assignment operator if you want to change a global variable inside a function:

Example

To change the value of a global variable inside a function, refer to the variable by using the global assignment operator <<-:

txt <- “awesome”
my_function <- function() {
txt <<- “fantastic”
paste(“R is”, txt)
}

my_function()

paste(“R is”, txt)