You’re about to start an adventure to learn R and statistics. If this is your first time working with R, then you should begin on this page. If you’re comfortable with R basics and you’d like to start with the statistical content, please proceed onto the islands with this link.
Throughout this material we will use an interactive version of R provided by learnR. Although this way of using R comes with some limitations regarding functionality, it will give us the benefit of being able to run R code without switching from a web browser to a standalone application such as RStudio.
Below you can see an example of an interactive R window. In the white window (under the button with the text Start Over) is where you will write code to be executed. To run code you have written, click the green Run Code button and observe how information pops up under the window. If no errors were made, this is where the answer will be returned. Error messages are shown in a red box, also under the learnR window.
Now spend a few minutes using the window below as a calculator and run simple calculations by clicking Run Code.
R is an object oriented programming language, meaning when working in R we will create different types of objects and use operators and functions to manipulate these objects.
To create a new object in R we first pick a name for the object and tell R what to assign to it using assignment operators; either =
or a small arrow <-
made from the combination of <
and -
.
+
to add these objects together and see what happens when you run the code.Most of the time when we work in R, we will use functions; either pre-written functions or ones we write ourselves. Functions make it easy to use sets of code instructions repeatedly (without filling our scripts with the code underlying the function) and help us carry out multiple tasks in a single step without having to go through the details of how the steps are executed.
To use functions in R, we type the name of the function followed by parentheses (e.g. print( )
). Within the parentheses is where we will specify the function input and options, separated by commas ,
. One function you will use a lot is the combine function c( )
, which as the name suggests lets you concatenate different types of values.
c( )
, separating the different values with commas.sum( )
.R makes it easy to create user defined functions by using function( )
. Here is how it works:
my_function_name <- function( )
.function(input_data)
function(input_data){code to run}
return( )
to specify what you want to your function to output once it is done running the code.Use the following instructions to complete the function in the window below:
sqrt( )
function.input_data
for the sum( ), and then filling in the remaining empty parentheses with the appropriate object names.
my_combined_values
. Then try out your new function on this object, which will compute the square root of the object’s sum.It is also possible (and sometimes very useful) to put a function within another function. For example, we could combine multiple steps into one like this: sqrt(prod(c(1,2)))
, resulting in one line of code that both generates the values and directly calculates the square root of the product of those values.
The c( )
function will combine values and create a specific type of data structure called an atomic vector. A vector is a simple one-dimensional structure that can contain only one type of value. This means that storing multiple numeric values in a vector, as we have already done, works just fine. But there are other types of values that can be used in R, for example character values. These values are created by putting text or numbers within quotes such as "Giraffe"
. If we try to combine numeric and character values in the same vector, R will convert both values to the character type.
c("Giraffe", 123)
## [1] "Giraffe" "123"
This behavior is not always desirable, so it is a good idea to try to only combine values of the same type.
Another type of variable in R is the boolean (TRUE/FALSE) type. Boolean or logical vectors can only take two different values; TRUE or FALSE (case sensitive). You will see these types of values mostly when logical operators are used to test how different objects relate to each other. For example, the logical operator ==
can be used to test if two different objects are exactly the same and TRUE
will be returned only if the identity is fulfilled.
"Giraffe" == "Teacup"
## [1] FALSE
sqrt(100) == 10
## [1] TRUE
Vectors are one of multiple data structures in R. We will not cover all types of structures here, but perhaps the most common one encountered when analyzing data in R is the data frame. Data frames are two dimensional, which basically means that data frames let you store multiple vectors in one object. Oftentimes, each vector will be a new column in the data frame. One constraint though is that all vectors/columns need to be of the same length.
data.frame()
function to create a data frame with two columns called x and y. When creating data frames we specify a column by giving the column a name and assigning values or vectors to it, e.g. data.frame(x= c("Giraffe", "Teacup"))
.As you could see above, running a line of code with just the name of a data structure (in this case, the letter d) will print the full data frame in the console output (if no errors were made!). If you instead are interested in only one of the columns, this can be specified by using the $
operator followed by the column’s name, e.g d$x
. Try it out below!
This was a brief introduction into R, but now you know what you need to get started with the first module!
This project was created entirely in RStudio using R Markdown and published on GitHub pages.