Before the Adventure Begins:
Intro to R

Ready to begin?

You’re about to start an adventure to learn R and statistics. If this is your first time working with R, then you should begin on this page. If you’re comfortable with R basics and you’d like to start with the statistical content, please proceed onto the islands with this link.

Using interactive windows

Throughout this material we will use an interactive version of R provided by DataCamp. Although this way of using R comes with some limitations regarding functionality, it will give us the benefit of being able to run R code without switching from a web browser to a standalone application such as RStudio.

Below you can see an example of an interactive R window. On the left side of the window (under the tab titled script.R) is where you will write code to be executed. To run code you have written, click the yellow Run button and observe how the right side of the window gets filled with information. A copy of your code will show up in black, and if no errors were made the answer will be returned right below in brown text. Error messages are red and begin with Error:.

Now spend a few minutes using the Script window below as a calculator and run simple calculations by clicking Run.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIyKzMifQ==

Objects in R

R is an object oriented programming language, meaning when working in R we will create different types of objects and use operators and functions to manipulate these objects.

To create a new object in R we first pick a name for the object and tell R what to assign to it using assignment operators; either = or a small arrow <- made from the combination of < and -.

  • Use the window below to assign numbers to objects and output the content by typing out the name of the object.
  • After creating two numerical objects, use the mathematical operator + to add these objects together and see what happens when you run the code.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJteV9vYmplY3QgPC0gXG5teV9vYmplY3RcblxubXlfb2JqZWN0MiA8LSBcblxubXlfb2JqZWN0ICsgbXlfb2JqZWN0MiJ9

Common functions

Most of the time when we work in R, we will use functions; either pre-written functions or ones we write ourselves. Functions make it easy to use sets of code instructions repeatedly (without filling our scripts with the code underlying the function) and help us carry out multiple tasks in a single step without having to go through the details of how the steps are executed.

To use functions in R, we type the name of the function followed by parentheses (e.g. print( )). Within the parentheses is where we will specify the function input and options, separated by commas ,. One function you will use a lot is the combine function c( ), which as the name suggests lets you concatenate different types of values.

  • In the window below, create an object combining a set of numerical values using c( ), separating the different values with commas.
  • Then sum up the content of this object using sum( ).
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJteV9jb21iaW5lZF92YWx1ZXMgPC0gYygsKSBcbnN1bShteV9jb21iaW5lZF92YWx1ZXMpIn0=

Writing your own function

R makes it easy to create user defined functions by using function( ). Here is how it works:

  • Give your function an object name and assign the function to it, e.g. my_function_name <- function( ).
  • Within the parentheses you specify inputs and options just like how pre-written functions work, e.g. function(input_data)
  • Next, put all the code you want your function to execute inside curly brackets like this: function(input_data){code to run}
  • Use return( ) to specify what you want to your function to output once it is done running the code.

Use the following instructions to complete the function in the window below:

  • We’ve started writing a function for you that will sum up values and take the square root of the sum. To take the square root, we use the sqrt( ) function.
  • Complete this function by filling in input_data for the sum( ), and then filling in the remaining empty parentheses with the appropriate object names.
    • Now create an object containing a set of numerical values and call it my_combined_values. Then try out your new function on this object, which will compute the square root of the object’s sum.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFVzZSB0aGUgaW5zdHJ1Y3Rpb25zIGFib3ZlIHRvIGNvbXBsZXRlIHRoZSBmdW5jdGlvbiBiZWxvd1xubXlfZnVuY3Rpb25fbmFtZSA8LSBmdW5jdGlvbihpbnB1dF9kYXRhKXtcbiAgcyA8LSBzdW0oIClcbiAgc3MgPC0gc3FydCggKVxuICByZXR1cm4oIClcbn1cblxuIyBDcmVhdGUgYSBuZXcgb2JqZWN0IGFuZCB0cnkgb3V0IHlvdXIgbmV3IGZ1bmN0aW9uXG5teV9jb21iaW5lZF92YWx1ZXMgPC0gYygsKSBcblxubXlfZnVuY3Rpb25fbmFtZShteV9jb21iaW5lZF92YWx1ZXMpIn0=

Functions within functions

It is also possible (and sometimes very useful) to put a function within another function. For example, we could combine multiple steps into one like this: sqrt(prod(c(1,2))), resulting in one line of code that both generates the values and directly calculates the square root of the product of those values.

Vectors

The c( ) function will combine values and create a specific type of data structure called an atomic vector. A vector is a simple one-dimensional structure that can contain only one type of value. This means that storing multiple numeric values in a vector, as we have already done, works just fine. But there are other types of values that can be used in R, for example character values. These values are created by putting text or numbers within quotes such as "Giraffe". If we try to combine numeric and character values in the same vector, R will convert both values to the character type.

c("Giraffe", 123)
## [1] "Giraffe" "123"

This behavior is not always desirable, so it is a good idea to try to only combine values of the same type.

Boolean values and logical operators

Another type of variable in R is the boolean (TRUE/FALSE) type. Boolean or logical vectors can only take two different values; TRUE or FALSE (case sensitive). You will see these types of values mostly when logical operators are used to test how different objects relate to each other. For example, the logical operator == can be used to test if two different objects are exactly the same and TRUE will be returned only if the identity is fulfilled.

"Giraffe"=="Teacup"
## [1] FALSE
sqrt(100)==10
## [1] TRUE

Data frames

Vectors are one of multiple data structures in R. We will not cover all types of structures here, but perhaps the most common one encountered when analyzing data in R is the data frame. Data frames are two dimensional, which basically means that data frames let you store multiple vectors in one object. Oftentimes, each vector will be a new column in the data frame. One constraint though is that all vectors/columns need to be of the same length.

  • In the window below, use the data.frame() function to create a data frame with two columns called x and y. When creating data frames we specify a column by giving the column a name and assigning values or vectors to it, e.g. data.frame(x= c("Giraffe", "Teacup")).
  • Also observe what happens when you try to combine an x and y vector of different lengths.
eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJ4X3ZhbHVlcyA8LSBjKCwpXG55X3ZhbHVlcyA8LSBjKCwpXG5cbmQgPC0gZGF0YS5mcmFtZSh4ID0geF92YWx1ZXMsIHkgPSB5X3ZhbHVlcylcbmQifQ==

As you could see above, running a line of code with just the name of a data structure (in this case, the letter d) will print the full data frame in the console output (if no errors were made!). If you instead are interested in only one of the columns, this can be specified by using the $ operator followed by the column’s name, e.g d$x. Try it out below!

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJkIDwtIGRhdGEuZnJhbWUoeD0gYygxMCwgNDUpLCB5PSBjKDIsIDUpKVxuZCR4In0=

You’re ready to go!

This was a brief introduction into R, but now you know what you need to get started with the first module!


This project was created entirely in RStudio using R Markdown with features making use of DataCamp Light windows and published on GitHub pages.