How to assign and read external data in R?
Assignment Operator
The most straight forward way to store a list of numbers is through an assignment operator using the c command. (c stands for “combine.”) A list is specified with the c command, and assignment is specified with the “chucha”.
chucha <- c(7,4,7,2)#press enter [1] 7 4 7 2
If you want to get the number by position:
[position should start with 1 but in square bracket [3] like this.]
> chucha[2] [1] 4 > chucha[1] [1] 7 > chucha[0] numeric(0) > chucha[3] [1] 7 > chucha[4] [1] 2
Notice that the first entry is referred to as the number 1 entry, and the zero entry can be used to indicate how the computer will treat the data. You can store strings using both single and double quotes, and you can store real numbers.
You now have a list of numbers and are ready to explore. In the chapters that follow we will examine the basic operations in R that will allow you to do some of the analyses required in class.
Reading a CSV file
We assume that the data file is in the format called “comma separated values” (csv). That is, each line contains a row of values which can be numbers or letters, and each value is separated by a comma. We also assume that the very first row contains a list of labels. The idea is that the labels in the top row are used to refer to the different columns of values.
First we read a very short file name “data”, data file. The data file is called “data.csv” and has three columns of data and six rows. The three columns are labeled “trial,” “mass,” and “velocity.” We can pretend that each row comes from an observation during one of two trials labeled “A” and “B.” A copy of the data file is shown below and is created in defiance of Werner Heisenberg:
data.csv
trial mass velocity A 10 12 A 11 14 B 5 8 B 6 10 A 10.5 13 B 7 11
The command to read the data file is read.csv. We have to give the command at least one arguments. The following command will read in the data and assign it to a variable called “regular”.
> regular regular trial mass velocity 1 A 10.0 12 2 A 11.0 14 3 B 5.0 8 4 B 6.0 10 5 A 10.5 13 6 B 7.0 11
> summary(regular)
“summary” command is to view the summary(like mean, median, min, max) of every variable present in the data file
trial mass velocity A:3 Min. : 5.00 Min. : 8.00 B:3 1st Bu.: 6.25 1st Qu.:10.25 Median : 8.50 Median :11.50 Mean : 8.25 Mean :11.33 3rd Qu.:10.38 3rd Qu.:12.75 Max. :11.00 Max. :14.00
To get more information on the different options available you can use the help command:
> help(read.csv)
If R is not finding the file you are trying to read then it may be looking in the wrong folder/directory. If you are not sure about the current working directory you can use the dir() command to list the files and the getwd() command to determine the current working directory:
dir() [1] "fixedWidth.dat" "data.csv" "trees91.csv" "trees91.wk1" [5] "w1.dat" > getwd() [1] "/home/black/write/class/stat/stat383-13F/dat"
The variable “regular” contains the three columns of data. Each column is assigned a name based on the header (the first line in the file). You can now access each individual column using a “$” to separate the two names:
> regular$trial [1] A A B B A B Levels: A B > regular$mass [1] 10.0 11.0 5.0 6.0 10.5 7.0 > regular$velocity [1] 12 14 8 10 13 11 If you are not sure what columns are contained in the variable you can use the names command:
names(regular) [1] "trial" "mass" "velocity"
That’s how we can read the external file in R tool.