Reshaping data in R
Reshaping data is one of the important steps for data analysis and data transformation. Today we’ll see few of the methods and function in R. The tutorials is easy to follow. I’m using mtcars dataset which is available in R.
rbind()
rbind() stands for row bind and is a function which add the data serially. We’ll see the working with few examples. I’m going to use mtcars dataset which is available in R.
This is how rbind() works.
> head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Here we are creating two subsets data1 and data2
data1 <- mtcars[1:10,] data2 <- mtcars[11:20,] data3 <- mtcars[1:10,-11]
We’ll check the dimension of the created data frames. Check the number of rows and number of columns.
> dim(data1) [1] 10 11 > dim(data2) [1] 10 11 > dim(data3) [1] 10 10
Columns should be same in both the data frames. As you can see below.
Rowbind <- rbind(data1, data2) > dim(Rowbind) [1] 20 11
Rowbind1 is throwing error because the columns are same in both the data frames. Rowbind1 <- rbind(data2, data3) Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match
cbind()
cbind() stands for column bind and is a function which add the data parallelly. We’ll see the working with few examples.
data4 <- mtcars[11:15,]
ColBind <- cbind(data1,data2) dim(ColBind) [1] 10 22 ColBind1 <- cbind(data1,data4) dim(ColBind1) [1] 10 22
If length are not same it will keep repeating the rows as you can see below.
cbind(1:2, 1:10) [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 1 3 [4,] 2 4 [5,] 1 5 [6,] 2 6 [7,] 1 7 [8,] 2 8 [9,] 1 9 [10,] 2 10
Merge
Merge() two data frames by common columns or row names, or do other versions of database join operations.
Lets create sample subset of the mtcars dataset.
data5 <- mtcars[sample(1:nrow(mtcars),10),] data6 <- mtcars[sample(1:nrow(mtcars),10),]
If column names are same
merge(data5,data6,by = "mpg" )
If column names are not same it should be mentioned separately
merge(data5,data6,by.x = "mpg",by.y="mpg" )
Further readings and sources:
Keep visiting Analytics Tuts for more tutorials.
Thanks for reading! Comment your suggestions and queries.