Subsetting Data In R


Subsetting Data In R

Data management in R can be somewhat challenging. I have always been able to subset observations without a problem but had struggled with subsetting variables or columns into a new object. Thanks to UCLA’s Institute for Digital Research and Education I was able to grasp the concept and saved a lot of time. Another great resource for learning about R is Quick R

download.file("http://www.openintro.org/stat/data/mlb11.RData", destfile = "mlb11.RData")
load("mlb11.RData")

bb <- mlb11
names(bb)  #Tells us the names of the variables/columns and it will have a number assigned to the variable.

##  [1] "team"         "runs"         "at_bats"      "hits"        
##  [5] "homeruns"     "bat_avg"      "strikeouts"   "stolen_bases"
##  [9] "wins"         "new_onbase"   "new_slug"     "new_obs"

bb1 <- bb[, c(2, 6, 10, 12)]  #This line of code tells R to subset variabless #2Runs, #6 bat_avg, #10 new_onbase, #12 new_obs
names(bb1)  # Check to make sure we did it correctly.

## [1] "runs"       "bat_avg"    "new_onbase" "new_obs"

Now you are able complete a data analysis on these variables. It will be easy to view the correlation between these selected variables.

cor(bb1)

##              runs bat_avg new_onbase new_obs
## runs       1.0000  0.8100     0.9215  0.9669
## bat_avg    0.8100  1.0000     0.8823  0.8671
## new_onbase 0.9215  0.8823     1.0000  0.9373
## new_obs    0.9669  0.8671     0.9373  1.0000

There you go!

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s