Subsetting Data In R

Data management in R can be somewhat challenging. I have always been able to subset observations without a problem but had struggled with subsetting variables or columns into a new object. Thanks to UCLA’s Institute for Digital Research and Education I was able to grasp the concept and saved a lot of time. Another great resource for learning about R is Quick R

download.file("", destfile = "mlb11.RData")

bb <- mlb11
names(bb)  #Tells us the names of the variables/columns and it will have a number assigned to the variable.

##  [1] "team"         "runs"         "at_bats"      "hits"        
##  [5] "homeruns"     "bat_avg"      "strikeouts"   "stolen_bases"
##  [9] "wins"         "new_onbase"   "new_slug"     "new_obs"

bb1 <- bb[, c(2, 6, 10, 12)]  #This line of code tells R to subset variabless #2Runs, #6 bat_avg, #10 new_onbase, #12 new_obs
names(bb1)  # Check to make sure we did it correctly.

## [1] "runs"       "bat_avg"    "new_onbase" "new_obs"

Now you are able complete a data analysis on these variables. It will be easy to view the correlation between these selected variables.


##              runs bat_avg new_onbase new_obs
## runs       1.0000  0.8100     0.9215  0.9669
## bat_avg    0.8100  1.0000     0.8823  0.8671
## new_onbase 0.9215  0.8823     1.0000  0.9373
## new_obs    0.9669  0.8671     0.9373  1.0000

There you go!

