Predicting values based on Linear Model

.knitr.inline { background-color: #f7f7f7; border:solid 1px #B0B0B0; } .error { font-weight: bold; color: #FF0000; }, .warning { font-weight: bold; } .message { font-style: italic; } .source, .output, .warning, .error, .message { padding: 0em 1em; border:solid 1px #F7F7F7; } .source { background-color: #f7f7f7; } .rimage.left { text-align: left; } .rimage.right { text-align: right; } .rimage.center { text-align: center; } .source { color: #333333; } .background { color: #F7F7F7; } .number { color: #000000; } .functioncall { color: #800054; font-weight: bolder; } .string { color: #9999FF; } .keyword { font-weight: bolder; color: black; } .argument { color: #B04005; } .comment { color: #2E9957; } .roxygencomment { color: #707AB3; } .formalargs { color: #B04005; } .eqformalargs { color: #B04005; } .assignement { font-weight: bolder; color: #000000; } .package { color: #96B525; } .slot { font-style: italic; } .symbol { color: #000000; } .prompt { color: #333333; } Title

This blog post is mostly for me and what I have learned using R. The main purpose of this exercise was to learn how to predict values based on a linear model. The purpose of this post is to help me recall how to do it. Predicted values were expected Runs based on a player’s On-base percentage.

# Establish the data
Player = read.csv("Baseball Player Stats_2011.csv")
names(Player)
##  [1] "RK"     "Player" "Team"   "Pos"    "G"      "AB"     "R"     
## [8] "H" "X2B" "X3B" "HR" "RBI" "BB" "SO"
## [15] "SB" "CS" "AVG" "OBP" "SLGâ.." "OPS"
dim(Player)
## [1] 144  20
# Split the data.frame into a Train set to build the linear model and test
# set to test test the model on new data.
set.seed(333)
TrainSamples = sample(1:144, size = (144/2), replace = F)
TrainPlayer = Player[TrainSamples, ]
TestPlayer = Player[-TrainSamples, ]
head(TrainPlayer)
##      RK       Player Team Pos   G  AB   R   H X2B X3B HR RBI BB  SO SB CS
## 68 68 Pagan, A SF CF 154 605 95 174 38 15 8 56 48 97 29 7
## 13 13 Craig, A STL 1B 119 469 76 144 35 0 22 92 37 89 2 1
## 139 139 Escobar, Y TOR SS 145 558 58 141 22 1 9 51 35 70 5 1
## 81 81 Jeter, D NYY SS 159 683 99 216 32 0 15 58 45 90 9 4
## 3 3 Hamilton, J TEX LF 148 562 103 160 31 2 43 128 60 162 7 4
## 101 101 DeJesus, D CHC RF 148 506 76 133 28 8 9 50 61 89 7 8
## AVG OBP SLGâ.. OPS
## 68 0.288 0.338 0.440 0.778
## 13 0.307 0.354 0.522 0.876
## 139 0.253 0.300 0.344 0.644
## 81 0.316 0.362 0.429 0.791
## 3 0.285 0.354 0.577 0.930
## 101 0.263 0.350 0.403 0.753
# Linear Model for the training set.
lm1 = lm(R ~ OBP, data = TrainPlayer)
summary(lm1)
## 
## Call:
## lm(formula = R ~ OBP, data = TrainPlayer)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.28 -9.90 -1.13 9.16 30.48
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.9 17.8 1.06 0.293
## OBP 170.7 53.1 3.21 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.9 on 70 degrees of freedom
## Multiple R-squared: 0.128, Adjusted R-squared: 0.116
## F-statistic: 10.3 on 1 and 70 DF, p-value: 0.002
plot(TrainPlayer$OBP, TrainPlayer$R, pch = 19, col = "blue")
lines(TrainPlayer$OBP, lm1$fitted, lwd = 3)
plot of chunk unnamed-chunk-3
# this is where I learned how to add more than one data parameter.
newdata = data.frame(OBP = c(0.35, 0.377))
predict(lm1, newdata, interval = "confidence")
##     fit   lwr   upr
## 1 78.62 74.95 82.29
## 2 83.22 77.63 88.82
Advertisements
This entry was posted in Linear Models, Predict, R. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s