Boxplots and Exploratory Graphs

One key aspect to a data analysis is  creating exploratory graphs. This point was emphasized in the Data Analysis course I took from Jeff Leek and I have found it beneficial in my work. According to Leek, the purpose of an exploratory graph is to do the following:
1. Understand data properties
2. Find patterns in the data
3. To suggest modeling strategies
4. To debug analyses.

An important chart to view data distributions is a boxplot The box plot below is the distribution of runs scored from Major League Players in the 2012 season.

A. The black line in the blue box represents the median.
B. The top of the blue box represents the 75% percentile of the distribution.
C. The bottom of the blue box represents the 25% percentile of the distribution.
D. The top solid line represents the 90% percentile of the distributions.
E. The bottom solid line represents the 10% percentile of the distribution.
F. The circle represents an outlier in that it is outside of the 90% percentile.

As you can see, the boxplot does a great job of summarizing the distribution of data points and it definitely fits the purpose for exploratory graphs.

Creating a boxplot is easy in R

Here is an example of the R code to create a boxplot in R.
boxplot(Player$R,col=”blue”, main=”Runs Scored”, ylab=”Player Runs Scored”)

Creating a boxplot in MS Excel appears to be more challenging but a google search for boxplots in Excel will provide you with tools to do the job.

Advertisements
This entry was posted in Boxplots, Exploratory Graphs, R. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s