One key aspect to a data analysis is creating exploratory graphs. This point was emphasized in the Data Analysis course I took from Jeff Leek and I have found it beneficial in my work. According to Leek, the purpose of an exploratory graph is to do the following:
1. Understand data properties
2. Find patterns in the data
3. To suggest modeling strategies
4. To debug analyses.
An important chart to view data distributions is a boxplot The box plot below is the distribution of runs scored from Major League Players in the 2012 season.
A. The black line in the blue box represents the median.
B. The top of the blue box represents the 75% percentile of the distribution.
C. The bottom of the blue box represents the 25% percentile of the distribution.
D. The top solid line represents the 90% percentile of the distributions.
E. The bottom solid line represents the 10% percentile of the distribution.
F. The circle represents an outlier in that it is outside of the 90% percentile.
As you can see, the boxplot does a great job of summarizing the distribution of data points and it definitely fits the purpose for exploratory graphs.
Creating a boxplot is easy in R
Here is an example of the R code to create a boxplot in R.
boxplot(Player$R,col=”blue”, main=”Runs Scored”, ylab=”Player Runs Scored”)
Creating a boxplot in MS Excel appears to be more challenging but a google search for boxplots in Excel will provide you with tools to do the job.