Test of Data Normality

Before we can test the normality of data, it is imperative to establish the "normality" of data. In a normal distribution, also referred to as the Gaussian distribution or the bell curve, there is symmetry in the mean. Near the mean, the frequency of data is high as opposed to the frequency of the data far from the mean. A normal distribution is called a "bell curve" because it forms a bell shape when the data is plotted on a graph. The normal distribution represents events that occur naturally. The mean and the standard deviation are very important parameters in a normal distribution, where within one standard deviation from the mean represents 68 percent of observations. The number of observations increases with an increase in standard deviations; 95 percent of observations within 2 SD and 99.7 percent observations within 3 SD (see figure 1).

Figure 1: A Normal Distribution

A normal distribution represents a naturally occurring phenomenon about a situation or event. For instance, the normal distribution can be used to determine the mean height of a given population sample, intelligence, or the age of students in a classroom. Taking the average height of a population as an example, when plotted on a graph, it shows that the number of people (frequency) of average height (located at the center of the graph) is more than in the extremes of the chart. This is shown by the height at the center of the curve, representing the mean height of the population, which also represents the greatest frequency.

Tests for Normality

Before statisticians can use a dataset, they must first determine if the dataset is normally distributed. In other words, it creates a bell curve when plotted in a graph. Normally distributed datasets make it possible for the use of robust parametric statistics. A dataset is not normally distributed if plotted on a chart; the shape of the curve skews to the left or right. Before performing parametric statistics on a dataset, it is important to “normalize” the data to resemble the normal distribution. The Kolmogorov-Smirnov and Shapiro-Wilk tests are two of the most powerful tests for normality. The Shapiro-Wilk test is used to determine whether a dataset/variable is normally distributed or not. In the Shapiro-Wilk test, similarities between the normal distribution and the observed distributing are determined in a sample size greater than 3. After calculations, if both the p-value and the alpha level are less than 0.05 and 0.05, this means that the data is not normally distributed. However, if the calculations result in the p-value being greater than 0.05, then the data is normally distributed. The Kolmogorov-Smirnov test also performs the same job as the Shapiro-Wilk test in that it determines if data is normally distributed. This method uses the sample mean in the calculations as well as the standard deviation. Comparatively, the Kolmogorov-Smirnov test is less powerful than most of the normality tests, including the Shapiro-Wilk test. It is mostly included because of its popularity in the history of statistics. Other examples of normality tests include the D’Agostino Skewness test, the D’Agostino Kurtosis test, and the D’Agostino Omnibus, just to mention a few.

Essay Experts is Canada's premier essay writing and research service. We help undergraduate and graduate students with their essays, research papers, theses and dissertations. Our statisticians are standing by to help. Simply email us your question, requirements or assignment and we'll get back to you with a quote. Our statisticians all possess advanced degrees and have experience in helping students succeeed in statistical writing and analysis.

Essay Writing

Thesis & Dissertation

How We Work

Test of Data Normality

Figure 1: A Normal Distribution

Tests for Normality