FAQ #1: K-S Tests in SPSS

I decided to start a series of blogs on questions that I get asked a lot. When I say a series I’m probably raising expectation unfairly: anyone who follows this blog will realise that I’m completely crap at writing blogs. Life gets busy. Sometimes I need to sleep. But only sometimes.

Anyway, I do get asked a lot about why there are two ways to do the Kolmogorov-Smirnov (K-S) test in SPSS. In fact, I got an email only this morning. I knew I’d answered this question many times before, but I couldn’t remember where I might have saved a response. Anyway, I figured if I just blog about it then I’d have a better idea of where I’d written a response. So, here it is. Anyway, notwithstanding my reservations about using the K-S test (you’ll have to wait until edition 4 of the SPSS book), there are three ways to get one from SPSS:

  1. Analyze>explore>plots> normality plots with tests
  2. Nonparametric Tests>One Sample … (or legacy dialogues>one sample KS)
  3. Tickle SPSS under the chin and whisper sweet nothings into its ear
These methods give different results. Why is that? Essentially (I think) if you use method 1 then SPSS applies Lillifor’s correction, but if you use method 2 it doesn’t. If you use method 3 then you just look like a weirdo.
So, is it better to use Lillifor’s correction or not? In the additional website material for my SPSS book, which no-one ever reads (the web material, not the book …) I wrote (self-plaigerism alert):
“If you want to test whether a model is a good fit of your data you can use a goodness-of-fit test (you can read about these in the chapter on categorical data analysis in the book), which has a chi-square test statistic (with the associated distribution). One problem with this test is that it needs a certain sample size to be accurate. The K–S test was developed as a test of whether a distribution of scores matches a hypothesized distribution (Massey, 1951). One good thing about the test is that the distribution of the K–S test statistic does not depend on the hypothesized distribution (in other words, the hypothesized distribution doesn’t have to be a particular distribution). It is also what is known as an exact test, which means that it can be used on small samples. It also appears to have more power to detect deviations from the hypothesized distribution than the chi-square test (Lilliefors, 1967). However, one major limitation of the K–S test is that if location (i.e. the mean) and shape parameters (i.e. the standard deviation) are estimated from the data then the K–S test is very conservative, which means it fails to detect deviations from the distribution of interest (i.e. normal). What Lilliefors did was to adjust the critical values for significance for the K–S test to make it less conservative (Lilliefors, 1967) using Monte Carlo simulations (these new values were about two thirds the size of the standard values). He also reported that this test was more powerful than a standard chi-square test (and obviously the standard K–S test).
Another test you’ll use to test normality is the Shapiro-Wilk test (Shapiro & Wilk, 1965) which was developed specifically to test whether a distribution is normal (whereas the K–S test can be used to test against other distributions than normal). They concluded that their test was ‘comparatively quite sensitive to a wide range of non-normality, even with samples as small as n = 20. It seems to be especially sensitive to asymmetry, long-tailedness and to some degree to short-tailedness.’ (p. 608). To test the power of these tests they applied them to several samples (n = 20) from various non-normal distributions. In each case they took 500 samples which allowed them to see how many times (in 500) the test correctly identified a deviation from normality (this is the power of the test). They show in these simulations (see table 7 in their paper) that the S-W test is considerably more powerful to detect deviations from normality than the K–S test. They verified this general conclusion in a much more extensive set of simulations as well (Shapiro, Wilk, & Chen, 1968).” 
So there you go. More people have probably read that now than when it was on the additional materials for the book. It Looks like Lillifor’s correction is a good thing (power wise) but you probably don’t want to be using K-S tests anyway really, or if you do interpret them within the context of the size of your sample and look at graphical displays of your scores too.