ANOVA Simplified Notes

Shirsh Verma
6 min readMar 21, 2022

Analysis of variance (ANOVA) is a statistical technique that is used to check if the means of two or more groups are significantly different from each other. ANOVA checks the impact of one or more factors by comparing the means of different samples.

F-Statistic

The statistic which measures if the means of different samples are significantly different or not is called the F-Ratio. Lower the F-Ratio, more similar are the sample means. In that case, we cannot reject the null hypothesis.

F = Between group variability / Within group variability

Fishers Distribution →> F distribution

Its very much similar Class 11th Physics concept of Center of Mass

One Way ANOVA

When the samples are influenced by a single Independent feature

  1. How do we decide that these three groups performed differently because of the different situations and not merely by chance?
  2. In a statistical sense, how different are these three samples from each other?
  3. What is the probability of group A students performing so differently than the other two groups?

SS between = 54.6

SS within= 90.1

df between =2

df within= 27

MS between= 27.3

MS within= 3.33

F= 27.3/3.33= 8.18

Calculating F critical

Using F table : Keeping critical as 0.05 (significance level)

A one-way ANOVA tells us that at least two groups are different from each other. But it won’t tell us which groups are different.

Here, we can see that the F-value is greater than the F-critical value for the alpha level selected (0.05). Therefore, we have evidence to reject the null hypothesis and say that at least one of the three samples have significantly different means and thus belong to an entirely different population.

There are commonly two types of ANOVA tests for univariate analysis — One-Way ANOVA and Two-Way ANOVA. One-way ANOVA is used when we are interested in studying the effect of one independent variable (IDV)/factor on a population, whereas Two-way ANOVA is used for studying the effects of two factors on a population at the same time. For multivariate analysis, such a technique is called MANOVA or Multi-variate ANOVA. (Here in One Way ANOVA ex: “MUSIC”)

Two Way Anova

A few questions that two-way ANOVA can answer about this dataset are:

  1. Is music treatment the main factor affecting performance? In other words, do groups subjected to different music differ significantly in their test performance?
  2. Is age the main factor affecting performance? In other words, do students of different age differ significantly in their test performance?
  3. Is there a significant interaction between the factors? In other words, how do age and music interact with regard to a student’s test performance? For example, it might be that younger students and elder students reacted differently to such a music treatment.
  4. Can any differences in one factor be found within another factor? In other words, can any differences in music and test performance be found in different age groups?

Two-way ANOVA tells us about the main effect and the interaction effect. The main effect is similar to a one-way ANOVA where the effect of music and age would be measured separately. Whereas, the interaction effect is the one where both music and age are considered at the same time.

That’s why a two-way ANOVA can have up to three hypotheses, which are as follows:

Two null hypotheses will be tested if we have placed only one observation in each cell. For this example, those hypotheses will be:
H1: All the music treatment groups have equal mean score.
H2: All the age groups have equal mean score.

For multiple observations in cells, we would also be testing a third hypothesis:
H3: The factors are independent or the interaction effect does not exist.

An F-statistic is computed for each hypothesis we are testing.

Now using these variances, we compute the value of F-statistic for the main and interaction effect. So, the values of f-statistic are,

F1 = 12.16 (sound)

F2 = 15.98 (Age)

F12 = 0.36 ( Interaction)

We can see the critical values from the table

Fcrit1 = 4.25

Fcrit2 = 3.40

Fcrit12 = 3.40

As you can see in the highlighted cells in the image above, the F-value for sample and column, i.e. factor 1 (music) and factor 2 (age) respectively, are higher than their F-critical values. This means that the factors have a significant effect on the results of the students and thus we can reject the null hypothesis for the factors.

Also, the F-value for interaction effect is quite less than its F-critical value, so we can conclude that music and age did not have any combined effect on the population.

Multi-variate ANOVA (MANOVA):

Until now, we were making conclusions on the performance of students based on just one test. Could there be a possibility that the music treatment helped improve the results of a subject like mathematics but would affect the results adversely for a theoretical subject like history?

How can we be sure that the treatment won’t be biased in such a case? So again, we take two groups of randomly selected students from a class and subject each group to one kind of music environment, i.e., constant music and no music. But now we thought of conducting two tests (maths and history), instead of just one. This way we can be sure about how the treatment would work for different kind of subjects.

We can say that one IDV/factor (music) will be affecting two dependent variables (maths scores and history scores) now. This kind of a problem comes under a multivariate case and the technique we will use to solve it is known as MANOVA. Here, we will be working on a specific case called one factor MANOVA. Let us now see how our data looks:

Here we have one factor, music, with 2 levels. This factor is going to affect our two dependent variables, i.e., the test scores of maths and history. Denoting this information in terms of variables, we can say that we have L = 2 (2 different music treatment groups) and P = 2 (maths and history scores).

A MANOVA test also takes into consideration a null hypothesis and an alternate hypothesis.:

We will implement MANOVA in Excel using the ‘RealStats’ Add-ins. It can be downloaded from here.

Here, we can see that the P value for history lies in a significant region (since P value less than 0.025) while for maths it does not. This means that the music treatment had a significant effect in improving the performance of students in history but did not have any significant effect in improving their performance in maths.

Based on this, we might consider picking and choosing subjects where this music approach can be used.

More on https://www.analyticsvidhya.com/blog/2018/01/anova-analysis-of-variance/#:~:text=Analysis%20of%20variance%20(ANOVA)%20is,the%20means%20of%20different%20samples

--

--