Math and Scientific Methods:
by Valerie Coskrey ©2005
Using the Chi-Square Statistic to Analyze the Results of an
Contents Quick Links
- A 50:50 Probability: The Coin Toss Example
- Scientific Method: Was it Really 50:50, or a Weighted Coin?
- Using Chi-Square χ² in Hypothesis Testing: Counting Alligators
- Random Sampling: Characteristics of Classmates
- Thought Experiments What Color Are the Fish in the Aquarium
- Calculating Chi-Square
- An Experiment To Do: How Many Purple Beads are in the Jar? Contains worksheet.
The Coin Toss Example: A 50:50 Probability
If you flip a coin, it will land either head up or tail up -- two possibilities. Therefore, we say that the probability of heads to tails is .50 to .50. Another way to say this is that in a coin toss, there is a 50% chance of the coin landing head up and a 50% chance of the coin landing tail up. If the coin is tossed a large number of times -- say, 100 times -- then the ratio of heads to tails would be close to 50:50.
Suppose you try this coin-tossing and get 47 heads to 53 tails. Is this ratio statistically different from 50:50 so that you would know that the coin is weighted to fall tail up most times? (No one wants to play craps with weighted dice, except maybe the person that owns the dice!) To answer this question, use the statistic χ² (Chi-square) to compare the actual, observed coin toss ratio to the theoretical, expected coin toss ratio. That is, 47:53 compared to 50:50.
In this activity you will learn how to calculate χ² and to use it to determine if the frequency of the actual data observed is the same frequency you would expect to obtain based on background knowledge of the situation. Biologists often use χ² in analyzing results of their observations made to answer specific questions. The use of such statistics to analyze data is part of the scientific method used by science to discover patterns in nature.
The term "Scientific Method" refers to the manner in which scientists approach a scientific problem. Some say that to be "scientific," a solution to a problem must be "falsifiable": tested in such a manner that if the solution is wrong, it fails the test. All agree that the solution must be testable. Experimentation is the testing of the possible solutions to problems that scientists consider. The possible solutions are called hypotheses.
The "Scientific Method" can be described as follows. Observations are made of phenomena dealing with a specific problem. Using these observations, hypotheses (predictions) are formulated. These hypotheses are tested in experiments; and the results, the data collected, of the experiments are analyzed in terms of whether one or more of the hypotheses actually predicted the results. Finally, conclusions are made and communicated to other scientists for criticism and repetition of the experiments. Then the cycle repeats. Reasonable explanations that are not refuted and therefore could be true based on the hypothesis testing are accepted for the time being as explanations of what is true.
Within this descriptions are nuances of when to call concepts theory, hypothesis, data, results and conclusion. These nuances become less confusing with the practice of talking about experiments. For now, just know that the confusion will clear up and that there are times when terms are interchangeable, so watch for that.
Using Statistics Like χ² in Hypothesis Testing
Suppose someone asked you to determine the ratio of male to female alligators in the state of Mississippi. How would you go about it? Would you need to count every alligator in the state and check each one for sex? No.
The study of statistics has shown that the ratio of male to female alligators in the state can be estimated with a high degree of accuracy (that is, getting pretty close to the actual ratio with the estimate) by using specific procedures. These procedures involve obtaining a random sample and taking a large enough sample of alligators to count. Furthermore, calculating a χ² statistic using the ratio obtained in one sampling tells if the ratio is different from an expected ratio.
Theoretically, since each sample (each set of counted alligators) comes from the same population (the set of all alligators in Mississippi), then each sample should give the same results. Combining a set of samples in another χ² calculation tells about the frequencies in the entire population with even greater accuracy. (Chi-square can also be used by combining a set of samples in another χ² calculation to tell if each sample represents the same ratio, or not.)
One key concept here is random sampling. To get a random sample, each member of the sample must be chosen at random. For example, if you wanted to take a random sample of the members of a lab class of 24, you could assign each member a number. Then you would toss a set of these numbers in a hat and shake. From the hat, you would draw several numbers -- say 6, about 1/4 of the class. The people whose numbers you drew would be a random sample of the members of the class. Statistically speaking, because these six were randomly chosen, their characteristics could be said to represent the characteristics of the whole class.
Of course, you could be more confident of your description of the class if the class were larger, and you could use more people in your sample; although you would most likely be using less than 1/4 the population of the class. The basic rule of random sampling is that any member of the whole group has an equal chance of being in the sample chosen to test.
Imagine, if you will, a large aquarium filled with 500 fish, all the same size. Among the 500, five are red and the rest are blue. The ratio of red fish to the whole 500 fish is 1% (or 1/100). The blue fish ratio to the whole set of fish (also called the frequency of blue fish) is 99% (or .99).
Now imagine dipping a net into the tank and scooping up the first fish that comes by. What color would you expect it to be? Would you agree that there is a 99% chance that the fish in your net would be blue?
Now imagine that you tossed your fish back into the aquarium and netted a fish again. You do this ten times. Would you expect that most of the times you netted a blue fish? Suppose that you netted only blue fish, does this mean that there are no red fish to be netted? Of course not. You can see the red fish in the tank, you just haven't netted one yet.
Now for a more complicated set-up. Instead of a glass aquarium, a set of 500 fish are in an opaque box. You cannot see into it at all. This time you do not know what colors of fish are in the box. Now you dip your dip your net into the box and net a blue fish, then toss it back. You do this 20 times. Each time you netted a blue fish. Would you say that most of the fish in the box are probably blue? Would you say that there are no red fish? Do you think you sampled enough fish to determine if any are red?
Too small a sample is one type of sampling error that can occur in testing a hypothesis. Using non-random samples is another. In this fish tank, if the red fish always swam on the bottom of the tank, and you always dipped your net near the top, then you would not be getting a random sample of fish from the whole set of fish in the tank. Deliberately netting all 5 of the red fish would also be a sampling error of the non-random sample type, making you (or whomever you are trying to convince) believe that there are a large number of red fish in the tank.
Consider this: did tossing the fish back into the aquarium make a difference? This procedure was called random sampling with replacement. Imagine the next random samples that you take will not have the sample tossed back into the pot, as it were, and use the procedure called random sampling without replacement. You will wait until the sampling process is ended before putting the item back into the container. Each trial begins with all items in the container, but each sampling within a trial will not contain the items already removed for counting.
Go back now to the coin tossing experiment. By chance alone, 50 out of 100 tosses should come up heads. If the ratio is not 50:50, then some other reason than chance is operating. The χ²statistic can tell if the observed frequency is simply what you would expect by chance or if some other factor is causing the results obtained in the experiment. To calculate the χ²statistic, follow the steps below and fill out the data table as shown.
1. Based on probability, the expected results of tossing a coin 100 times would be 50 heads and 50 tails.
2. An actual coin toss experiment resulted in an observed 45 heads and 55 tails.
3. Ask these questions.
Of the results: Is 45:55 statistically the same ratio as 50:50?
Of the experiment: Did the results of the experiment demonstrate that the number of heads to tails was just a matter of chance?
If the conclusion of the statistical analysis is that the result is not just a matter of chance, then the ratio can be said to be 50:50 due to a reason other than chance.
The purpose of the experiment and the reasoning behind it would suggest what the “reason other than chance” might be: the cause for the effect. In this way a cause for an effect is tested and accepted as probably true or verified--or not refuted and therefore accepted, depending on which philosopher of science you follow in defining concepts and actions. (In fact, the vocabulary used in discussing statistical results depends on one’s philosophical leanings. Personally, I accept Popper’s terminology and do not say the experiment verified a hypothesis, but that a hypothesis was not refuted and therefore can be accepted, tentatively.)
4. Complete a data chart like the one below.
|coin toss result||O||E||O-E||(O-E)2||(O-E)2/E|
Note the different variables in the different columns of the chart. The letter "O" stands for observed value; "E" stands for expected value; and "O - E" stands for the difference between the two. The rows are labeled "heads" and "tails" and represent the two possible occurrences of the tossed coin.
Two more steps are required for calculating χ². The difference (O - E) must be squared and then that number must be divided by E. Study the two additional columns in the chart that represent the calculations of these two steps.
|coin toss results||O||E||O-E||(O-E)2||(O-E)2/E|
5. Add the χ² values: 0.5 + 0.5 = 1.0.
6. The sum of 1.0 is then compared to a table of χ² values using degrees of freedom and probability expectations. The degrees of freedom (d.f.) is the number of possible occurrences minus 1.
Calculate the degrees of freedom in this experiment like this: heads and tails makes 2; 2 - 1 = 1. So there is 1 degree of freedom. You will use p < .05 as the probability that this χ² is due to chance. (Note: the possible occurrences are called categories, n; and d.f. = n - 1).
Below is a χ² value table with χ²s given in terms of degrees of freedom and probabilities.
|Degrees of Freedom in column 1. All other values are Probability Values|
|Row 1 are probability values that match to the chi-square statistic at that degree of freedom in column 1.|
|The chi-square statistic is calculated from the data and the hypothetical value of the data. The calculated statistic is then located on the table at the row of the correct degree of freedom for the experiment's number of possible categories of results. Read the probability value above the cell of intersection.|
Table adapted from www.richland.edu/james/lecture/m170/tbl-chi.html
7. Locate the χ² value of 1 obtained from your data analysis. Along the row of degrees of freedom being 1, the value 1 falls between the values 0.016 and 2.71. These values correspond to the probabilities of greater than 0.5, or p>.05. Therefore, you can say that your observed ratio is statistically equal to the expected ratio (or in terms of the null hypothesis: The probability that the results are due to chance is less than .05, and so chance can be discounted.)
With these results you can conclude that the data supports the hypothesis that tossing a coin results in a 50:50 probability of getting heads (or tails) with a significance of p>.05.
Had your χ² value come out greater than 3.84, then some ratio other than 50:50 was demonstrated and you would have had to offer an hypothesis other than chance to explain the results. In other words, your results were not different from 50:50 just by chance, but because something in nature made the difference and the population is probably not 50:50 based on this something of nature (a cause for this effect).
An Experiment To Do:
How Many Purple Beads are in the Jar?
In this lab you will estimate (predict) the number of beads of a particular color in a jar using a random sample of beads taken from a jar (population) of 200 beads.
(Quick Set-Up Hint: Have one person/group set up the jar for another person/group. CAUTION EACH PERSON/GROUP TO NOT TELL THE GROUP EXPERIMENTING ON A PARTICULAR JAR THE COUNT IN THE JAR. Have the Set-up Person/Group record the count, but hide the record from the Experimenters until the experimental data has been collected. This record contains the value of E for each color of bead in that jar. Exchange jars. After the beads have been collected and counted in the experiment, the Set-up Group should share the record of the jar’s bead count with the Experimenter Group.)
Consider the population of 200 beads in your jar. Each color of bead in your jar occurs in multiples of 25 (25, 50, 75 ...etc.). You are to estimate the number of beads of a given color in your jar. A possible hypothesis to use in testing your estimate is as follows:
One out of every four beads is purple.
This hypothesis equates to saying 1/4 of 200, or 50, beads are purple. Your two categories would be purple and not purple (therefore, your degrees of freedom would be ___).
Write down your estimate of the color assigned to you. Write an appropriate hypothesis.
You cannot open the jar to count the beads so you will have to take random samples and compare the observed results to the expected results. Make an appropriate table for recording your data below.
Now shake the jar thoroughly. Then take out 10 beads, one bead at a time. If you shake the jar again between taking each bead, the randomness is further assured. This is your first sample. Record your data. Calculate the χ² of this random sample. Compare your χ² to the χ² table of values above and draw a conclusion about the results of your experiment.
Record your conclusions in the blanks below.
(Note: If you shake out more than 10 beads, have your partner, with eyes closed, remove the excess beads and return them to the container.)
One trial is not adequate to draw conclusions with certainty. Scientific experiments are replicable. They can be repeated. Furthermore, each repetition should give the same results. Therefore, you will now conduct 20 trials of 10 beads each. (Each person at your table is to take 5 samples from the jar.) Be sure to replace the ten beads of the previous sample and shake thoroughly before taking the next sample. Remember, shaking keeps the beads randomized so that you will be getting a random sample for each trial.
-----A Sample Calculation-----
Suppose you hypothesize that 25% of the beads are purple. Then you take 20 samples of 10 beads each and count the purple beads in each sample. Say that the 20 samples produced a total of 56 purple beads. The data of the 20 trials can be combined in a data chart like the one below.
The sum of the χ² values (0.72 + 0.24) is 0.96. Locating the 0.96 value on the χ² table of values at 1 degree of freedom (2 -1), see that 0.96 is between 0.46 and 2.71. This corresponds to a probability between 0.5 and 0.1. Since this is greater than the standard p>0.05, and therefore not a chance association, then you can conclude that the frequency of purple beads in your samples is statistically equal to the hypothesized ratio of 1/4 purple beads.
Student Data and Questions
Now record your data from both the 5 trials that you did personally, and from the 20 trials that your table did. Make data charts for each set of trials. Calculate the χ²s and draw the relevant conclusions about the data.
1. Define the following terms: hypothesis, sample, population, non-random sample, random sample, Chi-square test.
2. You take one of the dice from your home casino kit and roll it 300 times. If you want to do a Chi-square (χ²) test on your results, how many degrees of freedom should you use?
3. Suppose you were only interested in the number of sixes you rolled. How many should you have gotten during your 300 rolls? How many degrees of freedom are involved in this problem?
4. You buy a trick coin in a magic shop that is supposed to give more tails than heads when it is tossed. You toss the coin 100 times and get 60 tails. Should you ask for your money back? Why or why not? Therefore, your degrees of freedom would be ___.
5. Write down your estimate of the color of the beads.