make a data set consisting of two groups, with 35 random observations per group. Repeat this 19 more times, i.e., there should be 20 sets, each consisting of 2 columns, for a total of 40 columns. Please have Excel pick these random numbers, each between 0 and 9 (whole numbers, not decimals). Run a stimulate uniform

*Split across columns

*Use dynamic seed

*Do not round

[In Excel use the function “=Randbetween (0, 9),” but it is more tedious]

Next run the two independent sample hypothesis test (t-test) on two columns at a time to generate a p-value for testing the significance of the mean difference between the two columns. Use a significance level (alpha) of 10%.

Click “Compute”.

Since the columns are pairs of __random__ numbers, and since each column should have an approximate mean of 4.5, the null hypothesis is TRUE! There really is __no__ difference in the mean of columns 1 and 2. Thus the p-value should almost always be non-significant, i.e., p>0.10.

So if a p-value is less 0.10 (like in this example, p=0.06199256) that is because a Type I error has occurred. The Probability(of making the Type I error)=alpha=α=0.10.

P(Type I error)=P(rej H0 |H0 is true)=alpha=0.10.

This should happen approximately 10% of the time.

Part 1:

__Repeat this process 19 more times__ (for a total of 20 times) create a table summarizing your results. Round to 2 decimal places. It should look like this:

Difference # Sample Diff. Std. Err. DF T-Stat P-value Significant?

1 -1.05 0.56 68 -1.89 0.06 Yes

2

3

…

20

Report on all your findings and interpret what you saw (among the twenty (20) repeats). The report should answer the following questions:

Part 2:

How does the number (%) of significant p-values use found correspond to the theory of a Type I error?

Part 3:

What was the average of the 20 “Sample Diff.?”

Part 4:

Explain why the number you reported in Part 3 makes intuitive sense.

Part 5:

Now, do one last simulation, this time using 2000 pairs of random numbers (1000 in each column, using Excel). This time, in addition to the mean difference and p-value also have StatCrunch calculate the means of the two groups, the standard deviations of each group and the 90% confidence interval for the difference between the two groups. (This latter calculation is easily done by checking the ‘Confidence Interval’ button instead of ‘Hypothesis testing’ (and inserting into ‘Level’ 0.90 (because the alpha we are using is 0.10).

Report your results here, as Part 5.

Part 6:

How do you explain the results you got for the 2000 pairs? (Discuss the mean and standard deviation of each column consisting of 1000 random numbers between 0 and 9.

Part 7:

How do you explain the results you got for the 2000 pairs with respect to the confidence interval? Why was it so narrow? How does the confidence interval including ‘0’ relate to its p-value?

Part 8:

If there are 100 students doing this assignment (each using 2000 random pairs) and using an alpha (significance level) of 10%, how many of the students are expected to report a significant mean difference?

Please sure to label each section Part 1, Part 2, …, Part 8