Home

Monday, October 25, 2010

UNIT -4 ( TESTING)

 Symbols 


  • α, the probability of Type I error (rejecting a null hypothesis when it is in fact true)
  • n = sample size
  • n1 = sample 1 size
  • n2 = sample 2 size
  • \overline{x} = sample mean
  • μ0 = hypothesized population mean
  • μ1 = population 1 mean
  • μ2 = population 2 mean
  • σ = population standard deviation
  • σ2 = population variance
  • s = sample standard deviation
  • s2 = sample variance
  • s1 = sample 1 standard deviation
  • s2 = sample 2 standard deviation
  • t = t statistic
  • df = degrees of freedom
  • \overline{d} = sample mean of differences
  • d0 = hypothesized population mean difference
  • sd = standard deviation of differences
  • \hat{p} = x/n = sample proportion, unless specified otherwise
  • p0 = hypothesized population proportion
  • p1 = proportion 1
  • p2 = proportion 2
  • dp = hypothesized difference in proportion
  • min{n1,n2} = minimum of n1 and n2
  • x1 = n1p1
  • x2 = n2p2
  • χ2 = Chi-squared statistic
  • F = F statistic


Theory of t- Distribution 

According to the central limit theorem, the sampling distribution of a statistic (like a sample mean) will follow a normal distribution, as long as the sample size is sufficiently large. Therefore, when we know the standard deviation of the population, we can compute a z-score, and use the normal distribution to evaluate probabilities with the sample mean.
But sample sizes are sometimes small, and often we do not know the standard deviation of the population. When either of these problems occur, statisticians rely on the distribution of the t statistic (also known as the t score), whose values are given by:
t = [ x - μ ] / [ s / sqrt( n ) ]
where x is the sample mean, μ is the population mean, s is the standard deviation of the sample, and n is the sample size. The distribution of the t statistic is called the t distribution or the Student t distribution.

Degrees of Freedom

There are actually many different t distributions. The particular form of the t distribution is determined by its degrees of freedom. The degrees of freedom refers to the number of independent observations in a set of data.
When estimating a mean score or a proportion from a single sample, the number of independent observations is equal to the sample size minus one. Hence, the distribution of the t statistic from samples of size 8 would be described by a t distribution having 8 - 1 or 7 degrees of freedom. Similarly, a t distribution having 15 degrees of freedom would be used with a sample of size 16.
For other applications, the degrees of freedom may be calculated differently. We will describe those computations as they come up.

Properties of the t Distribution

The t distribution has the following properties:
  • The mean of the distribution is equal to 0 .
  • The variance is equal to v / ( v - 2 ), where v is the degrees of freedom (see last section) and v > 2.
  • The variance is always greater than 1, although it is close to 1 when there are many degrees of freedom. With infinite degrees of freedom, the t distribution is the same as the standard normal distribution.

When to Use the t Distribution

The t distribution can be used with any statistic having a bell-shaped distribution (i.e., approximately normal). The central limit theorem states that the sampling distribution of a statistic will be normal or nearly normal, if any of the following conditions apply.
  • The population distribution is normal.
  • The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less.
  • The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40.
  • The sample size is greater than 40, without outliers.
The t distribution should not be used with small samples from populations that are not approximately normal.


Probability and the Student t Distribution

When a sample of size n is drawn from a population having a normal (or nearly normal) distribution, the sample mean can be transformed into a t score, using the equation presented at the beginning of this lesson. We repeat that equation below:
t = [ x - μ ] / [ s / sqrt( n ) ]
where x is the sample mean, μ is the population mean, s is the standard deviation of the sample, n is the sample size, and degrees of freedom are equal to n - 1.
The t score produced by this transformation can be associated with a unique cumulative probability. This cumulative probability represents the likelihood of finding a sample mean less than or equal to x, given a random sample of size n.


Notation and t Scores

Statisticians use tα to represent the t-score that has a cumulative probability of (1 - α). For example, suppose we were interested in the t-score having a cumulative probability of 0.95. In this example, α would be equal to (1 - 0.95) or 0.05. We would refer to the t-score as t0.05
Of course, the value of t0.05 depends on the number of degrees of freedom. For example, with 2 degrees of freedom, that t0.05 is equal to 2.92; but with 20 degrees of freedom, that t0.05 is equal to 1.725.
Note: Because the t distribution is symmetric about a mean of zero, the following is true.
tα = -t1 - alpha       And       t1 - alpha = -tα
Thus, if t0.05 = 2.92, then t0.95 = -2.92.

 Example 

  1. The Acme Chain Company claims that their chains have an average breaking strength of 20,000 pounds, with a standard deviation of 1750 pounds. Suppose a customer tests 14 randomly-selected chains. What is the probability that the average breaking strength in the test will be no more than 19,800 pounds?

    Solution:

    One strategy would be a two-step approach:

    • Compute a t score, assuming that the mean of the sample test is 19,800 pounds.
    • Determine the cumulative probability for that t score.

    We will follow that strategy here. First, we compute the t score:

    t = [ x - μ ] / [ s / sqrt( n ) ]
    t = (19,800 - 20,000) / [ 1750 / sqrt(14) ]
    t = ( -200 ) / [ (1750) / (3.74166) ] = ( -200 ) / (467.707) = -0.4276

    where x is the sample mean, μ is the population mean, s is the standard deviation of the sample, n is the sample size, and t is the t score.

    Now, we can determine the cumulative probability for the t score. We know the following:

    • The t score is equal to -0.4276.
    • The number of degrees of freedom is equal to 13. (In situations like this, the number of degrees of freedom is equal to number of observations minus 1. Hence, the number of degrees of freedom is equal to 14 - 1 or 13.)

    Now, we are ready to use the T Distribution Calculator. Since we have already computed the t score, we select "t score" from the drop-down box. Then, we enter the t score (-4.276) and the degrees of freedom (13) into the calculator, and hit the Calculate button. The calculator reports that the cumulative probability is 0.338. Therefore, there is a 33.8% chance that the average breaking strength in the test will be no more than 19,800 pounds.

    Note: The strategy that we used required us to first compute a t score, and then use the T Distribution Calculator to find the cumulative probability. An alternative strategy, which does not require us to compute a t score, would be to use the calculator in the "Sample mean" mode. That strategy may be a little bit easier. It is illustrated in the next example.
  2. Let's look again at the problem that we addressed above in Example 1. This time, we will illustrate a different, easier strategy to solve the problem.

    Here, once again, is the problem: The Acme Chain Company claims that their chains have an average breaking strength of 20,000 pounds, with a standard deviation of 1750 pounds. Suppose a customer tests 14 randomly-selected chains. What is the probability that the average breaking strength in the test will be no more than 19,800 pounds?

    Solution:

    We know the following:

    • The population mean is 20,000.
    • The standard deviation is 1750.
    • The sample mean, for which we want to find a cumulative probability, is 19,800.
    • The number of degrees of freedom is 13. (In situations like this, the number of degrees of freedom is equal to number of observations minus 1. Hence, the number of degrees of freedom is equal to 14 - 1 or 13.)

    First, we select "Sample mean" from the dropdown box, in the T Distribution Calculator. Then, we plug our known input (degrees of freedom, sample mean, standard deviation, and population mean) into the T Distribution Calculator and hit the Calculate button. The calculator reports that the cumulative probability is 0.338. Thus, there is a 33.8% probability that an Acme chain will snap under 19,800 pounds of stress.

    Note: This is the same answer that we found in Example 1. However, the approach that we followed in this example may be a little bit easier than the approach that we used in the previous example, since this approach does not require us to compute a t score.
  3. The school board administered an IQ test to 25 randomly selected teachers. They found that the average IQ score was 115 with a standard deviation of 11. Assume that the cumulative probability is 0.90. What population mean would have produced this sample result?

    Note: In this situation, a cumulative probability of 0.90 suggests that 90% of the random samples drawn from the teacher population will have an average IQ of 115 or less. This problem asks you to find the true population IQ for which this would be true.

    Solution:

    We know the following:

    • The cumulative probability is 0.90.
    • The standard deviation is 11.
    • The sample mean is 115.
    • The number of degrees of freedom is 24. (In situations like this, the number of degrees of freedom is equal to number of observations minus 1. Hence, the number of degrees of freedom is equal to 25 - 1 or 24.)

    First, we select "Sample mean" from the dropdown box, in the T Distribution Calculator. Then, we plug the known inputs (cumulative probability, standard deviation, sample mean, and degrees of freedom) into the calculator and hit the Calculate button. The calculator reports that the population mean is 112.1.

    Here is what this means. Suppose we randomly sampled every possible combination of 25 teachers. If the true population mean were 112.1, we would expect 90% of our samples to have a sample mean of 115 or less.

No comments:

Post a Comment