Last updated: 2021-06-22Summary
- Product: Hand sanitizers that may pose health risks.
- Issue: Certain hand sanitizers are being recalled because they do not meet Health Canada’s requirements.
- What to do: Stop using the identified product lots below. Consult your health care professional if you have used any of these products and have health concerns. Report any health product adverse events or complaints to Health Canada. See the additional information on buying health products safely in the links below.
.visualization techniques
|
On-Line Tools |
Texts |
Pie Chart
Bar Chart Histogram Gantt Chart Heat Map Box and Whisker Plot Waterfall Chart Area Chart Scatter Plot Pictogram Chart Timeline Highlight Table Bullet Graph Choropleth Map Word Cloud Network Diagram Correlation Matrices |
|
|
Why People WILLINGLY Give Up Their Freedoms W/ Prof. Mattias Desmet | Aubrey Marcus Oct 20, 2021 www.youtube.com/watch?v=IqPJiM5Ir3A Time Stamps: 00:00- Intro 1:22- Statistics not adding up 3:30- Psychological Dynamics 9:50- Mass formation 17:45- Something very specific can happen under these conditions 25:10- 19th century mass formation 31:55- Mentacide 37:27- Asch experiment 41:35- Dangers of our current landscape 47:42- You need mass media for mass formation 53:53- A Third Way 58:49- Mechanism of totalitarianism 1:05:10- The importance of parallel structures 1:19:13- Consequences of speaking out? |
Wizards of Odds: The Power of Probability https://www.youtube.com/watch?v=92A5iDjxgOg
Probability is the backbone of science, but how well do you understand it? Odds are, not as well as you think; it is a surprisingly subtle concept that is often misunderstood, sometimes even by professionals who use it to guide crucial and far-reaching decisions. In this program, experts from technology, physics, medicine, and programming explore the slippery side of probability and the powerful role it plays in modern life.
This program is part of the Big Ideas Series, made possible with support from the John Templeton Foundation. Original Program Date: May 30, 2015 Host: John Hockenberry Participants: Robert Green, Leonard Mlodinow, Masoud Mohseni, Alan Peters |
|
Skill OneStatistical model
A family of distributions dependent on a parameter t Parameter space The range of possible values for a parameter Estimator A function t^^ of data X_1,...,X_n with X_i ~ P_t such that t^^ is close to t Hypothesis test A function P ->{0.1} of data X_1,...,X_n with X_i ~ P_t such that P=0 with high probability if t = t_0 is close to t Confidence set A region which is a function C_n of data X_1,...,X_n with X_i ~ P_t such that P(t in C_n) = 1-a, for a in (0,1) fixed Likelihood The probability of occurrence of the observed sample given t Log-likelihood The logarithm of the likelihood function Normalised log-likelihood The logarithm of the likelihood function divided by the number of samples MLE An estimator t^^ which maximises the likelihood function given some data Score function The derivative of the log-likelihood Kullbach-Leibler distance The expectation of log(f(X;t_0)) - log(f(X;t)) over P_t_0 |
Skill TwoRegular
A model in which integration wrt x and differentiation wrt t can be freely interchanged Fisher Information The variance of the score function Asymptotic efficiency n Var_t (t^^) -> I(t)^-1 as n -> infinity Consistent An estimator which converges in probability to the true value of t as n -> infinity Almost surely Convergence st P(||X_n - X|| -> 0 as n -> infinity) = 1 In Probablility Convergence st P(||X_n - X|| < e) -> 0 for all e>0, as n -> infinity In distribution Convergence st P(X_n < t) -> P(X < t) as n -> infinity, wherever P(X < t) is continuous in t Observed Fisher information The normalised sum of the derivative of the log-likelihood times its transpose Wald statistics The statistics W_n(t) = n(t^^ - t_0)^T i^^_n (t^^ - t_0) Type I error Rejecting the null hypothesis when it is true Type II error Accepting the null hypothesis when it is false Score test Using the statistic 1/sqrt(n) d/dt l_n(t) which converges to N(0,I^-1(t_0)) in distribution, to test a simple null hypothesis Jeffrey's prior A prior distribution proportional to the square root of the determinant of the fisher information Action space The set of possible actions for a decision rule Decision rule A function from the state space to the action space Risk R(d,t_0) = E_(t_0) [L(d(X),t_0)] Bayes risk R_pi(d) = E_pi[R(d,t)] Posterior risk The expected loss over the posterior distribution Minimax A decision rule which attains the least maximum risk Least favourable A prior whose Bayes' rule has greater Bayes risk than any other prior Inadmissible A decision rule with an alternative rule whose risk is no more than the original rule, and strictly less at some points of t Admissible Not inadmissible Risk set The possible values of the risk given t fixed, over all decision rules James-Stein estimator The estimator for the mean of a normal model of dimension at least 3 given by (1-(p-2)/||X||^2)X Covariance Of real-valued random variables X and Y: E((X-E[X])(Y-E[Y])) Correlation Of real-valued random variables X and Y: the covariance divided by the square root of the product of their individual variances Jacknife bias estimate (n-1)(Sum(T_(-i))/n - T_n)) for a biased estimator T_n |
Skill ThreeImportance sampling
To approximate the mean of g(X), where X ~ f, we sample Z with density h and use g(Z)f(Z)/h(Z) Accept/reject An algorithm to simulate Y ~ f, by generating X ~ h where f <= Mh, and generate U ~ U([0,1]), and accepting X if U < f(X)/Mh(X), if not try again. Empirical distribution F_n(x) = 1/n Sum(indicator{x >= X_i}) given observations X_1,...,X_n Cramer-Rao [Theorem] The variance of an estimator T' of T is at least (d/dT E_T(T'))^2/n I(T), where I is the Fisher information. Continuous mapping [Theorem] If g:S->R is continuous, X_n->X by some manner, then also g(X_n)->g(X), by the same manner. Slutsky [Theorem] If X_n->X, Y_n->c in distribution, then Y_n->c in probability, X_n+Y_n->X+c in distribution, X_n Y_n -> cX in distribution, and if c =/=0, X_n/Y_n -> X/c in distribution WLLN The mean of n observations converges in probability to the expectation SLLN The mean of n observations converges almost surely to the expectation Central Limit [Theorem] sqrt(n) (mean - expectation) -> N(0,Var(X)) Wilks [Theorem] If H_0 has dimension d_0, H_1 dimension d, then the likelihood ratio statistic converges to a chi^2 on d-d_0 Bernstein von Mises [Theorem] If pi is a continuous prior, pi_n the posterior, and P_n ~ N(T_MLE, I(T)^-1/n), then the integral of |pi_n - P_n| -> 0 almost surely as n-> infinity Stein [Theorem] If X ~ N(T,1), g differentiable, bounded, E_T(g'(X))=E_T(g(X)(X-T)) when finite Gliverto-Cantelli [Theorem] The empirical distribution converges uniformly almost surely to the real distribution Kolmogorov-Smirnov [Theorem] sqrt(n)||F_n-F|| -> ||B|| in distribution for a standard Brownian bridge B, F_n an empirical distribution. |
How to randomize Andrew J. Vickers. J Soc Integr Oncol. 2006; 4(4): 194–198.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596474/
Blocking and stratificationSimple randomization has the disadvantage that it is quite possible to have important differences between groups simply by chance. For example, in a 40 patient trial, there is about a 15% chance that there will be an imbalance greater than 24 patients in one group and only 16 in the other. Simple randomization can also lead to imbalances on important prognostic factors.
Typical simple comparative experiment.
We have a group of engineers who are trying to improve the performance of a product and this product is Portland cement. What they've done is they've taken the original recipe for the mortar and they've modified it by adding polymer latex materials in an effort to reduce the setup time or the drying time of the mortar. This has been very successful.
They've observed a very dramatic change in the drying time. So that part of the experiment is over and now what they're looking at is the tension bond strength as adding this material to the recipe changed the bond strength of the cement.
To test this, they have prepared
that they've observed in testing. This sample data will be used for the t-test.
How do we visualize this data?
1. dot diagram
2. stem-and-leaf plot
3. histogram.
1. Dot Diagram: a scale, either horizontal or vertical, portraying the sampled data as dots along that scale.
In this case, the two dot diagrams are stacked on top of each other with the modified mortar dot diagram on top and the unmodified mortar dot diagram on the bottom.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596474/
Blocking and stratificationSimple randomization has the disadvantage that it is quite possible to have important differences between groups simply by chance. For example, in a 40 patient trial, there is about a 15% chance that there will be an imbalance greater than 24 patients in one group and only 16 in the other. Simple randomization can also lead to imbalances on important prognostic factors.
Typical simple comparative experiment.
We have a group of engineers who are trying to improve the performance of a product and this product is Portland cement. What they've done is they've taken the original recipe for the mortar and they've modified it by adding polymer latex materials in an effort to reduce the setup time or the drying time of the mortar. This has been very successful.
They've observed a very dramatic change in the drying time. So that part of the experiment is over and now what they're looking at is the tension bond strength as adding this material to the recipe changed the bond strength of the cement.
To test this, they have prepared
- 10 samples of the modified mortar
- another 10 samples of the unmodified mortar, that's the original recipe
that they've observed in testing. This sample data will be used for the t-test.
How do we visualize this data?
1. dot diagram
2. stem-and-leaf plot
3. histogram.
1. Dot Diagram: a scale, either horizontal or vertical, portraying the sampled data as dots along that scale.
In this case, the two dot diagrams are stacked on top of each other with the modified mortar dot diagram on top and the unmodified mortar dot diagram on the bottom.
- information about the sort of the middle of the data and the spread of the data.
Introduction and Experimental Design Basics: Comparative Experiments and Basic Statistical Concepts
Module 1Module 2 on experimental design is of comparative experiments and then some basic statistical methods that are used to analyze the data from those types of experiments. All of the experiments had only two levels. The factor only had two levels. Here the factor could have multiple levels, more than two.
Random samples and how to summarize data from those samples.
Numerical measures:
1. the sample average or sample mean
2. sample variance
3. the standard deviation,
Graphical methods:
Populations versus samples and parameters of
populations like the population mean or population variance and standard deviation ...How to estimate parameters with the sample data.
.....................................
How to analyze data from simple comparative experiments using the framework "the hypothesis testing framework"
The principle hypothesis testing technique is the two-sample t-test.
Variations of the two-sample t-test is the primary analysis engine that is used for looking at data from these simple experiments.
Checking Assumptions and what the importance of those assumptions are, and how violations of those assumptions might be some threat to the validity of your experiment.
Module 3 on experimental design
Here the factor could have multiple levels, more than two.
The t-test that we studied before, it just doesn't work. It does a really nice job of comparing the means of two factor levels, but it doesn't really work nicely for more than two factor levels. There's no really easy way to make it work. Of course, there are lots of practical situations where there are either more than two factor levels of interest.
Module 4
Typical simple comparative experiment.We have a group of engineers who are trying to improve the performance of a product and this product is Portland cement. What they've done is they've taken the original recipe for the mortar and they've modified it by adding polymer latex materials in an effort to reduce the setup time or the drying time of the mortar. This has been very successful.
They've observed a very dramatic change in the drying time. So that part of the experiment is over and now what they're looking at is the tension bond strength as adding this material to the recipe changed the bond strength of the cement.
To test this, they have prepared
that they've observed in testing. This sample data will be used for the t-test.
How do we visualize this data?
1. dot diagram
2. stem-and-leaf plot
3. histogram.
1. Dot Diagram: a scale, either horizontal or vertical, portraying the sampled data as dots along that scale.
In this case, the two dot diagrams are stacked on top of each other with the modified mortar dot diagram on top and the unmodified mortar dot diagram on the bottom.
You can see that the calculated the average bond strength for both of these formulations. The tension bond strength, y bar 1 16.76, is the modified mortar, and 17.04 is for the modified mortar.
It appears that the average of
the modified mortar is probably a little
lower than the average
of the unmodified mortar and in fact,
the numbers reveal that 16.76 for
the modified mortar and 17.04 for the original recipe.
the spread of the observations is about the same.
That is if you look at the spread here and compare that to the spread here, they're very similar.
So you might suspect that
the averages or means might have
been affected by this change in the recipe, but perhaps not the inherent variability.can get a little busy
and not very easy to
construct or to interpret for larger samples.
Dot DiagramHistogram
This histogram is for 200 observations on metal recovery or yield from a smelting process. You can see that the average metal recovery is somewhere around 70 or 71 or 72 percent
and that there's a fair amount of variability. It goes all the way from about the
low 60s' up to almost 85 percent. But the shape of the distribution is relatively symmetric.
Box Plot construction (spread = whiskers)These are the box plots for the Portland cement data that we've looked at earlier.
When you look at these box plots for our two formulations of the mortar, what do you notice?
Here's a picture that illustrate visually, the hypothesis testing framework.
The two diagrams are probability distributions.
And the probability distribution on the left represents = the population of measurements from factor level 1 or treatment 1.
And on the right = the population of measurements from factor level 2 or treatment level 2.
In our problem,
each of these represent a different formulation of the Portland cement mortar.
Assume that these populations are normal random variables.
They're normally distributed observations.
The mean of sample of population 1 is mu1 and the variance of that distribution is sigma 1 square.
And on the right, those observations are also normally distributed,and that is a normal distribution with mean mu2 and variance sigma 2 square.
So this is the sampling situation.
This is the situation that we assume exist that we're studying.
The key thing here is that we're sampling from a normal distribution.
What we want to investigate is the claim that the means of these two populations are the same.
How we structure that
is in terms of a pair of statistical hypotheses.
H-naught is called the null hypothesis.
And that's the statement that says the two means are indeed equal.
So H-naught, mu1 equal to mu2 is the null hypothesis,
and H1 is the alternative hypothesis and that's the other state of nature.
And in this case, it would be that the two means are not the same.
How do we estimate these parameters?
We have a mean mu in each population and
we have a variance sigma square in each population.
The way we do this is by using the sample average
y-bar to estimate the population mean.
the way you calculate the sample average
is easy.
You simply add all the observations in the sample together and
divide by the sample size n.
calculate the sample variance
One simply computes the differences between each observation in the sample and
the sample average y-bar squares those differences add them up.
And then we divide that sum by n- 1 and that estimates the variance sigma square.
These are straightforward calculations.
Here's the results
So how does the two-sample t-test work?
it uses the sample means to actually draw conclusions or draw inferences about the population means. And specifically, it uses the difference in those two means.
Y-bar 1 minus y-bar 2.
Well, if we plug in the sample data here, the difference in the sample averages y-bar 1- ybar 2 turns out to be -0.28. So that's the difference in the sample means.
The way the t-test works is we then divide that difference in the sample means by the standard deviation of the difference in sample means.
So this ratio becomes a measure of how different the sample means are in standard deviation units. That's how this works.
Well, we know that the standard deviation of an average sigma square of y-bar is sigma square the variance of an individual observation divided by n, the sample size. That's that's basically Statistics that we've probably seen before.The standard deviation of the difference in averages, sigma square of y-bar 1-
y-bar 2 is the sum of those sample variances, sigma 1 square over n1 plus sigma 2 square over n2, as long as the two averages y-bar 1 and y-bar 2 are independent.
So this statement here suggest a statistic of the form that you see here.
This ratio is z-naught is y-bar 1- y-bar 2, that's the difference in sample averages. And the denominator of that ratio is the square root of sigma 1 square over n1 + sigma 2 square over n2. That is the standard deviation of the difference in sample means.
Now, how do we use this information?
Well, if the variances were actually known, if we actually knew sigma 1 square and sigma 2 square, it turns out that this ratio z0 follows a normal distribution.
And in fact, if the two means are equal, if mu1 is equal to mu2, this ratio would have a standard normal distribution. That is a normal distribution with mean 0 and variance 1. And we could use that as the basis of a statistical test. And we're going to see how that works, right now.
Here's the way it works.
Now, we don't know the variances or standard deviations but let's assume we do.
Let's just make up a number.
Let's let sigma 1 and sigma 2 both be equal to 0.3 just for purposes of illustration, okay?
Soon as we know those two numbers, we can plug them into our test statistic z0.
It's what we call z-naught at test statistic.
Okay, we plug in the numbers, we do the arithmetic, and that value of z-naught turns out to be -2.09.
Okay, now here's how we use that information.
Now, we don't know the variances or standard deviations but let's assume we do. Let's just make up a number.
Let's let sigma 1 and sigma 2 both be equal to 0.3 just for
purposes of illustration, okay?
Soon as we know those two numbers, we can plug them into our test statistic z0.
It's what we call z-naught at test statistic.
Okay, we plug in the numbers, we do the arithmetic, and
that value of z-naught turns out to be -2.09.
Okay, now here's how we use that information.
How unusual is this value of z-naught = -2.09 if the two population means are really equal?
Well, remember z-naught, if the means are equal, has a normal 01 distribution.
Well, in a normal 01 distribution, it turns out that 95% of the probability or area under that normal curve falls between the values 1.96 and -1.96.
1.96 is called the upper 2 1/2 percent point of the normal distributions denoted z sub 0.025,
and -1.96 is the lower 0.025 percentage point of the standard normal.Okay, now here's how we use that information.
How unusual is this value of z-naught = -2.09 if the two population means are really equal?
Well, remember z-naught, if the means are equal, has a normal 01 distribution.
Well, in a normal 01 distribution, it turns out that 95% of the probability or area under that normal curve falls between the values 1.96 and -1.96.
1.96 is called the upper 2 1/2 percent point of the normal distributions denoted z sub 0.025,
and -1.96 is the lower 0.025 percentage point of the standard normal.
So if the means are equal, 95% of the time, you would expect to see and observe value of z-naught that's in that interval, -1.96 up to +1.96.
So what about this value that we just calculated, -2.09?
That's pretty unusual, isn't it, if the means are equal.
This is a value that would only occur less than 5% of the time if the population means were equal.
So this is a fairly strong indication that those means are not equal.
You can find these z values from any standard normal table.
Z Score Table.
Normal Distribution Table. Standard Normal Table. This is a standard normal cumulative distribution table, and it plots values of z from 0 up to about 3.99.
Most standard normal tables are organized like this one.
They only give you areas to the left of positive z scores or positive z values.
Now, that isn't really much of a problem, because the normal distribution is symmetric and so areas to the left of a negative z are the same as the areas to the right of a positive z.
So it's very easy to actually use these tables to show you how I got that value of 1.96.
Simply look at the table and scan the table until you find 1.96.
Well, here's 1.96 right there.
The 1.9 row and the .06 column.
And if you look at the entry in the body of the table in 0.975, that is the probability or area to the left of 1.96 on the standard normal curve.
So the upper alpha percentage point z of 0.025 would be 1- that.
So that's the upper 2 1/2 percent point of the standard normal distribution.
And you can use the normal table to calculate these probabilities, or to find these probabilities, or z score values very easily.
So if the variances were known, what would we conclude?
We would conclude that we should reject this null hypothesis, and a statistician would say we would reject this hypothesis, this null hypothesis at the 5% level of significance.
Because the calculated value of -2.09 is outside the + or -1.96 range that corresponds to 5% significance.
This is called a fixed significance level test.
Because we compared the value of the test statistic to a critical value, in this
case 1.96, that we typically select in advance before we run the experiment.
And the standard normal distribution is called the reference distribution for this test.
Now, there's another way to do this.
It's very popular and it's called the P-value approach.
The P-value is basically the observed actual significance level.
And for the Z-test, it's really easy to find the P-value.
And I'll show you how to do that next time.P-value approach.
Okay.our last class,
we talked about testing
these hypotheses about our Portland cement mortar,
and we concluded that we should reject
this null hypothesis at
the five percent level of significance.
So in other words,
we've have pretty strong evidence
here that the alternative hypothesis is true.
We used a procedure that I call up
a fixed significance level test because we suggested or
proposed a critical value of 1.96 that gives me
a five percent chance of being
wrong if I conclude that the means are different,
if the value of the test statistic lies outside
this range of minus 1.96 up to plus 1.96.
This fixed significance level approaches
is very common, very widely used.
But there's another approach
that is also very popular and it's
actually become popular because of using
computer software to do these tests.
That's called the P-value approach and for a Z-test,
its very easy to find the P-value.
Here's how you do it.
Here is your standard normal table
again that I showed you last time.
We want to find the probability
above or the probability that
the standard normal variable is greater than 2.09.
Now, if you think about this for a moment,
you say, "Wait a minute.
Didn't we calculate the Z_0 was equal to minus 2.09?"
Yes, we did, but
our table only contains positive values of Z.
So we need to take the absolute value of that,
and enter the table and find the probability
that is above and is greater than positive 2.09.
So we go into the table and we look
for a value that's greater than 2.09.
There it is,
0.98169.
All right, the 98169.
That is the area to the left of 2.09.
We need the area to the right of 2.09,
and so that is 0.01832.
Just subtract 0.98169 from one.
So the P-value will be twice this probability. Why twice.
Well, it's a two-sided test,
and so you want half of the risk of being wrong
to be on one side of
zero and the other half to be on the other.
So the P-value for
this test is actually twice this computed probability.
Well, twice that probability is 0.03662.
So we would reject this null hypothesis at
any level of significance that is less than 0.03662.
Typically, in most science and engineering applications,
0.05 is used as the cut-off.
Although frankly, there's nothing magic about 0.05,
you could use 0.01 or 0.02 or really any value you want.
This value of 0.05 is basically a risk measure.
It's the risk of you being
wrong when you conclude that the means are different.
Depending on the consequences of that,
you may choose larger or smaller values
of the cut-off depending on the context of the problem.
I believe that in the early stages of experimental work,
where you're really doing a lot of discovery and
you're trying to find
out which factors in a system might be important,
you could be a lot more
liberal with your choice of a cut-off.
You could use 0.1 or you could use
even 0.15 in some cases.
But the problem is if you
wrongly conclude that a factor isn't
important early on in research work,
quite frequently what happens is that factor is then
ignored and we don't pay
any attention to it for the rest of the work.
If the factor really turns out to be important,
that could have negative consequences on our work.
So making some, what we call type I errors,
that is concluding that
factors are important when they really
aren't in the early stages of research work,
that's typically not that big a problem because
ultimately we will figure
out that factors are important or not,
but you don't want to throw away
all useful one too early.
Now, the Z-test, which we've just described works great if you know what the two population variances are, but we don't.
If we knew them, we'd be in great shape.
But what if you just plugged in
the sample variances instead?
Instead of Sigma_1 square in your Z-statistic,
plug in s_1 square,
and instead of Sigma_2 square in your Z-statistic,
plug in s_2 square.
Well, if the sample sizes
are large enough, this works okay.
By large, I mean that the sample sizes for both of
your samples would have to be at least about 30,
some people say 40.
In other words, the Z-test is
a very good large-sample test
for the difference in means.
So if the sample size is big,
whether you know the variances or not,
is not as big a deal.
But many times that isn't possible because your sample size is small. In fact, Gosset actually wrote a paper on the probable error of a mean.
said,
"But what if the sample size is small?"
Well, it turns out if the sample size is small,
you can't use this normal 0,1
distribution as your reference distribution anymore.
So let's talk about using
s_1 square and s_2 square to estimate the two variances.
Well, now your previous ratio, your Z-statistic now changes. it looks like this. Instead of Sigmas, it's got Ss. But now remember, we're talking about the case where these variances are assumed to be equal.
So let's combine or pool the individual sample variances to get a single number.
What you see down at the bottom of this slide is
the pooled estimate of variance S square_P.
The way this is done it's a weighted average.
We simply combine the two sample variances,
s_1 squared and s_2 squared,
in proportion to the sample sizes.
So this is a pooled estimate of
variance and when we plug that in,
then we get the test statistic for the two-sample T-test,
or some people call this
the pooled t-test because
we've used this pooled estimator variance.
It works a lot like the Z-test that we described earlier.
Values of t_0 that are close to
zero are consistent with the null hypothesis.
Values of t_0 that are very different from
zero are consistent with the alternative.
So t_0 is a distance measure,
just like the Z-statistic was.
It measures how far apart
the averages are in standard deviation units.
You can interpret t_0 as a signal-to-noise ratio.
The numerator is a signal
that's being generated by
your sample data from your experiment,
and this thing down in
the bottom is a measure of variability,
scatter or spread or noise.
So when you think of t_0 as a signal-to-noise ratio.
So here's how we perform
the two-sample or pooled t-test
for the Portland cement problem.
First of all, we have to calculate S square of
P. That's straightforward and we
get a calculated value of
0.081 and the square root of that is 0.284.
So now, we substitute that into our test statistic t_0,
and we get minus
2.20 as the computed value of our test statistic.
So the two sample means are
a little bit more than two standard deviations apart.
Is this a large difference?
In other words, how unusual is
this value if the means are really equal?
Well, that's the question of course, Gosset answered. Gosset developed the T-test as the way to specifically answer this question.
Here's a picture of a t-distribution.
The t-distribution looks a lot like the normal distribution,
it's symmetric around zero.
It has a little bit more spread in the tails than the normal distribution.
In this case, the spread in the t-distribution is controlled by something
called the number of degrees of freedom on T.
The number of degrees of freedom on T here would be the sum of the two sample sizes, N_1 plus N_2 minus 2. So it'd be 18.
We can use a table of the t-distribution to find, let's say, the two-and-a-half percent point of T with 18 degrees of freedom, and that value turns out to be 2.101.
So minus 2.101 and plus 2.101 would be the boundaries of what we call the critical region for our test. T_0, the computed value of our test statistic falls into that lower critical region. So we would end up rejecting that null hypothesis. Here's the t-distribution table.
The rows are the number of degrees of freedom on the test and then the tail areas are the column headings. So we had 18 degrees of freedom and we want the 0.025 level. So that's two-and-a-half percent area in the upper tail and the T-value there is shown to be 2.101. So that's where that value came from.
In other words, a value of t_0 from your sample data that lies between minus 2.101 and plus 2.101 would be consistent with equality of means. It is of course possible that the means are equal and t_0 lies outside that range, but it's a rare event. So typically, when we find the value of t_0 that falls in that prescribed critical region, we reject the null hypothesis. By the way, you can also use a P-value approach to doing this, and we'll get into that in the not distant future.
Okay, thanks for listening, and we'll resume next time.
Pooled t-test and Two-sample t-test, pt 2
Last time, we were talking about the two-sample or the pooled t-test and we looked at our Portland cement mortar problem from that perspective.
We saw that the computed value of the two-sample t-test statistic was minus 2.20 and that fell into the lower critical region of our t-distribution with 18 degrees of freedom.
Now that was a fixed significance level test because that shows
five percent to generate those critical values of 2.101 and minus 2.101, but the p-value is the probability or area in the tails
beyond 2.20 and above plus 2.20 because it's a two-sided test. The p-value can be found in most cases by computers.
It is the risk of wrongly rejecting the null hypothesis of equal means,in other words, it measures how unusual the event is.
The exact p-value in our problem turns out to be 0.042 and I've found that from a computer program, but you can approximate the p-value with a t-table.
Most t-tables only give probabilities greater than positive values of t. So just like we had to do with the normal distribution z-statistic,
take the absolute value of t0,which is minus 2.20, and turn it into a positive 2.20. Now with the value of 18 degrees of freedom, go into the t-table and see if you can find an exact value of 2.20. Well, you can't, but you can find values that bracket that.
2.101 is less than2.20 and 2.552 is greater than 2.20. So you can bracket this value quite nicely. The right tail probability for the smaller value 2.101 is 0.025 and for the larger value 2.552 is 0.01. Now you have to double those because this is a two-sided test.
So the p-value has to lie between 0.05 and 0.02. Those are lower and upper bounds on the p-value, and we know that the actual p-value turns out to be 0.042.
We find that from a computer program. Here is some two-sample t-test results
from computer software. The upper part of this tableis the output from a product called Minitab,which is a very nice,
very useful product for analyzing data. It's a good statistics package and the output we're seeing there is the two-sample t-test
for the Portland cement data. If you look through that output, you will find the estimated difference and you will find the value of the t-statistic minus 2.19.Now I got minus 2.20. The computer carried a few more decimal places than I
did and it has 18 degrees of freedom and the p-value is 0.042. At the bottom of the output table is the output from jump and once again, the calculation is very similar. The t-ratio is 2.186.
Notice it's positive instead of negative because the software subtracted them in a different order than I did, and then it gives you the standard error of the difference, that's the bottom of the t-ratio, 18 degrees of freedom, and here is the probability that the computed value is
greater than the absolute value of t,it's 0.0422, that is the two-sided p-value for this problem.So this is what computer output looks like and you're going to get some guidance onhow to actually use the software to obtain these numbers in another class. Checking assumptions in the t-test.
Now remember we're assuming that the observations come from a normal distribution and we
have also assumed that the variance of those normal distributions is the same. So we have two normal distributions with equal variances,
but possibly unequal means. How do we check those assumptions? Well, an easy way,
a convenient way to do that is with normal probability plotting. Here is a normal probability plot of the tension bond strength data from both samples of our Portland cement experiment. The solid dots are the modified mortar and the little rectangular plotting positions, those are the unmodified mortar.
Now when you look at this normal probability plot, the first thing I think that I see is that both of these samples tend to lie along straight lines and remember, in a normal probability plot, if the sample data does lie approximately along a straight line,that's some reasonable evidence that the samples are drawn from a normal distribution. So normality seems to be reasonable here.
It turns out that on the normal probability plot, the slope of the straight line is proportional to standard deviation.
So if the straight lines have similar or nearly identical slopes, then you feel pretty good about the assumption of constant variance. When I look at these plots, these lines, it looks to me like the slope of these two lines is very, very similar. Now if you're drawing these plots and interpreting them by hand, I always urge people to concentrate on the central portion of the plots when you visualize the straight line. Don't get too carried away with the tails because the bulk of the probability is in the center of the plot and that's what you want to use in deciding where to draw the straight line. How important are these assumptions? Well, the normality assumption is only moderately important. The t-test works pretty well even for moderate departures from normality. As long as the population is reasonably symmetric and reasonably unimodal, you're not going to have any real problems with the t-test. It's pretty robust to the normality assumption.
The constant variance assumption is more important. If you inadvertently make a wrong assumption there, it tends to impact the sensitivity of the test.
Its ability to detect differences is negatively impacted by that. So that's a more important assumption.
How important are these assumptions?
Well, I think that its biggest value is that
The 100 times 1 minus Alpha percent confidence interval on the difference in two means, assuming of course that we have identical variances, is given by the equation at the bottom of the slide.
You'll notice that S sub p is used for the standard deviation. Then these are the lower and upper Alpha over 2 percentage points of the t-distribution with n1 plus n2 minus 2 degrees of freedom. We can actually calculate these intervals pretty easily for the Portland cement mortar formulation problem. We know everything, we know the two-sample averages, we know S square sub p, and we know that the appropriate t-percentile, the appropriate two and a half percentile point of t with 18 degrees of freedom is 2.101. So plugging those numbers into our confidence interval equation gives us a 95 percent confidence interval on the difference in means for minus 0.55, that's the lower bound, up to minus 0.01. Another way to say that is the confidence interval is minus 0.28 plus or minus 0.27 or the difference
in means strands for the accuracy of this interval is plus or minus 0.27. Notice, in looking at this interval, that zero is not in the interval. That's because we rejected the null hypothesis of equal means at the five percent level and this is a 95 percent confidence interval. If we had been unable to reject that null hypothesis at the five percent level, this interval would have included zero. So looking at a confidence interval, isn't the since another way to look at a hypothesis test? If the null hypothesis is not rejected, then the confidence interval on that parameter will include zero. So this is an alternate way to look at the results of the experiment and bet little bit more information. By the way, if you look at the computer output that we had earlier, both of these computer packages report a confidence interval. Minitab reports the confidence interval here and jump reports the confidence interval here. The signs are different on the lower and upper bound between these two computer outputs. Why is that? Well, that's because Jump did the calculations by subtracting the difference in means in a different order than Minitab did.
Module 1Module 2 on experimental design is of comparative experiments and then some basic statistical methods that are used to analyze the data from those types of experiments. All of the experiments had only two levels. The factor only had two levels. Here the factor could have multiple levels, more than two.
Random samples and how to summarize data from those samples.
Numerical measures:
1. the sample average or sample mean
2. sample variance
3. the standard deviation,
Graphical methods:
Populations versus samples and parameters of
populations like the population mean or population variance and standard deviation ...How to estimate parameters with the sample data.
.....................................
How to analyze data from simple comparative experiments using the framework "the hypothesis testing framework"
The principle hypothesis testing technique is the two-sample t-test.
Variations of the two-sample t-test is the primary analysis engine that is used for looking at data from these simple experiments.
Checking Assumptions and what the importance of those assumptions are, and how violations of those assumptions might be some threat to the validity of your experiment.
Module 3 on experimental design
Here the factor could have multiple levels, more than two.
The t-test that we studied before, it just doesn't work. It does a really nice job of comparing the means of two factor levels, but it doesn't really work nicely for more than two factor levels. There's no really easy way to make it work. Of course, there are lots of practical situations where there are either more than two factor levels of interest.
Module 4
Typical simple comparative experiment.We have a group of engineers who are trying to improve the performance of a product and this product is Portland cement. What they've done is they've taken the original recipe for the mortar and they've modified it by adding polymer latex materials in an effort to reduce the setup time or the drying time of the mortar. This has been very successful.
They've observed a very dramatic change in the drying time. So that part of the experiment is over and now what they're looking at is the tension bond strength as adding this material to the recipe changed the bond strength of the cement.
To test this, they have prepared
- 10 samples of the modified mortar
- another 10 samples of the unmodified mortar, that's the original recipe
that they've observed in testing. This sample data will be used for the t-test.
How do we visualize this data?
1. dot diagram
2. stem-and-leaf plot
3. histogram.
1. Dot Diagram: a scale, either horizontal or vertical, portraying the sampled data as dots along that scale.
In this case, the two dot diagrams are stacked on top of each other with the modified mortar dot diagram on top and the unmodified mortar dot diagram on the bottom.
- information about the sort of the middle of the data and the spread of the data.
You can see that the calculated the average bond strength for both of these formulations. The tension bond strength, y bar 1 16.76, is the modified mortar, and 17.04 is for the modified mortar.
It appears that the average of
the modified mortar is probably a little
lower than the average
of the unmodified mortar and in fact,
the numbers reveal that 16.76 for
the modified mortar and 17.04 for the original recipe.
the spread of the observations is about the same.
That is if you look at the spread here and compare that to the spread here, they're very similar.
So you might suspect that
the averages or means might have
been affected by this change in the recipe, but perhaps not the inherent variability.can get a little busy
and not very easy to
construct or to interpret for larger samples.
Dot DiagramHistogram
- with small samples, no more than about 20 or maybe 30 observations,
- some people call these box whisker plots.
- There are some rules about drawing these things that incorporate outliers or unusual vantage. But we're not going to get into those.
- the sample sizes for both of your samples would have to be at least about 30, some people say 40.
- if you know what the two population variances are
- larger samples. This is a large sample tool, the histogram.
- between 50 and 100 observations
- get a lot of information from a histogram. But the reason you use histograms with large samples rather than small samples is
that the shape of a histogram is dramatically impacted by the number of bins that you choose and the width of those bins. If you have small samples, small changes in those parameters can dramatically affect the shape of the histogram. So that's why they work better with large samples.
- reference distributions.
- In general, the degrees of freedom for a t-test will always be equal to the number of degrees of freedom associated with the variance estimate in the test statistic.
- Assumptions of 2 Sample T Test: Samples are collected randomly from 2 pops, the measured variable is continuous and normally distributed, measured on interval or ratio scale, powerful and robust
- 2 Sample t test
Determines differences between 2 independent sample means from 2 different populations
- divide the range of your variable into intervals or bins and usually these bins are of equal length or equal width and then we
- count the number of observations that fall in each one of these bins.
- Then we draw a diagram, where the horizontal scale is the variable of interest and then either the frequency or relative frequency of the counts is the horizontal scale.
This histogram is for 200 observations on metal recovery or yield from a smelting process. You can see that the average metal recovery is somewhere around 70 or 71 or 72 percent
and that there's a fair amount of variability. It goes all the way from about the
low 60s' up to almost 85 percent. But the shape of the distribution is relatively symmetric.
Box Plot construction (spread = whiskers)These are the box plots for the Portland cement data that we've looked at earlier.
- have either a vertical or a horizontal box. These are vertical boxes. The lower edge of the box corresponds to the 25th percentile of the sample data and the upper edge of the box corresponds to the 75th percentile. The land in the middle is the 50th percentile or the median. These lands that extend from the ends of the box are called whiskers.
- The whiskers extend to the minimum and maximum values that were observed in the sample.
When you look at these box plots for our two formulations of the mortar, what do you notice?
- in both cases the median, LAN, is in about the middle of the box.
- All right, that tells you that the sample is probably drawn from a symmetric distribution.
- if you look at the length of the boxes, including the whiskers, they're about the same on both of these displays. So that's an indication that the variability in the two populations are probably very similar.
- The other thing that you notice is that the central tendency of the unmodified mortar does appear to be higher than the central tendency of the modified recipe. That's kind of an important issue because if adding this material to the recipe really greatly improved the cure time, this was a victory. But if it has a negative impact on strength, this may affect the usability of the product. So probably what one would want to do after seeing this data is to investigate whether or not there is statistical evidence to support the claim that the main tension bond strength in these two recipes is the same and that's the problem that we'll start to address next.
Here's a picture that illustrate visually, the hypothesis testing framework.
The two diagrams are probability distributions.
And the probability distribution on the left represents = the population of measurements from factor level 1 or treatment 1.
And on the right = the population of measurements from factor level 2 or treatment level 2.
In our problem,
each of these represent a different formulation of the Portland cement mortar.
Assume that these populations are normal random variables.
They're normally distributed observations.
The mean of sample of population 1 is mu1 and the variance of that distribution is sigma 1 square.
And on the right, those observations are also normally distributed,and that is a normal distribution with mean mu2 and variance sigma 2 square.
So this is the sampling situation.
This is the situation that we assume exist that we're studying.
The key thing here is that we're sampling from a normal distribution.
What we want to investigate is the claim that the means of these two populations are the same.
How we structure that
is in terms of a pair of statistical hypotheses.
H-naught is called the null hypothesis.
And that's the statement that says the two means are indeed equal.
So H-naught, mu1 equal to mu2 is the null hypothesis,
and H1 is the alternative hypothesis and that's the other state of nature.
And in this case, it would be that the two means are not the same.
How do we estimate these parameters?
We have a mean mu in each population and
we have a variance sigma square in each population.
The way we do this is by using the sample average
y-bar to estimate the population mean.
the way you calculate the sample average
is easy.
You simply add all the observations in the sample together and
divide by the sample size n.
calculate the sample variance
One simply computes the differences between each observation in the sample and
the sample average y-bar squares those differences add them up.
And then we divide that sum by n- 1 and that estimates the variance sigma square.
These are straightforward calculations.
Here's the results
- For the new recipe, the modified mortar, the average bond strength is 16.76 and the sample variance is 0.1.
- The sample standard deviation is 0.316 and of course the sample size was 10.
- For the unmodified mortar, the original recipe, the sample average y-bar 2 is 17.06 and the sample variance is 0.061.
- And the sample standard deviation is 0.248.
- Again, the sample size is 10.
- Notice that those two standard deviations are not exactly the same but they're fairly close together and that's consistent with what we saw in the dot diagrams and in the stem-and-leaf plots for these two samples.
- We saw that there was a pretty sizably noticeable difference in the averages or in the means but the spread or variability in the samples was pretty similar.
So how does the two-sample t-test work?
- We're going to use the two-sample t-test to test this null hypothesis that says,
- the two means mu1 and mu2 are equal.
it uses the sample means to actually draw conclusions or draw inferences about the population means. And specifically, it uses the difference in those two means.
Y-bar 1 minus y-bar 2.
Well, if we plug in the sample data here, the difference in the sample averages y-bar 1- ybar 2 turns out to be -0.28. So that's the difference in the sample means.
The way the t-test works is we then divide that difference in the sample means by the standard deviation of the difference in sample means.
So this ratio becomes a measure of how different the sample means are in standard deviation units. That's how this works.
Well, we know that the standard deviation of an average sigma square of y-bar is sigma square the variance of an individual observation divided by n, the sample size. That's that's basically Statistics that we've probably seen before.The standard deviation of the difference in averages, sigma square of y-bar 1-
y-bar 2 is the sum of those sample variances, sigma 1 square over n1 plus sigma 2 square over n2, as long as the two averages y-bar 1 and y-bar 2 are independent.
- And I think we can comfortably assume independence here, because these are two completely different samples that were generated at different times.
- They're random samples and the treatments were applied essentially in random sequence.
- So independence is probably a very reasonable assumption here.
So this statement here suggest a statistic of the form that you see here.
This ratio is z-naught is y-bar 1- y-bar 2, that's the difference in sample averages. And the denominator of that ratio is the square root of sigma 1 square over n1 + sigma 2 square over n2. That is the standard deviation of the difference in sample means.
Now, how do we use this information?
Well, if the variances were actually known, if we actually knew sigma 1 square and sigma 2 square, it turns out that this ratio z0 follows a normal distribution.
And in fact, if the two means are equal, if mu1 is equal to mu2, this ratio would have a standard normal distribution. That is a normal distribution with mean 0 and variance 1. And we could use that as the basis of a statistical test. And we're going to see how that works, right now.
Here's the way it works.
Now, we don't know the variances or standard deviations but let's assume we do.
Let's just make up a number.
Let's let sigma 1 and sigma 2 both be equal to 0.3 just for purposes of illustration, okay?
Soon as we know those two numbers, we can plug them into our test statistic z0.
It's what we call z-naught at test statistic.
Okay, we plug in the numbers, we do the arithmetic, and that value of z-naught turns out to be -2.09.
Okay, now here's how we use that information.
Now, we don't know the variances or standard deviations but let's assume we do. Let's just make up a number.
Let's let sigma 1 and sigma 2 both be equal to 0.3 just for
purposes of illustration, okay?
Soon as we know those two numbers, we can plug them into our test statistic z0.
It's what we call z-naught at test statistic.
Okay, we plug in the numbers, we do the arithmetic, and
that value of z-naught turns out to be -2.09.
Okay, now here's how we use that information.
How unusual is this value of z-naught = -2.09 if the two population means are really equal?
Well, remember z-naught, if the means are equal, has a normal 01 distribution.
Well, in a normal 01 distribution, it turns out that 95% of the probability or area under that normal curve falls between the values 1.96 and -1.96.
1.96 is called the upper 2 1/2 percent point of the normal distributions denoted z sub 0.025,
and -1.96 is the lower 0.025 percentage point of the standard normal.Okay, now here's how we use that information.
How unusual is this value of z-naught = -2.09 if the two population means are really equal?
Well, remember z-naught, if the means are equal, has a normal 01 distribution.
Well, in a normal 01 distribution, it turns out that 95% of the probability or area under that normal curve falls between the values 1.96 and -1.96.
1.96 is called the upper 2 1/2 percent point of the normal distributions denoted z sub 0.025,
and -1.96 is the lower 0.025 percentage point of the standard normal.
So if the means are equal, 95% of the time, you would expect to see and observe value of z-naught that's in that interval, -1.96 up to +1.96.
So what about this value that we just calculated, -2.09?
That's pretty unusual, isn't it, if the means are equal.
This is a value that would only occur less than 5% of the time if the population means were equal.
So this is a fairly strong indication that those means are not equal.
You can find these z values from any standard normal table.
Z Score Table.
Normal Distribution Table. Standard Normal Table. This is a standard normal cumulative distribution table, and it plots values of z from 0 up to about 3.99.
Most standard normal tables are organized like this one.
They only give you areas to the left of positive z scores or positive z values.
Now, that isn't really much of a problem, because the normal distribution is symmetric and so areas to the left of a negative z are the same as the areas to the right of a positive z.
So it's very easy to actually use these tables to show you how I got that value of 1.96.
Simply look at the table and scan the table until you find 1.96.
Well, here's 1.96 right there.
The 1.9 row and the .06 column.
And if you look at the entry in the body of the table in 0.975, that is the probability or area to the left of 1.96 on the standard normal curve.
So the upper alpha percentage point z of 0.025 would be 1- that.
So that's the upper 2 1/2 percent point of the standard normal distribution.
And you can use the normal table to calculate these probabilities, or to find these probabilities, or z score values very easily.
So if the variances were known, what would we conclude?
We would conclude that we should reject this null hypothesis, and a statistician would say we would reject this hypothesis, this null hypothesis at the 5% level of significance.
Because the calculated value of -2.09 is outside the + or -1.96 range that corresponds to 5% significance.
This is called a fixed significance level test.
Because we compared the value of the test statistic to a critical value, in this
case 1.96, that we typically select in advance before we run the experiment.
And the standard normal distribution is called the reference distribution for this test.
Now, there's another way to do this.
It's very popular and it's called the P-value approach.
The P-value is basically the observed actual significance level.
And for the Z-test, it's really easy to find the P-value.
And I'll show you how to do that next time.P-value approach.
Okay.our last class,
we talked about testing
these hypotheses about our Portland cement mortar,
and we concluded that we should reject
this null hypothesis at
the five percent level of significance.
So in other words,
we've have pretty strong evidence
here that the alternative hypothesis is true.
We used a procedure that I call up
a fixed significance level test because we suggested or
proposed a critical value of 1.96 that gives me
a five percent chance of being
wrong if I conclude that the means are different,
if the value of the test statistic lies outside
this range of minus 1.96 up to plus 1.96.
This fixed significance level approaches
is very common, very widely used.
But there's another approach
that is also very popular and it's
actually become popular because of using
computer software to do these tests.
That's called the P-value approach and for a Z-test,
its very easy to find the P-value.
Here's how you do it.
Here is your standard normal table
again that I showed you last time.
We want to find the probability
above or the probability that
the standard normal variable is greater than 2.09.
Now, if you think about this for a moment,
you say, "Wait a minute.
Didn't we calculate the Z_0 was equal to minus 2.09?"
Yes, we did, but
our table only contains positive values of Z.
So we need to take the absolute value of that,
and enter the table and find the probability
that is above and is greater than positive 2.09.
So we go into the table and we look
for a value that's greater than 2.09.
There it is,
0.98169.
All right, the 98169.
That is the area to the left of 2.09.
We need the area to the right of 2.09,
and so that is 0.01832.
Just subtract 0.98169 from one.
So the P-value will be twice this probability. Why twice.
Well, it's a two-sided test,
and so you want half of the risk of being wrong
to be on one side of
zero and the other half to be on the other.
So the P-value for
this test is actually twice this computed probability.
Well, twice that probability is 0.03662.
So we would reject this null hypothesis at
any level of significance that is less than 0.03662.
Typically, in most science and engineering applications,
0.05 is used as the cut-off.
Although frankly, there's nothing magic about 0.05,
you could use 0.01 or 0.02 or really any value you want.
This value of 0.05 is basically a risk measure.
It's the risk of you being
wrong when you conclude that the means are different.
Depending on the consequences of that,
you may choose larger or smaller values
of the cut-off depending on the context of the problem.
I believe that in the early stages of experimental work,
where you're really doing a lot of discovery and
you're trying to find
out which factors in a system might be important,
you could be a lot more
liberal with your choice of a cut-off.
You could use 0.1 or you could use
even 0.15 in some cases.
But the problem is if you
wrongly conclude that a factor isn't
important early on in research work,
quite frequently what happens is that factor is then
ignored and we don't pay
any attention to it for the rest of the work.
If the factor really turns out to be important,
that could have negative consequences on our work.
So making some, what we call type I errors,
that is concluding that
factors are important when they really
aren't in the early stages of research work,
that's typically not that big a problem because
ultimately we will figure
out that factors are important or not,
but you don't want to throw away
all useful one too early.
Now, the Z-test, which we've just described works great if you know what the two population variances are, but we don't.
If we knew them, we'd be in great shape.
But what if you just plugged in
the sample variances instead?
Instead of Sigma_1 square in your Z-statistic,
plug in s_1 square,
and instead of Sigma_2 square in your Z-statistic,
plug in s_2 square.
Well, if the sample sizes
are large enough, this works okay.
By large, I mean that the sample sizes for both of
your samples would have to be at least about 30,
some people say 40.
In other words, the Z-test is
a very good large-sample test
for the difference in means.
So if the sample size is big,
whether you know the variances or not,
is not as big a deal.
But many times that isn't possible because your sample size is small. In fact, Gosset actually wrote a paper on the probable error of a mean.
said,
"But what if the sample size is small?"
Well, it turns out if the sample size is small,
you can't use this normal 0,1
distribution as your reference distribution anymore.
So let's talk about using
s_1 square and s_2 square to estimate the two variances.
Well, now your previous ratio, your Z-statistic now changes. it looks like this. Instead of Sigmas, it's got Ss. But now remember, we're talking about the case where these variances are assumed to be equal.
So let's combine or pool the individual sample variances to get a single number.
What you see down at the bottom of this slide is
the pooled estimate of variance S square_P.
The way this is done it's a weighted average.
We simply combine the two sample variances,
s_1 squared and s_2 squared,
in proportion to the sample sizes.
So this is a pooled estimate of
variance and when we plug that in,
then we get the test statistic for the two-sample T-test,
or some people call this
the pooled t-test because
we've used this pooled estimator variance.
It works a lot like the Z-test that we described earlier.
Values of t_0 that are close to
zero are consistent with the null hypothesis.
Values of t_0 that are very different from
zero are consistent with the alternative.
So t_0 is a distance measure,
just like the Z-statistic was.
It measures how far apart
the averages are in standard deviation units.
You can interpret t_0 as a signal-to-noise ratio.
The numerator is a signal
that's being generated by
your sample data from your experiment,
and this thing down in
the bottom is a measure of variability,
scatter or spread or noise.
So when you think of t_0 as a signal-to-noise ratio.
So here's how we perform
the two-sample or pooled t-test
for the Portland cement problem.
First of all, we have to calculate S square of
P. That's straightforward and we
get a calculated value of
0.081 and the square root of that is 0.284.
So now, we substitute that into our test statistic t_0,
and we get minus
2.20 as the computed value of our test statistic.
So the two sample means are
a little bit more than two standard deviations apart.
Is this a large difference?
In other words, how unusual is
this value if the means are really equal?
Well, that's the question of course, Gosset answered. Gosset developed the T-test as the way to specifically answer this question.
Here's a picture of a t-distribution.
The t-distribution looks a lot like the normal distribution,
it's symmetric around zero.
It has a little bit more spread in the tails than the normal distribution.
In this case, the spread in the t-distribution is controlled by something
called the number of degrees of freedom on T.
The number of degrees of freedom on T here would be the sum of the two sample sizes, N_1 plus N_2 minus 2. So it'd be 18.
We can use a table of the t-distribution to find, let's say, the two-and-a-half percent point of T with 18 degrees of freedom, and that value turns out to be 2.101.
So minus 2.101 and plus 2.101 would be the boundaries of what we call the critical region for our test. T_0, the computed value of our test statistic falls into that lower critical region. So we would end up rejecting that null hypothesis. Here's the t-distribution table.
The rows are the number of degrees of freedom on the test and then the tail areas are the column headings. So we had 18 degrees of freedom and we want the 0.025 level. So that's two-and-a-half percent area in the upper tail and the T-value there is shown to be 2.101. So that's where that value came from.
In other words, a value of t_0 from your sample data that lies between minus 2.101 and plus 2.101 would be consistent with equality of means. It is of course possible that the means are equal and t_0 lies outside that range, but it's a rare event. So typically, when we find the value of t_0 that falls in that prescribed critical region, we reject the null hypothesis. By the way, you can also use a P-value approach to doing this, and we'll get into that in the not distant future.
Okay, thanks for listening, and we'll resume next time.
Pooled t-test and Two-sample t-test, pt 2
Last time, we were talking about the two-sample or the pooled t-test and we looked at our Portland cement mortar problem from that perspective.
We saw that the computed value of the two-sample t-test statistic was minus 2.20 and that fell into the lower critical region of our t-distribution with 18 degrees of freedom.
Now that was a fixed significance level test because that shows
five percent to generate those critical values of 2.101 and minus 2.101, but the p-value is the probability or area in the tails
beyond 2.20 and above plus 2.20 because it's a two-sided test. The p-value can be found in most cases by computers.
It is the risk of wrongly rejecting the null hypothesis of equal means,in other words, it measures how unusual the event is.
The exact p-value in our problem turns out to be 0.042 and I've found that from a computer program, but you can approximate the p-value with a t-table.
Most t-tables only give probabilities greater than positive values of t. So just like we had to do with the normal distribution z-statistic,
take the absolute value of t0,which is minus 2.20, and turn it into a positive 2.20. Now with the value of 18 degrees of freedom, go into the t-table and see if you can find an exact value of 2.20. Well, you can't, but you can find values that bracket that.
2.101 is less than2.20 and 2.552 is greater than 2.20. So you can bracket this value quite nicely. The right tail probability for the smaller value 2.101 is 0.025 and for the larger value 2.552 is 0.01. Now you have to double those because this is a two-sided test.
So the p-value has to lie between 0.05 and 0.02. Those are lower and upper bounds on the p-value, and we know that the actual p-value turns out to be 0.042.
We find that from a computer program. Here is some two-sample t-test results
from computer software. The upper part of this tableis the output from a product called Minitab,which is a very nice,
very useful product for analyzing data. It's a good statistics package and the output we're seeing there is the two-sample t-test
for the Portland cement data. If you look through that output, you will find the estimated difference and you will find the value of the t-statistic minus 2.19.Now I got minus 2.20. The computer carried a few more decimal places than I
did and it has 18 degrees of freedom and the p-value is 0.042. At the bottom of the output table is the output from jump and once again, the calculation is very similar. The t-ratio is 2.186.
Notice it's positive instead of negative because the software subtracted them in a different order than I did, and then it gives you the standard error of the difference, that's the bottom of the t-ratio, 18 degrees of freedom, and here is the probability that the computed value is
greater than the absolute value of t,it's 0.0422, that is the two-sided p-value for this problem.So this is what computer output looks like and you're going to get some guidance onhow to actually use the software to obtain these numbers in another class. Checking assumptions in the t-test.
Now remember we're assuming that the observations come from a normal distribution and we
have also assumed that the variance of those normal distributions is the same. So we have two normal distributions with equal variances,
but possibly unequal means. How do we check those assumptions? Well, an easy way,
a convenient way to do that is with normal probability plotting. Here is a normal probability plot of the tension bond strength data from both samples of our Portland cement experiment. The solid dots are the modified mortar and the little rectangular plotting positions, those are the unmodified mortar.
Now when you look at this normal probability plot, the first thing I think that I see is that both of these samples tend to lie along straight lines and remember, in a normal probability plot, if the sample data does lie approximately along a straight line,that's some reasonable evidence that the samples are drawn from a normal distribution. So normality seems to be reasonable here.
It turns out that on the normal probability plot, the slope of the straight line is proportional to standard deviation.
So if the straight lines have similar or nearly identical slopes, then you feel pretty good about the assumption of constant variance. When I look at these plots, these lines, it looks to me like the slope of these two lines is very, very similar. Now if you're drawing these plots and interpreting them by hand, I always urge people to concentrate on the central portion of the plots when you visualize the straight line. Don't get too carried away with the tails because the bulk of the probability is in the center of the plot and that's what you want to use in deciding where to draw the straight line. How important are these assumptions? Well, the normality assumption is only moderately important. The t-test works pretty well even for moderate departures from normality. As long as the population is reasonably symmetric and reasonably unimodal, you're not going to have any real problems with the t-test. It's pretty robust to the normality assumption.
The constant variance assumption is more important. If you inadvertently make a wrong assumption there, it tends to impact the sensitivity of the test.
Its ability to detect differences is negatively impacted by that. So that's a more important assumption.
How important are these assumptions?
- Well, the normality assumption is only moderately important.
- The t-test works pretty well even for moderate departures from normality.
- As long as the population is reasonably symmetric and reasonably unimodal, you're not going to have any real problems with the t-test.
- It's pretty robust to the normality assumption.
- The constant variance assumption is more important. If you inadvertently make a wrong assumption there, it tends to impact the sensitivity of the test. Its ability to detect differences is negatively impacted by that.
- So that's a more important assumption.
Well, I think that its biggest value is that
- for simple comparative experiments, it gives you an objective basis for making decisions.
- It removes judgment from the decision-making process and that's really important. That's the value of statistics in experimental work, is that it lends scientific objectivity to our analysis of the data.
- The t-test is quite versatile. It could be used for a lot of things. One of the things it could be used for would be to test all of the relevant hypotheses in a two level factorial design because remember all of those hypotheses involve comparing the mean response on one side of the cube to the main response on the other side of the cube, remember that discussion.
- Finally, one of the things that we sometimes like to do in addition to a test of hypothesis is to construct a Confidence Interval on the difference in means and this is because while hypothesis testing gives you an objective statement concerning either the means are different or they're not, but it doesn't really specify how different they are. That's what Confidence Intervals do.
- Confidence interval: A useful way to interpret data from designed experiments.
- A confidence interval is typically a statement of the form that you see here.
- The parameters Theta and L and U are called the lower and upper confidence limits.
The 100 times 1 minus Alpha percent confidence interval on the difference in two means, assuming of course that we have identical variances, is given by the equation at the bottom of the slide.
You'll notice that S sub p is used for the standard deviation. Then these are the lower and upper Alpha over 2 percentage points of the t-distribution with n1 plus n2 minus 2 degrees of freedom. We can actually calculate these intervals pretty easily for the Portland cement mortar formulation problem. We know everything, we know the two-sample averages, we know S square sub p, and we know that the appropriate t-percentile, the appropriate two and a half percentile point of t with 18 degrees of freedom is 2.101. So plugging those numbers into our confidence interval equation gives us a 95 percent confidence interval on the difference in means for minus 0.55, that's the lower bound, up to minus 0.01. Another way to say that is the confidence interval is minus 0.28 plus or minus 0.27 or the difference
in means strands for the accuracy of this interval is plus or minus 0.27. Notice, in looking at this interval, that zero is not in the interval. That's because we rejected the null hypothesis of equal means at the five percent level and this is a 95 percent confidence interval. If we had been unable to reject that null hypothesis at the five percent level, this interval would have included zero. So looking at a confidence interval, isn't the since another way to look at a hypothesis test? If the null hypothesis is not rejected, then the confidence interval on that parameter will include zero. So this is an alternate way to look at the results of the experiment and bet little bit more information. By the way, if you look at the computer output that we had earlier, both of these computer packages report a confidence interval. Minitab reports the confidence interval here and jump reports the confidence interval here. The signs are different on the lower and upper bound between these two computer outputs. Why is that? Well, that's because Jump did the calculations by subtracting the difference in means in a different order than Minitab did.