Chapter 23 Graphical View of the Data Dot Diagram, Fig. 2.1, pp. 24
Chapter 24 If you have a large sample, a histogram may be useful
Chapter 25 Box Plots, Fig. 2.3, pp. 26
Chapter 26 The Hypothesis Testing Framework Statistical hypothesis testing is a useful framework for many experimental situations Origins of the methodology date from the early 1900s We will use a procedure known as the two- sample t-test
Chapter 27 The Hypothesis Testing Framework Sampling from a normal distribution Statistical hypotheses:
Chapter 28 Estimation of Parameters
Chapter 29 Summary Statistics (pg. 36) Formulation 1 New recipe Formulation 2 Original recipe
Chapter 210 How the Two-Sample t-Test Works:
Chapter 211 How the Two-Sample t-Test Works:
Chapter 212 How the Two-Sample t-Test Works: Values of t 0 that are near zero are consistent with the null hypothesis Values of t 0 that are very different from zero are consistent with the alternative hypothesis t 0 is a distance measure-how far apart the averages are expressed in standard deviation units Notice the interpretation of t 0 as a signal-to-noise ratio
Chapter 213 The Two-Sample (Pooled) t-Test
Chapter 214 William Sealy Gosset (1876, 1937) Gosset's interest in barley cultivation led him to speculate that design of experiments should aim, not only at improving the average yield, but also at breeding varieties whose yield was insensitive (robust) to variation in soil and climate. Developed the t-test (1908) Gosset was a friend of both Karl Pearson and R.A. Fisher, an achievement, for each had a monumental ego and a loathing for the other. Gosset was a modest man who cut short an admirer with the comment that Fisher would have discovered it all anyway.
Chapter 215 The Two-Sample (Pooled) t-Test So far, we havent really done any statistics We need an objective basis for deciding how large the test statistic t 0 really is In 1908, W. S. Gosset derived the reference distribution for t 0 … called the t distribution Tables of the t distribution – see textbook appendix t 0 = -2.20
Chapter 216 The Two-Sample (Pooled) t-Test A value of t 0 between –2.101 and 2.101 is consistent with equality of means It is possible for the means to be equal and t 0 to exceed either 2.101 or –2.101, but it would be a rare event … leads to the conclusion that the means are different Could also use the P-value approach t 0 = -2.20
Chapter 217 The Two-Sample (Pooled) t-Test The P-value is the area (probability) in the tails of the t-distribution beyond -2.20 + the probability beyond +2.20 (its a two-sided test) The P-value is a measure of how unusual the value of the test statistic is given that the null hypothesis is true The P-value the risk of wrongly rejecting the null hypothesis of equal means (it measures rareness of the event) The P-value in our problem is P = 0.042 t 0 = -2.20
Chapter 218 Computer Two-Sample t-Test Results
Chapter 219 Checking Assumptions – The Normal Probability Plot
Chapter 220 Importance of the t-Test Provides an objective framework for simple comparative experiments Could be used to test all relevant hypotheses in a two-level factorial design, because all of these hypotheses involve the mean response at one side of the cube versus the mean response at the opposite side of the cube
Chapter 221 Confidence Intervals (See pg. 44) Hypothesis testing gives an objective statement concerning the difference in means, but it doesnt specify how different they are General form of a confidence interval The 100(1- α)% confidence interval on the difference in two means:
Chapter 223 A função t.test no R t.test(stats) Student's t-Test Description Performs one and two sample t-tests on vectors of data. Usage t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95,...)
Chapter 224 Argumentos da função t.test x - a (non-empty) numeric vector of data values. y -an optional (non-empty) numeric vector of data values. alternative - a character string specifying the alternative hypothesis, must be one of two.sided" (default), "greater" or "less". You can specify just the initial letter. mu - a number indicating the true value of the mean (or difference in means if you are performing a two sample test). paired - a logical indicating whether you want a paired t-test. var.equal - a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.
Chapter 225 Argumentos da função t.test conf.level - confidence level of the interval. formula - a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups. data - an optional matrix or data frame containing the variables in the formula. subset - an optional vector specifying a subset of observations to be used. na.action - a function which indicates what should happen when the data contain NA s. Defaults to getOption("na.action").
Chapter 226 Exemplo dos dados sobre cimento Arquivo em cimento.txt com nome das variáveis. Ler e realizar o teste t no R.
Chapter 227 Usando o R dados=read.table(m://aulas//flavia//cimento.txt,header=T) stripchart(dados,at=c(1,1.1)) boxplot(dados)
Chapter 228 t.test(dados$m,dados$u,alternative="two.sided",var.equal=T,paired=F,conf.level=.95) Two Sample t-test data: dados$m and dados$u t = -2.1869, df = 18, p-value = 0.0422 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.54507339 -0.01092661 sample estimates: mean of x mean of y 16.764 17.042
Chapter 229 Comparando as variâncias Dadas duas amostras independentes de duas distribuições normais, antes de realizar o teste t, para comparar as médias, é necessário verificar se é razoável ou não considerar variâncias iguais ou não, para saber se adotaremos o teste t pooled (combinado) ou se adotaremos uma aproximação para o número de graus de liberdade da distribuição amostral da estatística de teste, adotando uma aproximação e não a distribuição exata.
Chapter 230 Se as amostras provêm de fato de populações normais temos que a variância amostral a menos de constante tem distribuição de qui-quadrado com número de graus de liberdade n-1, em que n é o tamanho da amostra. Como as amostras são independentes, segue que a menos da constante, as duas variâncias amostrais são independentemente distribuídas segundo uma distribuição de qui-quadrado.
Chapter 231 Resumindo...
Chapter 232 Teste de igualdade das variâncias Sob a hipótese de que as variâncias são iguais, segue que a estatística de teste é dada pela razão das variâncias amostrais e, num teste bilateral de nível de significância α, rejeitaremos a hipótese nula se:
Chapter 233 No R está disponível a função var.test F test to compare two variances data: dados$m and dados$u F = 1.6293, num df = 9, denom df = 9, p-value = 0.4785 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.4046845 6.5593806 sample estimates: ratio of variances 1.629257 var.test(dados$m,dados$u,ratio=1,alternative="two.sided",conf.level=0.95)