Lecture 8: Hypothesis testing (Part 1)

Economics 326 — Econometrics II

Author

Vadim Marmer, UBC

Hypothesis testing

  • Hypothesis testing is one of the fundamental problems in statistics.

  • A hypothesis is (usually) an assertion about the unknown population parameters such as \beta _{1} in Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i}.

  • Using the data, the econometrician has to determine whether an assertion is true or false.

  • Example: Phillips curve: \text{Unemployment}_{t}=\beta _{0}+\beta _{1}\text{Inflation}_{t}+U_{t}.

    In this example, we are interested in testing if \beta _{1}=0 (no Phillips curve) against \beta _{1}<0 (Phillips curve).

Null and alternative hypotheses

  • Usually, we have two competing hypotheses, and we want to draw a conclusion, based on the data, as to which of the hypotheses is true.

  • Null hypothesis, denoted as H_{0}: A hypothesis that is held to be true, unless the data provides sufficient evidence against it.

  • Alternative hypothesis, denoted as H_{1}: A hypothesis against which the null is tested. It is held to be true if the null is found false.

  • Usually, the econometrician has to carry the “burden of proof,” and the case that he is interested in is stated as H_{1}.

  • The econometrician has to prove that his assertion (H_{1}) is true by showing that the data rejects H_{0}.

  • The two hypotheses must be disjoint: it should be the case that either H_{0} is true or H_{1} but never together simultaneously.

Decision rule

  • The econometrician has to choose between H_{0} and H_{1}.

  • The decision rule that leads the econometrician to reject or not to reject H_{0} is based on a test statistic, which is a function of the data \left\{ \left( Y_{i},X_{i}\right) :i=1,\ldots n\right\} .

  • Usually, one rejects H_{0} if the test statistic falls into a critical region. A critical region is constructed by taking into account the probability of making a wrong decision.

Errors

  • There are two types of errors that the econometrician can make:

    Truth: H_0 Truth: H_1
    Decision: H_0 \checkmark Type II error
    Decision: H_1 Type I error \checkmark
  • Type I error is the error of rejecting H_{0} when H_{0} is true.

  • The probability of Type I error is denoted by \alpha and called significance level or size of a test: P\left( \text{Type I error}\right) =P\left( \text{reject }H_{0}|H_{0}\text{ is true}\right) =\alpha .

  • Type II error is the error of not rejecting H_{0} when H_{1} is true.

  • Power of a test: 1-P\left( \text{Type II error}\right) =1-P\left( \text{Accept }H_{0}|H_{0}\text{ is false}\right) .

Errors

  • The decision rule depends on a test statistic T.

  • The real line is split into two regions: acceptance region and rejection region (critical region).

  • When T is in the acceptance region, we accept H_{0} (and risk making a Type II error).

  • When T is in the rejection (critical) region, we reject H_{0} (and risk making a Type I error).

  • Unfortunately, the probabilities of Type I and II errors are inversely related. By decreasing the probability of Type I error \alpha , one makes the critical region smaller, which increases the probability of the Type II error. Thus it is impossible to make both errors arbitrarily small.

  • By convention, \alpha is chosen to be a small number, for example, \alpha =0.01,0.05, or 0.10. (This is in agreement with the econometrician carrying the burden of proof).

Steps

  • The following are the steps of the hypothesis testing:

    1. Specify H_{0} and H_{1}.

    2. Choose the significance level \alpha .

    3. Define a decision rule (critical region).

    4. Perform the test using the data: given the data compute the test statistic and see if it falls into the critical region.

  • The decision depends on the significance level \alpha: larger values of \alpha correspond to bigger critical regions (probability of Type I error is larger).

  • It is easier to reject the null for larger values of \alpha.

  • p-value: Given the data, the smallest significance level at which the null can be rejected.

Assumptions

  • \beta _{1} is unknown, and we have to rely on its OLS estimator \hat{\beta}_{1}.

  • We need to know the distribution of \hat{\beta}_{1} or of its certain functions.

  • We will assume that the assumptions of the Normal Classical Linear Regression model are satisfied:

    1. Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i},\qquad i=1,\ldots ,n.

    2. \mathrm{E}\left[U_{i}\mid X_{1},\ldots ,X_{n}\right] =0 for all i’s.

    3. \mathrm{E}\left[U_{i}^{2}\mid X_{1},\ldots ,X_{n}\right] =\sigma ^{2} for all i’s.

    4. \mathrm{E}\left[U_{i}U_{j}\mid X_{1},\ldots ,X_{n}\right] =0 for all i\neq j.

    5. U’s are jointly normally distributed conditional on X’s.

  • To derive the distribution of \hat{\beta}_{1}, we need to understand the normal distribution.

Normal distribution

  • A normal rv is a continuous rv that can take on any value. The PDF of a normal rv X is f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right), \text{ where} \mu = \mathrm{E}\left[X\right] \text{ and } \sigma^2 = \mathrm{Var}\left(X\right). We usually write X \sim N(\mu, \sigma^2).

  • If X \sim N(\mu, \sigma^2), then a + bX \sim N(a + b\mu, b^2\sigma^2).

Standard normal distribution

  • Standard normal rv has \mu = 0 and \sigma^2 = 1. Its PDF is \phi(z) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{z^2}{2}\right).

  • Symmetric around zero (mean): if Z \sim N(0, 1), P(Z > z) = P(Z < -z).

  • Thin tails: P(-1.96 \leq Z \leq 1.96) = 0.95.

  • If X \sim N(\mu, \sigma^2), then (X - \mu)/\sigma \sim N(0, 1).

Bivariate normal distribution

  • X and Y have a bivariate normal distribution if their joint PDF is given by: f(x, y) = \frac{1}{2\pi\sqrt{(1-\rho^2) \sigma_X^2 \sigma_Y^2}} \exp\left[-\frac{Q}{2(1-\rho^2)}\right], where Q = \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y},

    \mu_X = \mathrm{E}\left[X\right], \mu_Y = \mathrm{E}\left[Y\right], \sigma_X^2 = \mathrm{Var}\left(X\right), \sigma_Y^2 = \mathrm{Var}\left(Y\right), and \rho = \mathrm{Corr}(X, Y).

Properties of bivariate normal

If X and Y have a bivariate normal distribution:

  • a + bX + cY \sim N(\mu^*, (\sigma^*)^2), where \mu^* = a + b\mu_X + c\mu_Y, \quad (\sigma^*)^2 = b^2\sigma_X^2 + c^2\sigma_Y^2 + 2bc\rho\sigma_X\sigma_Y.

  • \mathrm{Cov}\left(X, Y\right) = 0 \Longrightarrow X and Y are independent.

  • Can be generalized to more than 2 variables (multivariate normal).

Normality of the OLS estimator

  • Assume that U_{i}’s are jointly normally distributed conditional on X’s (Assumption 5).

  • Then Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i} are also jointly normally distributed conditional on X’s.

  • Since \hat{\beta}_{1}=\sum_{i=1}^{n}w_{i}Y_{i}, where w_{i}=\frac{X_{i}-\bar{X}}{\sum_{l=1}^{n}\left( X_{l}-\bar{X}\right) ^{2}} depend only on the X_i’s, \hat{\beta}_{1} is also normally distributed conditional on the X_i’s.

  • Conditionally on \mathbf{X}: \hat{\beta}_{1}\sim N\left( \beta _{1},\mathrm{Var}\left(\hat{\beta}_{1}\right) \right) \text{, where }\mathrm{Var}\left(\hat{\beta}_{1}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}.

Two-sided tests

  • For Y_{i}=\beta _{0}+\beta _{1}X_{i}+U_{i}, consider testing H_{0}:\beta _{1}=\beta _{1,0}, against H_{1}:\beta _{1}\neq \beta _{1,0}.

  • \beta _{1} is the true unknown value of the slope parameter.

  • \beta _{1,0} is a known number specified by the econometrician. (For example \beta _{1,0} is zero if you want to test \beta _{1}=0).

  • Such a test is called two-sided because the alternative hypothesis H_{1} does not specify in which direction \beta _{1} can deviate from the asserted value \beta _{1,0}.

Two-sided tests when \sigma^2 is known (infeasible test)

  • Suppose for a moment that \sigma ^{2} is known.

  • Consider the following test statistic: T=\frac{\hat{\beta}_{1}-\beta _{1,0}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }},\text{ where }\mathrm{Var}\left(\hat{\beta}_{1}\right) =\frac{\sigma ^{2}}{\sum_{i=1}^{n}\left( X_{i}-\bar{X}\right) ^{2}}.

  • Consider the following decision rule (test): \text{Reject }H_{0}:\beta _{1}=\beta _{1,0}\text{ when }\left\vert T\right\vert >z_{1-\alpha /2}, where z_{1-\alpha /2} is the \left( 1-\alpha /2\right) quantile of the standard normal distribution (critical value).

Test validity and power

  • We need to establish that:

    1. The test is valid, where the validity of a test means that it has correct size or P\left( \text{Type I error}\right) =\alpha: P\left( \left\vert T\right\vert >z_{1-\alpha /2}|\beta _{1}=\beta _{1,0}\right) =\alpha .

    2. The test has power: when \beta _{1}\neq \beta _{1,0} (H_{0} is false), the test rejects H_{0} with probability that exceeds \alpha: P\left( \left\vert T\right\vert >z_{1-\alpha /2}|\beta _{1}\neq \beta _{1,0}\right) >\alpha .

  • We want P\left( \left\vert T\right\vert >z_{1-\alpha /2}|\beta _{1}\neq \beta _{1,0}\right) to be as large as possible.

  • Note that P\left( \left\vert T\right\vert >z_{1-\alpha /2}|\beta _{1}\neq \beta _{1,0}\right) depends on the true value \beta _{1}.

The distribution of T when \sigma^2 is known (infeasible test)

  • Write \begin{aligned} T &=\frac{\hat{\beta}_{1}-\beta _{1,0}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}=\frac{\hat{\beta}_{1}-\beta _{1}+\beta _{1}-\beta _{1,0}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }} \\ &=\frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}+\frac{\beta _{1}-\beta _{1,0}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}. \end{aligned}

  • Under our assumptions and conditionally on X’s: \hat{\beta}_{1}\sim N\left( \beta _{1},\mathrm{Var}\left(\hat{\beta}_{1}\right) \right) ,\text{ or }\frac{\hat{\beta}_{1}-\beta _{1}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }}\sim N\left( 0,1\right) .

  • We have that conditionally on X’s: T\sim N\left( \frac{\beta _{1}-\beta _{1,0}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }},1\right) .

Size when \sigma^2 is known (infeasible test)

  • We have that T\sim N\left( \frac{\beta _{1}-\beta _{1,0}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }},1\right) .

  • When H_{0}:\beta _{1}=\beta _{1,0} is true, T\overset{H_{0}}{\sim }N\left( 0,1\right).

  • We reject H_{0} when \left\vert T\right\vert >z_{1-\alpha /2}\Leftrightarrow T>z_{1-\alpha /2}\text{ or }T<-z_{1-\alpha /2}.

  • Let Z\sim N\left( 0,1\right) . \begin{aligned} P\left( \text{Reject }H_{0}|H_{0}\text{ is true}\right) &=P\left( Z>z_{1-\alpha /2}\right) +P\left( Z<-z_{1-\alpha /2}\right) \\ &=\alpha /2+\alpha /2=\alpha \end{aligned}

The distribution of T when \sigma^2 is known (infeasible test)

Power of the test when \sigma^2 is known (infeasible test)

  • Under H_{1}, \beta _{1}-\beta _{1,0}\neq 0 and the distribution of T is not centered zero: T\sim N\left( \frac{\beta _{1}-\beta _{1,0}}{\sqrt{\mathrm{Var}\left(\hat{\beta}_{1}\right) }},1\right) .

  • When \beta _{1}-\beta _{1,0}>0:

  • Rejection probability exceeds \alpha under H_{1}: power increases with the distance from H_{0} (\left\vert \beta _{1,0}-\beta _{1}\right\vert) and decreases with \mathrm{Var}\left(\hat{\beta}_{1}\right).