Lecture 2: Review of Probability

Economics 326 — Introduction to Econometrics II

Author

Vadim Marmer, UBC

Randomness

  • Random experiment: an experiment the outcome of which cannot be predicted with certainty, even if the experiment is repeated under the same conditions.

  • Event: a collection of outcomes of a random experiment.

  • Probability: a function from events to [0,1] interval.

    • If \Omega is a collection of all possible outcomes, P(\Omega) = 1.
    • If A is an event, P(A) \geq 0.
    • If A_1, A_2, \ldots is a sequence of disjoint events, P(A_1 \text{ or } A_2 \text{ or } \ldots) = P(A_1) + P(A_2) + \ldots.

Random variable

  • Random variable: a numerical representation of a random experiment.

  • Coin-flipping example:

    Outcome X Y Z
    Heads 0 1 -1
    Tails 1 0 1
  • Rolling a dice

    Outcome X Y
    1 1 0
    2 2 1
    3 3 0
    4 4 1
    5 5 0
    6 6 1

Summation operator

  • Let \{x_i: i = 1, \ldots, n\} be a sequence of numbers. \sum_{i=1}^{n} x_i = x_1 + x_2 + \ldots + x_n.

  • For a constant c: \sum_{i=1}^{n} c = nc. \sum_{i=1}^{n} cx_i = c(x_1 + x_2 + \ldots + x_n) = c\sum_{i=1}^{n} x_i.

Summation operator (continued)

  • Let \{y_i: i = 1, \ldots, n\} be another sequence of numbers, and a, b be two constants: \sum_{i=1}^{n}(ax_i + by_i) = a\sum_{i=1}^{n} x_i + b\sum_{i=1}^{n} y_i.

  • But: \sum_{i=1}^{n} x_i y_i \neq \sum_{i=1}^{n} x_i \sum_{i=1}^{n} y_i. \sum_{i=1}^{n} \frac{x_i}{y_i} \neq \frac{\sum_{i=1}^{n} x_i}{\sum_{i=1}^{n} y_i}. \sum_{i=1}^{n} x_i^2 \neq \left(\sum_{i=1}^{n} x_i\right)^2.

Discrete random variables

We often distinguish between discrete and continuous random variables.

  • A discrete random variable takes on only a finite or countably infinite number of values.

  • The distribution of a discrete random variable is a list of all possible values and the probability that each value would occur:

    Value x_1 x_2 \ldots x_n
    Probability p_1 p_2 \ldots p_n

    Here p_i denotes the probability of a random variable X taking on value x_i: p_i = P(X = x_i) \text{ (Probability Mass Function (PMF)).} Each p_i is between 0 and 1, and \sum_{i=1}^{n} p_i = 1.

Example: Bernoulli distribution

  • Consider a single trial with two outcomes: “success” (with probability p) or “failure” (with probability 1-p).

  • Define the random variable: X = \begin{cases} 1 & \text{if success} \\ 0 & \text{if failure} \end{cases}

  • Then X follows a Bernoulli distribution: X \sim Bernoulli(p).

  • PMF: P(X = x) = p^x (1-p)^{1-x}, \quad x \in \{0, 1\}.

Discrete random variables (continued)

  • Indicator function: \mathbf{1}(x_i \leq x) = \begin{cases} 1 & \text{if } x_i \leq x \\ 0 & \text{if } x_i > x \end{cases}

  • Cumulative Distribution Function (CDF): F(x) = P(X \leq x) = \sum_i p_i \mathbf{1}(x_i \leq x).

  • F(x) is non-decreasing.

  • For discrete random variables, the CDF is a step function.

Example: CDF of Bernoulli(0.3)

F(x) = \begin{cases} 0 & \text{if } x < 0 \\ 1-p & \text{if } 0 \leq x < 1 \\ 1 & \text{if } x \geq 1 \end{cases}

Continuous random variable

  • A random variable is continuously distributed if the range of possible values it can take is uncountable infinite (for example, a real line).

  • A continuous random variable takes on any real value with zero probability.

  • For continuous random variables, the CDF is continuous and differentiable.

  • The derivative of the CDF is called the Probability Density Function (PDF): f(x) = \frac{dF(x)}{dx} \text{ and } F(x) = \int_{-\infty}^{x} f(u) du; \int_{-\infty}^{\infty} f(x) dx = 1.

  • Since F(x) is non-decreasing, f(x) \geq 0 for all x.

Example: Uniform distribution

  • A random variable X follows a Uniform distribution on [0, 1], written X \sim Uniform(0, 1), if it is equally likely to take any value in [0, 1].

  • PDF: f(x) = \begin{cases} 1 & \text{if } 0 \leq x \leq 1 \\ 0 & \text{otherwise} \end{cases}

  • CDF: F(x) = \begin{cases} 0 & \text{if } x < 0 \\ x & \text{if } 0 \leq x \leq 1 \\ 1 & \text{if } x > 1 \end{cases}

Joint distribution (discrete)

  • Two random variables X, Y

    y_1 y_2 \cdots y_m Marginal
    x_1 p_{11} p_{12} \cdots p_{1m} p_1^X=\sum_{j=1}^mp_{1j}
    x_2 p_{21} p_{22} \cdots p_{2m} p_2^X=\sum_{j=1}^mp_{2j}
    \vdots \vdots \vdots \vdots \vdots \vdots
    x_n p_{n1} p_{n2} \cdots p_{nm} p_n^X=\sum_{j=1}^mp_{nj}

    Joint PMF: p_{ij} = P(X = x_i, Y = y_j).

    Marginal PMF: p_i^X = P(X = x_i) = \sum_{j=1}^{m} p_{ij}.

  • Conditional Distribution: If P(X = x_1) \neq 0, p_j^{Y|X=x_1} = P(Y = y_j | X = x_1) = \frac{P(Y = y_j, X = x_1)}{P(X = x_1)} = \frac{p_{1,j}}{p_1^X}

Joint distribution (continuous)

  • Joint PDF: f_{X,Y}(x, y) and \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x, y) dx dy = 1.

  • Marginal PDF: f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x, y) dy.

  • Conditional PDF: f_{Y|X=x}(y|x) = f_{X,Y}(x, y) / f_X(x).

Independence

  • Two (discrete) random variables are independent if for all x, y: P(X = x, Y = y) = P(X = x) P(Y = y).

  • If independent: P(Y = y | X = x) = \frac{P(X = x, Y = y)}{P(X = x)} = P(Y = y).

  • Two continuous random variables are independent if for all x, y: f_{X,Y}(x, y) = f_X(x) f_Y(y).

  • If independent, f_{Y|X}(y|x) = f_Y(y) for all x.

Expected value

  • Let g be some function: Eg(X) = \sum_i g(x_i) p_i \text{ (discrete).} Eg(X) = \int g(x) f(x) dx \text{ (continuous).} Expectation is a transformation of a distribution (PMF or PDF) and is a constant!

  • Mean (center of a distribution): EX = \sum_i x_i p_i \text{ or } EX = \int x f(x) dx.

  • Variance (spread of a distribution): Var(X) = E(X - EX)^2 Var(X) = \sum_i (x_i - EX)^2 p_i \text{ or } Var(X) = \int (x - EX)^2 f(x) dx.

  • Standard deviation: \sqrt{Var(X)}.

Example: Bernoulli distribution (continued)

  • Recall: X \sim Bernoulli(p) takes values \{0, 1\} with P(X=1) = p and P(X=0) = 1-p.

  • Mean: E(X) = 0 \cdot (1-p) + 1 \cdot p = p.

  • Variance: E(X^2) = 0^2 \cdot (1-p) + 1^2 \cdot p = p. Var(X) = E(X^2) - (EX)^2 = p - p^2 = p(1-p).

Example: Uniform distribution (continued)

  • Recall: X \sim Uniform(0, 1) has PDF f(x) = 1 for x \in [0, 1].

  • Mean: E(X) = \int_0^1 x \cdot 1 \, dx = \left. \frac{x^2}{2} \right|_0^1 = \frac{1}{2}.

  • Variance: E(X^2) = \int_0^1 x^2 \cdot 1 \, dx = \left. \frac{x^3}{3} \right|_0^1 = \frac{1}{3}. Var(X) = E(X^2) - (EX)^2 = \frac{1}{3} - \left(\frac{1}{2}\right)^2 = \frac{1}{3} - \frac{1}{4} = \frac{1}{12}.

Properties

  • If c is a constant, Ec = c, and Var(c) = E(c - Ec)^2 = (c - c)^2 = 0.

  • Linearity: E(a + bX) = \sum_i (a + bx_i) p_i = a \sum_i p_i + b \sum_i x_i p_i = a + bEX.

  • Re-centering: a random variable X - EX has mean zero: E(X - EX) = EX - E(EX) = EX - EX = 0.

Properties (continued)

  • Variance formula: Var(X) = EX^2 - (EX)^2 \begin{align*} Var(X) &= E(X - EX)^2 \\ &= E[(X - EX)(X - EX)] \\ &= E[(X - EX)X - (X - EX) \cdot EX] \\ &= E[(X - EX)X] - E[(X - EX) \cdot EX] \\ &= E[X^2 - X \cdot EX] - EX \cdot E(X - EX) \\ &= EX^2 - EX \cdot EX - EX \cdot 0\\ & = EX^2 - (EX)^2 \end{align*}

  • If EX = 0 then Var(X) = EX^2.

Properties (continued)

  • Var(a + bX) = b^2 Var(X) \begin{align*} Var(a + bX) &= E[(a + bX) - E(a + bX)]^2\\ & = E[a + bX - a - bEX]^2 \\ &= E[bX - bEX]^2 = E[b^2(X - EX)^2] \\ &= b^2 E(X - EX)^2 \\ &= b^2 Var(X). \end{align*}

  • Re-scaling: Let Var(X) = \sigma^2, so the standard deviation is \sigma: Var\left(\frac{X}{\sigma}\right) = \frac{1}{\sigma^2} Var(X) = 1.

Covariance

  • Covariance: Let X, Y be two random variables. Cov(X, Y) = E[(X - EX)(Y - EY)]. Cov(X, Y) = \sum_i \sum_j (x_i - EX)(y_j - EY) \cdot P(X = x_i, Y = y_j). Cov(X, Y) = \int \int (x - EX)(y - EY) f_{X,Y}(x, y) dx dy.

  • Cov(X, Y) = E(XY) - E(X) E(Y). Cov(X, Y) = E[(X - EX)(Y - EY)] = E(XY) - EX \cdot EY.

Properties of covariance

  • Cov(X, c) = 0.

  • Cov(X, X) = Var(X).

  • Cov(X, Y) = Cov(Y, X).

  • Cov(X, Y + Z) = Cov(X, Y) + Cov(X, Z).

  • Cov(a_1 + b_1 X, a_2 + b_2 Y) = b_1 b_2 Cov(X, Y).

  • If X and Y are independent then Cov(X, Y) = 0.

  • Var(X \pm Y) = Var(X) + Var(Y) \pm 2Cov(X, Y).

Correlation

  • Correlation coefficient: Corr(X, Y) = \frac{Cov(X, Y)}{\sqrt{Var(X) Var(Y)}}.

  • Cauchy-Schwartz inequality: |Cov(X, Y)| \leq \sqrt{Var(X) Var(Y)} and therefore -1 \leq Corr(X, Y) \leq 1.

  • Corr(X, Y) = \pm 1 \Leftrightarrow Y = a + bX.

Conditional expectation

  • Suppose you know that X = x. You can update your expectation of Y by conditional expectation: E(Y | X = x) = \sum_i y_i P(Y = y_i | X = x) \text{ (discrete)} E(Y | X = x) = \int y f_{Y|X}(y|x) dy \text{ (continuous).}

  • E(Y | X = x) is a constant.

  • E(Y | X) is a function of X and is a random variable and a function of X (Uncertainty about X has not been realized yet): E(Y | X) = \sum_i y_i P(Y = y_i | X) = g(X) E(Y | X) = \int y f_{Y|X}(y|X) dy = g(X), for some function g that depends on PMF (PDF).

Properties of conditional expectation

  • Conditional expectations satisfies all properties of unconditional expectation.

  • Once you condition on X, you can treat any function of X as a constant: E(h_1(X) + h_2(X) Y | X) = h_1(X) + h_2(X) E(Y | X), for any functions h_1 and h_2.

  • Law of Iterated Expectation (LIE): E[E(Y | X)] = E(Y), E(E(Y | X, Z) | X) = E(Y | X).

  • Conditional variance: Var(Y | X) = E[(Y - E(Y | X))^2 | X].

  • Mean independence: E(Y | X) = E(Y) = \text{constant.}

Relationship between different concepts of independence

\begin{array}{c} X \text{ and } Y \text{ are independent} \\ \Downarrow \\ E(Y | X) = \text{constant (mean independence)} \\ \Downarrow \\ Cov(X, Y) = 0 \text{ (uncorrelatedness)} \end{array}

Normal distribution

  • A normal rv is a continuous rv that can take on any value. The PDF of a normal rv X is f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right), \text{ where} \mu = EX \text{ and } \sigma^2 = Var(X). We usually write X \sim N(\mu, \sigma^2).

  • If X \sim N(\mu, \sigma^2), then a + bX \sim N(a + b\mu, b^2\sigma^2).

Standard Normal distribution

  • Standard Normal rv has \mu = 0 and \sigma^2 = 1. Its PDF is \phi(z) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{z^2}{2}\right).

  • Symmetric around zero (mean): if Z \sim N(0, 1), P(Z > z) = P(Z < -z).

  • Thin tails: P(-1.96 \leq Z \leq 1.96) = 0.95.

  • If X \sim N(\mu, \sigma^2), then (X - \mu)/\sigma \sim N(0, 1).

Bivariate Normal distribution

  • X and Y have a bivariate normal distribution if their joint PDF is given by: f(x, y) = \frac{1}{2\pi\sqrt{(1-\rho)^2 \sigma_X^2 \sigma_Y^2}} \exp\left[-\frac{Q}{2(1-\rho)^2}\right], where Q = \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y},

    \mu_X = E(X), \mu_Y = E(Y), \sigma_X^2 = Var(X), \sigma_Y^2 = Var(Y), and \rho = Corr(X, Y).

Properties of Bivariate Normal distribution

If X and Y have a bivariate normal distribution:

  • a + bX + cY \sim N(\mu^*, (\sigma^*)^2), where \mu^* = a + b\mu_X + c\mu_Y, \quad (\sigma^*)^2 = b^2\sigma_X^2 + c^2\sigma_Y^2 + 2bc\rho\sigma_X\sigma_Y.

  • Cov(X, Y) = 0 \Longrightarrow X and Y are independent.

  • E(Y | X) = \mu_Y + \frac{Cov(X, Y)}{\sigma_X^2}(X - \mu_X).

  • Can be generalized to more than 2 variables (multivariate normal).

Appendix: The Cauchy-Schwartz Inequality

  • Claim: |Cov(X, Y)| \leq \sqrt{Var(X) Var(Y)}.

  • Proof: Define U = Y - \beta X, where \beta = \frac{Cov(X, Y)}{Var(X)},

    • Note that \beta is a constant!
    • Also note the connection to regression and OLS in the definition of \beta.
  • Since variances are always non-negative:

    \begin{alignat*}{2} 0 & \leq Var(U) &&\\ & = Var(Y - \beta X) &&\quad (\text{def. of } U)\\ & = Var(Y) + Var(\beta X) - 2Cov(Y, \beta X) &&\quad (\text{prop. of var.})\\ & = Var(Y) + \beta^2 Var(X) - 2\beta Cov(X, Y) &&\quad (\text{prop. of var., cov.})\\ & = Var(Y) + \underbrace{\left(\frac{Cov(X, Y)}{Var(X)}\right)^2}_{=\beta^2} Var(X) &&\\ & \qquad - 2 \underbrace{\left(\frac{Cov(X, Y)}{Var(X)} \right)}_{=\beta}Cov(X, Y) &&\quad (\text{def. of } \beta)\\ & = Var(Y) + \frac{Cov(X, Y)^2}{Var(X)} - 2 \frac{Cov(X, Y)^2}{Var(X)} &&\\ & = Var(Y) - \frac{Cov(X, Y)^2}{Var(X)}. && \end{alignat*}

  • Rearranging: Cov(X, Y)^2 \leq Var(X) Var(Y)
  • or |Cov(X, Y)| \leq \sqrt{Var(X) Var(Y)}.