In this section we will introduce certain transformations of random variables for which the expected value of the transformation is the transformation of the expected value. We will also study variance of certain transformations of random variables.
A linear rescaling is a transformation of the form \(g(u) = a + bu\) . Recall that in Section 3.8.1 we observed, via simulation, that
Formally, if \(X\) is a random variable and \(a, b\) are non-random constants then
\[\begin \textrm(aX + b) & = a\textrm(X) + b\\ \textrm(aX + b) & = |a|\textrm(X)\\ \textrm(aX + b) & = a^2\textrm(X) \end\]
Example 5.28 Spin the Uniform(1, 4) spinner twice and let \(U_1\) be the first spin, \(U_2\) the second, and \(X = U_1 + U_2\) the sum.
In the previous example, the values \(U_1\) and \(U_2\) came from separate spins so they were unrelated. What about the expected value of \(X+Y\) when \(X\) and \(Y\) are correlated?
Example 5.29 Recall the Colab activity where you simulated pairs of SAT Math ( \(X\) ) and Reading ( \(Y\) ) scores from Bivariate Normal distributions with different correlations. You considered the distribution of the sum \(T=X+Y\) and difference \(D= X - Y\) . Did changing the correlation affect the distribution of \(T\) of \(D\) ? Did changing the correction affect the expected value of \(T\) ? Of \(D\) ?
You should have observed that, yes, changing the correlation affected the distribution of \(T\) and \(D\) mainly by changing the degree of variability. However, you should have also observed that the expected value of \(T\) did not change as the correlation changed (after accounting for simulation margin of error). Similarly, the expected value of \(D\) did not change as the correlation changed.
Linearity of expected value. For any two random variables \(X\) and \(Y\) . \[\begin \textrm(X + Y) & = \textrm(X) + \textrm(Y) \end\] That is, the expected value of the sum is the sum of expected values, regardless of how the random variables are related. Therefore, you only need to know the marginal distributions of \(X\) and \(Y\) to find the expected value of their sum. (But keep in mind that the distribution of \(X+Y\) will depend on the joint distribution of \(X\) and \(Y\) .)
Linearity of expected value follows from simple arithmetic properties of numbers. Whether in the short run or the long run, \[\begin \text & = \text + \text \end\] regardless of the joint distribution of \(X\) and \(Y\) . For example, for the two \((X, Y)\) pairs (4, 3) and (2, 1) \[ \text = \frac = \frac + \frac = \text + \text. \]
A linear combination of two random variables \(X\) and \(Y\) is of the form \(aX + bY\) where \(a\) and \(b\) are non-random constant. Combining properties of linear rescaling with linearity of expected value yields the expected value of a linear combination \[ \textrm(aX + bY) = a\textrm(X)+b\textrm(Y) \] For example, \(\textrm(X - Y) = \textrm(X) - \textrm(Y)\) . The left side above represents the “long way”: find the distribution of \(aX + bY\) , which will depend on the joint distribution of \(X\) and \(Y\) , and then use the definition of expected value. The right side is the “short way”: find the expected values of \(X\) and \(Y\) , which only requires their marginal distributions, and plug those numbers into the transformation formula. Similar to LOTUS, linearity of expected value provides a way to find the expected value of certain random variables without first finding the distribution of the random variables.
Linearity of expected value extends naturally to more than two random variables.
Example 5.30 Recall the matching problem in Example 5.1. We showed that the expected value of the number of matches \(Y\) is \(\textrm(Y)=1\) when \(n=4\) . Now consider a general \(n\) : there are \(n\) rocks that are shuffled and placed uniformly at random in \(n\) spots with one rock per spot. Let \(Y\) be the number of matches. Can you find a general formula for \(\textrm(Y)\) ?
The answer to the previous problem is not an approximation: the expected value of the number of matches is equal to 1 for any \(n\) . We think that’s pretty amazing. (We’ll see some even more amazing results for this problem LATER.) Notice that we computed the expected value without first finding the distribution of \(Y\) .
Intuitively, if the rocks are placed in the spots uniformly at random, then the probability that rock \(i\) is placed in the correct spot should be the same for all the rocks, \(1/n\) . But you might have said: “but if rock 1 goes in spot 1, there are only \(n-1\) rocks that can go in spot 2, so the probability that rock 2 goes in spot to is \(1/(n-1)\) ”. That is true if rock 1 goes in spot 1. However, when computing the marginal probability that rock 2 goes in spot 2, we don’t know whether rock 1 went in spot 1 or not, so the probability needs to account for both cases. There is a difference between marginal/unconditional probability and conditional probability, which we will discuss in more detail LATER.
When a problem asks “find the expected number of…” it’s a good idea to try using indicator random variables and linearity of expected value.
Let \(A_1, A_2, \ldots, A_n\) be a collection of \(n\) events. Suppose event \(i\) occurs with marginal probability \(p_i\) . Let \(N = \textrm_ + \textrm_ + \cdots + \textrm_\) be the random variable which counts the number of the events in the collection which occur. Then the expected number of events that occur is the sum of the event probabilities. \[ \textrm(N) = \sum_^n p_i. \] If each event has the same probability, \(p_i \equiv p\) , then \(\textrm(N)\) is equal to \(np\) . These formulas for the expected number of events are true regardless of whether there is any association between the events (that is, regardless of whether the events are independent.)
Example 5.31 Kids wake up during the night. On any given night,
If any kids wakes up they’re likely to wake other kids up too. Find the expected number of kids that wake up on any given night.
Simply add the probabilities: \(1/14 + 2/7 + 1/30+ 1/2 + 6/7=1.75\) . The expected number of kids to wake up in a night is 1.75. Over many nights, on average 1.75 kids wake up per night.
The fact that kids wake each other up implies that the events are not independent, but this is irrelevant here. Because of linearity of expected value, we only need to know the marginal probability 81 of each event (provided) in order to determine the expected number of events occur. (The distribution of the number of kids that wake up would depend the relationships between the events, but not the long run average value.)
Example 5.32 Consider a RV \(X\) with \(\textrm(X)=1\) . What is \(\textrm(2X)\) ?
Walt is correctly using properties of linear rescaling. Jesse is assuming that a variance of a sum is the sum of the variances, which is not true in general. We’ll see why below.
When two variables are correlated the degree of the association will affect the variability of linear combinations of the two variables.
Example 5.33 Recall the Colab activity where you simulated pairs of SAT Math ( \(X\) ) and Reading ( \(Y\) ) scores from Bivariate Normal distributions with different correlations. (See also Section 3.9.) You considered the distribution of the sum \(T=X+Y\) and difference \(D= X - Y\) . Did changing the correction affect the variance of \(T\) ? Of \(D\) ?
Variance of sums and differences of random variables. \[\begin \textrm(X + Y) & = \textrm(X) + \textrm(Y) + 2\textrm(X, Y)\\ \textrm(X - Y) & = \textrm(X) + \textrm(Y) - 2\textrm(X, Y) \end\]
Example 5.34 Assume that SAT Math ( \(X\) ) and Reading ( \(Y\) ) follow a Bivariate Normal distribution, Math scores have mean 527 and standard deviation 107, and Reading scores have mean 533 and standard deviation 100. Compute \(\textrm(X + Y)\) and \(\textrm(X+Y)\) for each of the following correlations.
If \(X\) and \(Y\) have a positive correlation:
If \(X\) and \(Y\) have a negative correlation:
The variance of the sum is the sum of the variances if and only if \(X\) and \(Y\) are uncorrelated. \[\begin \textrm(X+Y) & = \textrm(X) + \textrm(Y)\qquad \text\\ \textrm(X-Y) & = \textrm(X) + \textrm(Y)\qquad \text \end\]
The formulas for variance of sums and differences are application of several more general properties of covariance. Let \(X,Y,U,V\) be random variables and \(a,b,c,d\) be non-random constants.
The last two properties together are called bilinearity of covariance. These properties extend natural to sums involving more than two random variables. To compute the covariance between two sums of random variables, compute the covariance between each component random variable in the first sum and each component random variable in the second sum, and sum these covariances.
Example 5.35 Let \(X\) be the number of two-point field goals a basketball player makes in a game, \(Y\) the number of three point field goals made, and \(Z\) the number of free throws made (worth one point each). Assume \(X\) , \(Y\) , \(Z\) have standard deviations of 2.5, 3.7, 1.8, respectively, and \(\textrm(X,Y) = 0.1\) , \(\textrm(X, Z) = 0.3\) , \(\textrm(Y,Z) = -0.5\) .