11 Continuous Random Variables
11.1 Objectives
Define and properly use in context all new terminology, to include: probability density function (pdf) and cumulative distribution function (cdf) for continuous random variables.
Given a continuous random variable, find probabilities using the pdf and/or the cdf.
Find the mean and variance of a continuous random variable.
11.2 Continuous random variables
In the last chapter, we introduced random variables, and explored discrete random variables. In this chapter, we will move into continuous random variables, their properties, their distribution functions, and how they differ from discrete random variables.
Recall that a continuous random variable has a domain that is a continuous interval (or possibly a group of intervals). For example, let \(Y\) be the random variable corresponding to the height of a randomly selected individual. While our measurement will necessitate “discretizing” height to some degree, technically, height is a continuous random variable since a person could measure 67.3 inches or 67.4 inches or anything in between.
11.2.1 Continuous distribution functions
So how do we describe the randomness of continuous random variables? In the case of discrete random variables, the probability mass function (pmf) and the cumulative distribution function (cdf) are used to describe randomness. However, recall that the pmf is a function that returns the probability that the random variable takes the inputted value. Due to the nature of continuous random variables, the probability that a continuous random variable takes on any one individual value is technically 0. Thus, a pmf cannot apply to a continuous random variable.
Rather, we describe the randomness of continuous random variables with the probability density function (pdf) and the cumulative distribution function (cdf). Note that the cdf has the same interpretation and application as in the discrete case.
11.2.2 Probability density function
Let \(X\) be a continuous random variable. The probability density function (pdf) of \(X\), given by \(f_X(x)\) is a function that describes the behavior of \(X\). It is important to note that in the continuous case, \(f_X(x)\neq \mbox{P}(X=x)\), as the probability of \(X\) taking any one individual value is 0.
The pdf is a function. The input of a pdf is any real number. The output is known as the density. The pdf has three main properties:
\(f_X(x)\geq 0\)
\(\int_{S_X} f_X(x)\mathop{}\!\mathrm{d}x = 1\)
\(\mbox{P}(X\in A)=\int_{x\in A} f_X(x)\mathop{}\!\mathrm{d}x\) or another way to write this \(\mbox{P}(a \leq X \leq b)=\int_{a}^{b} f_X(x)\mathop{}\!\mathrm{d}x\)
Properties 2) and 3) imply that the area underneath a pdf represents probability. The pdf is a non-negative function, it cannot have negative values.
11.2.3 Cumulative distribution function
The cumulative distribution function (cdf) of a continuous random variable has the same interpretation as it does for a discrete random variable. It is a function. The input of a cdf is any real number, and the output is the probability that the random variable takes a value less than or equal to the inputted value. It is denoted as \(F\) and is given by: \[ F_X(x)=\mbox{P}(X\leq x)=\int_{-\infty}^x f_x(t) \mathop{}\!\mathrm{d}t \]
Example:
Let \(X\) be a continuous random variable with \(f_X(x)=2x\) where \(0 \leq x \leq 1\). Verify that \(f\) is a valid pdf. Find the cdf of \(X\). Also, find the following probabilities: \(\mbox{P}(X<0.5)\), \(\mbox{P}(X>0.5)\), and \(\mbox{P}(0.1\leq X < 0.75)\). Finally, find the median of \(X\).
To verify that \(f\) is a valid pdf, we simply note that \(f_X(x) \geq 0\) on the range \(0 \leq x \leq 1\). Also, we note that \(\int_0^1 2x \mathop{}\!\mathrm{d}x = x^2\bigg|_0^1 = 1\).
Using R
, we find
integrate(function(x)2*x, 0, 1)
## 1 with absolute error < 1.1e-14
Or we can use the mosaicCalc package to find the anti-derivative. If the package is not installed, you can use the Packages
tab in RStudio
or type install.packages("mosaicCalc")
at the command prompt. Load the library.
(Fx <- antiD(2*x ~ x))
## function (x, C = 0)
## x^2 + C
Fx(1) - Fx(0)
## [1] 1
Graphically, the pdf is displayed in Figure 11.1:
The cdf of \(X\) is found by
\[
\int_0^x 2t \mathop{}\!\mathrm{d}t = t^2\bigg|_0^x = x^2
\]
This is antiD
found from the calculations above.
So, \[ F_X(x)=\left\{ \begin{array}{ll} 0, & x<0 \\ x^2, & 0\leq x \leq 1 \\ 1, & x>1 \end{array}\right. \]
The plot of the cdf of \(X\) is shown in Figure 11.2.
Probabilities are found either by integrating the pdf or using the cdf:
\(\mbox{P}(X < 0.5)=\mbox{P}(X\leq 0.5)=F_X(0.5)=0.5^2=0.25\). See Figure 11.3.
\(\mbox{P}(X > 0.5) = 1-\mbox{P}(X\leq 0.5)=1-0.25 = 0.75\) See Figure 11.4.
\(\mbox{P}(0.1\leq X < 0.75) = \int_{0.1}^{0.75}2x\mathop{}\!\mathrm{d}x = 0.75^2 - 0.1^2 = 0.5525\) See Figure 11.5.
integrate(function(x)2*x, 0.1, 0.75)
## 0.5525 with absolute error < 6.1e-15
Alternatively, \(\mbox{P}(0.1\leq X < 0.75) = \mbox{P}(X < 0.75) -\mbox{P}(x \leq 0.1) = F(0.75)-F(0.1)=0.75^2-0.1^2 =0.5525\)
Fx(0.75) - Fx(0.1)
## [1] 0.5525
Notice for a continuous random variable, we are loose with the use of the =
sign. This is because for a continuous random variable \(\mbox{P}(X=x)=0\). Do not get sloppy when working with discrete random variables.
The median of \(X\) is the value \(x\) such that \(\mbox{P}(X\leq x)=0.5\), the area under a single point is 0. So we simply solve \(x^2=0.5\) for \(x\). Thus, the median of \(X\) is \(\sqrt{0.5}=0.707\).
Or using R
## [1] 0.7071067
11.2.4 Simulation
As in the case of the discrete random variable, we can simulate a continuous random variable if we have an inverse for the cdf. The range of the cdf is \([0,1]\), so we generate a random number in this interval and then apply the inverse cdf to obtain a random variable. In a similar manner, for a continuous random variable, we use the following pseudo code:
1. Generate a random number in the interval \([0,1]\), \(U\).
2. Find the random variable \(X\) from \(F_{X}^{-1}(U)\).
In R
for our example, this looks like the following.
## [1] 0.6137365
inspect(results)
##
## quantitative variables:
## name class min Q1 median Q3 max mean
## 1 sqrt numeric 0.005321359 0.4977011 0.7084257 0.8656665 0.9999873 0.6669452
## sd n missing
## 1 0.2358056 10000 0
Figure 11.6 is a density plot of the simulated density function.
11.3 Moments
As with discrete random variables, moments can be calculated to summarize characteristics such as center and spread. In the discrete case, expectation is found by multiplying each possible value by its associated probability and summing across the domain (\(\mbox{E}(X)=\sum_x x\cdot f_X(x)\)). In the continuous case, the domain of \(X\) consists of an infinite set of values. From your calculus days, recall that the sum across an infinite domain is represented by an integral.
Let \(g(X)\) be any function of \(X\). The expectation of \(g(X)\) is found by: \[ \mbox{E}(g(X)) = \int_{S_X} g(x)f_X(x)\mathop{}\!\mathrm{d}x \]
11.3.1 Mean and variance
Let \(X\) be a continuous random variable. The mean of \(X\), or \(\mu_X\), is simply \(\mbox{E}(X)\). Thus, \[ \mbox{E}(X)=\int_{S_X}x\cdot f_X(x)\mathop{}\!\mathrm{d}x \]
As in the discrete case, the variance of \(X\) is the expected squared difference from the mean, or \(\mbox{E}[(X-\mu_X)^2]\). Thus, \[ \sigma^2_X = \mbox{Var}(X)=\mbox{E}[(X-\mu_X)^2]= \int_{S_X} (x-\mu_X)^2\cdot f_X(x) \mathop{}\!\mathrm{d}x \]
Recall homework problem 6 from the last chapter. In this problem, you showed that \(\mbox{Var}(X)=\mbox{E}(X^2)-\mbox{E}(X)^2\). Thus, \[ \mbox{Var}(X)=\mbox{E}(X^2)-\mbox{E}(X)^2 = \int_{S_X} x^2\cdot f_X(x)\mathop{}\!\mathrm{d}x - \mu_X^2 \]
Example:
Consider the random variable \(X\) from above. Find the mean and variance of \(X\). \[ \mu_X= \mbox{E}(X)=\int_0^1 x\cdot 2x\mathop{}\!\mathrm{d}x = \frac{2x^3}{3}\bigg|_0^1 = \frac{2}{3}=0.667 \]
Side note: Since the mean of \(X\) is smaller than the median of \(X\), we say that \(X\) is skewed to the left, or negatively skewed.
Using R
.
integrate(function(x)x*2*x, 0, 1)
## 0.6666667 with absolute error < 7.4e-15
Or using antiD()
Ex <- antiD(2*x^2 ~ x)
Ex(1) - Ex(0)
## [1] 0.6666667
Using our simulation.
mean(~sqrt, data = results)
## [1] 0.6669452
\[ \sigma^2_X = \mbox{Var}(X)= \mbox{E}(X^2)-\mbox{E}(X)^2 = \int_0^1 x^2\cdot 2x\mathop{}\!\mathrm{d}x - \left(\frac{2}{3}\right)^2 = \frac{2x^4}{4}\bigg|_0^1-\frac{4}{9}=\frac{1}{2}-\frac{4}{9}=\frac{1}{18}=0.056 \]
integrate(function(x)x^2*2*x, 0, 1)$value - (2/3)^2
## [1] 0.05555556
or
Vx <- antiD(x^2*2*x ~ x)
Vx(1) - Vx(0) - (2/3)^2
## [1] 0.05555556
var(~sqrt, data = results)*9999/10000
## [1] 0.05559873
And finally, the standard deviation of \(X\) is \(\sigma_X = \sqrt{\sigma^2_X}=\sqrt{1/18}=0.236\).
11.4 Homework Problems
- Let \(X\) be a continuous random variable on the domain \(-k \leq X \leq k\). Also, let \(f(x)=\frac{x^2}{18}\).
- Assume that \(f(x)\) is a valid pdf. Find the value of \(k\).
- Plot the pdf of \(X\).
- Find and plot the cdf of \(X\).
- Find \(\mbox{P}(X<1)\).
- Find \(\mbox{P}(1.5<X\leq 2.5)\).
- Find the 80th percentile of \(X\) (the value \(x\) for which 80% of the distribution is to the left of that value).
- Find the value \(x\) such that \(\mbox{P}(-x \leq X \leq x)=0.4\).
- Find the mean and variance of \(X\).
- Simulate 10000 values from this distribution and plot the density.
- Let \(X\) be a continuous random variable. Prove that the cdf of \(X\), \(F_X(x)\) is a non-decreasing function. (Hint: show that for any \(a < b\), \(F_X(a) \leq F_X(b)\).)