Objectives
Descriptive Statistical Modeling
2 - Data Basics
Define and use properly in context all new terminology, to include: case, observational unit, variables, data frame, associated variables, independent, and discrete and continuous variables.
Identify and define the different types of variables.
Given a study description, describe the research question.
In
R
, create a scatterplot and determine the association of two numerical variables from the plot.
3 - Overview of Data Collection Principles
Define and use properly in context all new terminology, to include: population, sample, anecdotal evidence, bias, simple random sample, systematic sample, non-response bias, representative sample, convenience sample, explanatory variable, response variable, observational study, cohort, experiment, randomized experiment, and placebo.
From a description of a research project, be able to describe the population of interest, the generalizability of the study, the explanatory and response variables, whether it is observational or experimental, and determine the type of sample.
In the context of a problem, explain how to conduct a sample for the different types of sampling procedures.
4 - Studies
Define and use properly in context all new terminology, to include: confounding variable, prospective study, retrospective study, simple random sampling, stratified sampling, strata, cluster sampling, multistage sampling, experiment, randomized experiment, control, replicate, blocking, treatment group, control group, blinded study, placebo, placebo effect, and double-blind.
Given a study description, be able to describe the study using correct terminology.
Given a scenario, describe flaws in reasoning and propose study and sampling designs.
5 - Numerical Data
Define and use properly in context all new terminology, to include: scatterplot, dot plot, mean, distribution, point estimate, weighted mean, histogram, data density, right skewed, left skewed, symmetric, mode, unimodal, bimodal, multimodal, variance, standard deviation, box plot, median, interquartile range, first quartile, third quartile, whiskers, outlier, robust estimate, transformation.
In
R
, generate summary statistics for a numerical variable, including breaking down summary statistics by groups.In
R
, generate appropriate graphical summaries of numerical variables.Interpret and explain output both graphically and numerically.
6 - Categorical Data
Define and use properly in context all new terminology, to include: factor, contingency table, marginal counts, joint counts, frequency table, relative frequency table, bar plot, conditioning, segmented bar plot, mosaic plot, pie chart, side-by-side box plot, density plot.
In
R
, generate tables for categorical variable(s).In
R
, generate appropriate graphical summaries of categorical and numerical variables.Interpret and explain output both graphically and numerically.
Probability Modeling
8 - Probability Rules
Define and use properly in context all new terminology related to probability, including: sample space, outcome, event, subset, intersection, union, complement, probability, mutually exclusive, exhaustive, independent, multiplication rule, permutation, combination.
Apply basic probability and counting rules to find probabilities.
Describe the basic axioms of probability.
Use
R
to calculate and simulate probabilities of events.
9 - Conditional Probability
Define conditional probability and distinguish it from joint probability.
Find a conditional probability using its definition.
Using conditional probability, determine whether two events are independent.
Apply Bayes’ Rule mathematically and via simulation.
10 - Random Variables
Define and use properly in context all new terminology, to include: random variable, discrete random variable, continuous random variable, mixed random variable, distribution function, probability mass function, cumulative distribution function, moment, expectation, mean, variance.
Given a discrete random variable, obtain the pmf and cdf, and use them to obtain probabilities of events.
Simulate random variables for a discrete distribution.
Find the moments of a discrete random variable.
Find the expected value of a linear transformation of a random variable.
11 - Continuous Random Variables
Define and properly use in context all new terminology, to include: probability density function (pdf) and cumulative distribution function (cdf) for continuous random variables.
Given a continuous random variable, find probabilities using the pdf and/or the cdf.
Find the mean and variance of a continuous random variable.
12 - Named Discrete Distributions
Recognize and set up for use common discrete distributions (Uniform, Binomial, Poisson, Hypergeometric) to include parameters, assumptions, and moments.
Use
R
to calculate probabilities and quantiles involving random variables with common discrete distributions.
13 - Named Continuous Distributions
Recognize when to use common continuous distributions (Uniform, Exponential, Gamma, Normal, Weibull, and Beta), identify parameters, and find moments.
Use
R
to calculate probabilities and quantiles involving random variables with common continuous distributions.Understand the relationship between the Poisson process and the Poisson & Exponential distributions.
Know when to apply and then use the memory-less property.
14 - Multivariate Distributions
Define (and distinguish between) the terms joint probability mass/density function, marginal pmf/pdf, and conditional pmf/pdf.
Given a joint pmf/pdf, obtain the marginal and conditional pmfs/pdfs.
Use joint, marginal and conditional pmfs/pdfs to obtain probabilities.
15 - Multivariate Expectation
Given a joint pmf/pdf, obtain means and variances of random variables and functions of random variables.
Define the terms covariance and correlation, and given a joint pmf/pdf, obtain the covariance and correlation between two random variables.
Given a joint pmf/pdf, determine whether random variables are independent of one another.
Find conditional expectations.
16 - Transformations
Given a discrete random variable, determine the distribution of a transformation of that random variable.
Given a continuous random variable, use the cdf method to determine the distribution of a transformation of that random variable.
Use simulation methods to find the distribution of a transform of single or multivariate random variables.
Inferential Statistical Modeling
18 - Hypothesis Testing Case Study
Define and use properly in context all new terminology, to include: point estimate, null hypothesis, alternative hypothesis, hypothesis test, randomization, permutation test, test statistic, and \(p\)-value.
Conduct a hypothesis test using a randomization test, to include all 4 steps.
19 - Hypothesis Testing with Simulation
Know and properly use the terminology of a hypothesis test, to include: null hypothesis, alternative hypothesis, test statistic, \(p\)-value, randomization test, one-sided test, two-sided test, statistically significant, significance level, type I error, type II error, false positive, false negative, null distribution, and sampling distribution.
Conduct all four steps of a hypothesis test using randomization.
Discuss and explain the ideas of decision errors, one-sided versus two-sided tests, and the choice of a significance level.
20 - Hypothesis Testing with Known Distributions
Know and properly use the terminology of a hypothesis test, to include: permutation test, exact test, null hypothesis, alternative hypothesis, test statistic, \(p\)-value, and power.
Conduct all four steps of a hypothesis test using probability models.
21 - Hypothesis Testing with the Central Limit Theorem
Explain the central limit theorem and when it can be used for inference.
Conduct hypothesis tests of a single mean and proportion using the CLT and
R
.Explain how the \(t\) distribution relates to the normal distribution, where it is used, and how changing parameters impacts the shape of the distribution.
22 - Additional Hypothesis Tests
Conduct and interpret a goodness of fit test using both Pearson’s chi-squared and randomization to evaluate the independence between two categorical variables.
Explain how the chi-squared distribution relates to the normal distribution, where it is used, and how changing parameters impacts the shape of the distribution.
Conduct and interpret a hypothesis test for equality of two means and equality of two variances using both permutation and the CLT.
Conduct and interpret a hypothesis test for paired data.
Know and check the assumptions for Pearson’s chi-square and two-sample \(t\) tests.
23 - Analysis of Variance
Conduct and interpret a hypothesis test for equality of two or more means using both permutation and the \(F\) distribution.
Know and check the assumptions for ANOVA.
24 - Confidence Intervals
Using asymptotic methods based on the normal distribution, construct and interpret a confidence interval for an unknown parameter.
Describe the relationships between confidence intervals, confidence level, and sample size.
Describe the relationships between confidence intervals and hypothesis testing.
Calculate confidence intervals for proportions using three different approaches in
R
: explicit calculation,binom.test()
, andprop_test()
.
25 - Bootstrap
Use the bootstrap to estimate the standard error of a sample statistic.
Using bootstrap methods, obtain and interpret a confidence interval for an unknown parameter, based on a random sample.
Describe the advantages, disadvantages, and assumptions behind bootstrapping for confidence intervals.
Predictive Statistical Modeling
26 - Linear Regression Case Study
Using
R
, generate a linear regression model and use it to produce a prediction model.Using plots, check the assumptions of a linear regression model.
27 - Linear Regression Basics
Obtain parameter estimates of a simple linear regression model, given a sample of data.
Interpret the coefficients of a simple linear regression.
Create a scatterplot with a regression line.
Explain and check the assumptions of linear regression.
Use and be able to explain all new terminology, to include: response, predictor, linear regression, simple linear regression, coefficients, residual, extrapolation.
28 - Linear Regression Inference
Given a simple linear regression model, conduct inference on the coefficients \(\beta_0\) and \(\beta_1\).
Given a simple linear regression model, calculate the predicted response for a given value of the predictor.
Build and interpret confidence and prediction intervals for values of the response variable.
29 - Linear Regression Diagnostics
Obtain and interpret \(R\)-squared and the \(F\)-statistic.
Use
R
to evaluate the assumptions of a linear model.Identify and explain outliers and leverage points.
30 - Simulated-Based Linear Regression
Using the bootstrap, generate confidence intervals and estimates of standard error for parameter estimates from a linear regression model.
Generate and interpret bootstrap confidence intervals for predicted values.
Generate bootstrap samples from sampling rows of the data and from sampling residuals, and explain why you might prefer one method over the other.
Interpret regression coefficients for a linear model with a categorical explanatory variable.
31 - Multiple Linear Regression
Create and interpret a model with multiple predictors and check assumptions.
Generate and interpret confidence intervals for estimates.
Explain adjusted \(R^2\) and multi-collinearity.
Interpret regression coefficients for a linear model with multiple predictors.
Build and interpret models with higher order terms.