Preface
This book is based on the notes we created for our students as part of a one semester course on probability and statistics. We developed these notes from three primary resources. The most important is the Openintro Introductory Statistics with Randomization and Simulation (Diez, Barr, and Çetinkaya-Rundel 2014) book. In parts, we have used their notes and homework problems. However, in most cases we have altered their work to fit our needs. The second most important book for our work is Introduction to Probability and Statistics Using R (Kerns 2010). Finally, we have used some examples, code, and ideas from the first edition of Prium’s book, Foundations and Applications of Statistics: An Introduction Using R (R. J. Pruim 2011).
0.1 Who is this book for?
We designed this book for the study of statistics that maximizes computational ideas while minimizing algebraic symbol manipulation. Although we do discuss traditional small-sample, normal-based inference and some of the classical probability distributions, we rely heavily on ideas such as simulation, permutations, and the bootstrap. This means that students with a background in differential and integral calculus will be successful with this book.
This book makes extensive using of the R
programming language. In particular we focus both on the tidyverse and mosaic packages. We include a significant amount of code in our notes and frequently demonstrate multiple ways of completing a task. We have used this book for junior and sophomore college students.
0.2 Book structure and how to use it
This book is divided into four parts. Each part begins with a case study that introduces many of the main ideas of each part. Each chapter is designed to be a standalone 50 minute lesson. Within each chapter, we give exercises that can be worked in class and we provide learning objectives.
This book assumes students have access to R
. Finally, we keep the number of homework problems to a reasonable level and assign all problems.
The four parts of the book are:
Descriptive Statistical Modeling: This part introduces the student to data collection methods, summary statistics, visual summaries, and exploratory data analysis.
Probability Modeling: We discuss the foundational ideas of probability, counting methods, and common distributions. We use both calculus and simulation to find moments and probabilities. We introduce basic ideas of multivariate probability. We include method of moments and maximum likelihood estimators.
Inferential Statistical Modeling: We discuss many of the basic inference ideas found in a traditional introductory statistics class but we add ideas of bootstrap and permutation methods.
Predictive Statistical Modeling: The final part introduces prediction methods, mainly in the form of linear regression. This part also includes inference for regression.
The learning outcomes for this course are to use computational and mathematical statistical/probabilistic concepts for:
- Developing probabilistic models.
- Developing statistical models for description, inference, and prediction.
- Advancing practical and theoretical analytic experience and skills.
0.3 Prerequisites
To take this course, students are expected to have completed calculus up through and including integral calculus. We do have multivariate ideas in the course, but they are easily taught and don’t require calculus III. We don’t assume the students have any programming experience and, thus, we include a great deal of code. We have historically supplemented the course with Data Camp courses. We have also used RStudio Cloud to help students get started in R
without the burden of loading and maintaining software.
0.4 Packages
These notes make use of the following packages in R
: knitr (Xie 2023b), rmarkdown (Allaire et al. 2023), mosaic (R. Pruim, Kaplan, and Horton 2022), mosaicCalc (Kaplan, Pruim, and Horton 2022), tidyverse (Wickham 2023), ISLR (James et al. 2021), vcd (Meyer, Zeileis, and Hornik 2023), ggplot2 (Wickham et al. 2023), MASS (Ripley 2023), openintro (Çetinkaya-Rundel et al. 2022), broom (Robinson, Hayes, and Couch 2023), infer (Bray et al. 2022), kableExtra (Zhu 2021), and DT (Xie, Cheng, and Tan 2023).
Solutions Manual
The accompanying solutions manual is available here.
0.5 Acknowledgements
We have been lucky to have numerous open sources to help facilitate this work. Thank you to those who helped to correct mistakes to include Skyler Royse.
This book was written using the bookdown package (Xie 2023a).
This book is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.