Open Educational Resources

Document Type


Publication Date

Spring 5-26-2022


This is a self-contained course in data science and machine learning using R. It covers philosophy of modeling with data, prediction via linear models, machine learning including support vector machines and random forests, probability estimation and asymmetric costs using logistic regression and probit regression, underfitting vs. overfitting, model validation, handling missingness and much more. There is formal instruction of data manipulation using dplyr and data.table, visualization using ggplot2 and statistical computing.


This work is an Rmd file which can be opened with Rstudio ( which requires the R langauge (

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.