Open Educational Resources
Document Type
Textbook
Publication Date
Spring 5-26-2022
Abstract
This is a self-contained course in data science and machine learning using R. It covers philosophy of modeling with data, prediction via linear models, machine learning including support vector machines and random forests, probability estimation and asymmetric costs using logistic regression and probit regression, underfitting vs. overfitting, model validation, handling missingness and much more. There is formal instruction of data manipulation using dplyr and data.table, visualization using ggplot2 and statistical computing.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Comments
This work is an Rmd file which can be opened with Rstudio (rstudio.com) which requires the R langauge (https://www.r-project.org/)