Data Science and Predictive Analytics
The first edition of the textbook Data Science and Predictive Analytics: Biomedical and Health Applications using R, authored by Ivo D. Dinov, was published in August 2018 by Springer. The second edition of the book was printed in 2023.
This textbook covers some of the core mathematical foundations, computational techniques, and artificial intelligence approaches used in data science research and applications.
By using the statistical computing platform R and a broad range of biomedical case-studies, the 23 chapters of the book first edition provide explicit examples of importing, exporting, processing, modeling, visualizing, and interpreting large, multivariate, incomplete, heterogeneous, longitudinal, and incomplete datasets.
Structure
First edition table of contents
The first edition of the Data Science and Predictive Analytics textbook is divided into the following 23 chapters, each progressively building on the previous content.- Motivation
- Foundations of R
- Managing Data in R
- Data Visualization
- Linear Algebra & Matrix Computing
- Dimensionality Reduction
- Lazy Learning: Classification Using Nearest Neighbors
- Probabilistic Learning: Classification Using Naive Bayes
- Decision Tree Divide and Conquer Classification
- Forecasting Numeric Data Using Regression Models
- Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines
- Apriori Association Rules Learning
- k-Means Clustering
- Model Performance Assessment
- Improving Model Performance
- Specialized Machine Learning Topics
- Variable/Feature Selection
- Regularized Linear Modeling and Controlled Variable Selection
- Big Longitudinal Data Analysis
- Natural Language Processing/Text Mining
- Prediction and Internal Statistical Cross Validation
- Function Optimization
- Deep Learning, Neural Networks
Second edition table of contents
- Introduction
- Basic Visualization and Exploratory Data Analytics
- Linear Algebra, Matrix Computing, and Regression Modeling
- Linear and Nonlinear Dimensionality Reduction
- Supervised Classification
- Black Box Machine Learning Methods
- Qualitative Learning Methods—Text Mining, Natural Language Processing, and Apriori Association Rules Learning
- Unsupervised Clustering
- Model Performance Assessment, Validation, and Improvement
- Specialized Machine Learning Topics
- Variable Importance and Feature Selection
- Big Longitudinal Data Analysis
- Function Optimization
- Deep Learning, Neural Networks
Reception
As of January 17, 2021, the electronic version of the book first edition is freely available on SpringerLink and has been downloaded over 6 million times. The textbook is globally available in print and electronic formats in many college and university libraries and has been used for data science, computational statistics, and analytics classes at various institutions.