The field of statistical learning encompasses a variety of computational tools for modeling and understanding complex data. In this introductory course, we will explore many of the most popular of these tools, such as sparse regression, classification trees, boosting and support vector machines. In addition to unpacking the mathematics underlying the computational methods, students will also gain hands-on experience in applying these techniques to real datasets using R.
Prerequisite: MTH 220 (or an equivalent intro. statistics course), or permission of the instructor.
Date
|
Topic
|
Lab
|
Guest |
Assignments
|
09-11 |
Introduction to Machine Learning (p.1-28)
|
|
|
|
|
09-13 |
Evaluating Models (p.29-51)
|
R and RMarkdown Demo
|
|
|
|
09-18 |
Simple and Multiple Linear Regression (p.59-82)
|
Introduction to python
|
|
|
|
09-20 |
Assumptions and Other Potential Problems (p.82-119)
|
Linear Regression
|
|
A1 out
|
|
09-25 |
Intro. to Classification, KNN (p.129-130, p.37-42, p.104-109)
|
K-Nearest Neighbors
|
|
|
|
09-27 |
Logistic Regression (p.130-138)
|
Logistic Regression
|
|
A2 out
|
A1 due
|
10-02 |
Discriminant Analysis (p.138-150)
|
LDA/QDA
|
Ben Miller, MITLL
|
|
|
10-04 |
Classification Wrap-up (p.151-154)
|
Comparing methods
|
S. Chaitanya, UMass
|
A3 out
|
A2 due
|
10-09 |
NO CLASSES - FALL BREAK |
10-11 |
Resampling Methods (p.175-190)
|
CV & bootstrap
|
|
A4 out |
A3 due |
10-16 |
Best Subset and Stepwise Selection (p.205-210)
|
Subset Selection
|
|
|
|
10-18 |
Estimating Error w/ Cross-Validation (p.210-214)
|
Selection by CV
|
|
A5 out |
A4 due |
10-23 |
Ridge Regression and the Lasso (p.214-228)
|
RR & the Lasso
|
|
|
|
10-25 |
PCR and PLS (p.228-244)
|
Dimension Reduction
|
|
A6 out
|
A5 due
|
10-30 |
Machine Learning in the Wild
|
|
R. Caceres, MITLL |
|
|
11-01 |
Final Project Workshop I |
FP1 out |
A6 due
|
11-06 |
Polynomial Regression and Step Functions (p.265-270)
|
Polynomials & Step Functions
|
|
|
|
11-08 |
Splines and GAMs (p. 271-287)
|
Splines & GAMs
|
|
A7 out
|
FP1 due
|
11-13 |
Decision and Classification Trees (p.303-316)
|
|
|
|
|
11-15 |
Bagging, Random Forests, and Boosting (p.316-324)
|
Decision Trees
|
|
FP2 out |
A7 due |
11-20 |
NO CLASSES - Jordan Sick |
11-22 |
NO CLASSES - THANKSGIVING |
11-27 |
Maximal Margin and Support Vector Classifiers (p.337-355)
|
FP Workshop II
|
|
FP3 out |
FP2 due |
11-29 |
Multiclass SVMs (p.355-359)
|
SVMs for Classification
|
|
|
|
12-04 |
K-Means and Hierarchical Clustering (p.385-401)
|
Clustering
| |
A8 out |
FP3 due |
12-06 |
Neural Networks
|
|
G. Grinstein |
|
FP3 due |
12-11 |
Advanced Topics
|
|
|
|
A8 due |
12-13 |
Final Project Demonstrations |
|
|
Assignments and Deliverables: A problem set will be assigned at the end of each section (for a total of 8 assignments). The problem set will be due the following week. The course will culminate in a final project applying statistical learning techniques to a dataset of your choice.
Late submissions will be assessed a penalty of 10% per day. Extensions must be requested 48 hours in advance, or with notification from a student's class dean.
To help students gain hands-on experience in applying statistical learning techniques, this course will include many in-class lab sessions. The labs will be conducted primarily in R, with some supplemental python exercises at the instructor's discretion. Students are encouraged to work in pairs during these labs.
Lab responses are due 24 hours after the lab was released.
RStudio is great for statistical analysis.
Python is useful for data ingest, cleaning, formatting, and general wrangling.
Students enrolled in this course have free, unlimited access to DataCamp, generously provided by DataCamp for the Classroom.
The Spinelli Center for Quantitative Learning is a great place to get help brushing up on stats.
Required Reading |
R1 |
Introduction to Statistical Learning with Applications in R
by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
|
Free (pdf) | (supplemental material) |
|
Assignments |
40% |
Labs |
20% |
Final Project |
20% |
Class Participation |
20% |
Total |
100% |
Note that the final grade is based on my judgment of your work. Although the grade will be largely based
on the percentages shown to the left, I will be giving out extra credit for excellent work and out-of-the-box
thinking. Similarly, while "class participation" is somewhat subjective and is not one-size-fits-all, I will take note of contributions in class which demonstrate intellectual curiosity or clear understanding of a topic, as well as comments which
help others in class to learn a difficult concept. |
Smith is committed to providing support services and reasonable accommodations to all students with disabilities. To request an accommodation, please register with the Disability Services Office at the beginning of the semester. To do so, call (413) 585-2071 to arrange an appointment with Laura Rauscher, Director of Disability Services.
Some of the materials used in this course are derived from lectures, notes, or similar courses taught elsewhere. Appropriate references will be included on all such material.
|