Syllabus (in progress)
Key dates/times
- Class times: MW 6:10 - 7:25
- Tues Jan 22, classes start
- Monday May 6, last day of class
- May 10 - 17, Finals
Lecture Outlines
Week 1
- Monday Jan 21st (No Class)
- Wednesday Jan 23rd
- Intro to class
- Introduce computer setup requirements
Homework
- Computer setup workshops after class W and weekend
Week 2
- Monday Jan 28th
* Unix intro
- Wednedsay Jan 30th
* Python intro
- Friday/Saturday: Software carpentry (SWC) workshop
Homework
Trivial python/unix stuff that you should know if you went to SWC workshop
Modify Python command line interface (CLI) template
- Change what it does to lines
- Add an option
- Use in a pipe
- Commit to repo
Week 3
- Monday Feb 4th
- Linear Regression
- Linear solver
- Pseudocode
- Numpy code
- Wednesday Feb 6th
- L2 regularization/Gaussian priors
- Why numpy?
- Class structure
Homework
- Analyze regularization and overfitting on synthetic data
- Groups build linear regression module as part of a regression module
Week 4
- Monday Feb 11th
- Introduce data set
- Pandas for exploratory data analysis (EDA)
- Wednesday Feb 13th
Statistical analysis of data
- Goodness of fit metrics
- Quantifying error
Correlation coefficients
- Pearson
- (Brownian) distance covariance
Homework
- Analyze two-variable interactions and give various summaries/details.
- Analyze regularization and overfitting on real data
- Write detailed report on data problem. One team presents in class.
Week 5
- Monday Feb 18th
- Logistic Regression
- L1 regularization/Laplacian priors
- Wednesday Feb 20th
- Vanilla logistic regression
- Convexity of the loss function
- Newton's method
- L1 regularized solver
* Transformation to constrained problem
Homework
- Add logistic regression module to the regression module
- Analyze regularization and overfitting on synthetic data
Week 6
- Monday Feb 25th
- Introduce data set
- More pandas and EDA
- Wednesday Feb 27th
- Statistical analysis of data
- Confidence intervals and p-values for coefficients
- ROC curves
- Pseudo R squared
Homework
- Analyze regularization and overfitting on real data
- Write detailed report on data problem. One team presents in class.
Week 7
- Monday Mar 4th
- Regular expressions
- Regex tutorial with python re module
- Using the python nltk and pattern modules
- Wednesday Mar 6th
- Text processing and NLP
- Bayesian classifiers
Homework
- Basic regex and text processing exercises
- Write a basic Bayesian classifier class
Week 8
- Monday Mar 11th
* Introduce data
- Wednesday Mar 13th
- Statistical analysis of data
- Write detailed report on data problem. One team presents in class.
Week 9
Monday Mar 25th
- Intro to HTML and XML
- Using the python Beautiful Soup module
Wednesday Mar 27th
* Web Scraping
Homework
- Write a web scraper + entity extractor class
- Combine with the Bayesian Classifier
Week 10
- Monday Apr 1st
* Introduce data
- Wednesday Apr 3rd
- Statistical analysis of data
- Write detailed report on data problem. One team presents in class.
Week 11
- Monday Apr 8th
Time Series Analysis I - ARMA
- Stationarity
AR, MA processes
Maximum likelihood estimation
- Wednesday Apr 10th
Time Series Analysis II - ARCH
- Heteroskedasticity in time series
- Clustering
- Testing for ARCH/GARCH errors
Homework
- Write a JSON parser, get data, ARMA modeling of daily/monthly stock data.
Week 12
- Monday Apr 15th
- Introduce dataset
- Pandas timeseries API
- Generating ranges
- Date field access
- Resampling
- Intraday filtering
- Wednesday Apr 17th
- Missing data (look ahead bias etc)
- Moving window statistics
- Seasonality adjustment
- Exponential decay time weighting
Homework
- Moving window covariance estimation for predictive portfolio risk analysis
- Write detailed report on data problem. One team presents in class.
Additional topics that may be covered in weeks 13, 14, 15
- Causal inference
- Parallel computing
- Cluster computing
- MapReduce