Key dates/times

  • Class times: MW 6:10 - 7:25
  • Tues Jan 22, classes start
  • Monday May 6, last day of class
  • May 10 - 17, Finals

Lecture Outlines

Week 1

  1. Monday Jan 21st (No Class)
  2. Wednesday Jan 23rd
    • Intro to class
    • Introduce computer setup requirements

Homework

  • Computer setup workshops after class W and weekend

Week 2

  1. Monday Jan 28th
* Unix intro
  1. Wednedsay Jan 30th
* Python intro
  1. Friday/Saturday: Software carpentry (SWC) workshop

Homework

  • Trivial python/unix stuff that you should know if you went to SWC workshop

  • Modify Python command line interface (CLI) template

    • Change what it does to lines
    • Add an option
    • Use in a pipe
    • Commit to repo

Week 3

  1. Monday Feb 4th
    • Linear Regression
    • Linear solver
      • Pseudocode
      • Numpy code
  2. Wednesday Feb 6th
    • L2 regularization/Gaussian priors
    • Why numpy?
    • Class structure

Homework

  • Analyze regularization and overfitting on synthetic data
  • Groups build linear regression module as part of a regression module

Week 4

  1. Monday Feb 11th
  • Introduce data set
  • Pandas for exploratory data analysis (EDA)
  1. Wednesday Feb 13th
  • Statistical analysis of data

    • Goodness of fit metrics
    • Quantifying error
  • Correlation coefficients

    • Pearson
    • (Brownian) distance covariance

Homework

  • Analyze two-variable interactions and give various summaries/details.
  • Analyze regularization and overfitting on real data
  • Write detailed report on data problem. One team presents in class.

Week 5

  1. Monday Feb 18th
    • Logistic Regression
    • L1 regularization/Laplacian priors
  2. Wednesday Feb 20th
    • Vanilla logistic regression
    • Convexity of the loss function
    • Newton's method
    • L1 regularized solver
  * Transformation to constrained problem

Homework

  • Add logistic regression module to the regression module
  • Analyze regularization and overfitting on synthetic data

Week 6

  1. Monday Feb 25th
    • Introduce data set
    • More pandas and EDA
  2. Wednesday Feb 27th
    • Statistical analysis of data
    • Confidence intervals and p-values for coefficients
    • ROC curves
    • Pseudo R squared

Homework

  • Analyze regularization and overfitting on real data
  • Write detailed report on data problem. One team presents in class.

Week 7

  1. Monday Mar 4th
    • Regular expressions
    • Regex tutorial with python re module
    • Using the python nltk and pattern modules
  2. Wednesday Mar 6th
    • Text processing and NLP
    • Bayesian classifiers

Homework

  • Basic regex and text processing exercises
  • Write a basic Bayesian classifier class

Week 8

  1. Monday Mar 11th
* Introduce data
  1. Wednesday Mar 13th
    • Statistical analysis of data
    • Write detailed report on data problem. One team presents in class.

Week 9

  1. Monday Mar 25th

    • Intro to HTML and XML
    • Using the python Beautiful Soup module
  2. Wednesday Mar 27th

* Web Scraping

Homework

  • Write a web scraper + entity extractor class
  • Combine with the Bayesian Classifier

Week 10

  1. Monday Apr 1st
* Introduce data
  1. Wednesday Apr 3rd
    • Statistical analysis of data
    • Write detailed report on data problem. One team presents in class.

Week 11

  1. Monday Apr 8th
  • Time Series Analysis I - ARMA

    • Stationarity
  • AR, MA processes

  • Maximum likelihood estimation

    1. Wednesday Apr 10th
  • Time Series Analysis II - ARCH

    • Heteroskedasticity in time series
    • Clustering
    • Testing for ARCH/GARCH errors

Homework

  • Write a JSON parser, get data, ARMA modeling of daily/monthly stock data.

Week 12

  1. Monday Apr 15th
    • Introduce dataset
    • Pandas timeseries API
      • Generating ranges
      • Date field access
      • Resampling
      • Intraday filtering
  2. Wednesday Apr 17th
    • Missing data (look ahead bias etc)
    • Moving window statistics
    • Seasonality adjustment
    • Exponential decay time weighting

Homework

  • Moving window covariance estimation for predictive portfolio risk analysis
  • Write detailed report on data problem. One team presents in class.

Additional topics that may be covered in weeks 13, 14, 15

  • Causal inference
  • Parallel computing
  • Cluster computing
  • MapReduce
Subscribe to RSS Feed