# Syllabus (in progress)

# Key dates/times

- Class times: MW 6:10 - 7:25
- Tues Jan 22, classes start
- Monday May 6, last day of class
- May 10 - 17, Finals

# Lecture Outlines

## Week 1

- Monday Jan 21st (No Class)
- Wednesday Jan 23rd
- Intro to class
- Introduce computer setup requirements

#### Homework

- Computer setup workshops after class W and weekend

## Week 2

- Monday Jan 28th

```
* Unix intro
```

- Wednedsay Jan 30th

```
* Python intro
```

- Friday/Saturday: Software carpentry (SWC) workshop

#### Homework

Trivial python/unix stuff that you should know if you went to SWC workshop

Modify Python command line interface (CLI) template

- Change what it does to lines
- Add an option
- Use in a pipe
- Commit to repo

## Week 3

- Monday Feb 4th
- Linear Regression
- Linear solver
- Pseudocode
- Numpy code

- Wednesday Feb 6th
- L2 regularization/Gaussian priors
- Why numpy?
- Class structure

#### Homework

- Analyze regularization and overfitting on synthetic data
- Groups build linear regression module as part of a regression module

## Week 4

- Monday Feb 11th

- Introduce data set
- Pandas for exploratory data analysis (EDA)

- Wednesday Feb 13th

Statistical analysis of data

- Goodness of fit metrics
- Quantifying error

Correlation coefficients

- Pearson
- (Brownian) distance covariance

#### Homework

- Analyze two-variable interactions and give various summaries/details.
- Analyze regularization and overfitting on real data
- Write detailed report on data problem. One team presents in class.

## Week 5

- Monday Feb 18th
- Logistic Regression
- L1 regularization/Laplacian priors

- Wednesday Feb 20th
- Vanilla logistic regression
- Convexity of the loss function
- Newton's method
- L1 regularized solver

```
* Transformation to constrained problem
```

#### Homework

- Add logistic regression module to the regression module
- Analyze regularization and overfitting on synthetic data

## Week 6

- Monday Feb 25th
- Introduce data set
- More pandas and EDA

- Wednesday Feb 27th
- Statistical analysis of data
- Confidence intervals and p-values for coefficients
- ROC curves
- Pseudo R squared

#### Homework

- Analyze regularization and overfitting on real data
- Write detailed report on data problem. One team presents in class.

## Week 7

- Monday Mar 4th
- Regular expressions
- Regex tutorial with python re module
- Using the python nltk and pattern modules

- Wednesday Mar 6th
- Text processing and NLP
- Bayesian classifiers

#### Homework

- Basic regex and text processing exercises
- Write a basic Bayesian classifier class

## Week 8

- Monday Mar 11th

```
* Introduce data
```

- Wednesday Mar 13th
- Statistical analysis of data
- Write detailed report on data problem. One team presents in class.

## Week 9

Monday Mar 25th

- Intro to HTML and XML
- Using the python Beautiful Soup module

Wednesday Mar 27th

```
* Web Scraping
```

#### Homework

- Write a web scraper + entity extractor class
- Combine with the Bayesian Classifier

## Week 10

- Monday Apr 1st

```
* Introduce data
```

- Wednesday Apr 3rd
- Statistical analysis of data
- Write detailed report on data problem. One team presents in class.

## Week 11

- Monday Apr 8th

Time Series Analysis I - ARMA

- Stationarity

AR, MA processes

Maximum likelihood estimation

- Wednesday Apr 10th

Time Series Analysis II - ARCH

- Heteroskedasticity in time series
- Clustering
- Testing for ARCH/GARCH errors

#### Homework

- Write a JSON parser, get data, ARMA modeling of daily/monthly stock data.

## Week 12

- Monday Apr 15th
- Introduce dataset
- Pandas timeseries API
- Generating ranges
- Date field access
- Resampling
- Intraday filtering

- Wednesday Apr 17th
- Missing data (look ahead bias etc)
- Moving window statistics
- Seasonality adjustment
- Exponential decay time weighting

#### Homework

- Moving window covariance estimation for predictive portfolio risk analysis
- Write detailed report on data problem. One team presents in class.

## Additional topics that may be covered in weeks 13, 14, 15

- Causal inference
- Parallel computing
- Cluster computing
- MapReduce