What this class is not

This class is not a traditional statistics course, although much of the material will be rooted in statistical analysis. This class is not a computer science course, although you will program a lot and hopefully become better at doing so in different environments. And, this class is not a machine learning course, although ML techniques will be foundational material for the lectures. You will not have clean data sets to start with all the time; you will get data sets as we have seen them working in data science space for the last few years. The data sets might be messy and unstructured, and it might not always be clear how to extract the relevant signals for the problem at hand. However, this is part of the fun and we hope you will agree by the time May rolls around.


What this class is

This class is an introduction to the collection of techniques we have found indispensable when working in the data science space. There will be significant emphasis on understanding the relevant statistics and proper application thereof. It will teach you to write good code and use collaborative tools to do so, because after all if you intend to build things for other people to use there is no other option. We will talk about staple machine learning algorithms and techniques, going into some depth about the background mathematics, but will always revert back to implementing those techniques into python libraries to be used for subsequent data analysis. Sometimes you will have to find, get, process and clean data before taking initial steps in any kind of statistical modeling. In short, you will have a taste of the day-to-day in the data science world, and walk away with the foundational knowledge and toolkit that will allow you to build solutions in this relatively new and exciting area.



blog comments powered by Disqus

Published

20 December 2012

Tags

Subscribe to RSS Feed