Practical Data Science

Course Number: 
05-839
Semester and Units: 
Spring: 12 units
Course Description: 

This course covers techniques and technologies for creating data driven interfaces. You will learn about the entire data pipeline from sensing to cleaning data to different forms of analysis and computation. http://data.cmubi.org/

Syllabus: http://data.cmubi.org/syllabus

This syllabus is tentative but gives a flavor for what we hope to cover in the course. 

Introduction

  • identifying the questions you want to answer
  • identifying the data required to answer the question
  • transforming data to answers

Collecting data

  • Sources to collect from: click, sensors, mobile phones, etc.
  • APIs for social web & OAUTH
  • Common data formats: XML, json, csv, …
  • Sampling and Bias in data collection

Cleaning data

  • Understanding your data
  • Data Quality: coherence, correctness, completeness and accountability
  • Common problems with data

Tools for analyzing data

  • Exploratory Analysis, Distributions and their meanings
  • Causality
  • Transformations and Features
  • Usable Machine Learning

Visualisation

  • What, why and how not to visualize
  • Perceptual issues in visualization
  • What makes a good visualization, narrative
  • Visualizing big data
Instructors: 
Jennifer Mankoff