Machine Learning on Big Data with MapReduce
Course objectives:
Participants will learn to adapt and execute machine learning algorithms in the map reduce framework. Participants should finish the class able to author their own machine learning algorithms for map reduce and to run them on Amazon Web Services. Amazon is providing AWS credits for class participants.
Participants will learn to use python code to author mappers and reducers for “hadoop-streaming”. For most of the class we will employ “mrjob” - an open-source framework developed at Yelp. Employing mrjob enables class members to program mappers and reducers in python. The mrjob framework then submits the mapper-reducer to run locally without using hadoop, to run on Amazon Web Services, or to run them on a private hadoop cluster. This will simplify the programming tasks.
Schedule: Here's a tentative schedule to give a rough idea of what we intend to cover. This may change somewhat to meet the interests of the class participants.
Week/Date
|
Topic
|
Notes
|
Week 1
|
Implementing Algorithms on Big Data
|
|
April 13
|
MapReduce, Hadoop Streaming, Mahout, Amazon (AWS, EMR)
|
mapReduce |
April 14
|
mrjob - Jimmy Retzlaff from Yelp
|
|
Week 2
|
Clustering
|
|
April 20
|
Canopy Clustering
|
|
April 21
|
K-means, EM
|
|
Week 3
|
Supervised Learning
|
|
April 27 |
Regularized Regression - glmnet algo for elasticnet |
|
April 28 |
SVM - Pegasos algo for two-class and one-class, extensions |
|
Week 4 |
Recommender systems |
|
May 4 |
Background and simple recommender system |
|
May 5 |
SVD methods, SVD on mapReduce, Lanczos algo |
|
Week 5 |
Frequent ItemSet Implementations |
|
May 11 |
tbd |
|
May 12 |
tbd |
|
Prerequisites:
-Facility with undergrad level math and stats (vector calculus, density functions, etc.)
-Comfortable programming basic python (version 2.6 or 2.7 NOT version 3).
-You'll also need to develop some familiarity with Numpy - ("random" family of functions, matrix(), array())
-Install mrjob and boto (these are both python installations)
-Familiarity with basic machine learning.
Background Material:
Here's a page with links to Python tutorial to help you learn python. python references DO NOT INSTALL Python VERSION 3 - it has incompatibilities. You can find python at www.python.org
Thank you to amazon web services for sponsoring this class.
Comments (0)
You don't have permission to comment on this page.