| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Week 2 Homework Assignment

Page history last edited by mike@mbowles.com 13 years, 4 months ago

Pick out the problems that interest you to study. 

 

1.  rewrite kmeans clustering algo to incorporate true random selection of initial k centroids. 

2.  rewrite kmeans clustering to use separate reducers for each of the k cluster centers

3.  generate test data to test kMeans algo for scalability.  use the inputGen.py function to expand the input.txt data set in several dimension.  more data points, more centroids, and more attributes (inputGen.py writes lines with two real numbers per line (or sample).  increase that significantly).  generate a large enough data sets that one processor takes a significant amount of time and then run with 3, and more to determine limits of linear run-time reduction.  is that limit different when you increase the number of attribute dimensions versus the number of data points?

4.  code up the canopy clustering algo outlined in class.

5.  how would you combine canopy clustering and k-means? 

Comments (0)

You don't have permission to comment on this page.