kMeans-Canopy


References on k-means clustering:

     wikipedia: http://en.wikipedia.org/wiki/K-means_clustering

     visualizations: http://siebn.de/other/yakmeans/

                         http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html

     slides from tan, et. al. Introduction to Data Mining : chap8_basic_cluster_analysis.ppt

 

How to map-reduce k-Means

MapReducefor-k-Means.pdf

kMeanAlgo-MRChoices.ppt

 

References on canopy clustering:

     original paper on canopy clustering: canopy-kdd00.pdf

    Mahout entry on canopy clustering: https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering

 

How to get a random sample of k elements from a streaming input of unknown length:

     Slides from stanford course on big data mining:  15-streams.pdf (see slides 13 - 15)