Model based clustering a Algorithm optimizes a probabilistic model criterion Clustering is usually done by the Expectation Maximization(EM) algorithm o Gives a soft variant of the K-means algorithm Assume k clusters: (c1, C2,.Ck] Assume a probabilistic model of categories that allows computing P(C; E) for each category, ci, for a given example, E ◆ Parameters0={P(c),P(W|c):l∈{1…4
Model based clustering ◼ Algorithm optimizes a probabilistic model criterion ◼ Clustering is usually done by the Expectation Maximization (EM) algorithm ◆ Gives a soft variant of the K-means algorithm ◆ Assume k clusters: {c1 , c2 ,… ck } ◆ Assume a probabilistic model of categories that allows computing P(ci | E) for each category, ci , for a given example, E. ◆ Parameters = {P(ci ), P(wj | ci ): i{1,…k}, j{1,…,|V|}}
Expectation Maximization(EM) Algorithm a Iterative method for learning probabilistic categorization model from unsupervised data a Initially assume random assignment of examples to categories Learn an initial probabilistic model by estimating model parameters 0 from this randomly labeled data Iterate following two steps until convergence Expectation(E-step): Compute P(c E) for each example given the current model, and probabilistically re-label the examples based on these posterior probability estimates Maximization (M-step): Re-estimate the model parameters, 0 from the probabilistically re-labeled data
Expectation Maximization (EM) Algorithm ◼ Iterative method for learning probabilistic categorization model from unsupervised data. ◼ Initially assume random assignment of examples to categories. ◼ Learn an initial probabilistic model by estimating model parameters from this randomly labeled data. ◼ Iterate following two steps until convergence: ◆ Expectation (E-step): Compute P(ci | E) for each example given the current model, and probabilistically re-label the examples based on these posterior probability estimates. ◆ Maximization (M-step): Re-estimate the model parameters, , from the probabilistically re-labeled data