Think about it a How to represent text documents and categories a Vectors regions String Language(models) a How to build categorization functions? Closeness similarity to regions Probability to generate the string/language model
Think about it… ◼ How to represent text documents and categories? ◼ Vectors & Regions ◼ String & Language (models) ◼ How to build categorization functions? ◼ Closeness/Similarity to regions ◼ Probability to generate the string/language model
Nc&IS K-Nearest Neighbors
K-Nearest Neighbors
Classes in a Vector Space ● Government Science Arts
Classes in a Vector Space Government Science Arts
Classification Using Vector Spaces Each training doc a point(vector)labeled by its topic(= class Hypothesis: docs of the same class form a contiguous region of space We define surfaces to delineate classes in space
Classification Using Vector Spaces ◼ Each training doc a point (vector) labeled by its topic (= class) ◼ Hypothesis: docs of the same class form a contiguous region of space ◼ We define surfaces to delineate classes in space
Test document overnment Similarity hypothesis true in general? ● Government Science Arts
Test Document = Government Government Science Arts Similarity hypothesis true in general?