Similarity Metrics Nearest neighbor method depends on a similarity (or distance) metric Simplest for continuous m-dimensional instance space is EuClidean distance Simplest for mdimensional binary instance space is Hamming distance(number of feature values that differ) For text, cosine similarity of tf idf weighted vectors is typically most effective
Similarity Metrics ◼ Nearest neighbor method depends on a similarity (or distance) metric. ◼ Simplest for continuous m-dimensional instance space is Euclidean distance. ◼ Simplest for m-dimensional binary instance space is Hamming distance (number of feature values that differ). ◼ For text, cosine similarity of tf.idf weighted vectors is typically most effective
Illustration of 3 Nearest Neighbor for Text Vector Space
Illustration of 3 Nearest Neighbor for Text Vector Space
Nearest Neighbor with Inverted Index Naively finding nearest neighbors requires a linear search through D documents in collection But determining k nearest neighbors is the same as determining the top-k best retrievals using the test document as a query to a database of training documents Use standard vector space inverted index methods to find the k nearest neighbors Testing Time: O(B VD Where b is the average number of training documents in which a test-document word appears Typically B<< Dl
Nearest Neighbor with Inverted Index ◼ Naively finding nearest neighbors requires a linear search through |D| documents in collection ◼ But determining k nearest neighbors is the same as determining the top-k best retrievals using the test document as a query to a database of training documents. ◼ Use standard vector space inverted index methods to find the k nearest neighbors. ◼ Testing Time: O(B|Vt |) where B is the average number of training documents in which a test-document word appears. ◼ Typically B << |D|
kNN Discussion No training necessary No feature selection necessary Scales well with large number of classes Don't need to train n classifiers for n classes Classes can influence each other Small changes to one class can have ripple effect Scores can be hard to convert to probabilities
kNN: Discussion ◼ No training necessary ◼ No feature selection necessary ◼ Scales well with large number of classes ◼ Don’t need to train n classifiers for n classes ◼ Classes can influence each other ◼ Small changes to one class can have ripple effect ◼ Scores can be hard to convert to probabilities
Nc&IS Naive Bayes
Naïve Bayes