Structured t Unstructured data Key idea: optimal parse of (unstructured offer wrt specification Semantic parse of offers tagging plausible parse Combination of tags such that each attribute has distinct value roduct line brand mode Panasonic Lumix DMC-FX07 digital camera 7.2 megapixel, 2.5",3.6X, LCD monitor resolution zoom diagonal, display type height, width 16
Structured + Unstructured Data Key idea: optimal parse of (unstructured) offer wrt specification Semantic parse of offers: tagging, plausible parse – Combination of tags such that each attribute has distinct value 16 Panasonic Lumix DMC-FX07 digital camera 7.2 megapixel, 2.5”, 3.6x, LCD monitor brand product line model resolution diagonal, height, width zoom display type
Structured t Unstructured data Key idea: optimal parse of (unstructured offer wrt specification Semantic parse of offers tagging plausible parse Combination of tags such that each attribute has distinct value oduct line brand odel Panasonic Lumix DMC-FX07 digital camera 7.2 megapixeL, 2.5",3.6X, LCD monitor resolution zoom diagonal display type height
Structured + Unstructured Data Key idea: optimal parse of (unstructured) offer wrt specification Semantic parse of offers: tagging, plausible parse – Combination of tags such that each attribute has distinct value 17 Panasonic Lumix DMC-FX07 digital camera 7.2 megapixel, 2.5”, 3.6x, LCD monitor brand product line model resolution diagonal, height, width zoom display type
Structured t Unstructured data Key idea: optimal parse of (unstructured offer wrt specification Semantic parse of offers tagging plausible parse Combination of tags such that each attribute has distinct value #t depends on ambiguities product line brand mode Panasonic Lumix DMC-FX07 digital camera 7.2 megapixeL, 2.5",3.6X, LCD monitor resolution zoom diagonal, display type height 18
Structured + Unstructured Data Key idea: optimal parse of (unstructured) offer wrt specification Semantic parse of offers: tagging, plausible parse – Combination of tags such that each attribute has distinct value – # depends on ambiguities 18 Panasonic Lumix DMC-FX07 digital camera 7.2 megapixel, 2.5”, 3.6x, LCD monitor brand product line model resolution diagonal, height, width zoom display type
Structured t Unstructured data Key idea: optimal parse of (unstructured offer wrt specification Semantic parse of offers: tagging plausible parse optimal parse Optimal parse depends on the product specification Product specification Optimal Parse brand Panasonic product line Lumix Panasonic Lumix DMC-FX07 digital camera model DMC-FXO5 7.2 megapixel, 2.5",3.6x, LCD monitor diagonal 2.5 in brand Panasonic model DMC-FXO7 Panasonic Lumix DMC-FX07 digital camera resolution 7. 2 megapixel 7.2 megapixel, 2. 5",3.6X, LCD monitor 3.6X
Structured + Unstructured Data Key idea: optimal parse of (unstructured) offer wrt specification Semantic parse of offers: tagging, plausible parse, optimal parse – Optimal parse depends on the product specification 19 Productspecification Optimal Parse brand Panasonic product line Lumix model DMC-FX05 diagonal 2.5 in brand Panasonic model DMC-FX07 resolution 7.2 megapixel zoom 3.6x Panasonic Lumix DMC-FX07 digital camera 7.2 megapixel, 2.5”, 3.6x, LCD monitor Panasonic Lumix DMC-FX07 digital camera 7.2 megapixel, 2.5”, 3.6x, LCD monitor
Structured t Unstructured data o Key idea: optimal parse of ( unstructured offer wrt specification Semantic parse of offers tagging plausible parse, optimal parse o Finding specification with largest match probability is now easy Similarity feature vector between offer and specification -1, 0, 1* Use binary logistic regression to learn weights of each feature Blocking 1: use classifier to categorize offers into product categories Blocking 2 identify candidates with 2 1 high-weighted features
Structured + Unstructured Data Key idea: optimal parse of (unstructured) offer wrt specification Semantic parse of offers: tagging, plausible parse, optimal parse Finding specification with largest match probability is now easy – Similarity feature vector between offer and specification: {-1, 0, 1}* – Use binary logistic regression to learn weights of each feature – Blocking 1: use classifier to categorize offers into product categories – Blocking 2: identify candidates with ≥ 1 high-weighted features 20