Our Solution World Knowledge enabled learning Millions of entities and concepts Billions of relationships 产F Freebase DBpedia WIKIPEDIA The Free Encyclopedia NELL ProBasel N yaGo The Knowledge Grapt Grounding texts to knowledge bases
Our Solution • World Knowledge enabled learning – Millions of entities and concepts – Billions of relationships • Grounding texts to knowledge bases 11 NELL
Classification without Supervision WIKIPEDIA Label names carry a lot of information We can use world knowledge as features Classify document to English labels 179 languages with Wikipedia Distrbution of the 40 633.831 articles in diflerent language editions(as of 1 July 2016)I °July1508:3009:55 English(12.8%) redish(8. 29) Machine Learning 19: Classification Cebuano (6. 5% German (4. 8%) Dutch (4.6%) Russian(3.3%) taian(3.296) Labe Spanish(3.1%6) names Waray-Waray(3,1%) □ohe46%) labels/documents Compute document and label similarities to the same space Documents Cross-lingual World Choose labels Nxx knowledge M. Chang, L Ratinov, D. Roth, V Srikumar: Importance of Semantic Representation: Dataless x Classification. AAAl08 Y Song, D. Roth: On dataless hierarchical text classification. AAAI14 Size of shared CLESA title space Y. Song, D. Roth: Unsupervised Sparse Vector Densification for Short Text Similarity. HLT-NAACL'15
Classification without Supervision • Label names carry a lot of information – We can use world knowledge as features – Classify document to English labels – 179 languages with Wikipedia • July 15 08:30–09:55: – Machine Learning19: Classification2 12 M. Chang, L. Ratinov, D. Roth, V. Srikumar: Importance of Semantic Representation: Dataless Classification. AAAI‘08. Y. Song, D. Roth: On dataless hierarchical text classification. AAAI’14. Y. Song, D. Roth: Unsupervised Sparse Vector Densification for Short Text Similarity. HLT-NAACL’15
This Talk: Structured World Knowledge Enabled Learning and text mining Different Structured world Document similarity in ICDM'15 domains knowledge bases ent clustering in KDD'151 Document classification in aaal16 Freebase tem recommendation ongoing/ BOOR NELL RECOMN HDATA More general WIKIPEDIA ProBase and effective I Psychology machine learning/ DBpedia tweets, blogs, websites, data mining medical, psychology yaGo [Relation clustering in IJCAl 15 With help of [Similarity search in SDM'16 [Paraphrasing in ACL 13 machine learning [Data type refinement, ongoing algorithms
This Talk: Structured World Knowledge Enabled Learning and Text Mining With help of machine learning algorithms [Document similarity in ICDM’15] [Document clustering in KDD’15] [Document classification in AAAI’16] [Item recommendation, ongoing] Different domains tweets, blogs, websites, medical, psychology More general and effective machine learning/ data mining [Relation clustering in IJCAI’15] [Similarity search in SDM’16] [Paraphrasing in ACL’13] [Data type refinement, ongoing] 13 Structured world knowledge bases NELL
Outline Motivation Two Challenges Representation Labels Text Categorization via hin HIN construction from texts From hin similarity to clustering and classification World knowledge indirect supervision Conclusions and future work
Outline • Motivation – Two Challenges • Representation • Labels • Text Categorization via HIN – HIN construction from texts – From HIN similarity to clustering and classification – World knowledge indirect supervision • Conclusions and future work 14
Text Categorization via hin ohama On Feb 10, 2007,obama announced his candidacy for President of the United States in Springfield, Old Stale candidacy front of the Old State Capitol located in Springfield, Illinois 2007 portrayed United compassionate sh portrayed himself as a compassionate conservative, implying he was more suitable than other Republicans to go to lead the United States, How to convert unstructured texts to hins? What can we do with the hins?
Text Categorization via HIN • How to convert unstructured texts to HINs? • What can we do with the HINs? 15