A High-Level Depiction of IBM Watsons DeepQA Architecture Answer Evidence sources Candidate P Support Deep search answer ence evidence Question generation retrieval scoring ? models Question Query Hypothesis Soft Hypothesis and d Synthesis Final merging analysIs decomposition generation filtering evidence scoring and ranking Hypothesis Soft Hypothesis and generation filtering evidence scoring Answer and confidence Copynight@ 2014 Pearson Education, Inc Slide 5-6
Copyright © 2014 Pearson Education, Inc. Slide 5- 6 A High-Level Depiction of IBM Watson’s DeepQA Architecture Trained models Question analysis Hypothesis generation Query decomposition Soft filtering Hypothesis and evidence scoring Synthesis Final merging and ranking Answer and confidence ... ... ... Hypothesis generation Soft filtering Hypothesis and evidence scoring Answer sources Evidence sources Primary search Candidate answer generation Support evidence retrieval Deep evidence scoring Question 1 2 3 4 5
Text Mining Concepts 85-90 percent of all corporate data is in some kind of unstructured form(e.g, text) Unstructured corporate data is doubling in size every 18 months Tapping into these information sources is not an option, but a need to stay competitive Answer: text mining A semi-automated process of extracting knowledge from unstructured data sources a.k. a text data mining or knowledge discovery in textual databases Copynight@ 2014 Pearson Education, Inc Slide 5-7
Copyright © 2014 Pearson Education, Inc. Slide 5- 7 Text Mining Concepts ▪ 85-90 percent of all corporate data is in some kind of unstructured form (e.g., text) ▪ Unstructured corporate data is doubling in size every 18 months ▪ Tapping into these information sources is not an option, but a need to stay competitive ▪ Answer: text mining ▪ A semi-automated process of extracting knowledge from unstructured data sources ▪ a.k.a. text data mining or knowledge discovery in textual databases
Data Mining versus Text Mining Both seek for novel and useful patterns Both are semi-automated processes Difference is the nature of the data Structured versus unstructured data Structured data: in databases Unstructured data: Word documents. PDF files, text excerpts, XML files, and so on Text mining-first, impose structure to the data. then mine the structured data Copynight@ 2014 Pearson Education, Inc Slide 5-8
Copyright © 2014 Pearson Education, Inc. Slide 5- 8 Data Mining versus Text Mining ▪ Both seek for novel and useful patterns ▪ Both are semi-automated processes ▪ Difference is the nature of the data: ▪ Structured versus unstructured data ▪ Structured data: in databases ▪ Unstructured data: Word documents, PDF files, text excerpts, XML files, and so on ▪ Text mining – first, impose structure to the data, then mine the structured data
Text Mining Concepts Benefits of text mining are obvious, especially in text-rich data environments e.g., law(court orders), academic research(research articles), finance(quarterly reports, medicine(discharge summaries), biology(molecular interactions), technology (patent files), marketing(customer comments), etc Electronic communication records(e.g, Email) Spam filtering Email prioritization and categorization Automatic response generation Copynight@ 2014 Pearson Education, Inc Slide 5-9
Copyright © 2014 Pearson Education, Inc. Slide 5- 9 Text Mining Concepts ▪ Benefits of text mining are obvious, especially in text-rich data environments ▪ e.g., law (court orders), academic research (research articles), finance (quarterly reports), medicine (discharge summaries), biology (molecular interactions), technology (patent files), marketing (customer comments), etc. ▪ Electronic communication records (e.g., Email) ▪ Spam filtering ▪ Email prioritization and categorization ▪ Automatic response generation
Text Mining Application Area Information extraction Topic tracking Summarization Categorization Clustering Concept linking Question answering Copynight@ 2014 Pearson Education, Inc Slide 5-10
Copyright © 2014 Pearson Education, Inc. Slide 5- 10 Text Mining Application Area ▪ Information extraction ▪ Topic tracking ▪ Summarization ▪ Categorization ▪ Clustering ▪ Concept linking ▪ Question answering