SGWrap system o SGWrap= Schema Guided Wrapper Generation SGWrap system interact generate Wrapper Program run HTML page data
6 SGWrap System ⚫ SGWrap=Schema Guided Wrapper Generation SGWrap System interact Wrapper Program generate HTML page run data
SGWrap System o SGWrap mainly consists of three parts O SGWrap Runtime(Runtime, for short), which provides service to access our algorithms for web page content extraction It acts as the underlying functional layer of whole system and if you want to reuse or integrate your wrapper you also need reuse or ntegrate the runtime itself O SGWrap Compiler(Compiler, for short), which can compile SGWrap rules into wrapper in both source code form and bytecode form It does something like translation and the generated source code is human readable and can be modify to fulfill you special need. The bytecode is just compiled with help of Javas compiler javac. exe O Visual SGWrap, a visual tool to generate rules. It just need you interact with it by simple selecting and clicking operation, then it can calculate out the proper rules
7 SGWrap System ⚫ SGWrap mainly consists of three parts. SGWrap Runtime (Runtime, for short), which provides service to access our algorithms for web page content extraction. It acts as the underlying functional layer of whole system and if you want to reuse or integrate your wrapper you also need reuse or integrate the Runtime itself. SGWrap Compiler (Compiler, for short), which can compile SGWrap rules into wrapper in both source code form and bytecode form. It does something like translation and the generated source code is human readable and can be modify to fulfill you special need. The bytecode is just compiled with help of Java's compiler javac.exe. Visual SGWrap, a visual tool to generate rules. It just need you interact with it by simple selecting and clicking operation, then it can calculate out the proper rules
SGWrap System -basic usage 口×」 x e 2 Address: D: \Robots. htm Alternatively you can view Contact, or see the Overvie Detail Platform: java Purpose: indexing Availability: source Plat form. UNIX Ahoy The Homepage Finder Purpose: maintenance Availability: none Schema Rule Open DTD Add Mapping Remove Mapping Generate Rule Save Rules 日-<> Web robots DataItem i der DataPath /HTML/BODY/TABLE/TBODY/TR[1]/TD [O]/A MetaData[None a Functi on><I none
8 SGWrap System – basic usage
SGWrap system basic usage o3 Steps O Design Rule by Using Visual SGWrap O Compile rule into Program by Using SGWrapC OTest and Apply Wrapper by Using SGWrap (Runtime) o There is a tutorial at http://idke.ruc.educn/sawrap/doc/a-10 Minutes-Tutorial. html(also in documentation of each installation)
9 SGWrap System – basic usage ⚫3 Steps Design Rule by Using Visual SGWrap Compile Rule into Program by Using SGWrapC Test and Apply Wrapper by Using SGWrap (Runtime) ⚫There is a tutorial at http://idke.ruc.edu.cn/sgwrap/doc/A-10- Minutes-Tutorial.html (also in documentation of each installation)
Welcome to http://idkeruc.educn/sgwrap OHomepage of SGTrap System-lozilla Firefor 回 文件)编辑)查看转到G)书签0)工具T)帮助0 ERSI SGWrap(schema Guided Wrapper Generation) System Homepage Introduction News Updates Download Document I Background History Publications Developer ContactAcknowledgement What is SGWrap System Schema Gui ded Wrapper Generation System(SGWrap, for short) is a toolkit for web page nformation extraction. It can semi-automatically generate programs called wrapper built from extraction rules through user interactions. A wrapper for a set of web pages is a program used to extract contents from the pages and output strutured data for further processing. A wrapper, materialized as a java program by sgWrap system, for some certain pages can be easily generated using the visual sgWrap tool of the system and can be reused or integrated in many information systems
10 Welcome to http://idke.ruc.edu.cn/sgwrap