本节要点 正则表达式常应用于文本匹配: 串的查找 串的替换 将输入识别为一个个的记号
1 本节要点 • 正则表达式常应用于文本匹配: – 串的查找 – 串的替换 – 将输入识别为一个个的记号
正则表达式的应用 Use #1: Text-processing the web Web is full of data but it's in text form for humans to read · Screenscraping extracting the data you want from screen output these days, the output format is HTML Examples: extract tour schedule of your favorite bands from Ticketmaster web sites as web services: convert address to geo coordinates
正则表达式的应用 • Use #1: Text-processing the web – Web is full of data, but it’s in text form for humans to read • Screenscraping – extracting the data you want from screen output – these days, the output format is HTML • Examples: – extract tour schedule of your favorite bands from Ticketmaster – web sites as web services: convert address to geo coordinates 2
正则表达式的应用 Use #2: Text processing in general a spectrum of uses, from small to big Sma‖!fies: replacing " ugly quotes"with"smart quotes converting files between operating systems · Bigger tasks spell checking formatted documents(HTML): must extract text pretty printing code: find comments, etc; add format directives
正则表达式的应用 • Use #2: Text processing in general – a spectrum of uses, from small to big • Small fixes: – replacing "ugly quotes" with “smart quotes” – converting files between operating systems • Bigger tasks – spell checking formatted documents (HTML): must extract text – pretty printing code: find comments, etc; add format directives 3
正则表达式的应用 Use #3: Program processing especially on the web OntheWebprocedurecalls=httprequests procedure arguments"passed as strings argument extraction can be done with regular expressions · Other uses: extract components of an email address obfuscation: want to obfuscate all JS functions except those called from HTML embedded scripts; so scan web page for names of functions called from HTMl, to avoid obfuscating them
正则表达式的应用 • Use #3: Program processing – especially on the web • On the Web, procedure calls = http requests – “procedure arguments” passed as strings – argument extraction can be done with regular expressions • Other uses: – extract components of an email address – obfuscation: want to obfuscate all JS functions except those called from HTML embedded scripts; so scan web page for names of functions called from HTML, to avoid obfuscating them. 4
Regular Expression Tutorial Focus on the two languages: JavaScript Python a key rules common to both given a string and an regex. e Find the first position in string where a match is possible (except for the match( function in Python, which must match at the beginning of the string
Regular Expression Tutorial • Focus on the two languages: – JavaScript – Python A key rules common to both. Given a string and an regex: Find the first position in string where a match is possible. (except for the match() function in Python, which must match at the beginning of the string.) 5