Terminology of Languages Alphabet:a finite set of symbols (ASCII characters) String Finite sequence of symbols on an alphabet Sentence and word are also used in terms of string -8 is the empty string -s is the length of string s. ● Language:sets of strings over some fixed alphabet -3 the empty set is a language. -(s}the set containing empty string is a language The set of well-formed C programs is a language -The set of all possible identifiers is a language. Operators on Strings: -Concatenation:xy represents the concatenation of strings x and y.s s =s &s=s -sh =sss..s(n times)s CS308 Compiler Theory 6
Terminology of Languages • Alphabet : a finite set of symbols (ASCII characters) • String : – Finite sequence of symbols on an alphabet – Sentence and ord are also sed in terms of string Sentence and word are also used in terms of string – ε is the empty string – |s| is the length of string s. • Language: sets of strings over some fixed alphabet – ∅ the empty set is a language. – { ε}h ii i i l } t he set containing empty string is a language – The set of well-formed C programs is a language – The set of all possible identifiers is a language. • Operators on Strings: – Concatenation: xy represents the concatenation of strings x and y. s ε = s ε s = s 6 – s n = s s s .. s ( n times) s 0 = ε CS308 Compiler Theory
Operations on languages ·Concatenation: L L2={SiS2I S1E L1 and S2E L2} ·Union -L1UL2={s|s∈L1ors∈L2} ·Exponentiation: -L0={ε}L1=L L2=LL ·Kleene Closure -L-UL i0 ·Positive Closure -=U2 CS308 Compiler Theory 7
Operations on Languages • Concatenation: – L L = { s s | s ∈ L and s ∈ L } 1L 2 { s1s2 | s1 ∈ L1 and s2 ∈ L2 } • Union – L L { | L L } 1 ∪ L 2 = { s| s ∈ L1 or s ∈ L2 } • Exponentiation: – L 0 = { ε} L1 = L L 2 = LL • Kleene Closure Kleene Closure – L* = U ∞ i = 0 i L • Positive Closure – L + = U ∞ i L 7 i =1 CS308 Compiler Theory
Regular Expressions (Rules) Regular expressions over alphabet Reg.Expr Language it denotes 8 {ε} a∈∑ {a} ()|(2) L()UL(2) (1)(2) L(r1)L(2) (r) (L(r)* (r) Lr) ·(r)=(r)r)* ·(r)?=()|E CS308 Compiler Theory 8
Regular Expressions (Rules) Regular expressions over alphabet Σ Reg. Expr Language it denotes ε { ε } a∈ Σ {a} (r ) | (r ) L(r ) ∪ L(r ) 1) | (r 2 ) L(r1) ∪ L(r 2 ) (r1) (r 2) L(r1) L(r 2 ) (r) * (L(r)) * (r) L(r) • (r) + = (r)(r) * • ( r )? = ( r ) | ε 8 ( ) ()| CS308 Compiler Theory
Finite Automata A recognizer for a language is a program that takes a string x,and answers "yes"if x is a sentence of that language,and "no"otherwise. We call the recognizer of the tokens as a finite automaton. A finite automaton can be:deterministic(DFA)or non-deterministic (NFA) This means that we may use a deterministic or non-deterministic automaton as a lexical analyzer. Both deterministic and non-deterministic finite automaton recognize regular sets ·Which one? deterministic-faster recognizer,but it may take more space non-deterministic-slower,but it may take less space Deterministic automatons are widely used lexical analyzers. First,we define regular expressions for tokens;Then we convert them into a DFA to get a lexical analyzer for our tokens. Algorithm1:Regular Expression>NFA>DFA (two steps:first to NFA,then to DFA) Algorithm2:Regular Expression>DFA (directly convert a regular expression into a DFA) CS308 Compiler Theory
Finite Automata • A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language and is a sentence of that language, and “no” otherwise otherwise. • We call the recognizer of the tokens as a finite automaton. • A finite automaton can be: deterministic( ) DFA or non-deterministic ( ) NFA • This means that we may use a deterministic or non-deterministic automaton as a lexical analyzer. • B hd i i i d Both deterministic and non-d i i i fi i i l deterministic finite automaton recognize regular sets. • Which one? – deterministic – faster recog, y p nizer, but it may take more space – non-deterministic – slower, but it may take less space – Deterministic automatons are widely used lexical analyzers. • First we define regular expressions for tokens; Then we convert them into a DFA to First, we define regular expressions for tokens; Then we convert them into a DFA to get a lexical analyzer for our tokens. – Algorithm1: Regular Expression Î NFA Î DFA (two steps: first to NFA, then to DFA) Al ith 2: Re l E e i Î DFA (di e tl e t e l e e i i t DFA) 9 – Algorithm2: Regular Expression Î DFA (directly convert a regular expression into a DFA) CS308 Compiler Theory
Non-Deterministic Finite Automaton (NFA) A non-deterministic finite automaton (NFA)is a mathematical model that consists of: -S-a set of states ->-a set of input symbols (alphabet) move-a transition function move to map state-symbol pairs to sets of states. So -a start(initial)state F-a set of accepting states(final states) 8-transitions are allowed in NFAs.In other words,we can move from one state to another one without consuming any symbol. A NFA accepts a string x,if and only if there is a path from the starting state to one of accepting states such that edge labels along this path spell out x. CS308 Compiler Theory 10
Non-Deterministic Finite Automaton (NFA) • A non-deterministic finite automaton (NFA) is a mathematical model that consists of: that consists of: – S - a set of states – Σ - a set of in p y (p ) ut s ymbols (al phabet ) – move – a transition function move to map state-symbol pairs to sets of states. – s0 - a start (initial) state – F – a set of accepting states (final states) a set of accepting states (final states) • ε - transitions are allowed in NFAs In other words we can move from transitions are allowed in NFAs. In other words, we can move from one state to another one without consuming any symbol. • A NFA accepts a string x if and only if there is a path from the starting A NFA accepts a string x, if and only if there is a path from the starting state to one of accepting states such that edge labels along this path spell out x. CS308 Compiler Theory 10