Presentation is loading. Please wait.

Presentation is loading. Please wait.

Last Lecture Revisited

Similar presentations


Presentation on theme: "Last Lecture Revisited"— Presentation transcript:

1 Last Lecture Revisited
What and Why to study this course? Def. of translator and compiler Structure of compiler, front-end, back-end “Pass”

2 Chapter 2. Lexical Analysis
Lexical Analyzer Syntax Analyzer Symbol Table Token Get next token Source Code 词法分析器 语法分析器 符号表 记号 取下一个记号 源程序 2

3 What lexical analysis does:
Lexical analyzer Tokens Source code

4 2.1 Token and attributes 2.1.1 Token、pattern、lexeme
Lexeme:strings in source code。 Token:a token represent a type of strings subject to some rules (patterns) 。 Pattern: determines how a token relates to lexemes letters sequences lexeme token pattern

5 An example letters token var count : integer ; count = 5 ; lexeme
sequences lexeme token pattern eg: var count : integer ; count = 5 ; lexeme

6 Token Examples Description of Patterns var var var for for for
Tokens and patterns Token Examples Description of Patterns var var var for for for relation < , < = , = , … < or <= or = or … id sum, count, D5 letter and number string starting with a letter num 3.1, 10, 2.8 E12 any numerical value literal “seg. error” the string within “ ”

7 noun Dalian, Sea names of things Token Examples Pattern description
relation < , < = , = , … < or <= or = or … id sum, count, D5 letter and number string starting with a letter Token Examples Pattern description Chinese Zemin Jiang male/female with China nationality American Clinton,Obama male/female with USA nationality

8 2.1 词法记号及属性 Some issues in the history
Should we omit the blanks or not? DO 8 I  DO8I  3. 75 DO 8 I  3, 75 Should we reserve the keywords or not? IF THEN THEN THEN=ELSE;ELSE …

9 2.1.2 Attributes of tokens position := initial + rate * 60
id,pointer to the position item in the symble table assign _ op,  id,pointer to the initial item in the symble table add_op,+  id,pointer to the rate item in the symble table  mul_ op, * num,60

10 Exercise of symbol table:
Write a piece of code to calculate the number of words, the number of distinct words. eg: I love Dalian and I love DLUT Word number:7 Distinct word number:5

11 2.1.3 Lexical Errors It’s hard to find the following error, since lexical analysis is locally insightful. fi (a == f (x) ) … An exception: 123.

12 2.1.3 Lexical Errors Recovery “emergency model” “recovery try”
delete an additional character insert a missing character replace an invalid character with an valid one swap two adjacent characters

13 2.2 Description and Identification of Tokens
2.2.1 String and Language Alphabet table:set of symbol,eg. = {0,1} String:sequence of symbols, eg.0110, Language:set of strings {,0,00,000,…}, {},  Sentence:string in a language empty string 字母 组合 语言 集合 字母表 词法单元 模式 记号

14 Operators over strings
2.2.1 String and Language Operators over strings concatenation : xy,s = s = s Product(exponent): s0=,si=si -1s(i > 0)

15 Operators over languages
plus:L∪M = {s | s L 或 s  M } concatenation :LM = {st | s  L 且 t  M} exponent:L0是{ },Li是Li -1L closure:L = L0 ∪ L1 ∪ L2 ∪… positive closure: L+ = L1 ∪ L2 ∪… e.g. 2.2(p14) L: { A, B, …, Z, a, b, …, z }, D: { 0, 1, …, 9 } L∪D, LD, L6, L*, L(L∪D )*, D+

16 2.2.2 Regular Expression (RE)
正规式是用于说明词法单元如何到对应到词法记号的模式。与非形式化的描述相比,正规式更具形式化,更加精确。 2.2.2 Regular Expression (RE) Regular expression: According to a set of rules, formal regulations show how to form languages from scratch.

17 RE language defined description
 {} a {a} a   (r) | (s) L(r)∪L(s) r and s are FEs (r)(s) L(r)L(s) r and s are FEs (r)* (L(r))* r is a FE (r) L(r) r is a FE Priorities of operators: * > concatenation > |

18 Some examples of RE  = {a, b}
a | b {a, b} (a | b) (a | b ) {aa, ab, ba, bb} aa | ab | ba | bb {aa, ab, ba, bb} a* all the strings consisting of ‘a’ (a | b)* all the strings consisting of ‘a’ and ‘b’

19 这样就保证了,每个名字对应的正规式中使用的各种符号已经在前面定义了,从而可以避免递归定义的情况。
2.2.3 Names of RE For brevity, we can name some REs。 d1  r1 d2  r2 . . . dn  rn di must be different from each other Every ri must be a FE over  {d1, d2, …, di-1 } 这样就保证了,每个名字对应的正规式中使用的各种符号已经在前面定义了,从而可以避免递归定义的情况。

20 Examples of RE IDs in Pascal letter  A | B | … | Z | a | b | … | z
digit  0 | 1 | … | 9 id  letter(letter|digit)*

21 Unsigned numbers in Pascal
Simplified notations: r+=rr* r?=r|  [a-z]=a|b|c|…|z Unsigned numbers in Pascal 1946,11.28,63.6E8,1.99E6 digit  0 | 1 | … | 9 digits  digit digit* optional_fraction  .digits| optional_exponent  (E ( + |  |  ) digits ) |  numdigits optional_fraction optional_exponent Simplified notations: num  digit+ (.digit+)? (E(+|)? digit+)?

22 relop  < | < = | = | < > | > | > =
while  while do  do relop  < | < = | = | < > | > | > = id  letter (letter | digit )* num  digit+ (.digit+)? (E (+ | )? digit+)? delim  blank | tab | newline ws  delim+ Actually, the tokens are the RE names

23 Summary 源程序字符流 顺序组合 词法单元 词法记号 模式 非形式化描述 形式化描述 正规式 字母 组合 串 语言 集合 字母表 名字
词法分析器工作原理: 源程序字符流 顺序组合 词法单元 词法记号 模式 非形式化描述 形式化描述 正规式 字母 组合 语言 集合 字母表 名字 词法分析器 记号(token)流 源代码

24

25

26


Download ppt "Last Lecture Revisited"

Similar presentations


Ads by Google