Syntax-Directed Translation 語法導引之意譯 Syntax-Directed Translation 此章介紹如何將語法依附以語意屬性值; 即 SDT. We associate information with a programming language construct by attaching attributes to the grammar symbols representing the construct. Values for attributes are computed by “semantic rules” associated with the grammar productions. There are two notations for associating semantic rules with productions, syntax-directed definitions and translation schemes. Syntax-directed definitions are high-level specifications for translations. They hide many implementation details and free the user from having to specify explicitly the order in which translation takes place. Translation schemes indicate the order in which semantic rules are to be evaluated, so they allow some implementation details to be shown. 2019/1/16
The translation of the token stream is the result obtained by evaluating the semantic rules(conceptual view of SDT) Input string Parse tree Evaluation order For semantic rules Dependency graph To translate a programming language construct, a compiler may need to keep track of many quantities besides the code generated for the construct. For example, the compiler may need to know the type of the construct, or the location of the first instruction in the target code, or the number of instructions generated. We therefore talk abstractly about attributes associated with constructs. An attribute may represent any quantity, e.g., a type, a string, a memory location, or whatever. A syntax-directed definition uses a context-free grammar to specify the syntactic structure of the input. With each grammar symbol, it associates a set of attributes, and with each production, a set of semantic rules for computing values of the attributes associated with the symbols appearing in that production. The grammar and the set of semantic rules constitute the syntax directed definition. 2019/1/16
Attribute Grammars Def: An attribute grammar is a cfg G = (S, N, T, P) with the following additions: For each grammar symbol x there is a set A(x) of attribute values Each rule has a set of functions that define certain attributes of the nonterminals in the rule Each rule has a (possibly empty) set of predicates to check for attribute consistency Attribute grammars are grammars to which have been added attributes, attribute computation functions, and predicate functions. 2019/1/16
Attribute Grammars Let X0 X1 ... Xn be a rule Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define synthesized attributes Functions of the form I(Xj) = f(A(X0), ... , A(Xj-1)), for 1 <= j <= n, define inherited attributes Initially, there are intrinsic attributes on the leaves ... 2019/1/16
Attribute Grammars Example: expressions of the form id + id BNF: id's can be either int_type or real_type types of the two id's must be the same type of the expression must match it's expected type(from top down) BNF: <expr> <var> + <var> <var> id Attributes: actual_type - synthesized for <var> and <expr> expected_type - inherited for <expr> 2019/1/16
<assign> <var> = <expr> <expr> <var>1 + <var>2 <expr> <var> <var> A | B Syntax rule: <assign> <var> = <expr> Semantic rule: <expr>.expected_type <var>.actual_type Rule 1 Syntax rule: <expr> <var>1 + <var>2 Semantic rule: <expr>.actual_type if (<var>1 .actual_type = int) and (<var>2 .actual_type = int) then int else real end if Predicate: <expr>.actual_type = <expr>.expected_type Rule 2 2019/1/16
Syntax rule: <expr> <var> Semantic rule: <expr>.actual_type <var>.actual_type Predicate: <expr>.actual_type = <expr>.expected_type Rule 3 Syntax rule: <var> A | B Semantic rule: <var>.actual_type look-up(<var>.string) Rule 4 To compute the attribute values, the following is a possible sequence: <var>.actual_type look-up(A) (Rule 4) <expr>.expected_type <var>.actual_type (Rule 1) <var>1 .actual_type look-up(A) (Rule 4) <var>2 .actual_type look-up(A) (Rule 4) <expr>.actual_type either int or real (Rule 2) <expr>.expected_type = <expr>.actual_type is either True or False (Rule 2) 2019/1/16
<assign> T or F ? ⑥ <expr> ⑤ ⑤ ② <var> <var>1 Expected_type Actual_type <expr> ⑤ ⑤ ② Actual_type Actual_type Actual_type <var> <var>1 <var>2 ① ③ ④ A = A + B Now you may take a close look at the Fig.3.8 of a “Fully attributed” parse tree. 2019/1/16
Attribute Grammars How are attribute values computed? If all attributes were inherited, the tree could be decorated in top-down order. If all attributes were synthesized, the tree could be decorated in bottom-up order. In many cases, both kinds of attributes are used, and it is some combination of top-down and bottom-up that must be used. 2019/1/16
Assignment statements Now let us take a look on forward branching problem. 此相為一語意處理相(semantic processing),將有中間碼(intermediate code)產生. 今以例示之. Assignment statements E.code: sequence of three-address code to evaluate E. E.place: place where E’s value resides at runtime. E id {E.code := null, E.place := id.place} E E(1) * E(2) {E.place := Newtemp( ), E.code := E(1) .code // E(2).code // gen( E.place “:=“ E(1) .place “*” E(2).place) } E -E(1) {E.place := Newtemp( ), E.code := E(1).code // gen( E.place “:=“ ‘uminus’ E(1). Place) } A id:=E {A.code := E.code // gen(id.place “:=“ E.place) } 其語意之處理過程以 Bubble 圖示如次頁: 2019/1/16
.place id a “ “ := id * -id$ id a “ “ id b “ “ := E b “ “ id a “ “ := Bubble Input a := b * -c $ id := id * -id$ .place .CODE id a “ “ := id * -id$ id a “ “ id b “ “ := * -id $ E b “ “ id a “ “ := id $ * - id c “ “ id a “ “ E b “ “ := $ * - E b “ “ E c “ “ id a “ “ := * - $ E t1 t1:= -c E b “ “ id a “ “ := * $ 2019/1/16
id a “ “ E t2 t2 := b*t1 := $ S a a := t2 $ t1 := -c t2 := b*t1 a := t2 semantic processing 但是有些不妙的問題卻發生了:(1) forward branching, (2)inserting GOTO instruction. 2019/1/16
forward branching <label> ::= id: { if id.place.quad is defined then error(“Duplicate label defined”); else id.place.quad = NextQuad; backpatch( id.place.fwdref, NextQuad); } <stmt> ::= goto id { if id.place.quad is defined then gen(‘goto’ id.place.quad) : append( id.place.fwdref, NextQuad) ; gen( ‘goto’ ?); } Backpatch 是一旦找到id 之location 值, 則 回填任何與此 id相關之等待goto?處 append 是將此未定義之id.place.fwdref位置linking 到list 中,以做為未來backpatch之用. 2019/1/16
inserting "goto?" S ::= if E then S1 <else> S2 fi {backpatch(E.true, S1.1stquad); backpatch(E.false, S2.1stquad); S.next := merge(<else>.quad,S1.next,S2.next)} <else> ::= else {<else>.quad := Nextquad; gen(‘goto’ ?) } S ::= while E do S1 done {backpatch(E.false, S1.next ); backpatch( E.true, S1.1stquad); S.next = E.false; S.1stquad = E.1stquad; gen( ‘goto’ E.1stquad) } 也也人將文法改成 S ::= if E then S1 M else S2 fi M ::= 2019/1/16
語法導引之轉譯技術(今) 前言 1. 在靜態文法結構上,附加一些屬性值,而使語言的建構增添一些額外資訊( 這些 屬性值由語意規則計算而來;而文法規則與語意規則是相互對應的) 2. 自文法產生語意需經過兩種步驟: (a) Syntax Directed Definitions; 它僅定義比較抽象的規格,不涉及 實作細節 順序. (b) Translation Schemes; 它表達實作細節 與順序. 此二者觀念上均為了下述工作而來: 語法基本單元 語法樹 相依圖 語意規則計算 此過程正是 Syntax Directed Translation 之概念; 為了效率考量,實際的Translation可能不必明顯構作 解析樹 或 相依圖,而在one-pass方式下執行語意規則(此即所謂的 L-Attributed definitions). 2019/1/16
Syntax Directed Definition (SDD) SDD 是 Context-Free Grammar 之擴充,它對每一個文法符號( T & N) 賦予一組屬性( 即定義 Bubble). 這組屬性又可分 Synthesized attributes 及 Inherited attributes 兩種. 每一個屬性即就是我們在 programming languages 中所說的 object之 an attribute; ; 它可以是 a name, a value, a string , a date type, a location or anything else. 在語法解析樹中,所有non-terminals之屬性值均是由此節點相對應的語意規則(semantic rules)之計算而得. 其中合成屬性值來自其decendants之屬性的合成,而繼承屬性值則來自其ancestors或siblings(同輩)屬性之傳承而得. Observations 若Non-terminals之屬性值完全被定義,則不必synthesized及inherited之處理.但是一般而言,一棵語法樹(parse tree)是需要添加(annotating)屬性才能繼續往前編譯; 而屬性之添加需要透過相依圖( dependency graph)而導出一組計算語意規則之順序,然後藉著計算這些規則而建構各節點之屬性值. 2019/1/16
SDD型式 在SDD中,每一條語法規則 A ----> 對應一組型式為 b ::= f(c1 , . . . ,ck) 之語意規則. 此處 f 是一個函數,它滿足下述兩者之一: (1) b 是 A 的合成函數值,且 ci 是文法規則右邊之文法符號之屬性, 或 (2) b 是 中某 nonterminal之inherited屬性,且 ci 是 A 或文法規則 右邊任一文法符號之屬性 . 不管哪一種情況, b 屬性與 ci屬性均相依 . 有時 SDD 會產生side effect . 若有此情況,則可將此 f 函數看成 A 定義之虛擬合成屬性值. (PS: 所謂 side effect 即程式語言中所指之事) 2019/1/16
圖5.2 A desk calculator's SDD production Semantic Rules L En print (E.val) E E1 + T E.val := E1.val + T.val E T E.val := T.val T T1 * F T.val := T1.val * F.val T F T.val := F.val F (E) F.val := E.val F digit F.val := digit.lexval 圖5.2 A desk calculator's SDD 2019/1/16
說明: (1) token中之digit 有一合成屬性 lexval , 其值是 lexical analyzer所供給( 終端符號只含合成屬性) . (2) L En 之語意規則是把文法右邊算術式E之值印出,這點可以想 像成 L 的虛擬屬性. (3) 本SDD 整個看來均屬 “合成屬性”(bottom-up 表現) . (4) 若SDD只用到合成屬性,則謂之 S屬性定義(S-Attributed Definition). 其計算順序必可由下而上方式處理之. 一棵語法解析樹若標上了S屬性 (Synthesized attribute), 則成為 annotated parse tree. 使用 LR parser 可建構 S 屬性定義. 2019/1/16
圖5.3 L n 用 3*5+4n 這個輸入字串與上上一頁之SDD,可得annotated parse tree E.val = 19 n E.val = 15 + T.val=4 T.val=15 F.val=4 T.val=3 * digit.lexcal=4 F.val=5 F.val=3 digit.lexcal=5 digit.lexcal=3 圖5.3 2019/1/16 3*5+4n 之標上屬性的語法解析樹
圖5.4 用L.in 繼承式屬性之SDD S-attributes 已看過,現在轉為研究 Inherited Attributes. 眾所周知,繼承式屬性即表示解析樹中節點之屬性值來自此節點之祖先或同輩各節點之屬性值. 何以要有此 Inherited Attributes ? Ans: 因為有些屬性來自parse tree之由上而下順序情況時之上層節點. 舉例來說: 像 data type, upper bound 或 lower bound 等皆然. 用例示說明此觀點:(例 5.3) 先定義宣告部之文法及其對應之語意規則 production Semantic Rules D TL L.in := T.type (Inherited) T int T.type := integer (Synthesized) T real T.type := real (Synthesized) L L1,id L1.in := L.in (Inherited) addtype(id.entry, L.in) (Inherited) L id addtype(id.entry, L.in) (Inherited) 圖5.4 用L.in 繼承式屬性之SDD 2019/1/16
Parse tree with inherited attributes in at each node 說明 資料型態屬性由上而下傳遞過去,因此需要繼承式SDD. 本例中用 .in 屬性表示 inherited attributes. 上例中也有兩處使用 synthesized attributes. Addtype 乃一 procedure, 作用是將 type (即 L.in) 加到 id 所在之table中. 先前已說過,SDD不管“計算時之順序”, 因此圖5.4及圖5.2可說是“比較高層次的說明”. 下圖之箭頭方向代表“繼承式”屬性之傳遞方向.而且在每一個L節點中,Semantic Rules叫用 procedure - addtype 以便將此節點後代或右邊後代之 id 的 type (實數或整數) 加入符號表之中. D T.type = real L.in = real id3 L.in = real , real Fig. 5.5 Parse tree with inherited attributes in at each node labeled L. id2 L.in = real , 2019/1/16 id1
相依圖(Dependency Graph) 既然屬性由繼承式與合成式兩者而來,因此以語法解析樹這種無向圖形為基礎,可以構作出一有向圖形(即相依圖),用來表示所有節點之繼承屬性與合成屬性之相依關系. 先看例示: 假設文法 A XY 所對應的 Semantic Rules 是 A.a := f(X.x, Y.y), 則此 A.a屬性必是合成式;即由下而上方式合成之,且依賴 X.x 與 Y.y 兩屬性. 如果在語法解析樹中用到此語法,則在相依圖中就會有三個節點: A.a, X.x, 及 Y.y, 而且分別自 X. x 及 Y.y 到 A.a 會有兩條邊線用來表示“合成”理念. 用特定例子說明: Production Semantic Rules E E1 + E2 E.val := E1.val + E2.val 則相依圖如下: val E Fig. 5.6 E.val is synthesized from E1.val and E2.val. E2 + E1 val 2019/1/16 val
同理,若文法 A XY 之Semantic Rules 中有一條語意規則是 X.i := g(A.a, Y.y), 則自 A.a 與 Y.y 分別各有一調有向邊線指向 X.i 節點. 而此 屬性由於是繼承式,故箭頭若不是水平走向,就必然是向下走向. 將此兩種走向屬性值綜合之 ,可定義圖5.5 之相依圖如下: D L T 4 type in 5 6 , id3 in 7 L 3 entry 8 real , id2 in 9 L 10 2 entry id1 1 entry 2019/1/16 Fig. 5.7 Dependency graph for parse tree of Fig. 5.5.
說明 (1) 相依圖上之節點都標示了數字,用於表示 topological sort, 亦即計算語意規則的順序. (2) 文法 D TL 對應之 Semantic Rules L.in := T.type , 因此也一條 有向邊線自4號節點 T.type 連到 5 號節點 L.in(當然屬於 inherited attribute). 同理, 觀察 L L1, id 之 L1.in := L.in , 只不過對應此 L 之 Semantic Rules 有 addtype 程序會產生一個虛擬屬性(dummy attribute) ---- 即 id.entry 之屬性由 L.in 而來. 至此,我們可詳細地思量“如之何構作出一個語法解析樹之相依圖”. 其演算法(algorithm)之大意如次頁所示: 2019/1/16
FOR 對應於 n 之文法的每一條語意規則 b:=f(c1, . . ,ck) DO FOR i := 1 TO k DO FOR 語法解析樹上每一個節點 n DO FOR 節點 n 上每一個屬性 a DO 為 a 在相依圖上構作一個節點; FOR 語法解析樹上每一個節點 n DO FOR 對應於 n 之文法的每一條語意規則 b:=f(c1, . . ,ck) DO FOR i := 1 TO k DO 從節點 ci 到 b 節點劃一條有向邊線; PS: 若節點之屬性有 a1, a2, . . . , am , 則實線有向圖有 m 種矣 ! 2019/1/16
計算語意規則之順序 對於一個有向非循環圖形(Directed Acyclic Graph), 其拓樸排序如資料結構一書所示. 若將相依圖執行 topological sorting, 則這些節點之排序結果正是計算 Semantic Rules 之順序. [再觀察 Fig. 5.7 可知此理] In summary, Semantic Processings 包括 (1) 定義 SDD 及 (2) 設計translation scheme. SDD 已如前述針對各 production rule 定義 semantic rules. 而 translation scheme 即 (a) 用文法建構一個語法解析樹, (b) 用上一頁的algorithm建構相依圖, (c ) 對相依圖執行拓樸排序, 最后 (d) 求得各語意規則之計算順序,按順序執行各semantic rules,而得轉譯碼. 2019/1/16
實例: 根據上述,經過topological sorting 之后,圖5.7 可得下述程式. [ 若用 an 表示在相依圖編號為 n 之節點的對應屬性 ] a4 := real; a5 := a4; addtype( id3.entry, a5); a7 := a5; addtype( id2.entry, a7); a9 := a7; addtype( id1.entry, a9); 執行此程式,則 real 這個型別資訊可存入符號 表中各 id 之 type 欄位內. 現在,若相依圖中有 cycle 存在,則只好回到舊版書所提之方法 ----- Rule - Based Methods. 還好第三種方法是不管語意規則,而使用“某種固定的計算順序”. 由於後兩者不必構作相依圖,因此 Time 與 Space 相對地有效率. 2019/1/16