Dept. of Information Management OCIT February, 2002 資料庫基本概念 Dept. of Information Management OCIT February, 2002
Informal design guidelines for relational schemas Informal measures of quality for relation schema design Semantics of the attributes Reducing the redundant values in tuples Reducing the null values in tuples Disallowing spurious tuples
Semantics of the relational attributes A certain meaning is associated with the attributes. The easier it is to explain the semantics of the relation, the better will be the relation schema design. Guideline 1: Do not combine attributes from multiple entity types and relationship types into a single relation.
Redundant information in tuples and update anomalies Grouping attributes into relation schemas has a significant effect on storage space. Update anomalies Insertion Deletion Modification Guideline 2: No update anomalies occur in the relation.
Both schemas suffer from update anomalies EMP_DEPT ENAME SSN BDATE ADDRESS DNUMBER DNAME DMGRSSN EMP_PROJ SSN PNUMBER HOURS ENAME PNAME PLOCATION FD1 FD2 FD3
Null values in tuples How to account for nulls when aggregate operations such as COUNT or SUM are applied Nulls can have multiple interpretations: Not applying to the tuples Unknown values Known but absent values Guideline 3: Avoid placing attributes with nulls in a base relation. If nulls are unavoidable, make sure that they apply in exceptional cases only
正規化 (Normalization) a process for assigning attributes to entities to reduce data redundancies and to help eliminate the data anomalies. Normalization works through a series of stages called normal forms: First normal form (1NF) Second normal form (2NF) Third normal form (3NF) The highest level of normalization is not always desirable.
正規化 None 1NF 1NF 2NF 3NF 4NF
First normal form (1NF) Disallow multivalued and composite attributes. Disallow relations within relations. That is, the domains of attributes must include only atomic values.
第一正規化 (1NF) 1. 必須為row-column的二維式table 2. table的每一筆資料(row)只描述一件事情 3. 每一欄位只含有單一事物的特性(欄位的唯一性) 4. 每一筆row的欄位內只允許存放單一值 5. 每個欄位名稱必須是獨一無二的 6. 沒有任何兩筆資料是相同的 7. row或欄位的先後順予是無關緊要的
第一階正規化 地址 性別 姓名 薪水 員工 員工編號 第一階正規化 員工資料表
Second normal form (2NF) X Y is a full functionally dependency if removal of any attribute A from X means that the dependency does not hold any more. R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R.
第二正規化 (2NF) 第二正規化的表格必須合下面條件: 移去部分相關性 結論:消除功能相依(Functional Dependency) 所謂功能相依是指表格和表格之間的相互關係,若某個表格中有兩個欄位A及B,當A欄位值可推導出B欄位值,稱功能相依性。 即若一關連R,其屬性Y功能相關於屬性X,記作R.X→R.Y;若且唯若R中有二個X值相同時,其Y值亦相同。
第二階正規化 第二階正規化 員工參與計畫資料表 員工編號 計畫編號 參與時數 姓名 性別 地址 計劃部門 計劃地址 參與資料表 員工編號 員工資料表 員工編號 姓名 性別 地址 計劃資料表 計畫編號 計劃部門 計劃地址
Third normal form X Y is a transitive dependency if there is a set of attributes Z that is not a subset of any key of R, and both X Z and Z Y hold. R is in 3NF if it is in 2NF and no nonprime attribute of R is transitively dependent on the primary key.
第三階正規化 部門編號 薪水 地址 性別 姓名 員工編號 部門 管理者
第三正規化 (3NF) 消除遞移相依(Transitive Dependency) 所謂遞移相依是指在一個表格中,如果某一欄位值可決定其他欄位值,但這些欄位中又存在某一欄位可以決定剩餘欄位值,稱遞移相依性。若有上述情況存在.如果在刪除資料時,可能會造成其他資料損毀。 一個FD若R.A→R.B且R.B→R.C則,R.A→R.C成立,此種相關性稱為遞移相關。
所有非主索引欄位間不應該有函數相依的關係 所有非瑣碎函數的被相依屬性必然是超鍵值屬性 正規化步驟 全部欄位 完成第一階正規化 完成第二階正規化 完成第三階正規化 完成Boyce-Codd正規化 關聯式資料庫 將欄位皆解成最小資料欄位 非主索引位完全相依於主索引欄位 所有非主索引欄位間不應該有函數相依的關係 所有非瑣碎函數的被相依屬性必然是超鍵值屬性 消除多值相依 對於Y多值相依於X,X必然是超鍵值屬性