APPENDIX B Advanced Relational Database Design In this appendix we cover advanced topics in relational database design.We first present the theory of multivalued dependencies,including a set of sound and complete inference rules for multivalued dependencies.We then present PJNF and DKNF,two normal forms based on classes of constraints that generalize multivalued dependencies. In this chapter we illustrate our concepts using a bank enterprise with the schema shown in Figure 2.15. B.1 Multivalued Dependencies As we did for functional dependencies and 3NF and BCNF,we shall need to determine all the multivalued dependencies that are logically implied by a given set of multivalued dependencies. B.1.1 Theory of Multivalued Dependencies We take the same approach here that we did earlier for functional dependencies. Let D denote a set of functional and multivalued dependencies.The closure D+of D is the set of all functional and multivalued dependencies logically implied by D.As we did for functional dependencies,we can compute D+from D,using the formal definitions of functional dependencies and multivalued dependencies. However,it is usually easier to reason about sets of dependencies by using a system of inference rules. The following list of inference rules for functional and multivalued dependen- cies is sound and complete.Recall that sound rules do not generate any dependencies that are not logically implied by D,and complete rules allow us to generate all dependencies in D+.The first three rules are Armstrong's axioms,which we saw earlier in Chapter 8. 1.Reflexivity rule.If a is a set of attributes,and B C a,then aB holds
APPENDIX B Advanced Relational Database Design In this appendix we cover advanced topics in relational database design. We first present the theory of multivalued dependencies, including a set of sound and complete inference rules for multivalued dependencies. We then present PJNF and DKNF, two normal forms based on classes of constraints that generalize multivalued dependencies. In this chapter we illustrate our concepts using a bank enterprise with the schema shown in Figure 2.15. B.1 Multivalued Dependencies As we did for functional dependencies and 3NF and BCNF, we shall need to determine all the multivalued dependencies that are logically implied by a given set of multivalued dependencies. B.1.1 Theory of Multivalued Dependencies We take the same approach here that we did earlier for functional dependencies. Let D denote a set of functional and multivalued dependencies. The closure D+ of D is the set of all functional and multivalued dependencies logically implied by D. As we did for functional dependencies, we can compute D+ from D, using the formal definitions of functional dependencies and multivalued dependencies. However, it is usually easier to reason about sets of dependencies by using a system of inference rules. The following list of inference rules forfunctional and multivalued dependencies is sound and complete. Recall thatsound rules do not generate any dependencies that are not logically implied by D, and complete rules allow us to generate all dependencies in D+. The first three rules are Armstrong’s axioms, which we saw earlier in Chapter 8. 1. Reflexivity rule. If is a set of attributes, and ⊆ , then → holds. 1
Appendix B Advanced Relational Database Design 2.Augmentation rule.If aB holds,and y is a set of attributes,then ya→yβholds. 3.Transitivity rule.Ifa→B holds,.andB→y holds,then o→y holds. 4.Complementation rule..Ifa→→B holds.,then a→yR-B-a holds.. 5.Multivalued augmentation rule..Ifa→B holds,andy≤RandδsY, then yo→8B holds. 6.Multivalued transitivity rule.fa→→B holds,andB→→y holds,then a→→y-B holds. 7.Replication rule.Ifa→B holds,.then a→→B. &.Coalescence rule.Ifa→B holds,,and y B,and there is aδsuch thatδ sR,andδnB=i,andδ→Y,then a→y holds.. The bibliographical notes provide references to proofs that the preceding rules are sound and complete.The following examples provide insight into how the formal proofs proceed. Let R =(A,B,C,G,H,I)be a relation schema.Suppose that ABC holds.The definition of multivalued dependencies implies that,if h[A]=t2[A], then there exist tuples t3 and t4 such that [A]=2[A]=t[A=t4[A] t3[BC]=t[BC] tIGHI=GHI] t[GHI=t[GHI t4[BC]=[BC] The complementation rule states that,if A>BC,then A>GHI.Observe that f3 and t4 satisfy the definition of A>GHI if we simply change the subscripts. We can provide similar justification for rules 5 and 6(see Exercise B.2)using the definition of multivalued dependencies. Rule 7,the replication rule,involves functional and multivalued dependen- cies.Suppose that A-BC holds on R.If h[A]t2[A]and h[BC]t2[BC],then h and t2 themselves serve as the tuples t3 and t4 required by the definition of the multivalued dependency A→→BC. Rule 8,the coalescence rule,is the most difficult of the eight rules to verify (see Exercise B.4). We can simplify the computation of the closure of D by using the following rules,which we can prove using rules 1 to 8(see Exercise B.5): ·Multivalued union rule.fa→B holds,anda→→y holds,then a→→βy holds ·Intersection rule.fa→B holds,,anda→y holds,then o→→Bny holds
2 Appendix B Advanced Relational Database Design 2. Augmentation rule. If → holds, and is a set of attributes, then → holds. 3. Transitivity rule. If → holds, and → holds, then → holds. 4. Complementation rule. If →→ holds, then →→ R − − holds. 5. Multivalued augmentation rule. If →→ holds, and ⊆ R and ⊆ , then →→ holds. 6. Multivalued transitivity rule. If →→ holds, and →→ holds, then →→ − holds. 7. Replication rule. If → holds, then →→ . 8. Coalescence rule. If →→ holds, and ⊆ , and there is a such that ⊆ R, and ∩ = ∅, and → , then → holds. The bibliographical notes provide references to proofs that the preceding rules are sound and complete. The following examples provide insight into how the formal proofs proceed. Let R = (A, B,C, G, H, I) be a relation schema. Suppose that A →→ BC holds. The definition of multivalued dependencies implies that, if t1[A] = t2[A], then there exist tuples t3 and t4 such that t1[A] = t2[A] = t3[A] = t4[A] t3[BC] = t1[BC] t3[GHI] = t2[GHI] t4[GHI] = t1[GHI] t4[BC] = t2[BC] The complementation rule states that, if A →→ BC , then A →→ GHI. Observe that t3 and t4 satisfy the definition of A→→GHIif we simply change the subscripts. We can provide similar justification for rules 5 and 6 (see Exercise B.2) using the definition of multivalued dependencies. Rule 7, the replication rule, involves functional and multivalued dependencies. Suppose that A → BC holds on R. If t1[A] = t2[A] and t1[BC] = t2[BC], then t1 and t2 themselves serve as the tuples t3 and t4 required by the definition of the multivalued dependency A →→ BC. Rule 8, the coalescence rule, is the most difficult of the eight rules to verify (see Exercise B.4). We can simplify the computation of the closure of D by using the following rules, which we can prove using rules 1 to 8 (see Exercise B.5): • Multivalued union rule. If →→ holds, and →→ holds, then →→ holds. • Intersection rule. If →→ holds, and →→ holds, then →→ ∩ holds.
B.1 Multivalued Dependencies 3 ·Difference rule.Ifa→→B holds,anda→y holds,then o→→B-y holds anda→yy-B holds. Let us apply our rules to the following example.Let R=(A,B,C,G.H.I) with the following set of dependencies D given: A→→B B→→H1 CG→H We list several members of D+here: ·A→CGHI:Since A→B,the complementation rule(rule4)implies that A→→R-B-AR-B-A=CGHI,s0A→→CGHI. 。A→→Hl:Since A→B and B→→Hl,the multivalued transitivity rule(rule 6)implies that A→→HI-B.Since H-B=HI,A→→Hl. .B-H:To show this fact,we need to apply the coalescence rule (rule 8). B→→HI holds.Since HC HI and CG→H and CGnHI=O,we satisfy the statement of the coalescence rule,with a being B,B being HI,being CG,and y being H.We conclude that BH. ·A→→CG:We already know that A→→CGHI and A→HL.By the difference rule,A→→CGHI-HI.Since CGHI-HI=CG,A→→CG. B.1.2 Dependency Preservation The question of dependency preservation when we have multivalued dependen- cies is not as simple as it is when we have only functional dependencies. A decomposition of schema R into schemas R1,R2,...,R is a dependency- preserving decomposition with respect to a set D of functional and multivalued dependencies if,for every set of relationsr(R1).r2(R2)....,rn(R)such that for all i,ri satisfies D:(the restriction of D to R),there exists a relationr(R)that satisfies D and for which ri nIg(r)for all i. Let us apply the 4NF decomposition algorithm of Figure 8.16 to the schema R=(A,B,C,G,H,I)with D={A→yB,B→→Hl,CG→H.We shall then test the resulting decomposition for dependency preservation.We first need to compute the closure of D.The nontrivial dependencies in closure include all the dependencies in D,and the multivalued dependency A->HI,as we saw in Section B.1.1. R is not in 4NF.Observe that AB is not trivial,yet A is not a superkey. Using AB in the first iteration of the while loop,we replace R with two schemas,(A,B)and (A,C.G.H.1).It is easy to see that (A,B)is in 4NF since all multivalued dependencies that hold on (A,B)are trivial.However,the schema (A.C,G.H.I)is not in 4NF.Applying the multivalued dependency CG-H
B.1 Multivalued Dependencies 3 • Difference rule. If →→ holds, and →→ holds, then →→ − holds and →→ − holds. Let us apply our rules to the following example. Let R = (A, B,C, G, H, I) with the following set of dependencies D given: A →→ B B →→ HI CG → H We list several members of D+ here: • A →→ CGHI: Since A →→ B, the complementation rule (rule 4) implies that A →→ R − B − A. R − B − A = CGHI, so A →→ CGHI. • A →→ HI: Since A →→ B and B →→ HI, the multivalued transitivity rule (rule 6) implies that A →→ HI − B. Since HI − B = HI, A →→ HI. • B → H: To show this fact, we need to apply the coalescence rule (rule 8). B →→ HI holds. Since H ⊆ HI and CG → H and CG ∩ HI = ∅, we satisfy the statement of the coalescence rule, with being B, being HI, being CG, and being H. We conclude that B → H. • A →→ CG: We already know that A →→CGHI and A →→ HI. By the difference rule, A →→ CGHI − HI. Since CGHI − HI = CG, A →→ CG. B.1.2 Dependency Preservation The question of dependency preservation when we have multivalued dependencies is not as simple as it is when we have only functional dependencies. A decomposition of schema R into schemas R1, R2,..., Rn is a dependencypreserving decomposition with respect to a set D of functional and multivalued dependencies if, for every set of relations r1(R1),r2(R2),...,rn(Rn) such that for all i, ri satisfies Di (the restriction of D to Ri), there exists a relation r(R) that satisfies D and for which ri = Ri(r) for all i. Let us apply the 4NF decomposition algorithm of Figure 8.16 to the schema R = (A, B,C, G, H, I) with D = {A →→ B, B →→ HI, CG → H}. We shall then test the resulting decomposition for dependency preservation. We first need to compute the closure of D. The nontrivial dependencies in closure include all the dependencies in D, and the multivalued dependency A →→ HI, as we saw in Section B.1.1. R is not in 4NF. Observe that A →→ B is not trivial, yet A is not a superkey. Using A →→ B in the first iteration of the while loop, we replace R with two schemas, (A, B) and (A,C, G, H, I). It is easy to see that (A, B) is in 4NF since all multivalued dependencies that hold on (A, B) are trivial. However, the schema (A,C, G, H, I) is not in 4NF. Applying the multivalued dependency CG→→H
Appendix B Advanced Relational Database Design r1:A B a1 b1 a2 b1 T2:CGH c181h1 c2&2h2 r3:A1 a111 a212 r4: ACG a1c181 a2C282 Figure B.1 Projection of relation r onto a 4NF decomposition of R. (which follows from the given functional dependency CG->H by the replication rule),we replace(A,C,G,H.,I)with the two schemas(C,G,H)and (A,C,G,I). Schema(C,G,H)is in 4NF,but schema(A,C,G,I)is not.To see that(A,C,G,I) is not in4NF,we note that since A→→HI is in D+,A→→I is in the restriction of D to(A,C,G,I).Thus,in a third iteration of the while loop,we replace(A,C,G,I) with two schemas (A.1)and (A.C.G).The algorithm then terminates and the resulting 4NF decomposition is {(A,B).(C.G.H),(A,I),(A,C,G)}. This 4NF decomposition is not dependency preserving,since it fails to preserve the multivalued dependency BHI.Consider Figure B.1,which shows the four relations that may result from the projection of a relation on(A,B.C,G,H,I) onto the four schemas of our decomposition.The restriction of D to (A,B)is A>B and some trivial dependencies.It is easy to see that ri satisfies AB, because there is no pair of tuples with the same A value.Observe that r2 satisfies all functional and multivalued dependencies,since no two tuples in r2 have the same value on any attribute.A similar statement can be made for r3 and r4. Therefore,the decomposed version of our database satisfies all the dependencies in the restriction of D.However,there is no relation r on (A,B,C,G,H,I)that satisfies D and decomposes into ri.r2,r3,and r4.Figure B.2 shows the relation r=rh凶r2r3r4.Relation r does not satisfy B→→Hl.Any relation s containing r and satisfying B>HI must include the tuple(a2,b1,c2,82,hi,i1). However,IcGH (s)includes a tuple (c2.82.h)that is not in r2.Thus,our decomposition fails to detect a violation of BHI. We have seen that,if we are given a set of multivalued and functional depen- dencies,it is advantageous to find a database design that meets the three criteria of
4 Appendix B Advanced Relational Database Design r1 : A B 1 b1 2 b a a a a a a 1 r2 : C G H c1 g1 h1 c2 g2 h2 r3 : A I 1 1 2 2 r4 : A C G 1 c1 g1 2 c2 g2 i i Figure B.1 Projection of relation r onto a 4NF decomposition of R. (which follows from the given functional dependency CG → H by the replication rule), we replace (A,C, G, H, I) with the two schemas (C, G, H) and (A,C, G, I). Schema (C, G, H) is in 4NF, but schema (A,C, G, I) is not. To see that (A,C, G, I) is not in 4NF, we note that since A →→ HI is in D+, A →→ I is in the restriction of D to (A,C, G, I). Thus, in a third iteration of the while loop, we replace (A,C, G, I) with two schemas (A, I) and (A,C, G). The algorithm then terminates and the resulting 4NF decomposition is {(A, B), (C, G, H), (A, I), (A,C, G)}. This 4NFdecomposition is not dependency preserving, since it fails to preserve the multivalued dependency B →→ H I. Consider Figure B.1, which shows the four relations that may result from the projection of a relation on (A, B,C, G, H, I) onto the four schemas of our decomposition. The restriction of D to (A, B) is A →→ B and some trivial dependencies. It is easy to see that r1 satisfies A →→ B, because there is no pair of tuples with the same A value. Observe that r2 satisfies all functional and multivalued dependencies, since no two tuples in r2 have the same value on any attribute. A similar statement can be made for r3 and r4. Therefore, the decomposed version of our database satisfies all the dependencies in the restriction of D. However, there is no relation r on (A, B,C, G, H, I) that satisfies D and decomposes into r1,r2,r3, and r4. Figure B.2 shows the relation r = r1 ✶ r2 ✶ r3 ✶ r4. Relation r does not satisfy B →→ HI. Any relation s containing r and satisfying B →→ HI must include the tuple (a2, b1, c2, g2, h1, i1). However, CGH (s) includes a tuple (c2, g2, h1) that is not in r2. Thus, our decomposition fails to detect a violation of B →→ HI. We have seen that, if we are given a set of multivalued and functional dependencies, it is advantageous to find a database design that meets the three criteria of
B.2 Join Dependencies 1.4NF 2.Dependency preservation 3.Lossless join If all we have are functional dependencies,then the first criterion is just BCNF. We have seen also that it is not always possible to meet all three of these criteria.We succeeded in finding such a decomposition for the bank example,but failed for the example of schema R =(A,B,C,G,H,I). When we cannot achieve our three goals,we have to compromise on 4NF or dependency preservation. B.2 Join Dependencies We have seen that the lossless-join property is one of several properties of a good database design.Indeed,this property is essential:Without it,information is lost. When we restrict the set of legal relations to those satisfying a set of functional and multivalued dependencies,we are able to use these dependencies to show that certain decompositions are lossless-join decompositions. Because of the importance of the concept of lossless join,it is useful to be able to constrain the set of legal relations over a schema R to those relations for which a given decomposition is a lossless-join decomposition.In this section,we define such a constraint,called a join dependency.Just as types of dependency led to other normal forms,join dependencies will lead to a normal form called project-join normal form(PJNF). B.2.1 Definition of Join Dependencies Let R be a relation schema and R1,R2.....R be a decomposition of R.The join dependency *(R1,R2,...,R)is used to restrict the set of legal relations to those for which Ri,R2.....Rn is a lossless-join decomposition of R.Formally,if R R1U R2 U...U Rn,we say that a relation r(R)satisfies the join dependency *(R1,R2,,R)if r=ΠR1(r)凶ΠR2(r)·凶ΠR(r) A join dependency is trivial if one of the R;is R itself. ABCG HI a1 b1 c1 81 h1 i1 a201 C2 82 h2i2 Figure B.2 A relation r(R)that does not satisfy BHI
B.2 Join Dependencies 5 1. 4NF 2. Dependency preservation 3. Lossless join If all we have are functional dependencies, then the first criterion is just BCNF. We have seen also that it is not always possible to meet all three of these criteria. We succeeded in finding such a decomposition for the bank example, but failed for the example of schema R = (A, B, C, G, H, I). When we cannot achieve our three goals, we have to compromise on 4NF or dependency preservation. B.2 Join Dependencies We have seen that the lossless-join property is one of several properties of a good database design. Indeed, this property is essential: Without it, information is lost. When we restrict the set of legal relations to those satisfying a set of functional and multivalued dependencies, we are able to use these dependencies to show that certain decompositions are lossless-join decompositions. Because of the importance of the concept of lossless join, it is useful to be able to constrain the set of legal relations over a schema R to those relations for which a given decomposition is a lossless-join decomposition. In this section, we define such a constraint, called a join dependency. Just as types of dependency led to other normal forms, join dependencies will lead to a normal form called project-join normal form (PJNF). B.2.1 Definition of Join Dependencies Let R be a relation schema and R1, R2,..., Rn be a decomposition of R. The join dependency *(R1, R2,..., Rn) is used to restrict the set of legal relations to those for which R1, R2,..., Rn is a lossless-join decomposition of R. Formally, if R = R1 ∪ R2 ∪ ... ∪ Rn, we say that a relation r(R) satisfies the join dependency *(R1, R2,..., Rn) if r = R1 (r) ✶ R2 (r) ✶ ··· ✶ Rn (r) A join dependency is trivial if one of the Ri is R itself. A B C G H I a1 b1 c1 g1 h1 i i 1 a2 b1 c2 g2 h2 2 Figure B.2 A relation r(R) that does not satisfy B →→ HI.