u **Database design consists in grouping attributes that form “good” relation schemas.**

u **Normalization: a design process to reduce redundancies and update anomalies in a relational schema.**

– Result: a set of decomposed relations that meet certain normal form tests

– Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF)

– A process that is based on keys and functional dependencies among the attributes of a relation

]]>

**Normalized databases have a design that reflects the true dependencies between tracked quantities, allowing quick updates to data with little risk of introducing inconsistencies.**

**Instead of attempting to lump all information into one table, data is spread out logically into many tables.**

**Normalizing the data is decomposing a single relation into a set of smaller relations which satisfy the constraints of the original relation.**

**Redundancy can be solved by decomposing the tables. However certain new problems are caused by decomposition.**

**Normalization helps us to make a conscious decision to avoid redundancy keeping the pros and cons in mind.**

**One can only describe a database as having a normal form if the relationships between quantities have been rigorously defined.**

**It is possible to use set theory to express this knowledge once a problem domain has been fully understood, but most database designers model the relationships in terms of an “idealized schema”. (The mathematical support came back into play in proofs regarding the process of transforming from one form to another.)**

**The transformation of conceptual model to computer representation format is known as Normalization.**

** **

** **

]]>

u **Important concept for normalization.**

u **Functional Dependency (FD)**

– **Are constraints that are derived from the meaning and interrelationships of the data attributes**

– **If A and B are attributes of relation R, B is functionally dependent on A (denoted A ****–>**** B), or A functionally determines B, if each value of A in R is associated with exactly one value of B in R.**

– **Each value of A determines a value of B**

– **A and B can be sets of attributes.**

u FDs are derived from the real-world constraints on the attributes

u Social security number determines employee name

– SSN –> ENAME

u Project number determines project name and location

– PNUMBER –> {PNAME, PLOCATION}

u Employee ssn and project number determines the hours per week that the employee works on the project

– {SSN, PNUMBER} –> HOURS

u **An FD is a property of the attributes in the schema R.**

u **The constraint must hold on every relation instance.**

u **If K is a key of R, then K functionally determines all attributes in R **

– (since we never have two distinct tuples with t1[K]=t2[K])

u **FDs are nontrivial. (StaffNo–>****StaffNo is trivial)**

**Given TEXT we know the COURSE.**

** TEXT ****–> COURSE**

u **TEXT maps to a single value of COURSE**

** **

]]>

**To create an unnormalized table:**

– **Transform data from information source (e.g. form) into table format with columns and rows.**

]]>

– **It does not allow non-atomic attributes.**

** **

**An unnormalized relation must be converted to a 1NF relation**

– **First identify a primary key, then**

– **place repeating data along with copy of the original key attribute(s) into a separate relation.**

– **Making relation to satisfy 4NF**

]]>

– **A and B are attributes of a relation, **

– **B is fully dependent on A if B is functionally dependent on A but not on any proper subset of A.**

**2NF – A relation that is in 1NF and every non-prime attribute is fully functionally dependent on the primary key.**

– **Prime attribute: An attribute that is member of the primary key K**

– **Example: EMP_PROJ(SSN, Pnum, Hours, Ename, Pname, Ploc)**

** {SSN,Pnum}–>****Hours, SSN****–>****Ename, Pnum****–>****{Pname, Ploc}**

** * Relation is not in 2NF**

]]>

**If partial dependencies exist on the primary key remove them by placing them in a new relation along with copy of their determinant.**

** EMP_PROJ(SSN, Pnum, Hours, Ename, Pname, Ploc)**

** {SSN,Pnum}–>****Hours, SSN–>****Ename, Pnum–>****{Pname, Ploc}**

** EMP_PROJ decomposed into:**

** EMPLOYEE(SSN, Pnum, Hours) , {SSN,Pnum}****–>****Hours**

** WORKS_ON(SSN, Ename) , SSN****–>****Ename**

** PROJECT(Pnum,Pname, Ploc) , Pnum–>****{Pname, Ploc}**

]]>

– **A, B and C are attributes of a relation such that if A ****–>**** B and B ****–>**** C, **

– **then C is transitively dependent on A through B. (Provided that A is not functionally dependent on B or C).**

** **

**3NF – A relation that is in 1NF and 2NF and in which no non-prime attribute is transitively dependent on the primary key.**

]]>

**EMP_DEPT(Enum, Ename, Sal, Dnum, Dname, Mgr)**

**Enum****–>****{Ename, Sal, Dnum}, Dnum****–>****{Dname, Mgr}**

**second FD is transitive through Dnum**

**Decompose EMP_DEPT into **

**EMP(Enum, Ename, Sal, Dnum) and DEPT(Dnum, Dname, Mgr)**

** **

]]>