6-relationalmodel.htm;
updated 2/15/01
LOGICAL DATABASE DESIGN
and the RELATIONAL MODEL
Introduction.
-
This set of notes focuses on
logical database design while emphasizing the Relational
Model as an implementation DBMS software platform.
-
We focus on the attributes of
the Relational Model.
-
We also focus on the transformation
of an ER model diagram into a set of
relational
table structures.
-
Many case tools can be used
to automate this transformation; however, you need to understand the nature
of the transformation.
-
This chapter also covers Normalization
which is a theoretical process for determining if a table structure is
going to be "well-structured" and easily
maintainable.
Review of Attributes
of the Relational Data Model.
-
The relational model is credited
to E. F. Codd (1970) who was working at IBM
-
Basic Definitions:
-
Data Structure:
Data are organized into tables (often
called relations, not to be confused
with relationships) with rows and columns.
-
Data Manipulation:
The structured query language (SQL)
is used to manipulate data in the tables.
-
Data Integrity:
The DBMS has facilities to enforce
business rules to maintain data integrity,
e.g., not to delete a CUSTOMER row
when there is a corresponding outstanding row in the CUSTOMER_ORDERS
table.
-
Relational Data Structure:
-
A table
is a two-dimensional representation of data.
-
Each column
is named and represents a field..
-
A table can have any number
of rows (within physical storage limitations).
-
Each row
is a record.
-
We could delete all of the rows
and
the table would still exist.
-
The shorthand notation to express
the structure of a table is:
TABLE_NAME( PrimaryKeyField1,
Field2, Field3, ... )
example:
EMPLOYEE( EmpID,
EmpName, Department, DateHired )
Note that the "..." simply
means there could be additional fields.
-
Relational Keys:
-
Primary Key: column
(or combination of columns) that uniquely identifies a row. We underline
the primary key.
-
Composite Key:
a primary key that includes more than one column.
-
Foreign Key: a
field that links one table to another
table. A table can have an unlimited number of foreign keys linking
to other tables. We use a dashed
underline for foreign keys.
-
Properties of Tables (Relations):
-
Each table in a database has
a unique name.
-
Each entry in a table at the
intersection of a row/column can only store a single
value (principle of atomicity).
No multivalued fields are allowed in a table.
-
Each row is unique.
-
Each column has a unique
field name.
-
The sequence
of rows as well as columns is insignificant and can be interchanged without
changing the data.
Transforming Extended
Entity-Relationship (EER) Diagrams to Relational Tables.
-
Steps in Conversion:
-
1. Represent entities.
-
2. Represent relationships.
-
3. Normalize the tables.
-
4. Merge tables with the same
primary key (based on the existing database schema for the firm - the system
you are currently designing will probably add to an existing schema).
Entity Types.
-
In transforming entities into
tables, recall there are three types of entities:
-
1. Regular
Entities: have independent existence and generally represent
real-world objects such as customers, products, departments, etc.
-
2. Weak
Entities: cannot exist except with identifying relationships
with an owner (regular) entity type.
-
3. Associate
entities: formed from many-to-many relationships between
other entity types. These may be gerunds.
Transforming Entities.
-
The figure shown below shows
the CUSTOMER entity.
-
Transform this entity to a relational
table structure by creating a single table with a column for each attribute
and the primary key attribute underlined as the primary key column for
the table.
CUSTOMER(CustomerNumber,
CustomerName, AccountBalance,
CustomerPhone)
Transforming Entities
with Multivalued Attributes.
-
One problem with the CUSTOMER
table structure given above is that the CustomerPhone
column is a multivalued attribute.
-
A table with sample data might
look like the following:
| CustomerNumber |
CustomerName |
AccountBalance |
CustomerPhone |
| 14112 |
Mike McClary |
114.53 |
555-1212
555-3434 |
| 15267 |
Anita Ritter |
453.78 |
555-2424 |
| 34589 |
Steve Young |
543.30 |
555-6774
555-3443 |
-
Note that this table violates
the principle of atomicity - there is an attempt to store more than one
CustomerPhone
in a field - and this is impossible.
-
The solution is to remove the
multivalued attribute to a separate table
along with the primary key (to
link back to the original table). The attribute becomes part of a
composite key. The new solution consists of two tables with sample
data as shown here.
CUSTOMER(CustomerNumber,
CustomerName, AccountBalance)
| CustomerNumber |
CustomerName |
AccountBalance |
| 14112 |
Mike McClary |
114.53 |
| 15267 |
Anita Ritter |
453.78 |
| 34589 |
Steve Young |
543.30 |
CUSTOMER_PHONE(CustomerNumber,
CustomerPhone)
| CustomerNumber |
CustomerPhone |
| 14112 |
555-1212 |
| 14112 |
555-3434 |
| 15267 |
555-2424 |
| 34589 |
555-6774 |
| 34589 |
555-3443 |
Transforming Binary 1:N
Relationships for Regular Entities.
-
The image shown below gives
both 1:N and N:N
relationships. We will focus first on the 1:N
relationship named PlacesOrder.
-
First, you transform each entity
to a table (or set of tables if there is a multivalued attribute(s)) as
described above. Each entity becomes a single table.
-
Here we show the CUSTOMER
and CUSTOMER_PHONE tables that we created
above, assuming the attributes remain unchanged. We also show the
CUSTOMER_ORDER
table assuming the attributes shown now as columns for the table.
CUSTOMER(CustomerNumber,
CustomerName, AccountBalance)
CUSTOMER_PHONE(CustomerNumber,
CustomerPhone)
CUSTOMER_ORDER(OrderNumber,
OrderDate, OrderAmount,
DeliveryDate,
CustomerNumber)
| OrderNumber |
OrderDate |
OrderAmount |
DeliveryDate |
CustomerNumber |
| 54321 |
2/8/XX |
550.96 |
2/15/XX |
34589 |
| 76389 |
2/9/XX |
110.43 |
2/16/XX |
15267 |
| 76390 |
2/9/XX |
675.75 |
2/14/XX |
34589 |
-
The relationship is implemented
implicitly by copying the primary key from the one-side
of the relationship (CustomerNumber)
into the table on the many side (CUSTOMER_ORDER).
Note that here we underline with a dashed-underline to denote the foreign
key or linking column.
-
Question:
What if the primary key of the one-side is a composite key?
-
Answer:
Then all columns that comprise the composite key are copied to the many
side to become the foreign key.
Transforming Binary N:N Relationships.
-
Refer to the previous figure
showing the N:N relationship between
the CUSTOMER_ORDER and PRODUCT
entities named Orderline.
-
Each entity becomes a single
table. We have previously modeled the CUSTOMER_ORDER
table above. The PRODUCT table
structure is given here assuming the attributes shown as columns.
CUSTOMER_ORDER(OrderNumber,
OrderDate, OrderAmount,
DeliveryDate,
CustomerNumber)
PRODUCT(ProductNumber,
Description, DesiredPrice, QtyStocked)
| ProductNumber |
Description |
DesiredPrice |
QtyStocked |
| 15567 |
Table, Dining, Oak |
500.00 |
4 |
| 15568 |
Chair, with Arms, Oak |
125.00 |
25 |
| 15569 |
Chair, without Arms, Oak |
110.00 |
77 |
| 62899 |
Lamp, Brass |
175.00 |
10 |
ORDERLINE(OrderNumber,
ProductNumber,
QtyOrdered, NegotiatedPrice)
| OrderNumber |
ProductNumber |
QtyOrdered |
NegotiatedPrice |
| 76389 |
15567 |
1 |
450.00 |
| 76389 |
15568 |
2 |
125.00 |
| 76389 |
15569 |
4 |
100.00 |
| 76390 |
15567 |
1 |
490.00 |
| 76390 |
62899 |
2 |
175.00 |
-
Primary
keys are identified for each entity.
-
The relationship is implemented
by creating a table (ORDERLINE) to
represent the relationship symbol. This is known as a linking
table or intersection table.
-
A primary
composite key for the intersection table is composed of the
primary key from both entities that participate in the relationship.
Note that NO Foreign Keys are required as the primary key
for the intersection table serves to link to the other two tables that
participate in the relationship.
-
Relationship data are stored
in the intersection table.
Insufficient Primary
Keys for Intersection Tables.
-
This situation may occur where
the relationship between PATIENT and
TREATMENT
is N:N. A patient may have the
same treatment many different times. A treatment, e.g., chest X-ray,
appendectomy, etc., may be given to different patients.
-
The solution is to include the
DateTime
attribute as part of the key in the intersection table.
PATIENT(PatientID, PatientName,
RoomLocation)
TREATMENT(TreatmentID,
TreatmentDescription)
PATIENT_TREATMENT(PatientID,
TreatmentID,
DateTimeTreated,
Result)
Transforming Binary 1:1
Relationships.
-
These relationships do not arise
very often. As before, each entity becomes a table.
-
The selection of a Foreign
Key to link the two tables is somewhat arbitrary, and should
be made based upon an analysis of the primary type of query that will be
used to access the data -- termed access path
analysis.
-
Consider the example shown in
this ER diagram.
-
Some employees will not have
an assigned parking place, but this has no impact on the table structure.
Given below are table structures for the EMPLOYEE
and PARKING_PLACE tables that show
two alternative table structures to link the tables to represent the AssignedTo
relationship by using Foreign Keys.
EMPLOYEE(SSN, EmpName,
EmpJobTitle, DateHired, ParkingID)
PARKING_PLACE(ParkingID,
CoveredOrNot, Category)
| SSN |
EmpName |
EmpJobTitle |
DateHired |
ParkingID |
| 444-44-4444 |
Tracey Conner |
Lecturer |
7/15/XX |
45 |
| 555-55-5555 |
Doug Bock |
Professor |
8/15/XX |
NULL |
| 222-22-2222 |
Mary Sumner |
Professor |
1/4/XX |
23 |
| ParkingID |
CoveredOrNot |
Category |
| 45 |
NO |
B |
| 62 |
YES |
C |
| 23 |
YES |
A |
OR
EMPLOYEE(SSN, EmpName,
EmpJobTitle, DateHired)
PARKING_PLACE(ParkingID,
CoveredOrNot, Category, SSN)
| SSN |
EmpName |
EmpJobTitle |
DateHired |
| 444-44-4444 |
Tracey Conner |
Lecturer |
7/15/XX |
| 555-55-5555 |
Doug Bock |
Professor |
8/15/XX |
| 222-22-2222 |
Mary Sumner |
Professor |
1/4/XX |
| ParkingID |
CoveredOrNot |
Category |
SSN |
| 45 |
NO |
B |
444-44-4444 |
| 62 |
YES |
C |
NULL |
| 23 |
YES |
A |
222-22-2222 |
-
Compare the two solutions.
Which is better? Depends upon the path analysis.
Transforming 1:1 and
1:N Unary Relationships.
-
Consider the example 1:1
and 1:N Unary relationships shown here.
-
We cover the 1:1
Marriage relationship first. The PERSON
entity becomes a table, and the PersonID
field is the primary key. A foreign key field is created named MarriedToPersonID.
The foreign key and primary key share the same domain of valid values.
PERSON(PersonID, PersonName,
MarriedToPersonID)
| PersonID |
PersonName |
MarriedToPersonID |
| 4 |
John Brown |
6 |
| 3 |
Susan Smith |
1 |
| 6 |
Mary Brown |
4 |
| 1 |
Barry Smith |
3 |
-
The 1:N
Unary relationship is modeled exactly the same way as the 1:1
Unary relationship. Consider the Supervise
relationship shown in the figure above. The primary key field is
EmployeeID. The relationship
is implemented by creating a ManagerEmployeeID
field that has the same domain of valid values as the EmployeeID
field.
EMPLOYEE(EmployeeID,
EmployeeName, BirthDate,
ManagerEmployeeID)
| EmployeeID |
EmployeeName |
BirthDate |
ManagerEmployeeID |
| 42 |
Tom |
01/01/XX |
NULL |
| 67 |
Bill |
05/05/XX |
42 |
| 75 |
Alice |
07/07/XX |
67 |
| 45 |
Susan |
05/07/XX |
42 |
Transforming Unary
N:N Relationships.
-
This is an example of the Bill
of Materials N:N Unary relationship. Each Item
is produced by using other Items, e.g.,
an automobile may be produced by combining Engine, Chassis, and Body items.
-
The Entity becomes a table.
-
Relationship also becomes a
table.
-
The Intersection table has composite
key with shared domain. The PartNumber
from the ITEM table becomes part of
the primary key in the BILL_OF_MATERIALS
table. We create another column named ComponentPartNumber
(the component is used to build the Item) as part of the composite key.
We also store the quantity of the component required to build the item.
ITEM(PartNumber, ItemName,
ItemCost)
|
PartNumber
|
ItemName |
ItemCost
|
|
42
|
Cylinder
Head |
$550
|
|
55
|
Engine |
$2,500
|
|
18
|
Engine
Block |
$1,800
|
|
44
|
Exhaust
Valve |
$45
|
| ... |
... |
... |
BILL_OF_MATERIALS(PartNumber,
ComponentPartNumber,
QtyToManufacture)
|
PartNumber
|
ComponentPartNumber
|
QtyToManufacture
|
|
55
|
42
|
1
|
|
55
|
18
|
1
|
|
42
|
44
|
4
|
| .... |
.... |
.... |
Transforming Ternary
Relationships.
-
Recall the simple Shipment
relationship shown below.
-
When a relationship is ternary,
each entity becomes a single table.
-
Like the Binary
N:N, the Shipment ternary
relationship becomes a table.
-
All key fields from each entity
are included in the intersection table.
-
The primary key of intersection
table includes key fields from "many"
side of relationship, e.g., for a N:N:N
relationship, the primary key from each entity is used to form a composite
key in the intersection table..
-
If the relationship is 1:N:N,
then the key field from the entity on the "1" side of the relationship
becomes a foreign key field in the intersection table.
VENDOR(VendorNumber,
VendorName, ...)
WAREHOUSE(WarehouseNumber,
Location, ...)
ITEM(ItemNumber, Description,
Cost, ... )
SHIPMENT(VendorNumber,
WarehouseNumber,
ItemNumber,
DateTimeShipped,
QtyShipped)
|
VendorNumber
|
VendorName
|
|
10
|
ABC Company
|
|
20
|
XYZ Company
|
|
....
|
....
|
|
WarehouseId
|
Location
|
|
100
|
Chicago
|
|
200
|
St. Louis
|
|
ItemNumber
|
Description
|
Cost
|
|
1000
|
Table
|
$1500
|
|
2000
|
Chair
|
$250
|
|
3000
|
Lamp
|
$175
|
|
ItemNumber
|
WarehouseNumber
|
VendorNumber
|
DateTimeShipped
|
QtyShipped
|
|
1000
|
100
|
10
|
01/01/XX
|
1
|
|
2000
|
200
|
20
|
01/01/XX
|
3
|
|
1000
|
100
|
10
|
01/02/XX
|
1
|
|
....
|
....
|
....
|
|
....
|
-
Note the inclusion of the DateTimeShipped
column as part of the key. This is to guarantee uniqueness since
the combination of ItemNumber, WarehouseNumber,
and VendorNumber are not unique.
Transforming the Complex
Shipment Relationship.
-
Recall the complex Shipping
relationship shown here.
-
Here each entity and each gerund
become tables.
-
The 1:N
relationships are implemented with foreign keys in the SHIPPMENT
table.
-
The N:N relationship between
SHIPMENT
and ITEM is implemented as a separate
table.
-
The table structures are shown
here with example table data.
-
Note that the ShipmentNumber
primary key ensures uniqueness so the DateShipped field is not included
as part of the key.
VENDOR(VendorNumber,
VendorName, ...)
WAREHOUSE(WarehouseNumber,
Location, ...)
ITEM(ItemNumber, Description,
Cost, ... )
SHIPMENT(ShipmentNumber,
WarehouseNumber,
VendorNumber,
DateShipped)
ITEMS_ON_SHIPMENT(ShipmentNumber,
ItemNumber,
QtyShipped)
|
VendorNumber
|
VendorName
|
|
10
|
ABC Company
|
|
20
|
XYZ Company
|
|
....
|
....
|
|
WarehouseId
|
Location
|
|
100
|
Chicago
|
|
200
|
St. Louis
|
|
...
|
...
|
|
ItemNumber
|
Description
|
Cost
|
|
1000
|
Table
|
$1500
|
|
2000
|
Chair
|
$250
|
|
3000
|
Lamp
|
$175
|
|
...
|
...
|
...
|
|
ShipmentNumber
|
WarehouseNumber
|
VendorNumber
|
DateTimeShipped
|
|
0001
|
100
|
10
|
01/01/XX
|
|
0002
|
200
|
20
|
01/01/XX
|
|
0003
|
100
|
10
|
01/02/XX
|
|
....
|
....
|
....
|
|
|
ShipmentNumber
|
ItemNumber
|
QtyShipped
|
|
0001
|
1000
|
1
|
|
0002
|
2000
|
3
|
|
0003
|
1000
|
1
|
|
...
|
...
|
...
|
Transforming Associative
Relationships.
-
Consider the Associative
Relationship shown below. Recall that this type of relationship
may be modeled as a Gerund with two
1:N
relationships instead of one N:N relationship.
-
Each entity becomes a table
and the primary keys are as indicated in the ER diagram.
-
The 1:N
relationships are implemented with foreign keys in the CERTIFICATE table.
EMPLOYEE(EmployeeID,
EmpName, ...)
COURSE(CourseID, Title,
PurchasePrice, ...)
CERTIFICATE(CertNumber,
DateCompleted,
EmployeeID, CourseID)
| EmployeeID |
EmpName |
| 0001 |
Tom Jones |
| 0002 |
Barry Manilow |
| 0003 |
Sandra Bullock |
| CourseID |
Title |
PurchasePrice |
| CMIS450 |
Database Design |
$1500 |
| CMIS464 |
Applied Operating Systems |
$1750 |
| CMIS470 |
Structured Design |
$1300 |
| CertNumber |
DateCompleted |
EmployeeID |
CourseID |
| 14562 |
1/15/XX |
0001 |
CMIS450 |
| 14563 |
1/15/XX |
0002 |
CMIS470 |
| 14563 |
1/30/XX |
0001 |
CMIS470 |
Transforming Generalization
Hierarchy (Supertype-Subtype) to Relationships.
-
This is a type of relationship
where the database designer has to make a determination about which of
two ways to model the Super-Subtype. Refer to the diagram below for
an example.
-
Relational Modeling Solution
#1 (the classical solution).
-
Each entity becomes a table.
-
Each table has the same
primary key.
-
You can improve searching for
the subtype of a supertype by using a coded subtype
field in the supertype table -- shown below as the EmployeeSubType
field.
EMPLOYEE(EmployeeID,
Name, DateHired, EmployeeSubType)
HOURLY(EmployeeID,
Wage)
SALARIED(EmployeeID,
StockOption, Salary)
-
Relational Modeling Solution
#2 (single table solution).
-
Implement the diagram as a single
table.
-
Allow NULL
fields for the various subtypes. In the table structure shown below,
Wage
would be NULL for the Salaried
employees and StockOption and Salary
would be NULL for the Hourly
employees.
EMPLOYEE(EmployeeID,
Name, DateHired, Wage,
StockOption,
Salary)
NORMALIZATION.
Introduction.
-
This is a formal method to check
tables for potential data storage problems termed anomalies.
-
In order to clarify the discussion,
we will formalize the definition of several terms.
-
KEYS:
The
term KEY is often confusing because it has different meanings during design
and implementation of a system.
-
DESIGN:
During design,
KEY means a combination
of one or more attributes (columns) of a relational table that uniquely
identify rows in the table.
-
KEY
guarantees uniqueness; no two rows can be identical.
-
IMPLEMENTATION:
During implementation, the term KEY
is a column on which the DBMS builds an index or other data structure,
to allow quick access to rows. Such keys need not be unique - they
may be secondary keys enabling access to a SET of rows.
-
Sometimes the terms Logical
Key and Physical Key are
used to distinguish between these two meanings.
-
INDEXES:
Since
a physical key is usually an index, we often use the term Index
for a physical key.
-
Indexes are created to:
-
Allow quick access.
-
Facilitate sorting or sorted
order access.
-
Insure no duplicates if the
keyword UNIQUE is used when defining
an index.
Functional Dependency.
-
FUNCTIONAL
DEPENDENCY:
A Functional Dependency (FD) is a relationship
between or among attributes, i.e. given a unique value for one or more
attributes, such as the CustomerAccountNumber,
we can find a corresponding value for another attribute such as the CustomerBalance
attribute.
-
Equations represent functional
dependencies. Consider the equation:
TotalPrice = ItemPrice
x Quantity
-
Here, TotalPrice
is functionally dependent on both ItemPrice
and Quantity.
-
Unlike equations, however, FDs
cannot be worked out using arithmetic; instead, they are listed in the
database.
-
FDs
are written following standard notation. For example, if the attribute
A
determines the attribute B, we write
the notation:
A -> B (read A functionally
determines B)
-
The attribute "A"
is called a determinant, here it is
a determinant of B.
Rules of Functional Dependencies:
-
If W,
X,
Y, and Z
are attributes of a table, then:
-
X -> X
-
(reflexive rule - not terribly
useful, simply means that if we know X, then we know X).
-
If X-> Y, then XZ ->Y
-
(augmentation rule - Note Y
is not really dependent on Z, but if we know X and can determine a value
for Y from X, then knowing Z has no effect on our ability to determine
a value for Y).
-
If X -> Y and X ->Z, then X
-> YZ
-
(union rule - useful for combining
tables).
-
If X -> Y then X -> Z if Z is
a subset of Y
-
If X -> Y and Y ->Z, then X
-> Z
-
(transitivity rule - useful
for avoiding transitive dependencies).
-
If X -> Y and YZ -> W, then
XZ -> W
-
(pseudotransitivity rule - useful
in understanding multivalued dependencies.
Well-Structured Tables (Relations).
-
A well-structured table contains
a minimum amount of redundancy and allows users to insert, modify, and
delete the rows in a table without errors or inconsistencies.
-
Examine the table below which
is not well-structured. In this example, the university is tracking which
Employees
enroll in University Courses. Employees
are listed who have taken (or are taking) Courses.
|
EmpId
|
Name
|
Dept
|
Salary
|
Course
|
DateTook
|
Fee
|
|
130
|
Margaret
|
Math
|
45,000
|
Calculus
|
01/15
|
150
|
|
130
|
Margaret
|
Math
|
45,000
|
Biology
|
02/15
|
200
|
|
200
|
Susan
|
Sci
|
38,000
|
Biology
|
01/15
|
200
|
|
250
|
Chris
|
Math
|
52,000
|
Calculus
|
03/15
|
150
|
|
250
|
Chris
|
Math
|
52,000
|
Biology
|
03/15
|
200
|
|
425
|
Bill
|
Math
|
48,000
|
Algebra
|
03/15
|
200
|
|
425
|
Bill
|
Math
|
48,000
|
Calculus
|
04/15
|
150
|
Problems With This Table:
-
Redundancy of data storage.
-
Potential inconsistencies on
updating data.
What is the Primary Key of
this table?
To determine an
appropriate key, you should first examine the FDs.
EmpId -> Name, Dept, Salary
Course -> Fee
EmpId, Course -> DateTook
-
Assuming employees only take
a course once (no time dependencies), then uniqueness for the rows in the
table is ensured by a composite key of EmpId +
Course.
Data Anomalies.
-
Data
Anomalies are problems with data storage caused by poorly structured
tables. Refer to the table above.
-
Insertion
Anomaly. If the primary key is EmpId
+ Course, to add a new employee, the employee must first be
enrolled in a course. If an employee is not enrolled in a course, then
the COURSE column that is part of the
composite primary key will be null,
and null key values are not allowed.
-
Deletion
Anomaly. Deleting data for Employee #425 (Bill) causes us to
lose data about Algebra and the course fee for Algebra because Bill is
the only employee who has enrolled in Algebra.
-
Modification
Anomaly. If the fee for Calculus is increased, the data must
be updated for more than one row.
-
Note there is also a time-sensitivity
between EmpId and Course
since an employee could take a course many times, but the table does not
track this fact.
First Normal Form (1NF).
Remove Repeating Groups.
-
Consider the following table
where redundant data are eliminated from the view:
|
EmpId
|
Name
|
Dept
|
Salary
|
Course
|
DateTook
|
Fee
|
|
130
|
Margaret
|
Math
|
45,000
|
Calculus
|
01/15
|
150
|
| |
|
|
|
Biology
|
02/15
|
200
|
|
200
|
Susan
|
Sci
|
38,000
|
Biology
|
01/15
|
200
|
|
250
|
Chris
|
Math
|
52,000
|
Calculus
|
03/15
|
150
|
| |
|
|
|
Biology
|
03/15
|
200
|
|
425
|
Bill
|
Math
|
48,000
|
Algebra
|
03/15
|
200
|
| |
|
|
|
Calculus
|
04/15
|
150
|
-
In order to store information
for employees who take more than one course, a possible table structure
is:
EMPLOYEE (EmpId, Name,
Dept, Salary, Course1, DateTook1, Fee1, Course2, DateTook2, Fee2, …)
-
Obviously there is a problem
anticipating the number of times the group (Course, DateTook, and Fee will
repeat).
-
A better table structure (allowing
for the fact that there is significant data redundancy) and a composite
primary key of EmpId + Course, is:
EMPLOYEE (EmpId, Name,
Dept, Salary, Course, DateTook, Fee)
|
EmpId
|
Name
|
Dept
|
Salary
|
Course
|
DateTook
|
Fee
|
|
130
|
Margaret
|
Math
|
45,000
|
Calculus
|
01/15
|
150
|
|
130
|
Margaret
|
Math
|
45,000
|
Biology
|
02/15
|
200
|
|
200
|
Susan
|
Sci
|
38,000
|
Biology
|
01/15
|
200
|
|
250
|
Chris
|
Math
|
52,000
|
Calculus
|
03/15
|
150
|
|
250
|
Chris
|
Math
|
52,000
|
Biology
|
03/15
|
200
|
|
425
|
Bill
|
Math
|
48,000
|
Algebra
|
03/15
|
200
|
|
425
|
Bill
|
Math
|
48,000
|
Calculus
|
04/15
|
150
|
-
An obvious problem associated
with the above solution is the storage of a lot of redundant data.
We can eliminate this problem by further normalizing the table.
Second Normal Form (2NF).
Remove Partial Dependencies.
-
There are many potential problems
associated with the data redundancy. These anomalies
occur because not all attributes are fully
dependent on the primary key. Note that this type of problem
only arises when the key is a composite
key.
-
The FDs here are:
EmpId -> Name, Dept, Salary
Course -> Fee
EmpId, Course -> DateTook
-
Solution: Divide the
table into two or more tables. Do this by removing attributes dependent
on part of the key to a separate table with just that part of the key as
the primary key. This results in three tables for this particular modeling
problem:
EMPLOYEE (EmpId,
Name, Dept, Salary )
|
EmpId
|
Name
|
Dept
|
Salary
|
|
130
|
Margaret
|
Math
|
45,000
|
|
200
|
Susan
|
Sci
|
38,000
|
|
250
|
Chris
|
Math
|
52,000
|
|
425
|
Bill
|
Math
|
48,000
|
COURSE (Course, Fee
)
|
Course
|
Fee
|
|
Calculus
|
150
|
|
Biology
|
200
|
|
Algebra
|
200
|
TOOK_COURSE ( EmpId,
Course,
DateTook )
|
EmpId
|
Course
|
DateTook
|
|
130
|
Calculus
|
01/15
|
|
130
|
Biology
|
02/15
|
|
200
|
Biology
|
01/15
|
|
250
|
Calculus
|
03/15
|
|
250
|
Biology
|
03/15
|
|
425
|
Algebra
|
03/15
|
|
425
|
Calculus
|
04/15
|
-
Note that DateTook
should be included as part of the primary key if employees can take a course
more than once and the appropriate table and primary key become:
TOOK_COURSE ( EmpId,
Course,
DateTook
)
Third Normal Form (3NF).
Remove Transitive Dependencies.
-
A transitive
dependency is a functional dependency between twoor
more non-key attributes where one of the attributes is a determinant,
but not part of the primary key and usually not a candidate
key. Consider the following table.
|
CustNo
|
Name
|
Salesperson
|
Region
|
|
8023
|
Anderson
|
Smith
|
South
|
|
9167
|
Bancroft
|
Hicks
|
West
|
|
7924
|
Hobbs
|
Smith
|
South
|
|
6837
|
Tucker
|
Hernandez
|
East
|
|
8596
|
Arnold
|
Hicks
|
West
|
CustNo -> Name, Salesperson,
Region
Salesperson -> Region
-
Note that Salesperson,
a non-key attribute, determines Region,
another non-key attribute.
-
Note Salespersons
are assigned to regions and that Salesperson
is a candidate key since it is a determinant
of Region.
-
Solution: Again the Table must
be divided into two or more tables, depending upon how many different transitive
dependencies exist. Usually only one exists in a table, but more
than one is certainly possible.
SALES ( CustNo, Name,
Salesperson
)
|
CustNo
|
Name
|
Salesperson
|
|
8023
|
Anderson
|
Smith
|
|
9167
|
Bancroft
|
Hicks
|
|
7924
|
Hobbs
|
Smith
|
|
6837
|
Tucker
|
Hernandez
|
|
8596
|
Arnold
|
Hicks
|
SPERSON (Salesperson,
Region )
|
Salesperson
|
Region
|
|
Smith
|
South
|
|
Hicks
|
West
|
|
Hernandez
|
East
|
Boyce-Codd Normal Form
(BCNF). Every Determinant Is A Candidate Key. Resolve Anomalies For More
Than One Candidate Key.
-
If a table has more than
one candidate key, data storage
anomalies may result even if the table is in 3NF.
-
This situation arises when the
candidate
key is not identified as part of or the primary
key, and two of the candidate keys have overlapping
attributes. Consider the following table.
|
Player
|
Position
|
Coach
|
|
Earl
|
Fullback
|
Joe
|
|
John
|
Guard
|
Ed
|
|
Tony
|
Fullback
|
Pete
|
|
Earl
|
Tackle
|
Jim
|
|
Mack
|
Fullback
|
Joe
|
-
Each player may play more than
one position (note Earl).
-
Each position can have more
than one coach (note Fullback position).
-
For each position a player is
assigned only one coach.
-
Each coach coaches only a single
position.
-
Each coach can coach several
players who play a position.
-
Anomalies: If John
leaves the team, we lose the fact that Ed
coaches Guards. If we have Mel
as a Quarterback coach, we cannot insert
this information into the table until a player is assigned to play Quarterback
because the Player column would be
null
and that is not allowed for a column that is part of the primary key.
-
No single attribute is a candidate
key (determines the other two attributes).
-
FDs for the table are:
Player, Position -> Coach
Player, Coach -> Position
(by the augmentation rule).
Coach -> Position
-
Confirm for yourself that the
table is in 3NF.
-
Solution: Divide the
table into two tables. Place the attribute that is a determinant,
but not a primary key (Coach) in a
separate table with the attributes that it determines and make it the key.
-
Here this refers to the COACH
table since the attribute Coach is
a determinant, but not a candidate key.
POSITION ( PLAYER,
COACH
)
|
Player
|
Coach
|
|
Earl
|
Joe
|
|
John
|
Ed
|
|
Tony
|
Pete
|
|
Earl
|
Jim
|
|
Mack
|
Joe
|
COACH (COACH, POSITION
)
|
Coach
|
Position
|
|
Joe
|
FB
|
|
Ed
|
G
|
|
Pete
|
FB
|
|
Jim
|
T
|
Fourth Normal Form (4NF).
Remove Multivalued Dependencies.
-
Once BCNF
has been achieved, there are no additional anomalies that can result from
functional dependency problems. However, problems can arise from multi-valued
dependencies.
-
In order for this type of problem
to occur, there must be at least three attributes in a table with the following
associations:
A ->> B (read A is associated
with multiple values of B)
A ->> C (read A is associated
with multiple values of C)
B <-/-> C (read B and
C are independent of one another)
-
Consider the situation where
each course can have several instructors
and each course uses several textbooks,
but the instructors do not select the
textbooks
(they are independent of one another).
Course ->> Instructor
Course ->> Textbook
Instructor <-/-> Textbook
-
In the following table the primary
key is a composite of all three attributes: Course
+ Instructor + Textbook.
-
Note that this represents in
an E-R diagram model the intersection table for a ternary
relationship.
-
The 4NF problem arises when
the data should have been modeled as a set of two
binary relationships instead of a ternary
relationship.
OFFERING (Course, Instructor,
Textbook)
-- Note the composite key -- this table structure is WRONG!!!!
|
Course
|
Instructor
|
Textbook
|
|
|
Mngt
|
White
|
Drucker
|
|
|
Mngt
|
Green
|
Drucker
|
|
|
Finance
|
Gray
|
Weston
|
|
|
Finance
|
Gray
|
Gilford
|
|
|
Mngt
|
White
|
Peters
|
(new row)
|
|
Mngt
|
Green
|
Peters
|
(new row)
|
-
If a new textbook (one written
by Peters) is adopted for use in the
Mngt
course by all instructors (department mandates usage of the book), then
it is necessary to add multiple rows to this table.
-
The new rows would be those
indicated above as new and it is the problem of having to add multiple
rows that is the anomaly here.
-
Solution: Divide the
table into two new tables:
INSTRUCTOR(Course,
Instructor)
|
Course
|
Instructor
|
|
Mngt
|
White
|
|
Mngt
|
Green
|
|
Finance
|
Gray
|
TEXT(Course, Textbook)
|
Course
|
Textbook
|
|
Mngt
|
Drucker
|
|
Finance
|
Weston
|
|
Finance
|
Gilford
|
|
Mngt
|
Peters
|
-
When information about the Peters
book for the Mngt course is added,
only a single row is added to the TEXT
table.
Fifth Normal Form (5NF).
Join Dependency.
-
This problem arises if a table
is divided into two or more tables, but it is not possible to join the
tables together again to create the original view of the data without inaccurate
join results.
-
This problem arises in the case
of ternary relationships and is covered
in detail with examples in my article, "Entity Relationship Modeling and
Normalization Errors."
-
Basically, this occurs when
a relationship that must be modeled as a ternary
relationship is incorrectly modeled as a set of two or more
binary
relationships. This is essentially the opposite problem that arises with
4NF.
-
We will study this problem in
more detail by reading the article on normalization errors.
END OF NOTES