In the relational data model a superkey is any set of attributes that uniquely identifies each tuple of a relation.[1][2] Because superkey values are unique, tuples with the same superkey value must also have the same non-key attribute values. That is, non-key attributes are functionally dependent on the superkey.
The set of all attributes is always a superkey (the trivial superkey). Tuples in a relation are by definition unique, with duplicates removed after each operation, so the set of all attributes is always uniquely valued for every tuple. A candidate key (or minimal superkey) is a superkey that can't be reduced to a simpler superkey by removing an attribute.[3]
For example, in an employee schema with attributes employeeID
, name
, job
, and departmentID
, if employeeID
values are unique then employeeID
combined with any or all of the other attributes can uniquely identify tuples in the table. Each combination, {employeeID
}, {employeeID
, name
}, {employeeID
, name
, job
}, and so on is a superkey. {employeeID
} is a candidate key, since no subset of its attributes is also a superkey. {employeeID
, name
, job
, departmentID
} is the trivial superkey.
If attribute set K is a superkey of relation R, then at all times it is the case that the projection of R over K has the same cardinality as R itself.
Monarch Name | Monarch Number | Royal House |
---|---|---|
Edward | II | Plantagenet |
Edward | III | Plantagenet |
Richard | III | Plantagenet |
Henry | IV | Lancaster |
First, list out all the sets of attributes:
Second, eliminate all the sets which do not meet superkey's requirement. For example, {Monarch Name, Royal House} cannot be a superkey because for the same attribute values (Edward, Plantagenet), there are two distinct tuples:
Finally, after elimination, the remaining sets of attributes are the only possible superkeys in this example:
In reality, superkeys cannot be determined simply by examining one set of tuples in a relation. A superkey defines a functional dependency constraint of a relation schema which must hold for all possible instance relations of that relation schema.