Foreign key

Last updated

A foreign key is a set of attributes in a table that refers to the primary key of another table, linking these two tables. In the context of relational databases, a foreign key is subject to an inclusion dependency constraint that the tuples consisting of the foreign key attributes in one relation, R, must also exist in some other (not necessarily distinct) relation, S; furthermore that those attributes must also be a candidate key in S. [1] [2] [3]

Contents

In other words, a foreign key is a set of attributes that references a candidate key. For example, a table called TEAM may have an attribute, MEMBER_NAME, which is a foreign key referencing a candidate key, PERSON_NAME, in the PERSON table. Since MEMBER_NAME is a foreign key, any value existing as the name of a member in TEAM must also exist as a person's name in the PERSON table; in other words, every member of a TEAM is also a PERSON.

Important points to note:-

Summary

The table containing the foreign key is called the child table, and the table containing the candidate key is called the referenced or parent table. [4] In database relational modeling and implementation, a candidate key is a set of zero or more attributes, the values of which are guaranteed to be unique for each tuple (row) in a relation. The value or combination of values of candidate key attributes for any tuple cannot be duplicated for any other tuple in that relation.

Since the purpose of the foreign key is to identify a particular row of referenced table, it is generally required that the foreign key is equal to the candidate key in some row of the primary table, or else have no value (the NULL value. [2] ). This rule is called a referential integrity constraint between the two tables. [5] Because violations of these constraints can be the source of many database problems, most database management systems provide mechanisms to ensure that every non-null foreign key corresponds to a row of the referenced table. [6] [7] [8]

For example, consider a database with two tables: a CUSTOMER table that includes all customer data and an ORDER table that includes all customer orders. Suppose the business requires that each order must refer to a single customer. To reflect this in the database, a foreign key column is added to the ORDER table (e.g., CUSTOMERID), which references the primary key of CUSTOMER (e.g. ID). Because the primary key of a table must be unique, and because CUSTOMERID only contains values from that primary key field, we may assume that, when it has a value, CUSTOMERID will identify the particular customer which placed the order. However, this can no longer be assumed if the ORDER table is not kept up to date when rows of the CUSTOMER table are deleted or the ID column altered, and working with these tables may become more difficult. Many real world databases work around this problem by 'inactivating' rather than physically deleting master table foreign keys, or by complex update programs that modify all references to a foreign key when a change is needed.

Foreign keys play an essential role in database design. One important part of database design is making sure that relationships between real-world entities are reflected in the database by references, using foreign keys to refer from one table to another. [9] Another important part of database design is database normalization, in which tables are broken apart and foreign keys make it possible for them to be reconstructed. [10]

Multiple rows in the referencing (or child) table may refer to the same row in the referenced (or parent) table. In this case, the relationship between the two tables is called a one to many relationship between the referencing table and the referenced table.

In addition, the child and parent table may, in fact, be the same table, i.e. the foreign key refers back to the same table. Such a foreign key is known in SQL:2003 as a self-referencing or recursive foreign key. In database management systems, this is often accomplished by linking a first and second reference to the same table.

A table may have multiple foreign keys, and each foreign key can have a different parent table. Each foreign key is enforced independently by the database system. Therefore, cascading relationships between tables can be established using foreign keys.

A foreign key is defined as an attribute or set of attributes in a relation whose values match a primary key in another relation. The syntax to add such a constraint to an existing table is defined in SQL:2003 as shown below. Omitting the column list in the REFERENCES clause implies that the foreign key shall reference the primary key of the referenced table. Likewise, foreign keys can be defined as part of the CREATE TABLE SQL statement.

CREATETABLEchild_table(col1INTEGERPRIMARYKEY,col2CHARACTERVARYING(20),col3INTEGER,col4INTEGER,FOREIGNKEY(col3,col4)REFERENCESparent_table(col1,col2)ONDELETECASCADE)

If the foreign key is a single column only, the column can be marked as such using the following syntax:

CREATETABLEchild_table(col1INTEGERPRIMARYKEY,col2CHARACTERVARYING(20),col3INTEGER,col4INTEGERREFERENCESparent_table(col1)ONDELETECASCADE)

Foreign keys can be defined with a stored procedure statement.

sp_foreignkeychild_table,parent_table,col3,col4

Referential actions

Because the database management system enforces referential constraints, it must ensure data integrity if rows in a referenced table are to be deleted (or updated). If dependent rows in referencing tables still exist, those references have to be considered. SQL:2003 specifies 5 different referential actions that shall take place in such occurrences:

CASCADE

Whenever rows in the parent (referenced) table are deleted (or updated), the respective rows of the child (referencing) table with a matching foreign key column will be deleted (or updated) as well. This is called a cascade delete (or update).

RESTRICT

A value cannot be updated or deleted when a row exists in a referencing or child table that references the value in the referenced table.

Similarly, a row cannot be deleted as long as there is a reference to it from a referencing or child table.

To understand RESTRICT (and CASCADE) better, it may be helpful to notice the following difference, which might not be immediately clear. The referential action CASCADE modifies the "behavior" of the (child) table itself where the word CASCADE is used. For example, ON DELETE CASCADE effectively says "When the referenced row is deleted from the other table (master table), then delete also from me". However, the referential action RESTRICT modifies the "behavior" of the master table, not the child table, although the word RESTRICT appears in the child table and not in the master table! So, ON DELETE RESTRICT effectively says: "When someone tries to delete the row from the other table (master table), prevent deletion from that other table (and of course, also don't delete from me, but that's not the main point here)."

RESTRICT is not supported by Microsoft SQL 2012 and earlier.

NO ACTION

NO ACTION and RESTRICT are very much alike. The main difference between NO ACTION and RESTRICT is that with NO ACTION the referential integrity check is done after trying to alter the table. RESTRICT does the check before trying to execute the UPDATE or DELETE statement. Both referential actions act the same if the referential integrity check fails: the UPDATE or DELETE statement will result in an error.

In other words, when an UPDATE or DELETE statement is executed on the referenced table using the referential action NO ACTION, the DBMS verifies at the end of the statement execution that none of the referential relationships are violated. This is different from RESTRICT, which assumes at the outset that the operation will violate the constraint. Using NO ACTION, the triggers or the semantics of the statement itself may yield an end state in which no foreign key relationships are violated by the time the constraint is finally checked, thus allowing the statement to complete successfully.

SET NULL, SET DEFAULT

In general, the action taken by the DBMS for SET NULL or SET DEFAULT is the same for both ON DELETE or ON UPDATE: the value of the affected referencing attributes is changed to NULL for SET NULL, and to the specified default value for SET DEFAULT.

Triggers

Referential actions are generally implemented as implied triggers (i.e. triggers with system-generated names, often hidden.) As such, they are subject to the same limitations as user-defined triggers, and their order of execution relative to other triggers may need to be considered; in some cases it may become necessary to replace the referential action with its equivalent user-defined trigger to ensure proper execution order, or to work around mutating-table limitations.

Another important limitation appears with transaction isolation: your changes to a row may not be able to fully cascade because the row is referenced by data your transaction cannot "see", and therefore cannot cascade onto. An example: while your transaction is attempting to renumber a customer account, a simultaneous transaction is attempting to create a new invoice for that same customer; while a CASCADE rule may fix all the invoice rows your transaction can see to keep them consistent with the renumbered customer row, it won't reach into another transaction to fix the data there; because the database cannot guarantee consistent data when the two transactions commit, one of them will be forced to roll back (often on a first-come-first-served basis.)

CREATETABLEaccount(acct_numINT,amountDECIMAL(10,2));CREATETRIGGERins_sumBEFOREINSERTONaccountFOREACHROWSET@sum=@sum+NEW.amount;

Example

As a first example to illustrate foreign keys, suppose an accounts database has a table with invoices and each invoice is associated with a particular supplier. Supplier details (such as name and address) are kept in a separate table; each supplier is given a 'supplier number' to identify it. Each invoice record has an attribute containing the supplier number for that invoice. Then, the 'supplier number' is the primary key in the Supplier table. The foreign key in the Invoice table points to that primary key. The relational schema is the following. Primary keys are marked in bold, and foreign keys are marked in italics.

 Supplier (SupplierNumber, Name, Address)  Invoice (InvoiceNumber, Text, SupplierNumber)

The corresponding Data Definition Language statement is as follows.

CREATETABLESupplier(SupplierNumberINTEGERNOTNULL,NameVARCHAR(20)NOTNULL,AddressVARCHAR(50)NOTNULL,CONSTRAINTsupplier_pkPRIMARYKEY(SupplierNumber),CONSTRAINTnumber_valueCHECK(SupplierNumber>0))CREATETABLEInvoice(InvoiceNumberINTEGERNOTNULL,TextVARCHAR(4096),SupplierNumberINTEGERNOTNULL,CONSTRAINTinvoice_pkPRIMARYKEY(InvoiceNumber),CONSTRAINTinumber_valueCHECK(InvoiceNumber>0),CONSTRAINTsupplier_fkFOREIGNKEY(SupplierNumber)REFERENCESSupplier(SupplierNumber)ONUPDATECASCADEONDELETERESTRICT)

See also

Related Research Articles

A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.

In the relational model of databases, a primary key is a specific choice of a minimal set of attributes (columns) that uniquely specify a tuple (row) in a relation (table). Informally, a primary key is "which attributes identify a record," and in simple cases constitute a single attribute: a unique ID. More formally, a primary key is a choice of candidate key ; any other candidate key is an alternate key.

<span class="mw-page-title-main">Referential integrity</span> Where all data references are valid

Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute, then the referenced value must exist.

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP.

A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data, unlike a natural key.

<span class="mw-page-title-main">Join (SQL)</span> SQL clause

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

An SQL INSERT statement adds one or more records to any single table in a relational database.

Entity integrity is concerned with ensuring that each row of a table has a unique and non-null primary key value; this is the same as saying that each row in a table represents a single instance of the entity type modelled by the table. A requirement of E. F. Codd in his seminal paper is that a primary key of an entity, or any part of it, can never take a null value. The relational model states that every relation must have an identifier, called the primary key, in such a way that every row of the same relation be identifiable by its content, that is, by a unique and minimal value. The PK is a not empty set of attributes. The same format applies to the foreign key because each FK matches a preexistent PK. Each of attributes being part of a PK must have data values but not data marks. Morphologically, a composite primary key is in a "steady state": If it is reduced, PK will lose its property of identifying every row of its relation but if it is extended, PK will be redundant.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

In database systems, a propagation constraint "details what should happen to a related table when we update a row or rows of a target table". Tables are linked using primary key to foreign key relationships. It is possible for users to update one table in a relationship in such a way that the relationship is no longer consistent and this is known as breaking referential integrity. An example of breaking referential integrity: if a table of employees includes a department number for 'Housewares' which is a foreign key to a table of departments and a user deletes that department from the department table then Housewares employees records would refer to a non-existent department number.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

A check constraint is a type of integrity constraint in SQL which specifies a requirement that must be met by each row in a database table. The constraint must be a predicate. It can refer to a single column, or multiple columns of the table. The result of the predicate can be either TRUE, FALSE, or UNKNOWN, depending on the presence of NULLs. If the predicate evaluates to UNKNOWN, then the constraint is not violated and the row can be inserted or updated in the table. This is contrary to predicates in WHERE clauses in SELECT or UPDATE statements.

The Suppliers and Parts database is an example relational database that is referred to extensively in the literature and described in detail in C. J. Date's An Introduction to Database Systems, 8th ed. It is a simple database comprising three tables: Supplier, Part and Shipment, and is often used as a minimal exemplar of the interrelationships found in a database.

  1. The Supplier relation holds information about suppliers. The SID attribute identifies the supplier, while the other attributes each hold one piece of information about the supplier.
  2. The Part relation holds information about parts. Likewise, the PID attribute identifies the part, while the other attributes hold information about the part.
  3. The Shipment relation holds information about shipments. The SID and PID attributes identify the supplier of the shipment and the part shipped, respectively. The remaining attribute indicates how many parts where shipped.

A slowly changing dimension (SCD) in data management and data warehousing is a dimension which contains relatively static data which can change slowly but unpredictably, rather than according to a regular schedule. Some examples of typical slowly changing dimensions are entities such as names of geographical locations, customers, or products.

In relational database management systems, a unique key is a candidate key. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the primary key of the relation. The remaining candidate keys are called unique keys because they can uniquely identify a record in a relation. Unique keys can consist of multiple columns. Unique keys are also called alternate keys. Unique keys are an alternative to the primary key of the relation. In SQL, the unique keys have a UNIQUE constraint assigned to them in order to prevent duplicates. Alternate keys may be used like the primary key when doing a single-table select or when filtering in a where clause, but are not typically used to join multiple tables.

A database refactoring is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. Database refactoring does not change the way data is interpreted or used and does not fix bugs or add new functionality. Every refactoring to a database leaves the system in a working state, thus not causing maintenance lags, provided the meaningful data exists in the production environment.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. Coronel, Carlos (2010). Database Systems: Design, Implementation, and Management. Independence KY: South-Western/Cengage Learning. p. 65. ISBN   978-0-538-74884-1.
  2. 1 2 Elmasri, Ramez (2011). Fundamentals of Database Systems . Addison-Wesley. pp.  73–74. ISBN   978-0-13-608620-8.
  3. Date, C. J. (1996). A guide to the SQL standard. Addison-Wesley. p. 206. ISBN   978-0201964264.
  4. Sheldon, Robert (2005). Beginning MySQL. John Wiley & Sons. pp. 119–122. ISBN   0-7645-7950-9.
  5. "Database Basics — Foreign Keys" . Retrieved 2010-03-13.
  6. MySQL AB (2006). MySQL Administrator's Guide and Language Reference. Sams Publishing. p. 40. ISBN   0-672-32870-4.
  7. Powell, Gavin (2004). Oracle SQL: Jumpstart with Examples . Elsevier. p.  11. ASIN   B008IU3AHY.
  8. Mullins, Craig (2012). DB2 developer's guide. IBM Press. ASIN   B007Y6K9TK.
  9. Sheldon, Robert (2005). Beginning MySQL. John Wiley & Sons. p. 156. ISBN   0-7645-7950-9.
  10. Garcia-Molina, Hector (2009). Database Systems: The Complete Book . Prentice Hall. pp.  93–95. ISBN   978-0-13-187325-4.