SQL
The Structured Query Language is the standard for data query and manipulation
in relational database systems. It implements many (but not all) concepts
of the relational model.
History
-
early 1970s: Donald Chamberlin and Raymond Boyce develop SEQUEL as query
language, later renamed to SQL (Structured Query Language)
-
based on Codd's Relational Model
-
declarative rather than procedural as in earlier navigational DB systems
(network and hierarchical)
-
System R (for relational) as first implementation, IBM Research in early
1970s
-
proof of concept that automatic query optimizer could produce decent
performance for relational system
-
DB2 in 1983 as commercial product
-
ANSI Standard 1986, ISO 1987
-
Major revision SQL92 (SQL2)
-
SQL99 (SQL3)
-
SQL 2003/2006/2008/...
Table vs. Relation
The basic concept of the SQL implementation is the table. When going from
the relation model to the SQL implementation relations become tables.
An SQL table is similar to a relation, with some important differences:
- Doubles:
- A relation contains a set of tuples, which means that by definition
there are no doubles.
- An SQL table and the result of an SQL query can contain several rows with identical values.
- Order:
- Since there is no order defined for the elements of a set, any order is valid
when listing the tuples of a relation.
- Results of SELECT statements can be ordered at will with the ORDER BY clause.
Data Types
- The relational model is not concerned with technical implementation.
- SQL DBMS must know in great detail the type of data stored in tables.
Numbers:
-
precision: number of significant digits
-
scale: position of the least significant digit to the right of the
decimal
-
e.g. 1234.56 has precision 6, scale 2, defined as NUMERIC(6,2)
SQL92 standard numeric data types:
-
INT (precision)
-
exact numeric type
-
scale 0
-
NUMERIC (precision, scale)
-
exact numeric type
-
i.e. rounding and truncating dictated by predefined formula
-
decimal portion exactly scale
-
DECIMAL (precision, scale)
-
exact numeric type
-
decimal portion at least scale
-
REAL
-
approximate numeric type
-
binary precision based on manufacturer limit
-
default precision must be less than DOUBLE PRECISION
-
DOUBLE PRECISION
-
approximate numeric type
-
binary precision based on manufacturer limit
-
default precision must be greater than REAL
-
FLOAT (precision)
-
approximate numeric type
-
DOUBLE with given precision
Postgres: DECIMAL(p, s) and NUMERIC(p, s) equivalent.
Other Types:
-
CHAR (size)
-
fixed length
-
unused space empty
-
VARCHAR (size)
-
variable length up to size
-
no unused space
-
DATE
- Various input formats, some manufacturer depdendent, e.g. '2012-04-01'
- Calendar functions such as length of time, weekday
-
BOOLEAN
A desirable Feature
Conceptually, types are distinct from their representation, e.g.
TYPE POINT
POSSREP CARTESIAN ( X RATIONAL, Y RATIONAL )
POSSREP POLAR (R RATIONAL, THETA RATIONAL )
Associated idea: valid operations, ADT, e.g. certain operations with
part weights and quantities make sense, while others don't:
-
Valid: P.WEIGHT * SP.QTY
-
Invalid: P.WEIGHT + SP.QTY
SQL92: named types in terms of primitive types, no operation restrictions
Postgres:
create domain ssn char(10);
create table emp (id ssn not null, name char(20) not null, primary key(id));
insert into emp (id, name) values ('846312041972', 'Jones');
ERROR: value too long for type character(10)