Johann Mitlöhner, 2014-2020
SQL (often pronounced sequel) is the query language supported by all major relational database management systems (DBMS). It includes a DDL (data definition language).
There is a lot of effort to standardize SQL; however, both commercial and free DBMS offer many convenient extensions. For any non-trivial application the choice of DBMS is still a long-term commitment.
PostgresQL is a free and Open Source DBMS. It is available for all major operating systems. On major Linux distributions the package postgresql is available via the package manager; the client program psql can be started on the command line (in a terminal window) to communicate with the DBMS:
psql -h postgresql.student-db.svc.cluster.local -U h12345678
As another option, the Jupyter notebook sql extension can be used.
We plan to implement a very simple shopping cart system. During discussions with key users (who have a better than average understanding of the processes) we arrive at the following ER diagram:
Customers place orders; each order contains one or more items, and each item refers to exactly one product. After some more discussion with the key users we learn the essential attributes of the enitities. In the database we define the following tables:
The ITEM table will be used for shopping cart items and order items: products are added via the web browser anonymously, using the session id. When an order is placed the customer id is requested, the ORDR field is set to a newly generated order number, and the SESS field is set to NULL. These steps can be implemented in various manners, e.g. via the PHP module of the Apache web server.
The table is the basic object in a relational database. It consists of columns with names and datatypes:
create table cust ( id varchar(80) primary key, name varchar(80), address varchar(80) );
The PRIMARY KEY
The datatype VARCHAR(80)
Table and column names:
Do not use German umlauts in table names or column names!
Note that text must be enclosed in (straight single) quotes.
insert into cust (id, name) values ('DF', 'Douglas Fairbanks'); insert into cust (id, name) values ('BK', 'Buster Keaton'); insert into cust (id, name) values ('RV', 'Rudolph Valentino'); insert into cust (id, name) values ('ML', 'Myrna Loy');
Missing values are set to NULL.
If all values are supplied in the proper order then the INSERT statement works without the list of column names before the VALUES keyword, but this depends on the current state of the table: if the table is altered (columns added or dropped) the statement without the column names will no longer work. Interactively this is not much of a problem, since the error is immediately apparent and easily fixed, but in a programmatic environment (e.g. in PHP scripts) always state the column names before the VALUES clause.
☆ If all people on the planet were our customers, could we store their data on a single hard disk?
☆ Are 100 characters on screen always 100 bytes on disk?
select * from cust;
Select all columns from table CUST.
select id, name from cust;
Specify list of columns.
select name from cust where id = 'DF';
The WHERE clause restricts the resulting rows.
update cust set name = 'newname' where id = 'someid';
Without WHERE clause all records are updated!
More than one value can be updated in a single statement:
update item set ordr = 12, sess = NULL where sess = '59ed53dd...';
delete from cust where id = 'someid';
Some records are deleted, but the table remains.
Note: without the WHERE clause all records are deleted!
drop table sometable;
Table and content are deleted.
alter table x add column b char(20);
For the small sample application it will be easier to drop a table and create it again as desired.
This is non-standard, Postgres only:
create table prod ( id varchar(80) primary key, name varchar(80), price numeric(8,2), stock integer );
Price is numeric with 2 decimal points and a total of 8 digits
insert into prod (id, name, price, stock) values ('HUB1', 'USB Hub 4x', 15.90, 5); insert into prod (id, name, price, stock) values ('SD08', 'SDHC Card 8 GB', 5.80, 10); insert into prod (id, name, price, stock) values ('HD05', 'HD IDE 500 GB', 55.90, 2); insert into prod (id, name, price, stock) values ('MON1', 'LCD Monitor 19 XL', 65.90, 2);
create table ordr ( id integer primary key, cust varchar(80) references cust(id) on delete set null, odate date );
Foreign keys ensure data integrity, but they also restrict operations on the database; they can be a blessing or a curse. Decide carefully on the options, or whether to use foreign keys at all.
insert into ordr (id, cust, odate) values (1, 'DF', '2014-01-13'); insert into ordr (id, cust, odate) values (2, 'RV', '2014-02-17'); insert into ordr (id, cust, odate) values (3, 'BK', '2014-03-12'); insert into ordr (id, cust, odate) values (4, 'DF', '2014-03-23');
create table item ( ordr integer references ordr(id) on delete set null, prod varchar(80) references prod(id) on delete set null, qty integer, sess varchar(80) );
This table also serves as a shopping cart. The application will set either the session id or the order number.
insert into item (prod, qty, ordr) values ('HUB1', 1, 1); insert into item (prod, qty, ordr) values ('HD05', 2, 1); insert into item (prod, qty, ordr) values ('MON1', 1, 2); insert into item (prod, qty, ordr) values ('HD05', 1, 2); insert into item (prod, qty, ordr) values ('HD05', 1, 3); insert into item (prod, qty, ordr) values ('MON1', 1, 4);
Combine records from several tables with a condition, usually checking foreign key values:
select name, odate from cust join ordr on cust.id = ordr.cust;
name | odate -------------------+------------ Douglas Fairbanks | 2014-01-13 Rudolph Valentino | 2014-02-17 Buster Keaton | 2014-03-12 Douglas Fairbanks | 2014-03-23 (4 rows)
Only customers with orders appear in the result, same as
select name, odate from cust inner join ordr on cust.id = ordr.cust;
I.e. INNER join is default. Use LEFT join for all records of the left table:
select name, odate from cust LEFT join ordr on cust.id = ordr.cust ;
name | odate -------------------+------------ Buster Keaton | 2014-03-12 Douglas Fairbanks | 2014-01-13 Douglas Fairbanks | 2014-03-23 Myrna Loy | Rudolph Valentino | 2014-02-17 (5 rows)
The functions count(), sum(), min(), max(), avg() work on sets of records defined by the GROUP BY clause:
select cust, count(*) from ordr group by cust; cust | count ------+------- BK | 1 DF | 2 RV | 1 (3 rows)
Count the number of orders for each customer that appears in the ORDR table. The second column is named after the function; rename the second column using the AS keyword:
select cust, count(*) as orders from ordr group by cust; cust | orders ------+------- BK | 1 DF | 2 RV | 1 (3 rows)
The WHERE clause selects rows before groups and aggregates are computed. Therefore, it cannot be used with aggregate functions:
select cust, count(*) from ordr where count(*) > 1 group by cust; ERROR: aggregates not allowed in WHERE clause
The HAVING clause achieves the desired effect:
select cust, count(*) from ordr group by cust having count(*) > 1;
ALIAS: Sometimes a statement becomes more elegant by renaming a result column using the AS clause. Integration with other software components such as PHP scripts is another reason for renaming columns. Such an alias can then be used in the GROUP BY and ORDER BY part, but not in the WHERE and HAVING clauses.
select cust, count(*) as orders from ordr group by cust order by orders desc limit 5;
Here the option DESC sorts descending, and LIMIT drops everything after the first 5 records.
Note the order of the clauses: WHERE - GROUP BY - HAVING - ORDER BY - LIMIT
The default of the JOIN operation is INNER JOIN: only rows matching in both tables are selected.
Assume we want to get a listing showing the number of orders for each customer. Here is a simple approach:
select cust.name, cust.id, count(*) from cust join ordr on ordr.cust = cust.id group by cust.name, cust.id ;
name | id | count -------------------+----+------- Rudolph Valentino | RV | 1 Douglas Fairbanks | DF | 2 Buster Keaton | BK | 1 (3 rows)
select cust.name, cust.id, count(*) from cust LEFT join ordr on ordr.cust = cust.id group by cust.name, cust.id ;
name | id | count -------------------+----+------- Rudolph Valentino | RV | 1 Douglas Fairbanks | DF | 2 Myrna Loy | ML | 1 Buster Keaton | BK | 1 (4 rows)
The LEFT JOIN returns all rows from the left table.
select cust.name, cust.id, count(ordr.id) from cust left join ordr on ordr.cust = cust.id group by cust.name, cust.id ;
name | id | count -------------------+----+------- Rudolph Valentino | RV | 1 Douglas Fairbanks | DF | 2 Myrna Loy | ML | 0 Buster Keaton | BK | 1 (4 rows)
☆ Create tables for student - enroll - course, insert some data, and produce the
☆ number of courses enrolled per student, for all students
☆ number of students per course, for all courses
☆ students not enrolling in any course
☆ courses with fewer than two students
Sequences are a reliable source of unique identifiers; every result from a nextval() query is guaranteed to be unique, including concurrent multi-user access:
create sequence seq1; select nextval('seq1'); select nextval('seq1'); ...
The numbers from the sequence can be used e.g. for order IDs. Typically an application program such as a PHP script will select from the sequence and then use that value in INSERT and UPDATE statements.
Note that nextval('seqname') is non-standard PostgresQL.
The expression current_date in SQL statements evaluates to the current date:
select current_date; date ------------ 2014-07-25 (1 row)
This is useful e.g. for generating ORDR records. Note that current_date can be used in INSERT and UPDATE statements as well:
insert into ordr (id, cust, odate) values (12, 'BK', current_date);
☆ We want to import a list of football players where each player is associated with a club in a separate table:
Create a table club with columns id and name, both varchar. id is the primary key.
Create a table player with columns id, name, and club. id is integer and the primary key. name and club are varchar. club references id in table club.
We want to import the file player.csv into table player. Download that file into your home directory, e.g.
Preview the file e.g. in the spreadsheet, or download it and view it in your favorite editor. Make sure all values in the column club occur in your club table as id. Also, make sure that the id values for the players are not used in your player table.
Run the psql command in your home directory and enter:
\copy player from 'player.csv' delimiter ',' CSV HEADER;
☠ There is also the COPY command (without the backslash) which is run by the DBMS and requires root priviledges for security reasons; the \copy command is run by the psql client and does not require additional priviledges.
☆ Exercise: import the data from the file pl.csv into a (different!) table for football players and transfer values! Then you can use aggregate functions for statistics, such as average transfer value by club.
Your database files are not stored under your home directory. Use the command pg_dump to backup your database (on the command line, not within psql):
pg_dump dbuserid > backup.sql
The file backup.sql contains the SQL commands to restore all tables. Feed it to psql to restore your database:
psql < backup.sql
There is currently no mechanism to set a size limit on a DB or user in Postgres. However, you can use this command to check the total size in bytes of a database:
Although many query problems can be solved with various types of joins some situations are more conveniently expressed with subqueries. Typically the result of a subquery is used with an operator in a Select statement:
SELECT ... FROM ... WHERE expression OPERATOR (SELECT ...)
E.g., SELECT name FROM customer WHERE salary > (SELECT ...)
For some operators the subquery must return either exactly one value, while others (such as the IN operator) can deal with a list of values.