Johann Mitlöhner, 2014
SQL (often pronounced sequel) is the query language supported by all major relational database management systems (DBMS). It includes a DDL (data definition language).
There is a lot of effort to standardize SQL; however, both commercial and free DBMS offer many convenient extensions. For any non-trivial application the choice of DBMS is still a long-term commitment.
PostgresQL is a free and Open Source DBMS. It is available for all major operating systems. On major Linux distributions the package postgresql is available via the package manager; the client program can be started on the command line (in a terminal window) be entering
psql
To save typing the passwort every time edit the file .pgpass in your home directory and insert this line:
*:*:*:yourdbuserid:yourdbpasswd
On the command line issue the following command (for moderate security):
chmod go-r .pgpass
We plan to implement a very simple shopping cart system. During discussions with key users (who have a better than average understanding of the processes) we arrive at the following ER-Diagram:
Customers place orders; each order contains one or more items, and each item refers to exactly one product. After some more discussion with the key users we learn the essential attributes of the enitities. In the database we define the following tables:
The ITEM table will be used for shopping cart items and order items: products are added via the web browser anonymously, using the session id. When an order is placed the customer id is requested, the ORDR field is set to a newly generated order number, and the SESS field is set to NULL. These steps can be implemented in various manners, e.g. via the PHP module of the Apache web server.
The table is the basic object in a relational database. It consists of columns with names and datatypes:
create table cust ( id varchar(80) primary key, name varchar(80), address varchar(80), email varchar(80), pw varchar(80) );
The PRIMARY KEY
The datatype VARCHAR(80)
Recommended characters in table names and column names are a-z, A-Z, 0-9, and _. Any other characters should not be used, even if they seem to work within Postgres; otherwise, integration with components such as PHP and R will yield some nasty surprises.
Do not use German umlauts in table names or column names!
Note that text must be enclosed in (straight single) quotes.
insert into cust (id, name) values ('DF', 'Douglas Fairbanks'); insert into cust (id, name) values ('BK', 'Buster Keaton'); insert into cust (id, name) values ('RV', 'Rudolph Valentino'); insert into cust (id, name) values ('ML', 'Myrna Loy');
Missing values are set to NULL.
If all values are supplied in the proper order then the INSERT statement works without the list of column names before the VALUES keyword, but this depends on the current state of the table: if the table is altered (columns added or dropped) the statement without the column names will no longer work. Interactively this is not much of a problem, since the error is immediately apparent and easily fixed, but in a programmatic environment (e.g. in PHP scripts) always state the column names before the VALUES clause.
select * from cust;
Select all columns from table CUST.
select id, name from cust;
Specify list of columns.
select name from cust where id = 'DF';
The WHERE clause restricts the resulting rows.
update cust set name = 'newname' where id = 'someid';
Without WHERE clause all records are updated!
More than one value can be updated in a single statement:
update item set ordr = 12, sess = NULL where sess = '59ed53dd...';
delete from cust where id = 'someid';
Some records are deleted, but the table remains.
Note: without the WHERE clause all records are deleted!
drop table sometable;
Table and content are deleted.
alter table x add column b char(20);
For the small sample application it will be easier to drop a table and create it again as desired.
This is non-standard, Postgres only:
create table prod ( id varchar(80) primary key, name varchar(80), price numeric(8,2), stock integer );
Price is numeric with 2 decimal points and a total of 8 digits
insert into prod (id, name, price, stock) values ('HUB1', 'USB Hub 4x', 15.90, 5); insert into prod (id, name, price, stock) values ('SD08', 'SDHC Card 8 GB', 5.80, 10); insert into prod (id, name, price, stock) values ('HD05', 'HD IDE 500 GB', 55.90, 2); insert into prod (id, name, price, stock) values ('MON1', 'LCD Monitor 19 XL', 65.90, 2);
create table ordr ( id integer primary key, cust varchar(80) references cust(id) on delete set null, odate date );
Foreign keys ensure data integrity, but they also restrict operations on the database; they can be a blessing or a curse. Decide carefully on the options, or whether to use foreign keys at all.
insert into ordr (id, cust, odate) values (1, 'DF', '2014-01-13'); insert into ordr (id, cust, odate) values (2, 'RV', '2014-02-17'); insert into ordr (id, cust, odate) values (3, 'BK', '2014-03-12'); insert into ordr (id, cust, odate) values (4, 'DF', '2014-03-23');
create table item ( ordr integer references ordr(id) on delete set null, prod varchar(80) references prod(id) on delete set null, qty integer, sess varchar(80) );
This table also serves as a shopping cart. The application will set either the session id or the order number.
insert into item (prod, qty, ordr) values ('HUB1', 1, 1); insert into item (prod, qty, ordr) values ('HD05', 2, 1); insert into item (prod, qty, ordr) values ('MON1', 1, 2); insert into item (prod, qty, ordr) values ('HD05', 1, 2); insert into item (prod, qty, ordr) values ('HD05', 1, 3); insert into item (prod, qty, ordr) values ('MON1', 1, 4);
Combine records from several tables with a condition, usually checking foreign key values:
select name, odate from cust join ordr on cust.id = ordr.cust;
name | odate -------------------+------------ Douglas Fairbanks | 2014-01-13 Rudolph Valentino | 2014-02-17 Buster Keaton | 2014-03-12 Douglas Fairbanks | 2014-03-23 (4 rows)
Only customers with orders appear in the result, same as
select name, odate from cust inner join ordr on cust.id = ordr.cust;
I.e. INNER join is default. Use LEFT join for all records of the left table:
select name, odate from cust LEFT join ordr on cust.id = ordr.cust ;
name | odate -------------------+------------ Buster Keaton | 2014-03-12 Douglas Fairbanks | 2014-01-13 Douglas Fairbanks | 2014-03-23 Myrna Loy | Rudolph Valentino | 2014-02-17 (5 rows)
The functions count(), sum(), min(), max(), avg() work on sets of records defined by the GROUP BY clause:
select cust, count(*) from ordr group by cust; cust | count ------+------- BK | 1 DF | 2 RV | 1 (3 rows)
Count the number of orders for each customer that appears in the ORDR table. The second column is named after the function; rename the second column using the AS keyword:
select cust, count(*) as orders from ordr group by cust; cust | orders ------+------- BK | 1 DF | 2 RV | 1 (3 rows)
The WHERE clause selects rows before groups and aggregates are computed. Therefore, it cannot be used with aggregate functions:
select cust, count(*) from ordr where count(*) > 1 group by cust; ERROR: aggregates not allowed in WHERE clause
The HAVING clause achieves the desired effect:
select cust, count(*) from ordr group by cust having count(*) > 1;
Sometimes a statement becomes more elegant by renaming a result column using the AS clause. Such an alias can then be used in the GROUP BY and ORDER BY part, but not in the WHERE and HAVING clauses (in PostgresQL 8.1).
select cust, count(*) as orders from ordr group by cust order by orders desc limit 5;
Here the option DESC sorts descending, and LIMIT drops everything after the first 5 records.
Note the order of the clauses: WHERE - GROUP BY - HAVING - ORDER BY - LIMIT
Assume we want to get a listing showing the number of orders for each customer. Here is a simple approach:
select cust.name, cust.id, count(*) from cust join ordr on ordr.cust = cust.id group by cust.name, cust.id ;
name | id | count -------------------+----+------- Rudolph Valentino | RV | 1 Douglas Fairbanks | DF | 2 Buster Keaton | BK | 1 (3 rows)
select cust.name, cust.id, count(*) from cust LEFT join ordr on ordr.cust = cust.id group by cust.name, cust.id ;
name | id | count -------------------+----+------- Rudolph Valentino | RV | 1 Douglas Fairbanks | DF | 2 Myrna Loy | ML | 1 Buster Keaton | BK | 1 (4 rows)
select cust.name, cust.id, count(ordr.id) from cust left join ordr on ordr.cust = cust.id group by cust.name, cust.id ;
name | id | count -------------------+----+------- Rudolph Valentino | RV | 1 Douglas Fairbanks | DF | 2 Myrna Loy | ML | 0 Buster Keaton | BK | 1 (4 rows)
Sequences are a reliable source of unique identifiers; every result from a nextval() query is guaranteed to be unique, including concurrent multi-user access:
create sequence seq1; select nextval('seq1'); select nextval('seq1'); ...
The numbers from the sequence can be used e.g. for order IDs. Typically an application program such as a PHP script will select from the sequence and then use that value in INSERT and UPDATE statements.
Note that nextval('seqname') is non-standard PostgresQL.
The expression current_date in SQL statements evaluates to the current date:
select current_date; date ------------ 2014-07-25 (1 row)
This is useful e.g. for generating ORDR records. Note that current_date can be used in INSERT and UPDATE statements as well:
insert into ordr (id, cust, odate) values (12, 'BK', current_date);
\copy wage from 'data.txt' delimiter ',' CSV;
The file data.txt contains the field values separated by commas:
Smith,27 Johnson,22 Williams, 30
Putting double quotes around text values will also work. Spaces around numbers are ignored.
Your database files are not stored under your home directory. Use the command pg_dump to backup your database (on the command line, not within psql):
pg_dump dbuserid > backup.sql
The file backup.sql contains the SQL commands to restore all tables. Feed it to psql to restore your database:
psql < backup.sql