SQL Basics and Shopping Cart Example

Johann Mitlöhner, 2014

Structured Query Language

SQL (often pronounced sequel) is the query language supported by all major relational database management systems (DBMS). It includes a DDL (data definition language).

There is a lot of effort to standardize SQL; however, both commercial and free DBMS offer many convenient extensions. For any non-trivial application the choice of DBMS is still a long-term commitment.

PostgresQL is a free and Open Source DBMS. It is available for all major operating systems. On major Linux distributions the package postgresql is available via the package manager; the client program can be started on the command line (in a terminal window) be entering

psql

To save typing the passwort every time edit the file .pgpass in your home directory and insert this line:

*:*:*:yourdbuserid:yourdbpasswd

On the command line issue the following command (for moderate security):

chmod go-r .pgpass

Simple Shopping Cart System

We plan to implement a very simple shopping cart system. During discussions with key users (who have a better than average understanding of the processes) we arrive at the following ER-Diagram:

Customers place orders; each order contains one or more items, and each item refers to exactly one product. After some more discussion with the key users we learn the essential attributes of the enitities. In the database we define the following tables:

The ITEM table will be used for shopping cart items and order items: products are added via the web browser anonymously, using the session id. When an order is placed the customer id is requested, the ORDR field is set to a newly generated order number, and the SESS field is set to NULL. These steps can be implemented in various manners, e.g. via the PHP module of the Apache web server.

CREATE TABLE

The table is the basic object in a relational database. It consists of columns with names and datatypes:


create table cust (
  id varchar(80) primary key,
  name  varchar(80),
  address varchar(80),
  email varchar(80),
  pw varchar(80)
);

The PRIMARY KEY

The datatype VARCHAR(80)

Recommended characters in table names and column names are a-z, A-Z, 0-9, and _. Any other characters should not be used, even if they seem to work within Postgres; otherwise, integration with components such as PHP and R will yield some nasty surprises.

Do not use German umlauts in table names or column names!

INSERT

Note that text must be enclosed in (straight single) quotes.


insert into cust (id, name) values ('DF', 'Douglas Fairbanks');
insert into cust (id, name) values ('BK', 'Buster Keaton');
insert into cust (id, name) values ('RV', 'Rudolph Valentino');
insert into cust (id, name) values ('ML', 'Myrna Loy');

Missing values are set to NULL.

If all values are supplied in the proper order then the INSERT statement works without the list of column names before the VALUES keyword, but this depends on the current state of the table: if the table is altered (columns added or dropped) the statement without the column names will no longer work. Interactively this is not much of a problem, since the error is immediately apparent and easily fixed, but in a programmatic environment (e.g. in PHP scripts) always state the column names before the VALUES clause.

SELECT


select * from cust;

Select all columns from table CUST.

select id, name from cust;

Specify list of columns.

select name from cust where id = 'DF';

The WHERE clause restricts the resulting rows.

UPDATE

update cust set name = 'newname' where id = 'someid';

Without WHERE clause all records are updated!

More than one value can be updated in a single statement:

update item set ordr = 12, sess = NULL where sess = '59ed53dd...';

DELETE and DROP

delete from cust where id = 'someid';

Some records are deleted, but the table remains.

Note: without the WHERE clause all records are deleted!

drop table sometable;

Table and content are deleted.

ALTER TABLE

alter table x add column b char(20);

For the small sample application it will be easier to drop a table and create it again as desired.

Information on tables

This is non-standard, Postgres only:

Product


create table prod (
  id varchar(80) primary key,
  name varchar(80),
  price numeric(8,2),
  stock integer
);

Price is numeric with 2 decimal points and a total of 8 digits


insert into prod (id, name, price, stock) values ('HUB1', 'USB Hub 4x', 15.90, 5);
insert into prod (id, name, price, stock) values ('SD08', 'SDHC Card 8 GB', 5.80, 10);
insert into prod (id, name, price, stock) values ('HD05', 'HD IDE 500 GB', 55.90, 2);
insert into prod (id, name, price, stock) values ('MON1', 'LCD Monitor 19 XL', 65.90, 2);

Order


create table ordr (
  id integer primary key,
  cust varchar(80) references cust(id) on delete set null,
  odate date
);

Foreign keys ensure data integrity, but they also restrict operations on the database; they can be a blessing or a curse. Decide carefully on the options, or whether to use foreign keys at all.


insert into ordr (id, cust, odate) values (1, 'DF', '2014-01-13');
insert into ordr (id, cust, odate) values (2, 'RV', '2014-02-17');
insert into ordr (id, cust, odate) values (3, 'BK', '2014-03-12');
insert into ordr (id, cust, odate) values (4, 'DF', '2014-03-23');

Item


create table item (
  ordr integer references ordr(id) on delete set null,
  prod varchar(80) references prod(id) on delete set null,
  qty integer,
  sess varchar(80)
);

This table also serves as a shopping cart. The application will set either the session id or the order number.


insert into item (prod, qty, ordr) values ('HUB1', 1, 1);
insert into item (prod, qty, ordr) values ('HD05', 2, 1);
insert into item (prod, qty, ordr) values ('MON1', 1, 2);
insert into item (prod, qty, ordr) values ('HD05', 1, 2);
insert into item (prod, qty, ordr) values ('HD05', 1, 3);
insert into item (prod, qty, ordr) values ('MON1', 1, 4);

JOIN

Combine records from several tables with a condition, usually checking foreign key values:


select name, odate 
from cust join ordr on cust.id = ordr.cust;
       name        |   odate    
-------------------+------------
 Douglas Fairbanks | 2014-01-13
 Rudolph Valentino | 2014-02-17
 Buster Keaton     | 2014-03-12
 Douglas Fairbanks | 2014-03-23
(4 rows)

Only customers with orders appear in the result, same as


select name, odate 
from cust inner join ordr on cust.id = ordr.cust;

I.e. INNER join is default. Use LEFT join for all records of the left table:


select name, odate 
from cust LEFT join ordr on cust.id = ordr.cust ;
       name        |   odate    
-------------------+------------
 Buster Keaton     | 2014-03-12
 Douglas Fairbanks | 2014-01-13
 Douglas Fairbanks | 2014-03-23
 Myrna Loy         | 
 Rudolph Valentino | 2014-02-17
(5 rows)

Aggregate Functions

The functions count(), sum(), min(), max(), avg() work on sets of records defined by the GROUP BY clause:


select cust, count(*) from ordr group by cust;

 cust | count 
------+-------
 BK   |     1
 DF   |     2
 RV   |     1
(3 rows)

Count the number of orders for each customer that appears in the ORDR table. The second column is named after the function; rename the second column using the AS keyword:

select cust, count(*) as orders from ordr group by cust;

 cust | orders 
------+-------
 BK   |     1
 DF   |     2
 RV   |     1
(3 rows)

WHERE and HAVING

The WHERE clause selects rows before groups and aggregates are computed. Therefore, it cannot be used with aggregate functions:

select cust, count(*) from ordr 
where count(*) > 1 group by cust;
ERROR: aggregates not allowed in WHERE clause

The HAVING clause achieves the desired effect:

select cust, count(*) from ordr 
group by cust having count(*) > 1; 

Sometimes a statement becomes more elegant by renaming a result column using the AS clause. Such an alias can then be used in the GROUP BY and ORDER BY part, but not in the WHERE and HAVING clauses (in PostgresQL 8.1).

select cust, count(*) as orders from ordr 
group by cust order by orders desc limit 5;

Here the option DESC sorts descending, and LIMIT drops everything after the first 5 records.

Note the order of the clauses: WHERE - GROUP BY - HAVING - ORDER BY - LIMIT

Aggregates on Inner and Left Join

Assume we want to get a listing showing the number of orders for each customer. Here is a simple approach:


select cust.name, cust.id, count(*) 
from cust join ordr on ordr.cust = cust.id 
group by cust.name, cust.id ;
       name        | id | count 
-------------------+----+-------
 Rudolph Valentino | RV |     1
 Douglas Fairbanks | DF |     2
 Buster Keaton     | BK |     1
(3 rows)

select cust.name, cust.id, count(*) 
from cust LEFT join ordr on ordr.cust = cust.id 
group by cust.name, cust.id ;
       name        | id | count 
-------------------+----+-------
 Rudolph Valentino | RV |     1
 Douglas Fairbanks | DF |     2
 Myrna Loy         | ML |     1
 Buster Keaton     | BK |     1
(4 rows)

select cust.name, cust.id, count(ordr.id) 
from cust left join ordr on ordr.cust = cust.id 
group by cust.name, cust.id ;
       name        | id | count 
-------------------+----+-------
 Rudolph Valentino | RV |     1
 Douglas Fairbanks | DF |     2
 Myrna Loy         | ML |     0
 Buster Keaton     | BK |     1
(4 rows)

Sequences

Sequences are a reliable source of unique identifiers; every result from a nextval() query is guaranteed to be unique, including concurrent multi-user access:

create sequence seq1;
select nextval('seq1');
select nextval('seq1');
...

The numbers from the sequence can be used e.g. for order IDs. Typically an application program such as a PHP script will select from the sequence and then use that value in INSERT and UPDATE statements.

Note that nextval('seqname') is non-standard PostgresQL.

Date

The expression current_date in SQL statements evaluates to the current date:

select current_date;
    date    
------------
 2014-07-25
(1 row)

This is useful e.g. for generating ORDR records. Note that current_date can be used in INSERT and UPDATE statements as well:

insert into ordr (id, cust, odate) values (12, 'BK', current_date);

Import from CSV

The following (Postgres) command reads rows from the file data.txt and insert them into the table wage (with a character and a numeric column):
\copy wage from 'data.txt' delimiter ',' CSV;

The file data.txt contains the field values separated by commas:

Smith,27
Johnson,22
Williams, 30

Putting double quotes around text values will also work. Spaces around numbers are ignored.

Backup

Your database files are not stored under your home directory. Use the command pg_dump to backup your database (on the command line, not within psql):

pg_dump dbuserid > backup.sql

The file backup.sql contains the SQL commands to restore all tables. Feed it to psql to restore your database:

psql < backup.sql

Further Reading