SQL Shopping Cart Example

Johann Mitlöhner, 2014-2020

Structured Query Language

SQL (often pronounced sequel) is the query language supported by all major relational database management systems (DBMS). It includes a DDL (data definition language).

There is a lot of effort to standardize SQL; however, both commercial and free DBMS offer many convenient extensions. For any non-trivial application the choice of DBMS is still a long-term commitment.

PostgresQL is a free and Open Source DBMS. It is available for all major operating systems. On major Linux distributions the package postgresql is available via the package manager; the client program psql can be started on the command line (in a terminal window) to communicate with the DBMS:

psql -h postgresql.student-db.svc.cluster.local -U h12345678

As another option, the Jupyter notebook sql extension can be used. However, some commands (such as \copy) only work in psql; note that you can start a terminal from within Jupyter.

Simple Shopping Cart System

We plan to implement a very simple shopping cart system. During discussions with key users (who have a better than average understanding of the processes) we arrive at the following ER diagram:

Customers place orders; each order contains one or more items, and each item refers to exactly one product. After some more discussion with the key users we learn the essential attributes of the enitities. In the database we define the following tables:

CUST (id, name, address) for data on customers
PROD (id, name, price, stock) for data on products
ORDR (id, cust, odate) for data on orders
ITEM (prod, qty, sess, ordr) for shopping cart content and order items

The ITEM table will be used for shopping cart items and order items: products are added via the web browser anonymously, using the session id. When an order is placed the customer id is requested, the ORDR field is set to a newly generated order number, and the SESS field is set to NULL. These steps can be implemented in various manners, e.g. via the PHP module of the Apache web server.

CREATE TABLE

The table is the basic object in a relational database. It consists of columns with names and datatypes:


create table cust (
  id varchar(80) primary key,
  name  varchar(80),
  address varchar(80)
);

The PRIMARY KEY

ensures unique values for the column ID
creates an index for fast access

The datatype VARCHAR(80)

defines a character field with maximum size
in Postgres there is no performance loss compared to fixed-length CHAR()
the non-standard Postgres version VARCHAR() accepts strings of any length

Table and column names:

Recommended characters in table names and column names are a-z, A-Z, and the digits 0-9 (not as the first character). Underscore _ should work as well. Any other characters should not be used, even if they seem to work within the DB; otherwise, integration with components such as PHP and R can yield some nasty surprises.
Case sensitivity: do not rely on it, even if it seems to work in your DB. Choose a convention and stick to it, e.g., everything upper case OR everything lower case.
In Postgres all identifiers that are not double-quoted are folded to lower case; you can enforce case with double quotes, but double-quoted names are one sure way to maintenance hell.

Do not use German umlauts in table names or column names!

INSERT

Note that text must be enclosed in (straight single) quotes.


insert into cust (id, name) values ('DF', 'Douglas Fairbanks');
insert into cust (id, name) values ('BK', 'Buster Keaton');
insert into cust (id, name) values ('RV', 'Rudolph Valentino');
insert into cust (id, name) values ('ML', 'Myrna Loy');

Missing values are set to NULL.

If all values are supplied in the proper order then the INSERT statement works without the list of column names before the VALUES keyword, but this depends on the current state of the table: if the table is altered (columns added or dropped) the statement without the column names will no longer work. Interactively this is not much of a problem, since the error is immediately apparent and easily fixed, but in a programmatic environment (e.g. in PHP scripts) always state the column names before the VALUES clause.

☆ If all people on the planet were our customers, could we store their data on a single hard disk?

At the time of writing (2018) a typical desktop computer has a 1 TB hard disk.
Ignore space for index (fraction of main table).
Assume that we need about 100 bytes for the average record.
That means there are about 10 records in one KB, and since 1 TB = 1024 GB = 1024 * 1024 MB = 1024 * 1024 * 1024 KB we can store about 10,000,000,000 records.
The answer is Yes!

☆ Are 100 characters on screen always 100 bytes on disk?

This depends on the encoding:
- In ASCII each character is encoded in one byte. ASCII contains a total of 128 characters: those used in the English language, some commonly used punctuation marks and signs, and control characters.
- ISO 8859-1 (Latin-1) also encodes each character in one byte, but in addition to the ASCII characters it also contains characters used in western European languages, such as French accents and German umlauts.
- UTF-8 is a variable length encoding: the ASCII characters are encoded in one byte, the other characters in the Unicode set are encoded in two, three, or four bytes. The Unicode set contains over 100,000 characters.
If a text contains only ASCII characters, the number of bytes is equal to the number of characters.
If the text contains non-ASCII characters and UTF-8 is used, the number of bytes is larger than the number of characters.
There is a general move towards UTF-8 but various applications still use other encodings which continues to cause problems.
Since the encoding of the ASCII characters is the same in all encodings mentioned above, a text only containing ASCII characters is usually safe from encoding problems.

SELECT


select * from cust;

Select all columns from table CUST.

select id, name from cust;

Specify list of columns.

select name from cust where id = 'DF';

The WHERE clause restricts the resulting rows.

UPDATE

update cust set name = 'newname' where id = 'someid';

Without WHERE clause all records are updated!

More than one value can be updated in a single statement:

update item set ordr = 12, sess = NULL where sess = '59ed53dd...';

DELETE and DROP

delete from cust where id = 'someid';

Some records are deleted, but the table remains.

Note: without the WHERE clause all records are deleted!

drop table sometable;

Table and content are deleted.

ALTER TABLE

alter table x add column b char(20);

For the small sample application it will be easier to drop a table and create it again as desired.

Information on tables

This is non-standard, Postgres only:

\d ... list of all objects
\d cust ... definition of table CUST

Product


create table prod (
  id varchar(80) primary key,
  name varchar(80),
  price numeric(8,2),
  stock integer
);

Price is numeric with 2 decimal points and a total of 8 digits


insert into prod (id, name, price, stock) values ('HUB1', 'USB Hub 4x', 15.90, 5);
insert into prod (id, name, price, stock) values ('SD08', 'SDHC Card 8 GB', 5.80, 10);
insert into prod (id, name, price, stock) values ('HD05', 'HD IDE 500 GB', 55.90, 2);
insert into prod (id, name, price, stock) values ('MON1', 'LCD Monitor 19 XL', 65.90, 2);

Order


create table ordr (
  id integer primary key,
  cust varchar(80) references cust(id) on delete set null,
  odate date
);

ORDER is a reserved word in SQL, as part of the ORDER BY clause.
DATE is another reserved word for the datatype date.
The Foreign key CUST refers to customer IDs; the field is set to NULL if the customer in table CUST is deleted.

Foreign keys ensure data integrity, but they also restrict operations on the database; they can be a blessing or a curse. Decide carefully on the options, or whether to use foreign keys at all.


insert into ordr (id, cust, odate) values (1, 'DF', '2014-01-13');
insert into ordr (id, cust, odate) values (2, 'RV', '2014-02-17');
insert into ordr (id, cust, odate) values (3, 'BK', '2014-03-12');
insert into ordr (id, cust, odate) values (4, 'DF', '2014-03-23');

Item


create table item (
  ordr integer references ordr(id) on delete set null,
  prod varchar(80) references prod(id) on delete set null,
  qty integer,
  sess varchar(80)
);

This table also serves as a shopping cart. The application will set either the session id or the order number.


insert into item (prod, qty, ordr) values ('HUB1', 1, 1);
insert into item (prod, qty, ordr) values ('HD05', 2, 1);
insert into item (prod, qty, ordr) values ('MON1', 1, 2);
insert into item (prod, qty, ordr) values ('HD05', 1, 2);
insert into item (prod, qty, ordr) values ('HD05', 1, 3);
insert into item (prod, qty, ordr) values ('MON1', 1, 4);

JOIN

Combine records from several tables with a condition, usually checking foreign key values:


select name, odate 
from cust join ordr on cust.id = ordr.cust;

       name        |   odate    
-------------------+------------
 Douglas Fairbanks | 2014-01-13
 Rudolph Valentino | 2014-02-17
 Buster Keaton     | 2014-03-12
 Douglas Fairbanks | 2014-03-23
(4 rows)

Only customers with orders appear in the result, same as


select name, odate 
from cust inner join ordr on cust.id = ordr.cust;

I.e. INNER join is default. Use LEFT join for all records of the left table:


select name, odate 
from cust LEFT join ordr on cust.id = ordr.cust ;

       name        |   odate    
-------------------+------------
 Buster Keaton     | 2014-03-12
 Douglas Fairbanks | 2014-01-13
 Douglas Fairbanks | 2014-03-23
 Myrna Loy         | 
 Rudolph Valentino | 2014-02-17
(5 rows)

All customers appear in the result.
Missing ODATEs are set to NULL in the result.

Aggregate Functions

The functions count(), sum(), min(), max(), avg() work on sets of records defined by the GROUP BY clause:


select cust, count(*) from ordr group by cust;

 cust | count 
------+-------
 BK   |     1
 DF   |     2
 RV   |     1
(3 rows)

Count the number of orders for each customer that appears in the ORDR table. The second column is named after the function; rename the second column using the AS keyword:

select cust, count(*) as orders from ordr group by cust;

 cust | orders 
------+-------
 BK   |     1
 DF   |     2
 RV   |     1
(3 rows)

WHERE and HAVING

The WHERE clause selects rows before groups and aggregates are computed. Therefore, it cannot be used with aggregate functions:

select cust, count(*) from ordr 
where count(*) > 1 group by cust;
ERROR: aggregates not allowed in WHERE clause

The HAVING clause achieves the desired effect:

select cust, count(*) from ordr 
group by cust having count(*) > 1;

ALIAS: Sometimes a statement becomes more elegant by renaming a result column using the AS clause. Integration with other software components such as PHP scripts is another reason for renaming columns. Such an alias can then be used in the GROUP BY and ORDER BY part, but not in the WHERE and HAVING clauses.

select cust, count(*) as orders from ordr 
group by cust order by orders desc limit 5;

Here the option DESC sorts descending, and LIMIT drops everything after the first 5 records.

Note the order of the clauses: WHERE - GROUP BY - HAVING - ORDER BY - LIMIT

Aggregates on Inner Join and Left/Right Join

The default of the JOIN operation is INNER JOIN: only rows matching in both tables are selected.

Assume we want to get a listing showing the number of orders for each customer. Here is a simple approach:


select cust.name, cust.id, count(*) 
from cust join ordr on ordr.cust = cust.id 
group by cust.name, cust.id ;

       name        | id | count 
-------------------+----+-------
 Rudolph Valentino | RV |     1
 Douglas Fairbanks | DF |     2
 Buster Keaton     | BK |     1
(3 rows)

Count the number of records in the join for each customer.
However, customers with no orders do not appear in the result.
Therefore, the list is incomplete.


select cust.name, cust.id, count(*) 
from cust LEFT join ordr on ordr.cust = cust.id 
group by cust.name, cust.id ;

       name        | id | count 
-------------------+----+-------
 Rudolph Valentino | RV |     1
 Douglas Fairbanks | DF |     2
 Myrna Loy         | ML |     1
 Buster Keaton     | BK |     1
(4 rows)

Count the number of records for each customer.
Because of the LEFT join every customer appears in the result.
The LEFT JOIN returns all rows from the left table.
Therefore, customers with no orders still get one record in the join.
However, this is still not the number of orders for each customer!


select cust.name, cust.id, count(ordr.id) 
from cust left join ordr on ordr.cust = cust.id 
group by cust.name, cust.id ;

       name        | id | count 
-------------------+----+-------
 Rudolph Valentino | RV |     1
 Douglas Fairbanks | DF |     2
 Myrna Loy         | ML |     0
 Buster Keaton     | BK |     1
(4 rows)

Count the number of order IDs for each customer in the CUST table.
NULLs do not count.
This is the number of orders for each customer.

☆ Create tables for student - enroll - course, insert some data, and produce the following listings:
☆ number of courses enrolled per student, for all students
☆ number of students per course, for all courses
☆ students not enrolling in any course
☆ courses with fewer than two students

Sequences

Sequences are a reliable source of unique identifiers; every result from a nextval() query is guaranteed to be unique, including concurrent multi-user access:

create sequence seq1;
select nextval('seq1');
select nextval('seq1');
...

The numbers from the sequence can be used e.g. for order IDs. Typically an application program such as a PHP script will select from the sequence and then use that value in INSERT and UPDATE statements.

Note that nextval('seqname') is non-standard PostgresQL.

Date

The expression current_date in SQL statements evaluates to the current date:

select current_date;
    date    
------------
 2014-07-25
(1 row)

This is useful e.g. for generating ORDR records. Note that current_date can be used in INSERT and UPDATE statements as well:

insert into ordr (id, cust, odate) values (12, 'BK', current_date);

Import from CSV

The Postgres command \copy reads rows from a given file in your current working directory and inserts them into a given table; this table must already exist and have the corresponding columns with suitable data types. If additional constraints are defined (such as primary or foreign keys) they must also be satisfied.

☆ We want to import a list of football players where each player is associated with a club in a separate table:

Create a table club with columns id and name, both varchar. id is the primary key.

Create a table player with columns id, name, and club. id is integer and the primary key. name and club are varchar. club references id in table club.

We want to import the file player.csv into table player. Download that file into your home directory, e.g.

wget http://mitloehner.com/lehre/sql/player.csv

Preview the file e.g. in the spreadsheet, or download it and view it in your favorite editor. Make sure all values in the column club occur in your club table as id. Also, make sure that the id values for the players are not used in your player table.

Run the psql command in your home directory and enter:

\copy player from 'player.csv' delimiter ',' CSV HEADER;

The \copy command is specific to Postgres.
This command does not work with the SQL extension in Jupyter notebooks, only on the command line using psql.
player is the name of the table that will receive the data
The name of the data file is given with quotes. That file must be in the working directory i.e. the directory where you started the psql session
This particular file actually uses comma as the delimiter; this is not always the case with CSV files
The file starts with a header line that contains the names of the columns in the file - this line will be ignored

☠ There is also the COPY command (without the backslash) which is run by the DBMS and requires root priviledges for security reasons; the \copy command is run by the psql client and does not require additional priviledges.

☆ Exercise: import the data from the file pl.csv into a (different!) table for football players and transfer values! Then you can use aggregate functions for statistics, such as average transfer value by club.

Backup

Your database files are not stored under your home directory. Use the command pg_dump to backup your database (on the command line, not within psql):

pg_dump dbuserid > backup.sql

The file backup.sql contains the SQL commands to restore all tables. Feed it to psql to restore your database:

psql < backup.sql

DB Size and Quota

There is currently no mechanism to set a size limit on a DB or user in Postgres. However, you can use this command to check the total size in bytes of a database:

SELECT pg_database_size('dbuserid');

Subqueries

Although many query problems can be solved with various types of joins some situations are more conveniently expressed with subqueries. Typically the result of a subquery is used with an operator in a Select statement:

SELECT ... FROM ... WHERE expression OPERATOR (SELECT ...)

E.g., SELECT name FROM customer WHERE salary > (SELECT ...)

For some operators the subquery must return either exactly one value, while others (such as the IN operator) can deal with a list of values.