回顾

在 「Relational Model」 下构建查询语句的方式分为两种:Procedural 和 Non-Procedural。第一节课中已经介绍了 Relational Algebra,它属于 Procedural 类型,而本节将介绍的 SQL 属于 Non-Procedural 类型。使用 SQL 构建查询时,用户只需要指定它们想要的数据,而不需要关心数据获取的方式,DBMS 负责理解用户的查询语义,选择最优的方式构建查询计划。

SQL 历史

  • SEQUAL from IBM’s System R prototype
    • Structured English Query Language
    • Adopted by Oracle in the 1970s
  • IBM releases DB2 in 1983
  • ANSI Standard in 1986. ISO in 1987
    • Structued Query Language
      当前 SQL 的标准是 SQL 2016,而目前大部分 DBMSs 至少支持 「SQL-92」 标准

SQL 主要特性介绍

概要

本节使用的示例数据库如下所示:

student(sid, name, login, age, gpa)

sid name login age gpa
53666 Kanye kayne@cs 39 4.0
53668 Bieber jbieber@cs 22 3.9

enrolled(sid, cid, grade)

course(cid, name)

聚合(Aggregates)

Aggregates 通常返回一个值,它们包括:

  • AVG(col)
  • MIN(col)
  • MAX(col)
  • SUM(col)
  • COUNT(col)

举例:

1
2
3
4
5
6
7
8
SELECT COUNT(login) AS cnt
FROM student WHERE login LIKE '%@cs';

SELECT COUNT(*) AS cnt
FROM student WHERE login LIKE '%@cs';

SELECT COUNT(1) AS cnt
FROM student WHERE login LIKE '%@cs';

1
2
SELECT AVG(gpa), COUNT(sid)
FROM student WHERE login LIKE '%@cs';
1
2
SELECT COUNT(DISTINCT login)
FROM student WHERE login LIKE '%@cs';

aggregate 与其它通常的查询列不可以混用,比如:

1
2
3
SELECT AVG(s.gpa), e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid;

注意:
不同 DBMSs 的输出结果不一样,严格模式下,DBMS 应该抛错。

分组(Group by)

group by 就是把记录按某种方式分成多组,对每组记录分别做 aggregates 操作,如求每门课所有学生的 GPA 平均值:

1
2
3
4
SELECT AVG(s.gpa), e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid;

所有非 aggregates 操作的字段,都必须出现在 group by 语句,如下面示例中的 e.cid 和 s.name:

1
2
3
4
SELECT AVG(s.gpa), e.cid, s.name
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid, s.name;

再过滤(Having)

基于 aggregation 结果的过滤条件不能写在 WHERE 中,而应放在 HAVING 中,如:

1
2
3
4
5
SELECT AVG(s.gpa) AS avg_gpa, e.cid
FROM enrolled AS e, student AS s
WHERE e.sid = s.sid
GROUP BY e.cid
HAVING avg_gpa > 3.9;

重定向输出(Output Redirection)

将查询结果储存到另一张表上:

  • 该表必须是已经存在的表
  • 该表的列数,以及每列的数据类型必须相同
    1
    2
    3
    CREATE TABLE CourseIds (
    SELECT DISTINCT cid FROM enrolled
    );

输出控制(Output Control)

排序(Order By)

语法:ORDER BY <column*> [ASC|DESC]

1
2
3
SELECT sid, grade FROM enrolled
WHERE cid = '15-721'
ORDER BY grade;

按多个字段分别排序:

1
2
3
SELECT sid FROM enrolled
WHERE cid = '15-721'
ORDER BY grade DESC, sid ASC;

限制(Limit)

语法:LIMIT <count> [offset]

1
2
3
4
5
6
7
SELECT sid, name FROM student
WHERE login LIKE '%@cs'
LIMIT 10;

SELECT sid, name FROM student
WHERE login LIKE '%@cs'
LIMIT 20 OFFSET 10;

嵌套查询(Nested Queries)

nested queries 包含 「inner queries」 和 「outer queries」,前者可以出现在 query 的任何位置,且 inner queries 可以引用 outer queries 中的表信息。

例 1:获取所有参与 ‘15-445’ 这门课所有学生的姓名:

1
2
3
4
5
6
7
8
9
10
SELECT name FROM student
WHERE sid IN (
SELECT sid FROM enrolled
WHERE cid = '15-445'
);

SELECT (SELECT S.name FROM student AS S
WHERE S.sid = E.sid) AS sname
FROM enrolled AS E
WHERE cid = '15-445';

语法中支持的谓词包括:

  • ALL: 所有 inner queries 返回的记录都必须满足条件
  • ANY:任意 inner queries 返回的记录满足条件即可
  • IN:与 ANY 等价
  • EXISTS:inner queries 返回的表不为空
    1
    2
    3
    4
    5
    SELECT name FROM student
    WHERE sid ANY (
    SELECT sid FROM enrolled
    WHERE cid = '15-445'
    )
    例 2:找到至少参与一门课程的所有学生中,id 最大的
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    SELECT sid, name FROM student
    WHERE sid >= ALL (
    SELECT sid FROM enrolled
    );

    SELECT sid, name FROM student
    WHERE sid IN (
    SELECT MAX(sid) FROM enrolled
    );

    SELECT sid, name FROM student
    WHERE sid IN (
    SELECT sid FROM enrolled
    ORDER BY sid DESC LIMIT 1
    );
    例 3:找到所有没有学生参与的课程
    1
    2
    3
    4
    5
    SELECT * FROM course
    WHERE NOT EXISTS (
    SELECT * FROM enrolled
    WHERE course.cid = enrolled.cid
    );
    nested queries 比较难被优化(具体原因暂不知道)

通用表达式(Common Table Expressions)

在一些复杂查询中,创建一些中间表能够使得这些查询逻辑更加清晰:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
WITH cteName AS (
SELECT 1
)
SELECT * FROM cteName

WITH cteName (col1, col2) AS (
SELECT 1, 2
)
SELECT col1 + col2 FROM cteName

WITH
cteName1(col) AS (SELECT 1),
cteName2(col) AS (SELECT 2)
SELECT C1.col + C2.col FROM cteName1 AS C1, cteName2 AS C2;

例 1:找到所有参与课程的学生中 id 最大的
1
2
3
4
5
WITH cteSource(maxId) AS (
SELECT MAX(sid) FROM enrolled
)
SELECT name FROM student, cteSource
WHERE student.sid = cteSource.maxId

例 2:打印 1-10
1
2
3
4
5
6
7
WITH RECURSIVE cteSource (counter) AS (
(SELECT 1)
UNION ALL
(SELECT counter + 1 FROM cteSource
WHERE counter < 10)
)
SELECT * FROM cteSource;

字符串操作(String Operations)

DBMS String Case String Quotes
SQL-92 Sensitive Single Only
Postgres Sensitive Single Only
MySQL InSensitive Single/Double
SQLite Sensitive Single/Double
Oracle Sensitive Single Only

如在 condition 中判断两个字符串忽略大小写后是否相等:

1
2
3
4
5
/* SQL-92 */
WHERE UPPER(name) = UPPER('KaNyE')

/* MySQL */
WHERE name = "KaNyE"

字符串匹配(String Matching)

1
2
3
4
5
SELECT * FROM enrolled AS e
WHERE e.cid LIKE '15-%';

SELECT * FROM student AS s
WHERE s.login LIKE '%@c_';

字符串操作(String Operations)

不同 DBMS 有不同的 string 函数(没有完全遵从 SQL-92 标准),如连接两个 strings

1
2
3
4
5
6
7
8
9
/* SQL-92 */
SELECT name FROM student
WHERE login = LOWER(name) || '@cs';

/* MySQL */
SELECT name FROM student
WHERE login = LOWER(name) + '@cs';
SELECT name FROM student
WHERE login = CONCAT(LOWER(name), '@cs')

时间类型操作(Date/Time Operations)

不同的 DBMS 中的定义和用法不一样,具体见各 DBMS 的文档。

窗口/开窗函数(Window Functions)

主要用于在一组记录中,对每一条记录进行一些计算,如:
例 1:

1
2
SELECT *, ROW_NUMBER() OVER () AS row_num
FROM enrolled;

例 2:
1
2
3
4
SELECT cid, sid,
ROW_NUMBER() OVER (PARTITION BY cid)
FROM enrolled
ORDER BY cid;

例 3:找到每门课获得最高分的学生
1
2
3
4
5
6
SELECT * FROM (
SELECT *,
RANK() OVER (PARTITION BY cid ORDER BY grade ASC) AS rank
FROM enrolled
) AS ranking
WHERE ranking.rank = 1

补充

参考