第 11 章VIP章节 - SQL语言艺术

种情况下，使用正规连接、关联子查询，还是非关联子查询，

要根据不同条件的过滤能力和已存在哪些索引而定。

例如，由于不太畅销，我们不再检索订购蝙蝠车的人，而是查找上周六购买某种肥皂的客户。

此时，我们的查询语句为：

select distinct orders.custid

from orders

join orderdetail

on (orderdetail.ordid = orders.ordid)

join articles

on (articles.artid = orderdetail.artid)

where articles.artncom = 'SOAP'

and

这个处理流程很合逻辑，该逻辑和商品具有高可选择xìng时相反：先取得商品，再取得包含商品

的明细订单，最后处理订单。对目前讨论的肥皂订单的情况而言，我们应该先取得在较短期间

内下的少量订单，再检查哪些订单涉及肥皂。从实践角度来看，我们将使用完全不同的索引：

第一个例子需要orderdetail表的商品名称、商品ID这两个字段上的索引，以及orders表的主键

orderid上的索引；而此肥皂订单的例子需要orders表日期字段的索引、orderdetail表的订单ID字

段的索引，以及articles表的主键orderid上的索引。当然，我们首先假设索引对上述两例都是最

佳方式。

要知道哪些客户在上星期六买了肥皂，最明显而自然的选择是使用关联子查询：

select distinct orders.custid

from orders

where

and exists (select 1

from orderdetail

join articles

on (articles.artid = orderdetail.artid)

where articles.artncom = 'SOAP'

and orderdetails.ordid = orders.ordid)

在这个方法中，为了使关联子查询速度较快，需要orderdetail表的ordid字段上有索引（就可以

通过主键artid取得商品，无需其他索引）。

第3章已提到，事务处理型数据库（transactional database）的索引是种奢侈，因为它处在经常更

改的环境中，维护的成本很高。于是选择“次佳”解决方案：当表orderdetail 上的索引并不重要，

而且也有充足理由不再另建索引时，我们考虑以下方式：

select distinct orders.custid

from orders,

(select orderdetails.ordid

from orderdetail,

articles

where articles.artid = orderdetail.artid

and articles.artncom = 'SOAP') as sub_q

where sub_q.ordid = orders.ordid

and

这第二个方法对索引的要求有所不同：如果商品数量不超过数百万项，即使artncom字段上没有

索引，基于商品名称条件的查询xìng能也不错。表orderdetail的artid字段可能也不需索引：如果商

品很畅销，出现在许多订单中，则表orderdetail和articles之间的连接通过哈希或合并连接comrge

join）更高效，而artid字段上的索引会引起嵌套的循环。与第一种方法相比，第二种方法属于索

引较少的解决方案。一方面，我们无法承受为表的每个字段建立索引；另一方面，应用中都有

一些“次要的”查询，它们不太重要，对响应时间要求也不苛刻，索引较少的解决方案完全满足

它们的要求。

总结：为现存的查询增加搜索条件，可能彻底改变先前的构想：修改过的查询成了新查询。

多个间接宽泛条件的jiāo集

Small Intersection, Indirect Broad Criteria

为了构造查询条件，需要连接（join）源表之外的表，并在条件中使用该表的字段，就叫间接条

件（indirect criterion）。正如上一节“多个宽泛条件的jiāo集”的情况，通过两个或多个宽泛条件的

jiāo集处理获取小结果集，是项艰难的工作；若是涉及多次joincāo作，或者对中心表（central table）

进行joincāo作，则会更加困难——这是典型的“星形schema（star schema）”（第10章详细讨论），

实际的数据库系统中经常遇到。对于多个可选择xìng差的条件，一些罕见的组合要求我们预测哪

些地方会执行完整扫描。当牵涉到多个表时，这种情况颇值得研究。

DBMS引擎的执行始于一个表、一个索引或一个分区，就算DBMS引擎能并行处理数据也是如

此。虽然由多个大型数据集合的jiāo集所定义的结果集非常小，但前期的全表扫描、两次扫描等

问题依然存在，还可能在结果上执行嵌套循环（nested loop）、哈希连接

（hash join）或合并连接comrge join）。此时，困难在于确定结果集的哪种表组合产生的记录数

最少。这就好比，找到防线最弱的环节，然后利用它获得最终结果。

下面通过一个实际的Oracle 案例说明这种情况。原始查询相当复杂，有两个表在from 子句中

都出现了两次，虽然表本身不太庞大（大的包含700 000 行数据），但传递给查询的九个参数可

选择xìng都太差：

select (data from ttex_a,

ttex_b,

ttraoma,

topeoma,

ttypobj,

ttrcap_a,

ttrcap_b,

trgppdt,

tstg_a)

from ttrcapp ttrcap_a,

ttrcapp ttrcap_b,

tstg tstg_a,

topeoma,

ttraoma,

ttex ttex_a,

ttex ttex_b,

tbooks,

tpdt,

trgppdt,

ttypobj

where ( ttraoma.txnum = topeoma.txnum )

and ( ttraoma.bkcod = tbooks.trscod )

and ( ttex_b.trscod = tbooks.permor )

and ( ttraoma.trscod = ttrcap_a.valnumcod )

and ( ttex_a.nttcod = ttrcap_b.valnumcod )

and ( ttypobj.objtyp = ttraoma.objtyp )

and ( ttraoma.trscod = ttex_a.trscod )

and ( ttrcap_a.colcod = :0 ) -- not selective

and ( ttrcap_b.colcod = :1 ) -- not selective

and ( ttraoma.pdtcod = tpdt.pdtcod )

and ( tpdt.risktyp = trgppdt.risktyp )

and ( tpdt.riskflg = trgppdt.riskflg )

and ( tpdt.pdtcod = trgppdt.pdtcod )

and ( trgppdt.risktyp = :2 ) -- not selective

and ( trgppdt.riskflg = :3 ) -- not selective

and ( ttraoma.txnum = tstg_a.txnum )

and ( ttrcap_a.refcod = :5 ) -- not selective

and ( ttrcap_b.refcod = :6 ) -- not selective

and ( tstg_a.risktyp = :4 ) -- not selective

and ( tstg_a.chncod = :7) -- not selective

and ( tstg_a.stgnum = :8 ) -- not selective

我们提供适当的参数（这里以:0 到:8 代表）执行此查询：耗时超过25 秒，返回记录不到20

虎虎漫画小说

第 11 章免费阅读

虎虎漫画小说

第 11 章 免费阅读

第 11 章免费阅读