0%

PostgreSQL技术-索引

PostgreSQL技术-索引

为了更加高效地查询数据, 我们应该考虑使用索引.

在PostgreSQL中, 我们可以通过如下地方式创建索引

1
2
3
4
5
6
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ INCLUDE ( column_name [, ...] ) ]
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]

1. btree

介绍

绝大部分场景下btree索引均适用. 该索引可用来进行范围查找, 支持以下操作符

  • >,>=,=,<, <=
  • like
  • between
  • in

同时,该类索引支持多字段索引, 只用索引的扫描,覆盖索引唯一约束

但是该索引无法使用长度较长的key. 否则就会出现: 54000] ERROR: index row requires 21792 bytes, maximum size is 8191

用法

创建表与数据

1
2
3
4
5
6
7
8
create table test_btree
(
key varchar(32),
value int
);
insert into test_btree
select md5(random()::text), (random() * 1000000)::int
from generate_series(1, 1000000);

创建索引

1
create index test_btree_key on test_btree (key) include (value);

查询示例

1
2
3
select sum(value)
from test_btree
where key in ('323816e2d4bb47d85d669b90089a2382', 'f3180734688c85d1cf5b173b261e9521');

查询计划

1
2
3
4
5
6
Aggregate  (cost=8.89..8.90 rows=1 width=8) (actual time=0.024..0.024 rows=1 loops=1)
-> Index Only Scan using test_btree_key on test_btree (cost=0.42..8.88 rows=2 width=4) (actual time=0.016..0.021 rows=2 loops=1)
" Index Cond: (key = ANY ('{323816e2d4bb47d85d669b90089a2382,f3180734688c85d1cf5b173b261e9521}'::text[]))"
Heap Fetches: 0
Planning Time: 0.052 ms
Execution Time: 0.038 ms

2. hash

介绍

hash索引适用于字段value非常长的场景. 只支持等值连接。

用法

创建表

1
2
3
4
5
6
7
8
create table test_hash
(
key text,
value int
);
insert into test_hash
select md5(random()::text), (random() * 1000000)::int
from generate_series(1, 10000);

创建索引

1
create index test_hash_key on test_hash using hash (key);

查询示例

1
2
3
select key, value
from test_hash
where key = '70913971e3676ba202faba804a12f9bd';

查询计划

1
2
3
4
Index Scan using test_hash_key on test_hash  (cost=0.00..8.02 rows=1 width=37) (actual time=0.010..0.011 rows=1 loops=1)
Index Cond: (key = '70913971e3676ba202faba804a12f9bd'::text)
Planning Time: 0.037 ms
Execution Time: 0.021 ms

3. GiST

介绍

Gist表示通用搜索树, 它是一种平衡的树结构的访问方法. 它可以用来解决空间,范围等几何问题. 例如,对于各种数据类型,执行各类操作符

名称 索引数据类型 可索引操作符 排序操作符
box_ops box && &> &< `&< >><<<<
circle_ops circle && &> &< `&< >><<<<
inet_ops inet, cidr && >> >>= > >= <> << <<= < <= =
point_ops point >> >^ << <@ <@ <@ <^ ~= <->
poly_ops polygon && &> &< `&< >><<<<
range_ops 任何范围类型 && &> &< >> << <@ `- -=@>@>`
tsquery_ops tsquery <@ @>
tsvector_ops tsvector @@

用法

创建表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
create table test_gist
(
range int4range,
value int,
exclude using gist (range with &&)
);
insert into test_gist
with a as (
select (random() * 1000)::int v
from generate_series(1, 100)
)
select int4range(v, v + 11), (random() * 100)::int
from a
on conflict do nothing;

查询示例

1
2
3
select range, value
from test_gist
where range @> 400;

查询计划

1
2
3
4
5
6
7
Bitmap Heap Scan on test_gist  (cost=4.19..13.66 rows=6 width=36) (actual time=0.049..0.049 rows=1 loops=1)
Recheck Cond: (range @> 400)
Heap Blocks: exact=1
-> Bitmap Index Scan on test_gist_range_excl (cost=0.00..4.19 rows=6 width=0) (actual time=0.009..0.009 rows=1 loops=1)
Index Cond: (range @> 400)
Planning Time: 0.142 ms
Execution Time: 0.068 ms

4. SP-GiST

介绍

SP-GiST是空间分割的(Space-Partitioned)GiST的省略语. 他的使用场景与GiST类似. 它支持二维点的SP-Gist操作符如下

  • <<, 是否严格在左
  • >>, 是否严格在右
  • ~=, 与…相同
  • <@, 包含或在…上
  • <^, 低于(允许接触)
  • >^,高于(允许接触)

用法

创建表

1
2
3
4
5
6
7
8
create table test_spgist
(
key point,
value int
);
insert into test_spgist
select point(random() * 1000, random() * 1000), (random() * 10000)::int
from generate_series(1, 1000000);

创建索引

1
create index test_spgist_key on test_spgist using spgist (key);

查询示例

1
2
3
select key, value
from test_spgist
where key >> point(100, 100);

查询计划

1
2
3
4
5
6
7
Bitmap Heap Scan on test_spgist  (cost=2951.28..10571.28 rows=100000 width=20) (actual time=64.614..119.100 rows=899566 loops=1)
" Recheck Cond: (key >> '(100,100)'::point)"
Heap Blocks: exact=6370
-> Bitmap Index Scan on test_spgist_key (cost=0.00..2926.28 rows=100000 width=0) (actual time=64.098..64.099 rows=899566 loops=1)
" Index Cond: (key >> '(100,100)'::point)"
Planning Time: 0.048 ms
Execution Time: 135.882 ms

5. GIN

介绍

GIN为通用倒排索引, 用于处理索引项为组合值的情况. 比如所搜包含特定词的文档, 或拥有特定值的数组. 对于一维数组, 它支持以下的操作符

  • <@, 被包含
  • @>, 包含
  • =, 等于
  • &&, 重叠

用法

创建表

1
2
3
4
5
6
7
8
create table test_gin
(
uuid uuid,
tags integer[]
);
insert into test_gin
select gen_random_uuid(), array [(random() * 100)::int,(random() * 100)::int,(random() * 100)::int]
from generate_series(1, 1000);

创建索引

1
create index test_gin_tags on test_gin using gin (tags);

查询示例

1
2
3
select uuid, tags
from test_gin
where tags @> array [10, 2];

查询计划

1
2
3
4
5
6
7
Bitmap Heap Scan on test_gin  (cost=12.00..16.02 rows=1 width=49) (actual time=0.018..0.019 rows=1 loops=1)
" Recheck Cond: (tags @> '{10,2}'::integer[])"
Heap Blocks: exact=1
-> Bitmap Index Scan on test_gin_tags (cost=0.00..12.00 rows=1 width=0) (actual time=0.012..0.012 rows=1 loops=1)
" Index Cond: (tags @> '{10,2}'::integer[])"
Planning Time: 0.078 ms
Execution Time: 0.065 ms

6. BRIN

介绍

BRIN索引(块范围索引, Block Range Indexes)维护不同范围内数据块的统计信息. 相比于btree索引, BRIN索引占用的空间更小.该索引用于线性相关较强的精确和范围查询.

用法

创建表

1
2
3
4
5
6
7
8
create table test_brin
(
key int,
value int
);
insert into test_brin
select g, (random() * 1000000)::int
from generate_series(1, 1000000) g;

创建索引

1
create index test_brin_key on test_brin (key) include (value);

查询示例

1
2
3
select key, value
from test_brin
where key between 11100 and 11150;

查询计划

1
2
3
4
5
Index Only Scan using test_brin_key on test_brin  (cost=0.42..5.42 rows=50 width=8) (actual time=0.014..0.018 rows=51 loops=1)
Index Cond: ((key >= 11100) AND (key <= 11150))
Heap Fetches: 0
Planning Time: 0.046 ms
Execution Time: 0.029 ms

7. bloom

介绍

bloom索引虽然只能用于等值查询,但可通过多列字段索引高效的查询每一种可能的情况.

用法

创建表

1
2
3
4
5
create table test_bloom as
select md5(random()::text) as i1,
md5(random()::text) as i2,
md5(random()::text) as i3
from generate_series(1, 1000000);

创建索引

1
create index test_bloom_all on test_bloom using bloom (i1, i2, i3);

查询示例

1
2
3
select i1,i2,i3
from test_bloom
where i2 = '2efeb41703124b0c5a3ac7cfbc86029f';

查询计划

1
2
3
4
5
6
7
8
Bitmap Heap Scan on test_bloom  (cost=15348.00..15352.01 rows=1 width=99) (actual time=5.933..8.660 rows=1 loops=1)
Recheck Cond: (i2 = '2efeb41703124b0c5a3ac7cfbc86029f'::text)
Rows Removed by Index Recheck: 4380
Heap Blocks: exact=3890
-> Bitmap Index Scan on test_bloom_all (cost=0.00..15348.00 rows=1 width=0) (actual time=5.615..5.616 rows=4381 loops=1)
Index Cond: (i2 = '2efeb41703124b0c5a3ac7cfbc86029f'::text)
Planning Time: 0.039 ms
Execution Time: 8.678 ms