On-Disk Bitmap Index In Bizgres
Ayush Parashar
aparashar@greenplum.com and
Jie Zhang
jzhang@greenplum.com
1
Agenda
Introduction to On-Disk Bitmap Index
Bitmap index creation
Bitmap index creation performance - index size and
creation time
Performance with varying cardinality
Query performance
Summary
2
Introduction
Access method specially efficient for low-
cardinality column of high-dimensional fact table
Takes a fraction of space as compared to B-tree
Very less index creation time
Very effective on queries with multiple-conditions
in where clause
3
Introduction: continued..
Consider the following data:
Qu i ckTi me ™ an d a
TIFF (LZW) de co mp re ss or
a re ne ed ed to se e thi s p i ctu re .
4
Introduction: B-Tree representation
QuickTime™ and a
TIFF (Uncompressed) decompre ssor
are neede d to see this picture.
5
Introduction: Bitmap representation
QuickTime™ and a
TIFF (LZW) decompressor
are neede d to see this picture.
6
Bitmap index creation
Steps in bitmap-index creation
Build bin indexes
Apply encoding schemes: Equality encoding scheme
used in the present implementation
Apply compression schemes
Contributors: Jie Zhang, Mark Kirkwood, Gavin
Sherry
7
Bitmap index creation: continued..
QuickTime™ and a
TIFF (LZW) decompressor
are neede d to see this picture.
8
On-disk Bitmap Index - Performance
Table definition from OSDL - DBT 3 benchmark
Size of table is corresponding to scale factor 10 for DBT 3
Table Name Indexed Column Data Type Cardinality
LINEITEM L_SHIPMODE character(10) 7
LINEITEM L_QUANTITY numeric(15,2) 50
LINEITEM L_LINENUMBER integer 7
LINEITEM L_SHIPMODE, L_QUANTITY character(10), 350
numeric(15,2)
* multi-column index
ORDERS O_ORDERSTATUS character(1) 3
ORDERS O_ORDERPRIORITY character(15) 5
CUSTOMER C_MKTSEGMENT character(10) 5
CUSTOMER C_NATIONKEY integer 25
9
On-disk Bitmap Index - Index Size Performance
Index size - Size of bitmap index is a fraction of B-tree index
3000 2845
2500
Size of the index (in MBytes)
2000 1804 1804
1500 1285
1000
580
500 321
117 176
58 59 5 11 1 45 2 32
0
l_s hipmode l_quantity l_linenum ber l_s m & l_q o_orders tatus o_orderpriority c_mktsegment c_nationkey
Bitmap B-tree
10
On-disk Bitmap Index - Creation Time Performance
Index creation time: Up to 7 times faster index creation
3500
Time take n to c rea te the index (sec onds )
2933.4
3000
2500
2217.1
2000
1500
937.8 948.7
1000
679.2
547.2
454.9 412.4
500 374.6
241.4
83.5 108.5
10.9 51.4 8.3 9.3
0
l_s hipmode l_quantity l_linenum ber l_s m & l_q o_orders tatus o_orderpriority c_mktsegment c_nationkey
Bitmap B-Tree
11
On-disk Bitmap Index - Performance with varying cardinality
Index size with varying cardinality: Total rows 250 million
Variation in index size with change in column cardinality
7000
6000
5000
Index size in MB
4000
Bitmap Index
B-tree Index
3000
2000
1000
0
0 2000 4000 6000 8000 10000 12000
Column Cardinality 50, 100, 500, 1000, 2000, 4000, 10000 (total rows 250 million)
12
On-disk Bitmap Index - Performance with varying cardinality
Index creation time with varying cardinality: Total rows 250
million
Variation in index creation time with varying cardinality
1600
1400
1200
Index creation time in seconds
1000
Bitmap Index
800
B-tree Index
600
400
200
0
0 2000 4000 6000 8000 10000 12000
Column Cardinality: 50, 100, 500, 1000, 2000, 4000, 10000 (total rows 250 million)
13
On-disk Bitmap Index - Query Performance
Query 1 Query 2
SELECT sum(lineitem.l_discount)
SELECT avg(lineitem.l_tax)
FROM
FROM
lineitem, orders, customer, nation
lineitem, orders
WHERE
WHERE
nation.n_name='UNITED STATES' AND
orders.o_orderstatus='F' AND
customer.c_mktsegment='AUTOMOBILE' AND
orders.o_orderpriority='4-NOT SPECIFIED' AND
orders.o_orderstatus='P' AND
lineitem.l_linenumber=5 AND
orders.o_orderpriority='2-HIGH' AND
lineitem.l_shipmode='TRUCK' AND
lineitem.l_quantity=5 AND
lineitem.l_quantity=2 AND
lineitem.l_shipmode='AIR' AND
orders.o_orderkey=lineitem.l_orderkey;
lineitem.l_linenumber=5 AND
customer.c_custkey=orders.o_custkey AND
orders.o_orderkey=lineitem.l_orderkey AND
nation.n_nationkey=customer.c_nationkey;
Query 3
SELECT count(*) FROM lineitem WHERE l_linenumber=1;
Query 4
SELECT count(*) FROM lineitem WHERE l_linenumber in (1,2) AND l_shipmode IN ('RAIL','TRUCK');
Query 5
SELECT count(*) FROM lineitem WHERE l_linenumber=5 AND l_shipmode='RAIL' AND l_quantity=18;
14
On-disk Bitmap Index - Query Performance
Query Performance:
Run1, Run2 and Run3 indicate that the same query has been run consecutively three-times
450
Time taken for query completion (seconds)
400
39 39
38 6.5 2.8 5.1 38 38
38 6.6 6.7 8.7
34 35
35 0.8 6.7 2.2
350
300 28 28
28 3.4 3.3 3.3
250
200 18 1.8 1 0.5
18 18
150 13 6.3
10 1.4 .4 .7
99 98 10 10
10 8.6 9.1 9.3
100
60 68
60 .4 .6 .3 53 .9
50
1.7 1.7 0.8 0.6
0
Query 1 Query 2 Query 3 Query 4 Query 5
Bitmap - Ru n1 Bitmap - Ru n2 Bitmap - Ru n3 B-tree - Run1
B-tree - Run2 B-tree - Run3
15
On-disk Bitmap Index - Query Performance
Example: Query I explain plan output
QUERY PLAN
-----------------------------------------------------------------------------
Aggregate (cost=713682.71..713682.72 rows=1 width=9)
-> Merge Join (cost=706473.11..713678.58 rows=1649 width=9)
Merge Cond: ("outer".l_orderkey = "inner".o_orderkey)
-> Sort (cost=122798.17..122841.70 rows=17412 width=13)
Sort Key: lineitem.l_orderkey
-> Bitmap Heap Scan on lineitem (cost=57440.64..121571.69 rows=17412 width=13)
Recheck Cond: ((l_quantity = 2::numeric) AND (l_linenumber = 5) AND (l_shipmode = 'TRUCK'::bpchar))
-> BitmapAnd (cost=57440.64..57440.64 rows=17412 width=0)
-> Bitmap Index Scan on l_quantity_bm_idx(on-disk bitmap index) (cost=0.00..4500.02 rows=1199721 width=0)
Index Cond: (l_quantity = 2::numeric)
-> Bitmap Index Scan on l_linenumber_bm_idx(on-disk bitmap index) (cost=0.00..22779.89 rows=6278540 width=0)
Index Cond: (l_linenumber = 5)
-> Bitmap Index Scan on l_shipmode_bm_idx(on-disk bitmap index) (cost=0.00..30160.23 rows=8318065 width=0)
Index Cond: (l_shipmode = 'TRUCK'::bpchar)
-> Sort (cost=583674.94..587225.94 rows=1420397 width=4)
Sort Key: orders.o_orderkey
-> Bitmap Heap Scan on orders (cost=10318.21..372165.55 rows=1420397 width=4)
Recheck Cond: (o_orderpriority = '4-NOT SPECIFIED'::bpchar)
Filter: (o_orderstatus = 'F'::bpchar)
-> Bitmap Index Scan on o_orderpriority_bm_idx(on-disk bitmap index) (cost=0.00..10318.21 rows=2869489 width=0)
Index Cond: (o_orderpriority = '4-NOT SPECIFIED'::bpchar)
(21 rows)
16
Summary
On-disk Bitmap Index: New feature in Bizgres 0.9
Provides dramatic improvements in index creation
time and space used by the index
Dramatically improves response time for large
classes of ad hoc data-warehousing queries
17
Thanks
Questions..?
References
Bitmap Index Design and Evaluation: Chee-Yong Chan and Yannis E. Ioannidis
Compressed bitmap indices for efficient query processing: Kesheng Wu Ekow J. Otoo
Arie Shoshani
On-Disk Bitmap Index Performance in Bizgres 0.9 - A Greenplum Whitepaper:
http://bgn.greenplum.com
http://www.bizgres.org/home.php
18