Click to See Complete Forum and Search --> : Bitmap indexes


pando
11-08-2000, 11:19 AM
Hi

Does anyone knows how bitmap index works internally? The difference between it and b*tree


cheers

denevge
11-09-2000, 04:59 AM
A bitmap index is a 2-dimensional table of bits.
when a certain value is OK, the bit is 1, otherwise it's 0

eg, table demo ( switchA char(1) )

values

A
B
A
C
D
E

the following table is created
--1 2 3 4 5 6" -> ID. There's a function that transforms this value to the ROWID if the row is selected
A 1 0 1 0 0 0
B 0 1 0 0 0 0
C 0 0 0 1 0 0
D 0 0 0 0 1 0
E 0 0 0 0 0 1
|--> possible values

When you do select ... where switchA = 'A'
--> all ID's where the 'A'-bit = 1 are selected, The ID is converted to a ROWID using a function. Now we have the rowid's off the records.


Hope this helps
Gert

pando
11-09-2000, 05:14 AM
Hi Thanks for the reply Devenge it certainly helped a bit, a bit because I dont see how this helps to improve the perfomance in a low cardinality column :P

For example if the table you put has 100 'A's will each 'A' get a different combination of 1 and 0?

what do you mean the 'A'-bit = 1
A minus bit?

denevge
11-09-2000, 06:10 AM
what is a bitmap index ?

I said it's a 2-dimensional array.
each element of the array has 2 elements. the value + a serie off 1's and/or 0's. ( = bitmap )
each bit in the bitmap represents a row in the table.
If the row in the table has the same value as the 'value'- element,
the bit in the bitmap for that row is 1, otherwise it's 0.

Let's take my example and put it in a array

value+bitmap
--------------------
A + 101000
B + 010000
C + 000100
D + 000010
E + 000001

When selecting switchA = 'A'
the bitmap that corresponds to the value 'A' is taken.
==> 101000
then we can identify the positions in the bitmap where bit=1
==> position 1
position 3
then the positions are converted to the ROWID's
==> ROWID's of the records in the table

the same for where switchA='D'
==> 000010
==> postion 5 ==> rowid

Why is it faster than a 'normal' index
--> it takes less space
Why is it fast when we use multiple bitmap indexes
--> because we can do a binary operation before converting the positions to rowid
-->==> example
-->==> where switchA='A' and myspecialcode='Y'
-->==> we get the bitmap from switchA='A' : 101000
-->==> we get the bitmap from myspecialcode='Y': let assume the bitmap is 001000
-->==> 101000 AND 001000 = 001000
-->==> where do we have 1 in bitmap : position 3
-->==> position 3 ==> converted to rowid
why is it preferred for low cardinality column
--> for each different value, a value+bitmap combination is made. So if you have to much different values, the size of the bitmap indexes become to big.
--> although I known somebody : table with 10.000.000 records / column with about 48.000 different values and still a bitmap index gives a big performance gain.

I hope this was better then the first reply.

Regards
Gert

pando
11-09-2000, 07:06 AM
I think more or less I am getting the idea, I still dont see if there will be different combination of value+bitmap though

for example if I have these values

A
B
A
A
C

will it be like this ?


value+bitmap
A 101000
B 010000
A 101000
......

or

value+bitmap
A 101000
B 010000
A 101001

denevge
11-09-2000, 07:31 AM
A 10110 -> record 1,3,4 have value A
B 01000 -> record 2 has value B
C 00001 -> record 5 has value C

when we add a new row with value D the new array will be

A 101100
B 010000
C 000010
D 000001
--> added value D
--> added a bit to every bitmap

when again we add a new row with value B the new array will be

A 1011000
B 0100001
C 0000100
D 0000010
--> added a bit to every bitmap

Gert

pando
11-09-2000, 08:14 AM
Ok I see it now, the bitmap is really 1 and 0 add together which becomes the number of rows :o

so if I have 10 rows

I will have 10 1`s and o's
if 11 rows 11 1's and 0's and so on...

pando
11-09-2000, 08:27 AM
by the way, bitmap index is not suitable for high cardinality columns because of SIZE issue? Because i think it´perfomance would be great in any query

rcaballe
11-09-2000, 10:24 AM
It's not suitable for HC because the RDBMS will make the comparison with all the records just to get a few records.
For this kind is better b-tree because the RDBMS is going to navigate through the index to find the few records.
I have bitmapped indexes with 150 million of records and 100 thousand of values and it's still good enough.
The examples shown before are good, but they are really vertical not horizontal, I mean A,B,C,etc are the columns of the index, if you add a record with a new value it gonna be a NEW column in the index, with all the problems you know there are.
A tip:
Create your table=>loaded up=>alter table name minimize_records_per_block=>create all the bitmap indexes on the table.

TerryD
11-09-2000, 11:21 AM
Just be careful if your updating a column with a bitmap index on.
Because the smallest part of a bitmap index that can be locked is a bitmap segment (up to half a block). This actually contains a number of rows, so, if you have some parallel updates (like we have) - even though you may be updating records on say a primary key, you may still get deadlocking issues due locking on the index.

Bitmap indexes were put on our system ages ago to help performance, but they've introduced a whole host of problems - so the application was coded around it (i.e. commit's in cursor loops to stop locking), and now this is introducing snapshot too old.

Terry

aracnid
11-09-2000, 11:24 AM
i found them good on select (or static tables) but disastereos on sqlloader (go often on direct load)

rcaballe
11-09-2000, 11:25 AM
Bitmap indexes are not good for OLTP.
The exception is a table not frequently updated in an OLTP DB. (catalog)

carp
11-09-2000, 12:34 PM
At the risk of muddying the waters, let me throw in my 2 cents' worth:

Contrary to an earlier entry, a bitmap index is NOT a table - it's an index. Internally, it is structured like a B* Tree index (to maintain performance when searching for a key value), but the index entries are different.

In a "regular" index, each entry is composed of a key value and a rowid.

In a bitmap index each entry is composed of a key value, the first and last data blocks for each contiguous string of blocks containing the key value, and the bitmap that represents the rows within the indicated string of blocks. Furthermore, in order to conserve space, Oracle uses a compression algorithm.

For example, if your table looked like this:

Block 1:
A
B
A
A
Block 2:
C
D
D
B
Block 3:
A
B
E
E

Then the entries in your bitmap index might look like this:

A: Blk1: Blk1: 1x1 1x0 2x1
A: Blk3: Blk3: 1x1 3x0
B: Blk1: Blk3: 1x0 1x1 5x0 1x1 1x0 1x1 2x0
C: Blk2: Blk2: 1x1 3x0
D: Blk2: Blk2: 1x0 2x1 1x0
E: Blk3: Blk3: 2x0 2x1

So you DO pay a small performance penalty due to the compression/decompression of bitmaps. But this can be more than compensated for in the compactness of the index (notice that in our trivial example, we only needed 6 index entries to represent 12 rows - conceivably you could index a table with a million rows and 4 key values with only 4 index entries!) and the capability for fast boolean logic that has already been discussed.

You pay a HUGE penalty if you are updating the values on which the bitmap index is based. For instance, if we
update the table to change the value of the second row in Block 3 from 'B' to 'C', we have to find the proper index entry for 'B', decompress the bitmap, find the appropriate bit, change it from 1 to 0, then determine that the string of contiguous blocks containing 'B' has changed from Blk1:Blk3 to Blk1:Blk2, so we have to eliminate the bits representing Block 3, and finally recompress the bitmap. THEN we have to find the appropriate entry for C and decompress it. We then find that we now have a new contiguous block that this bitmap has to reflect, so the end block entry is changed from Blk2 to Blk3 and the bitmap for the 'C' entry goes from
1x1 3x0
to
1x1 4x0 1x1 2x0
and then we have to recompress it.

This overhead is why Oracle recommends you NOT use bitmap indexes if you are going to do frequent DML on the key values or you have a relatively small table. Bitmap indexes were intended primarily for use in data warehouses, where tables are millions of rows long and DML is rare (except during periodic refreshes, which tend to take the form of inserts rather than updates).

TerryD
11-09-2000, 12:47 PM
Carp - Good answer and really explains things!

When we were having Snapshot too old problems the other day we found some code which contained commits accross fetches... Tut Tut we thought, made the rollback segments bigger and removed them.

Next day - Out of index tablespace...? Weird.
A bitmap index had grown from 8Mb to over 500Mb during the update of 300,000 records. So, back in the commits go since the app is coded to run pants without them (hence my cacheing question thread!).

I thought I was going mad, and couldn't work out how an index had grown so much, but what you're saying is that it recompresses after updating... Would that be right?

rcaballe
11-09-2000, 12:47 PM
Very nice!
Thank you!
I have found that a rebuild of bitmap indx are very helpful is your case Terry
Just a question:
Why 2 As ? Why not like B?

[Edited by rcaballe on 11-09-2000 at 12:57 PM]

pando
11-09-2000, 06:19 PM
just wondering what is

A: Blk1: Blk1: 1x1 1x0 2x1

the 1x1 1x0 and 2x1 stand for?

carp
11-09-2000, 06:25 PM
Sorry - that was the notation I was using to indicate a compressed entry.
So
1x1 3x0 2x1
would indicate a bitmap that looks like
100011
when decompressed.

denevge
11-10-2000, 06:18 AM
I knew how the bitmap part worked,
but not how it was stored in the datafiles.

Now ik know both.

Tanx Carp
Gert