-
Hash Partitioning and Partition-Wise Joins
Hi,
For the ETL process of a Data Warehouse project I have to join 2 tables (~10M rows) over their primary key.
I was thinking of hash partitioning these 2 tables over their PK. Through this the database (9.2) should be able to do a Hash-Hash Full Partition-wise Join. For more detail about that you can have a look at:
http://download-west.oracle.com/docs...part.htm#98291
What I'm looking for are some documents or recommandation concerning the number of hash partitions to create depending on the number of rows of the tables, CPU of the server or any other parameters.
I would be grateful if someone could give some input.
Mike
-
Have you done any assessment of how long it would take to join those record sets without partitioning? That would be my first step. Start by setting what you think is a reasonable performance target, and try a non-parallel join between non-partitioned tables (with the segments nicely spread out over your disk subsystem of course, and a memory allocation sufficient to avoid disk sorting). Compare performance against your target, and you maybe pleasantly surprised to find that you don't need to dabble any further.
Now, supposing you just want to do it as fast as possible, and are prepared to spend time on the issue. On 9.2 you should be using DBMS_STATS.GATHER_SYSTEM_STATS() to give Oracle some hard statistics on the relative performance on your system components, and this will let the optimizer make a better estimate of a suitable degree of parallelism for the process.
If your sole concern is the performance of this join, then I think it would be difficult to fault a high number of partitions -- sixteen or thirty-two, say. Remember that you want to use a number of partitions equal to a power of two (2, 4, 8 ...) in order to achieve an even distribution of rows-per-partition. A high number of partitions would give the optimizer plenty of room for distributing the partition-wise joins over multiple parallel server processes.
You might like to back off from a high number of partitions depending on other factors, such as the manageability of local indexes, or the other types of queries accessing these tables.
Anyway, I'd definately make sure that there are meaningful, measurable gains to be had from implementing this partitioning scheme before you plunge ahead.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|