Does split the big fact table make a difference?
There is a big fact table (70 gig) in our database. There are 11 years of data on the table. The table is paritioned at year. If I split the table into two tables, one for archiving data ( about 7 years of data), another one for the new data (about 4 years of data), is it going to improve the query performance? The new data table is usually the query access the most.
How could I find it out?
how is this table accessed?
does partition pruning occure?
why do you partition such a large table only per year?
is the table often joinded with another large table?
which version of Oracle do you use?
why do you think that the IO generated by this table is a performance problem?
1. There are full table scans on this table sometimes. Even I created indexes on all the columns for the query, there are still full table scan on the excution path.
2. Partition pruning does occure.
3. It's the nature of the data requires us to partition the data by year.
4. THe table join often with other tables. Those tables are relatively small, like 8 gig, 4gig...
6. After we load history data, the performance seems get bad.
Is the table partitioned by a YEAR identifier, or is there a date column with a lower level of granularity by which it is partitioned (eg. month, day, hour)?
Do you understand why full scans sometimes occur, even if everything is indexed?
Have you measured performance problems after loading historical data, and understood what has changed to cause any problems?
1. The table is partitioned by YEAR identifier. What would be the difference if it's in other ways? Why do you ask?
2. No, I don't know why full scans occur. I assume it's selectivity of the indexes, or the cost of doing full scan is less than index scan. How could I find out why optimizer takes full scan instead of picking up indexes?
3. The performance problems was measured by response time. The change is the data volumn. What would be changed after data volumn changed?
Well, if you are getting true full table scans, then how do you propose to run those queries if you split the table into multiple tables? If you are getting full partition scans (ie. partition pruning but no index-based access) then the total size of the table is not relevant, it's just the size of the partitions being accessed that matters.
Your difference in performance might be due to a number of issues. For example, the optimizer might be making a bad choice about the best access method because you haven't updated both global and partition statistics. Alternatively the optimizer might just be responding to the different data volumes correctly, and an increased data volume gives longer response times.
I don't think that the answer is to split the table - you have already effectively done that by partitioning. The key would just be to make sure that the optimizer is leveraging the partitioning by giving you the pruning that you expect.
I guess that it is not the reading of the data which is taking most time but the "group by", "order by" and join with the dimension tables.
Are you using table compression? It's an easy way to reduce the the IO.
1. The global and partition statistics updated right after data loading. The volumn is only 30% more, but the response time is like 10 times slower.
2. How often should the index get analyzed? Indexes are all created with COMPUTE STATISTICS. How big is the impact on the database performance if analyzing at day time? (using dbms_stats package).
3. The big fact table is range-hash partitioned. Oracle so far does not support table compression on a composite partitioned table, does it? Have you ever done a table compression on a composite partitioned table? How much it would improve the performance? What's the disadvantange if just compress the partitions without update?
where have you seen that oracle doesn't support compression on composite partitioned table?
As far as I remember you can't define it at subpartition level but that dosen't mean that the subpartition is not compressed if compression has been enabled at partition level.
1. Then you need to run a trace on the SQL before and after the load, and see what the extra response time is attributable to. Maybe a change in execution plan, for example, or maybe something else.
2. When you analyze the table use DBMS_STATS and specify cascade => true
Only you can answer on the impact of analyzing during the day time, but if you have made no effort to physically order the rows you are adding then you can most likely use a much lower estimate percentage than you think. Have a look at this: http://www.jlcomp.demon.co.uk/stats_i.html
3. "What's the disadvantange if just compress the partitions without update?" What do you mean here?
Click Here to Expand Forum to Full Width