External Tables and Parallel Queries Not Working?
We're running 10gR2 (with the 10.2.0.3 patch set) on RHEL 4.5. We're testing out access to our external tables (using the external tables as staging tables to transform logs into a star schema - so each external table will have multiple select statements run on it - for a bunch of dimensions plus a fact load).
The files for the external tables are basic tab-delimited text files and they are sitting on a 5-disk RAID5 LUN on EMC Clarion (so access should be rather fast - and is via filesystem commands). The Oracle server is an 8-CPU machine with 16GB RAM and has no usage at this point (aside from my tests).
So, I set up the external table below. I then ran a simple select count(*) from z_ext_users with autotrace turned on. The plan shows me that it's doing parallel access to the file (I see the old "PX BLOCK ITERATOR" and other PX statements), but when I watch over v$session, I see AT MOST 2 processes running. I then tried with the parallel hint ("select /*+ parallel(a,8) */ ..."), but that didn't help. The query takes approx 50 secs for about 2million records in the file.
If I CTAS the external table to a new regular table and run the select count(*) on the regular table, the query takes about 2 seconds (and runs with about 10 threads by default - no hints - as I would expect).
So, what's the issue here? What constitutes more parallelism with external tables? Is there something in my setup that isn't helping the optimizer? I was expecting similar (maybe not exactly the same) query performance.
There was one statement in the docs (http://download-west.oracle.com/docs...t_concepts.htm) that confused me - maybe some one can translate the second bullet point for me (could it have something to do with us being UTF-8?):
The following file, record, and data characteristics make it impossible for a file to be processed in parallel:
* Sequential data sources (such as a tape drive or pipe)
* Data in any multibyte character set whose character boundaries cannot be determined starting at an arbitrary byte in the middle of a string
This restriction does not apply to any datafile with a fixed number of bytes per record.
* Records with the VAR format
Here is the create statement:
create table z_ext_users
, version number
, urltracker varchar2(255)
, invitationcode varchar2(255)
, username varchar2(40)
default directory oracle_home
records delimited by newline
fields terminated by '\t'
missing field values are null
reject limit 10000
If you're not using some funky multibyte character set then you don't have to worry about that. Do you have representative statistics set on the external table?
No, I didn't have statistics on it - good point - so I used dbms_stats.set_table_stats and set the number of rows and row length. Should I have set any other info (I couldn't think of anything else I'd know about programatically).
Once I did that, I did see more parallel threads (about 6 by default). But I still can't force more parallelism via the hint (event if I set to 8 or 10, I get 6). Plus, it's still taking a minute to run (vs 2 secs on a normal table).
So, overall, I'm a tad better off (seeing parallel), but still a bit confused on how Oracle is treating these external tables.
BTW, do you consider UTF-8 a "funky multibyte character set"? I'm hoping that's not the issue (doesn't sound like it).
Click Here to Expand Forum to Full Width