query concurrency problems

**Vladimir Atehor** · 06-01-2005, 12:32 PM

Hello all.

I'm a newbie here so I'll try to be as clear as possible.

We have an application made of applets that all connect to an 8i database in an HP 2 proc, 5 GB, 12 disks server.

The table design has a main table called NMObject, that has around 400.000 records, and basically contains some basic attributes like "name", of our entities. the other attributes of an entity are stored in a separate table. Which table corresponds to an entity depends on the entity's class, so there are entities like "Network" that have a table called "NMNetwork" containing some of its attributes, while the name and "generic" attributes reside on the NMObject table.

The "big" table is NMObject, most other tables are small (< 10.000 records). The application constists of applets that connec to to the database and perform hand-written sql. No J2EE, no entity beans.

Our customer complained about low response times. response times were really high (40 seconds for a simple join on nmobject indexed query) and we knew all the time was spent waiting for queries to complete.

All the performance problems mentioned here are related to query (select) performance. inserts operations are not a problem now.

We did 2 things to improve performance:

1. We found out many tables (specially NMObject) had too many extents. We redesigned the entire storage of all tables to have few extents now and in the future, intensive insert tables were made to have to have higher FREELISTS and INITRANS values, and we tuned PCTUSED and PCTFREE according to observed space usage behavior. all the tables were re-deployed, so the new storage is effective as if it was the original.

2. We tuned the complex queries using Explain Plan on analyzed tables, and modified them to reduce cost and execution time. Production server analyzes tables frequently.

We have created plenty of "query-based" "as selective as possible" indexes, but sometimes the CBO chooses not to use them.

We got the following results:

The response time for a single user connected to the database dropepd significantly (from 40s to 3 or 4 seconds). Our customer says that for a single user, the response times are now what was desired. However, for concurrent users the response times are still very high. 3 users get 10 times the response time of 1 user.

We then performed the following tests on our development server (which is not a monster machine like the production server of our customer):

a) Using ssh to connect tot he database server, and then opening an sqlplus connection from the server itself, we @ran a script containing 5 complex queries extracted from an operation performed by one of the applets. Then we measured the response time experienced by a single user, and then the time experienced by 6 concurrent users.
The single user completed the queries in 5.5 seconds. The concurrent users got 22 seconds. 6 times the users, 4 times the response time.

b) The same test as before, but this time the users were not connected locally trough sqlplus, but remotely from other machines using jdbc (trough a java query tool). 1 user got a response time of 13.8 seconds, while 6 users got a response time of 157 seconds. 6 times the users, 11 times the response time.

We are now very concerned about this, specially because we no longer know what to do. We already improved sotrage statements of the tables. We already improved the queries using explain plan, the cost and execution time of many queries was significantly reduced because of this. However, it only stays cool for a single user, with as few as 2 or 3 users, response times skyrocket and we dont' know why.

One theory is that there is a contention on this NMObject table because most queries have to go to that table to request the "name" of any entity (networks, links, customers, services...). its the largest table with 400.00 records, and altough queries were tuned, there are enough indexes verified to be in use according to explain plan, etc etc, maybe there are just too many queries on the same table and that contention causes serialization that kills prformance. This theory implies that there might be a design flaw with the whole "NMObject" table idea. This is suggested by the by the fact that when tuning queries, we know plan costs and execution times are very low until we join anything with NMObject. If the join is outer, or when there has to be more than 1 access to nmobject, it gets worse.

Other theory says that Oracle is proven to be able to take hundreds of concurrent users on the same table even if its a large table. According to this theory, the mproblem is more of oracle administration and since we are developers and not expert DBAs its because of our ignorance in oracle options and configuration that we dont know what to do, but then an expert would point us in the right direction. This theory implies that either the DBAs of our customer are not experts either, or our queries can be done some other way that improves concurrency, and our method of using explain plan to find the "lowest cost" "fastest execution time for single user" failed to find the perfect queries. We acknowledge that we have used almost no trace/tkprof in our tuning.

Given this scenario, we can use any help any of you can provide us. ¿which can be the problem that causes the response time to raise 11 times when the number of concurrent users is raised to 6?. Is there anything we can do or are we doomed?

Here is a representative "complex query" from our application, which was tuned using explain plan, in case it helps:

PHP Code:


SELECT AFFECTNETWORK.ID,o.name AS NAME,c.NAME,AFFECTNETWORK.RESOURCEID,AFFECTNETWORK.IDEVENT,AFFECTNETWORK.ALARMID,AFFECTNETWORK.AFFECTATIONTYPE ,AFFECTNETWORK.COMMENTS

FROM 

(select eventaffectationresources.ID, RESOURCEID , IDEVENT, ALARMID, AFFECTATIONTYPE , EVENTAFFECTATIONRESOURCES.COMMENTS

FROM eventaffectationresources, NMNETWORK, events

WHERE EVENTAFFECTATIONRESOURCES.resourcetype ='Network' 

AND (idevent = 18945 OR parentid = 18945)

AND NMNETWORK.networkobject = RESOURCEID 

AND eventaffectationresources.idevent = events.id) AFFECTNETWORK, NMOBJECT O,NMCLASS C

WHERE

O.ID =AFFECTNETWORK.RESOURCEID

AND O.CLASSID = C.ID

Thank you in advance

**slimdave** · 06-01-2005, 01:28 PM

I think that one path forward is hinted at in your question -- use tracing to find out exactly what these queries are waiting on.

Are you familiar with event 10046 tracing? It can write to the trace file a list of all the wait events involved in processing the query, and hence tells you why a 22 second query took 22 seconds.

briefly, you issue

Code:

alter session set events '10046 trace name context forever, level 8';

and then execute the queries.

10046 is pretty well documented on various sites, and googling will get you a lot more detail.

**tamilselvan** · 06-01-2005, 06:49 PM

Did you try setting 10046 events as said by Slimdave?

Here are my questions:
Is the "in-line" view necessary?
Can you not write a simple SQL statement?
Can you post the explain plan and init.ora file?

Tamil

**Vladimir Atehor** · 06-03-2005, 10:08 AM

I have requested to set tracing for the 10046 event on the production server for a day. (I think it wouldn't work on the development server that doesn't have the concurrency workload). I'll have tkprof output next week, I'll post it here by then.

Thank you all.

**slimdave** · 06-03-2005, 11:52 AM

You intend to set it for the whole instance? You'd better have a big pile of very fast disks handy then -- I wouldn't do that myself.

I would just login to SQL*Plus, set the event, then execute that representative SQL. See if anything obvious is flagged in the trace file.

Thread: query concurrency problems

Thread Tools

Display

query concurrency problems

Posting Permissions