The previous
article in this series reviewed Oracle 10g's enhancements to analytical
functions, including the use of partitioned outer joins to densify data
within multiple dimensions. As that article promised, I will now delve into the
new MODEL clause
that grants any Oracle developer or DBA the power to produce complex
inter-row calculations using just the database engine, thus eliminating the
need to massage data output via a third-party tool.
The Scenarios: Oh, What A Tangled Web Our Users Weave ...
As I mentioned in my previous article, these two scenarios
were among the more complex I had seen recently:
-
A user in Marketing needed a report that broke down sales within
time periods, product types, and regions, including percentages of the
total sales calculated at different control breaks.
-
An Accounting department user needed a report that calculated the
eventual value of a company's outstanding cash balance if particular
assets were invested at different rates of interest, or over different time
periods.
My first inclination would have been to tell my developers
that they should only use Oracle to aggregate the required data, then export
the data and whip up the final results through a third-party tool - perhaps
nothing more complex than a Microsoft Excel spreadsheet - to accomplish these
analyses. However, that was before I explored Oracle 10g's new MODEL clause.
Overview: The MODEL Clause
Let's explored how I used the MODEL clause to solve the first scenario above. I
will modify the SH.SALES_MIDWEST_ONLY
view created for the previous article so that view now selects a slightly
different subset of products, but still within the Midwestern states. Listing 3.1
shows the revised view.
Next, I will construct some sample SQL to demonstrate the
new MODEL clause:
SELECT state, prod, total_sales
FROM sh.sales_midwest_only
WHERE prod BETWEEN 125 AND 130
GROUP BY state, prod
MODEL
PARTITION BY (state)
DIMENSION BY (prod)
MEASURES (SUM(sold) AS total_sales)
IGNORE NAV
UNIQUE DIMENSION
RULES UPSERT
(
total_sales[99910] =
total_sales[prod=125] + total_sales[prod=126],
total_sales[99920] =
total_sales[prod=127]
+ total_sales[prod=128]
+ total_sales[prod=129]
+ total_sales[prod=130],
total_sales[99999] =
total_sales[99920]
- total_sales[99910]
)
ORDER BY state, prod;
When this SQL statement is executed, Oracle retrieves
the selected result set, places the result set into memory and then
allows the MODEL clause
to divvy up all returned rows into a multi-dimensional array. For
example, a three-dimensional array can be visualized as a cube that has been
cut into several horizontal slices, vertical slices, and layers. Continuing
this cubist analogy (apologies to Pablo Picasso!), the MODEL clause also lets me apply rules that
perform inter-row calculations on the data that is stored at each
intersection of these slices and layers. (For the sake of simplicity and my own
sanity, I promise to keep all examples under four dimensions.)
But wait it gets better! MODEL also lets me create additional sets of
values for example, brand new rows that include the same columns as the other
rows in the result set by direct assignment of values to the new row's cells.
I find it useful to picture this feature just as if I'd inserted a new row into
an Excel spreadsheet, and then added a formula, perhaps a SUM() or just simple
addition or subtraction, to calculate values into that new row's cells based on
the contents of other cells in the spreadsheet.
Since this may be the first time that you have seen the MODEL clause
in action, I will break down this rather complex statement into its components.
Let's first take a look at the statements that classify the returned columns of
this query into three distinct types:
-
Partition Columns. The PARTITION BY statement tells Oracle which column(s)
will be used to split up the result set into partitions. The partitions
are defined by the values present for each column specified in this clause. In
the example above, I am using the values returned for the STATE column to partition
my result set.
-
Dimension Columns. The DIMENSION BY statement tells Oracle which
column(s) will be used to split up the result set into dimensions, which
are used to break out rows within a partition. I am using the values
returned for just one column the Product Identifier to create my dimension
values for the result set.
-
Measure Columns. Finally, the MEASURES statement tells Oracle the column(s) on
which inter-row calculations may be performed. It helps me to visualize the
result set if I think of the values in the partition and dimension columns as
subscripts for each intersecting cube in the result set, and the values stored
within each cube as the candidates for the measurements I am going to be
taking.
Enforcing Result Set Uniqueness. Two statements let
me control how to handle uniqueness within the returned result set. UNIQUE DIMENSION,
the default, tells Oracle that the columns specified in the PARTITION BY
and DIMENSION BY
columns must form a unique key for the query. In this example, however, I have
specified UNIQUE SINGLE REFERENCE
to tell Oracle to only check single-cell references on the right-hand side of
the rule instead of the entire result set.
Handling Missing Values. What happens when there are
no values present at an intersection? MODEL also provides the IGNORE_NAV statement to tell Oracle to ignore
missing (i.e. null and absent) values. Since I specified IGNORE_NAV in
this example, Oracle will place a zero in any numeric columns, 01-JAN-2000 for
dates, an empty string (not a NULL!) for character datatypes, and a NULL value
for any other datatypes. If I had not specified IGNORE_NAV, Oracle would
instead assume the default, KEEP_NAV, was in force and would instead return NULLs
for both null and absent values.
Rules and Cell Assignments
Next, I will turn my attention to how the MODEL clause
defines the business rules for calculating values, and how those rules will be
applied to the result set:
Applying Rules to Measure Columns. Each value that is
found in a MEASURE column can be operated upon by one or more rules.
Much like a simple algebraic equation, a rule has two components:
-
The left-hand side of the rule simply specifies which
cells are to be updated (or created see the next sections!) as a result of
the rule's right-hand side instructions.
-
The right-hand side evaluates to the values to be assigned
to the cells specified on the left-hand side of the rule.
Rule Processing Order. MODEL also lets me define the order in which
rules can be applied to the result set. Since I specified the default
method, SEQUENTIAL ORDER,
in this query, Oracle will apply the rules to the result set in the order
that the rules have been specified. However, if AUTOMATIC ORDER is picked instead, Oracle will
select which rule is to be processed based on the order of the individual
rules' dependency. It is important to note that if AUTOMATIC ORDER is selected, the end result is
that a cell will be assigned a value just once. On the other hand, when SEQUENTIAL ORDER is
specified, a cell could be assigned a value several times based on the order of
the rules being applied, and the value based on the last applied rule becomes
the final value.
Processing Rule Assignments. MODEL provides two methods for processing these
value assignments:
-
The UPDATE
directive (not to be confused with the UPDATE DML statement) tells Oracle to
only apply the rules to the cells referenced on the left hand side of
the rule that it finds in the multi-dimensional result set. If no match is
found, nothing happens.
-
On the other hand, the UPSERT directive instructs Oracle to apply the
rules to all cells that match the left-hand side of the rule
specification, as well as insert new rows for any that do not exist in the
result set. It is important to note that this is the default behavior,
but that it only matters when positional referencing is being used on the
left-hand side of the rule, and one cell is being referenced.
Cell Assignments. The most complex set of statements
describe which cells should participate in the rules, so naturally I have saved
them for last. Fortunately, there are just two types, symbolic and positional.
-
A symbolic reference uses a Boolean condition to tell the
rule which specific single-dimension column the rule should be applied to. For
example, in our sample SQL statement, this rule:
total_sales[99910] =
total_sales[prod=125] + total_sales[prod=126],
tells Oracle to create a new row in
the total_sales
dimension that's indexed by value 99910, and add all values in the rows indexed
by Product ID 125 and Product ID 126 into the columns in that new row.
-
As its name implies, a rule with a positional reference
determines which column in the dimension to apply the rule based on the column's
relative position in the array described by the DIMENSION BY clause. For example, I can tell a
rule to calculate a value in a target cell in the current row based on the
difference between two source cells. However, one of the source cells could come
from the row above the current row and the other source cell could be
two rows below the current row. I will demonstrate positional reference in a
later example.
Listing 3.2
shows the original SQL statement, and the results returned from its execution.
Another important difference in symbolic versus positional
referencing is the way NULL values are treated. For example, if I was
referencing a single cell in a two-dimensional array and I used a symbolic
reference, the rule would not be applied; but if I used a positional
reference for that same single cell, the rule would be applied. A slightly
modified version of the previous query that uses positional rather than symbolic
references is shown in Listing 3.3,
and it effectively demonstrates this difference.
It Never Ends: Performing Multiple Calculations Repeatedly With ITERATE
Finally, the MODEL clause offers the ITERATE directive to perform the same calculation
a specific number of times until either the maximum iterations have been
completed, or a threshold value has been reached that forces the iteration to
be terminated. To complete the example solution for my second scenario
calculating outstanding cash balances based on the value of particular assets
-- I have constructed a new table, SH.BALANCE_SHEETS, and populated it with some
sample data representing several balance sheet accounts (see Listing 3.4).
Here are the business rules my users have supplied for
projecting cash balances:
-
Compute the value of Cash Discounts (Account #2020) as the
sum of Cash on Hand (#1010) plus Accounts Receivable (#2010), then multiply by
30% to reflect Net 30 as our typical Cash Discount terms.
-
Calculate the value of Earned Interest (Account #2030) by
adding together Cash on Hand plus Accounts Receivable, then subtracting Cash Discounts,
and then multiplying by 7.5% to reflect our expected interest rate.
-
Derive Taxable Dividends (Account #3020) by adding
together Cash on Hand and Accounts Receivable, subtracting Cash Discounts,
adding Earned Interest, and then multiplying by 15% to reflect our expected tax
rate.
-
Finally, feed these resulting balances back into the
calculation as the beginning balance for each account during the next
iteration of the calculation.
This would be relatively simple to do in an Excel
spreadsheet, of course, but with ITERATE I can now also perform it right on the
Oracle server. Here is the SQL statement I constructed against SH.BALANCE_SHEETS
to accomplish this:
SELECT
acct#, fb
FROM sh.balance_sheets
MODEL
DIMENSION BY (acct#)
MEASURES (balance fb)
RULES ITERATE (3)
(
fb['3020'] =
((fb['1010'] + fb['2010'] - fb[2020] + fb[2030]) * 0.15)
,fb['2030'] = ((fb['1010'] + fb['2010'] - fb[2020]) * 0.075)
,fb['2020'] = (fb['1010'] + fb['2010'] * 0.30)
);
Listing 3.5
shows the results of executing this query for one, two, and three iterations.
Conclusion
Oracle 10g's new inter-row calculation capabilities
significantly expand the already-powerful suite of analytical functions so that
any developer or Oracle DBA can perform complex, advanced data modeling and
reporting in multiple dimensions, all without the use of third-party software
to perform these analyses. Oracle 10g's extensions to Structured Query Language
have insured it is among the most robust, flexible, and fully featured of any
of the relational premier database management systems.
References and Additional Reading
While there is no substitute for direct experience, reading
the manual is not a bad idea, either. I have drawn upon the following Oracle
10g documentation for the deeper technical details of this article:
B10736-01 Oracle Database
Data Warehousing Guide, Chapter 22
B10750-01 Oracle Database
New Features Guide
B10759-01 SQL Reference
Previous
Back to DBAsupport.com