In the last
article, we covered some of the possible implementations Oracle uses to
perform searches. Closely related to searching is sorting, and that raises the
question of how Oracle performs this key piece of functionality. If you’ve done
it once, you’ve done it a million times, and that is to include an ORDER BY
clause in a SELECT statement, and one of the main goals of this article is to
help foster an appreciation of what the RDBMS does or has to go through when
you toss in the ubiquitous ORDER BY clause. Fundamentally, how do you sort a
list? Do you start at one end and work your through to the end, finding the
first ranked element (could be highest or lowest), return to the start of the
list (minus the position you just found as being the first) and start over? And
how does MIN and MAX fit into this sorting or ordering of elements?
Let’s start with MIN and MAX, as these can be boiled down to
trivial cases. That’s not “trivial” in the sense of we have nothing better to
do, but trivial in the sense of mind-numbing exotic proofs your Real Analysis
professor used to show by filling up every chalkboard in the classroom. In this
case, the search for MIN and MAX means we only care about the current min or
max value. Once a candidate is compared to the current leader, it either
replaces it or is discarded, never to be bothered with again. The search for
MIN/MAX should then have one benefit over a complete sort: only one pass
through the list is needed.
So what is of more interest to us is how Oracle sorts a
list. And list can mean more than one source of data. For example, if joining and
sorting two tables, would it be more efficient to first sort each table and
then perform a merge-sort, or would it be better to perform the full join, and
then sort the results? If you are familiar with execution plans, you may
already suspect the answer given the fact Oracle often uses a merge join in
such cases (a sort merge join). The idea is simple: sort one list, then
the other (concurrently if possible), start at the top of both (sub) lists and
merge them together to produce the final list. And to extend this idea, the
source can be just one list that is recursively divided or divided into buckets
(think partitions). Taking a large piece of work and breaking it up into
smaller units of work, commonly known as divide and conquer, has widespread use
in Oracle.
Speaking of lists, how is the data arrayed within a list? The
ideal (or trivial) case would be sorting an already sorted (in the same order)
list. A relatively (depending on the algorithm used) worst case would be
sorting an already sorted list, but this time you are reversing the sort order.
Why would you do that? Probably because you didn’t know the data was already
sorted or you simply need the list presented in the opposite order. Where does
the data live? If it came from an index, there may not be as much work to do
since the index could have stored the data in sorted order to begin with. But,
if the data comes from a table, who knows how it is stored? Four possible cases
(or five, if the last one is extended a bit) are shown below. Is completely (as
much as it can be) random data easier or harder to sort than something which is
nearly sorted, reversed, or relatively unique (the extended case being 100%
unique)?

Figure 1. Various
Arrays of Data (from http://www.sorting-algorithms.com/)
The answer to that question depends on the sorting algorithm
used. Since practically all of what Oracle does is designed to be performed in
the most efficient manner possible (including human efforts to introduce the
worst possible SQL), it would be reasonable to assume that the RDBMS uses the
best sorting algorithm known to man. If there were one and only one best
sorting algorithm, great, but that is not the case. With respect to sorting
algorithms, there is no one best method. So, which method does Oracle use? Like
the answer to many other questions related to optimization, it depends.
What are some of the choices? These are typically presented
in algorithm design classes, and the list includes insertion, selection, bubble
(and variations thereof), quicksort, merge and heap. For small lists (small
being relative, but generally less than 1000 elements), some algorithms are
faster than others, but even within the same algorithm, how the data is arrayed
can make a difference. As an interesting exercise, one could code some of the
algorithms using PL/SQL, populate arrays/collections with data, and time the
sorting results. When using the ORDER BY clause in SQL, which method does
Oracle choose to use when and where? If you were designing your own RDBMS, you
would have to re-invent this wheel on your own. We know sorting is a key
element of using and viewing data, and countless forum questions have been
registered about sort_area_size.
For a visual representation of how sorting algorithms look
with respect to performance (look at the relative performance, not the absolute
times), go to http://www.sorting-algorithms.com/,
a site developed by David R. Martin of Boston College. Other Web sites show
visual animations much like this one, but sorting-algorithms.com makes
choosing, viewing, and comparing methods easy to do.
Another factor to consider when it comes to sorting concerns
the rules used to sort data. If we compare Apples to Apples, does it matter
which element comes first, that is, do we preserve the order as is? Does it
matter if Apples comes before or after APPLES or even apples? And to make
things more interesting, what if your language spells apples as applés? Sorting
numbers is pretty much a no-brainer, but sorting strings introduces much more
complexity, and that leads to the next topic in this series: linguistic
sorting. There is an amazing amount of rules and considerations to take into
account when sorting strings. If all you’ve ever worked with is your own
language set, then seeing how Oracle deals with other languages will be
somewhat of an eye-opener.
In Closing
Again, the emphasis here is to gain an appreciation of what
Oracle has to offer with respect to searching and sorting, and especially so when
it comes to linguistic sorting. You may not know this, but you do have some
control over how sorts are performed within your database’s character set and
you can compare Apples to apples, as it turns out.
Previous
Next
Back to DBAsupport.com