The time has come to make account "numbers" in
your customer database real numbers or real alphanumeric strings across the
board. Why would you ever choose to do something like that, especially if you
are dealing with millions of records? One reason is that you may be forced into
doing so because your company was acquired, and the database integration
between your system and your new parent company requires such an undertaking.
Another reason is based on the rampant use of Social Security Numbers (SSN) as
a means of identification. Chances are you have online accounts, which use all
or part of your SSN as the account number, which, for what it's worth, SSNs
were never intended to be used for anything other than taxation and tracking
Social Security benefits. Many colleges, as an example, have moved away from
using the SSN as the primary means of identification for students.
Another reason for performing a conversion may be that you
have exhausted the original pool of possible account numbers or that your
random alphanumeric string generation scheme has been generating duplicates. So
whatever the reason, how would you go about salvaging actual numeric strings
and identifying those which need to be converted? Here is an example to help
clarify the situation.
Suppose you have the following account numbers: A12345,
009431, and 783652. By quick inspection, A12345 obviously needs to be converted
because of the letter "A." The third account number is a "real"
number, so nothing needs to be done with it. The second account number is a bit
tricky. The characters, so to speak, are all numeric, but is 009431 a number
you really want to keep given that most systems ignore leading zeroes? Moreover,
if the new system has account numbers all the same length (six in this
scenario), then 9431 fails to meet the minimum length. The end result is that
the salvageable account numbers, which are truly numeric, must also be tested
for length (or a minimum value).
What schemes are at your disposal to identify strings one
way or the other? Math, translation, date functions, and regular expressions
are but four ways to approach and solve this problem.
Approach 1: Using TRANSLATE
This isn't pretty, but it will identify a real number.
SELECT <column> FROM <table> WHERE
length(<column>)- length( TRANSLATE(<column>, CHR(1)||TRANSLATE(<column>, CHR(1)||'1234567890', CHR(1) ), CHR(1) ) ) = 0;
If you want to test for a string, then use "> 0"
instead of testing "= 0." I have inserted the three account numbers
from before into a test table.
SQL> create table account_test
2 (acctnum varchar2(6));
Table created.
SQL> insert into account_test values('A12345');
1 row created.
SQL> insert into account_test values('009341');
1 row created.
SQL> insert into account_test values(783652);
1 row created.
SQL> commit;
Commit complete.
Note that in the last insert statement, the account number
was inserted without the single quotation marks, that is, it went in as a
number, so how does Oracle view this now: number or varchar2 string?
SQL> SELECT acctnum "ACCTNUM" FROM account_test WHERE
2 length(acctnum)-
3 length(TRANSLATE(acctnum,CHR(1)||TRANSLATE(acctnum,CHR(1)||'1234567890',
CHR(1)),CHR(1)))=0;
ACCTNUM
-------
009341
783652
This approach, although the length/translate part looks
cumbersome, has one distinct advantage in that the results will always be one
way or the other. How do you account for the length of the number so that
009341 is filtered out? This leads to a math approach.
Approach 2: Using Math
Adding an extra "and length(acctnum/1)=6" or
similar operation has the desired effect of filtering 009341. A distinct
disadvantage of the math approach is that when using it by itself, you will run
into an error situation when trying to divide ‘A12345' by one.
SQL> select '009341'/1 from dual;
'009341'/1
----------
9341
SQL> select 'A12345'/1 from dual;
select 'A12345'/1 from dual
*
ERROR at line 1:
ORA-01722: invalid number
The math approach is still valid if you account for the
ORA-01722 error, and one way to do this is to use a nested PL/SQL block within
another PL/SQL block so you can deal with the exception without exiting the
main processing loop.
SQL> set serveroutput on
SQL> declare
2 v_result number;
3 cursor c is
4 select acctnum
5 from account_test;
6 begin
7 for r in c loop
8 begin
9 select r.acctnum/1 into v_result from dual;
10 dbms_output.put_line(r.acctnum||' is a number');
11 exception
12 when others then
13 dbms_output.put_line(r.acctnum||' is a NOT number');
14 end;
15 end loop;
16 end;
17 /
A12345 is a NOT number
009341 is a number
783652 is a number
PL/SQL procedure successfully completed.
A slight modification to the select into v_result statement will
perform the filtering operation from before, so 009341 will be flagged as not
being a number (at least not in the sense of what we're looking for).
Using WHEN OTHERS has the undesired effect of masking other
errors, so a better exception handling condition, that is, one which
specifically catches "invalid number" conditions, would be ideal.
Fortunately, there happens to be a built-in predefined handler named
INVALID_NUMBER. If all you are familiar with is the OTHERS condition, you owe
it to yourself to see what Oracle (specifically, PL/SQL) has to offer in its
list of Predefined
PL/SQL Exceptions.
So far, the results have been simply output to the
user/terminal. Instead of using a select statement (by itself), the result set
can be steered into one of two tables via an "insert into other_table
select …" statement.
Approach 3: Using a Date Function
An infrequently used date format option is Julian date. A
simple example of using the Julian date format to convert numbers to words is
shown below.
SQL> select to_char(to_date(009341,'J'), 'JSP') "WORDS" from dual;
WORDS
-------------------------------------
NINE THOUSAND THREE HUNDRED FORTY-ONE
What happens when you try to convert A12345 into words using
this approach? I will use the same type of PL/SQL block as before, but toss in
an extra exception handling condition.
SQL> declare
2 v_result varchar2(300);
3 cursor c is
4 select acctnum
5 from account_test;
6 begin
7 for r in c loop
8 begin
9 select to_char(to_date(r.acctnum ,'J'), 'JSP') into v_result from dual;
10 dbms_output.put_line(r.acctnum||' is a number');
11 exception
12 when value_error then
13 dbms_output.put_line(r.acctnum||' is a NOT number');
14 when others then
15 dbms_output.put_line(sqlerrm);
16 end;
17 end loop;
18 end;
19 /
ORA-01858: a non-numeric character was found where a numeric was expected
009341 is a number
783652 is a number
PL/SQL procedure successfully completed.
The VALUE_ERROR handler didn't catch the ORA-01858 condition
and is why the output shows the SQL error message. The inner exception block
can be modified to catch/test for specific error numbers, but accounting for
all of the possible types of errors of this nature quickly makes the date
approach – at least as far as the account number conversion project is
concerned – somewhat cumbersome. Our last approach is elegant and easy to
implement.
Approach 4: Using Regular Expressions
A new feature in Oracle
(which release?) deals with regular expressions. Long a mainstay of
other languages (to include UNIX), regular expression functionality had been
missing from Oracle. This approach is so simple and elegant to use that you may
wonder why the other three were even mentioned in the first place. The only
requirement for using REGEXP is that you must be using, obviously, a version of
Oracle that supports it. You know there are companies still using version
7.3.4, and given that REGEXP first appeared several versions later, well, that
answers why the other approaches merit consideration.
So what does REGEXP_LIKE do for us? Let's go straight to the
"only digits" search condition.
SQL> select acctnum from account_test
2 where regexp_like (acctnum,'^[[:digit:]]+$');
ACCTNUM
-------
009341
783652
Again, a slight modification to the query will filter out
the too short 009341 number. The REGEXP_LIKE function performs exactly the same
as the first approach using TRANSLATE, that is, results come back one way or
the other.
A Slight Extension
Taking either the first or last approach and converting the
query/returned result into a function call adds an extremely useful utility
users/Oracle have been missing since day one. How do you test for "not a
number" or "is a number" in Oracle? By using the first or last
approach, you just performed that test. Turn it into a function returning true
or false, and grant execute permissions as appropriate, and you can now perform
NaN(whatever) tests.
In Closing
The examples in this article are not purely academic because
they can be applied to many commonly encountered day-to-day situations. Data
hygiene is of critical importance to a database. Suppose you are building (or
managing) a database application, which takes orders from customers. How do you
check if a phone number, bank routing number, bank account number, or credit
card number is all digits? Alternatively, what if you need to identify
customers by country (using the US and Canada as examples)? US ZIP codes are
different from Canadian postal codes. A NON all-digit postal code is a good bet
that you are dealing with a Canadian address. A common theme in my articles is
that there is almost always more than one way to solve a problem, and the
problem of separating numbers from letters is no exception.
Back to DBAsupport.com