In first difference between numeric and char(varchar2) datatypes
is tiny if they have the same length and I don't care about it.
But, as "common theoretical" problem, this difference exists.
1)char(varchar2) fields Oracle store as ascii/ebcdic/unicode characters
with maximum length 4000 characters.
Most CPU platforms on assembler/mashine code level have to use
commands like 'long compare' and 'long moving' for work with this type of
Binary structure of these commands has only 1 byte for describe length of operation,
then maximum preparing data length will be no more then 256 bytes and Oracle
(and any other program) must use set of mashine commands for comparing
(for example) two varchar2 fields.
2)numeric(number) fields Oracle store in binary-decimal numeric system and
(as i know) most CPU platforms on assembler/mashine code/ level have
set of special command for work with this type of data (PACK group) and
any program can use only 1 mashine command in sort/find and so on operations.
In this case Oracle has to implement two independent parts of assempler
code for execution comparing or moving char(varchar2) datatype and number datatype.
I guess, this is internal nature of difference of number and varchar2 datatypes
as a primary key field.
Fair enough, slimdave, and all good points - sorry for the rant (but it *is* my hallmark)
With the tight, non-sensical series of numbers and letters as a string PK, I was referring to jmodic's point that string PKs can actually be tighter than numeric PKs because of the greater availability of values (base-10 as opposed to base-26, or 36, or 64 or 128, depending on what characters you allow.)
As for pre-existing, commonly-accepted, static, small string pks, like state codes, or currency codes, or country codes - they are definitely an exception and preferable to surrogate keys.
So, here are the choices as I see them, either:
1 - There is a standard, short, acceptable string key in existence
2 - Somebody creates a meaningless series of characters
3 - Somebody creates a meaningless numeric key
4 - Somebody creates a meaningful short string key (abbreviations)
5 - An existing meaningful number is used
6 - An existing meaningful string is used
So, 1 is fine. 2 is not terrible, but rarely done, and harder to implement than 3. 3 is best, IMHO. 4-6 suffer from meaningful keys, and also usually result in keys that are larger than they would be with surrogate keys. Sometimes a lot larger. They are almost never smaller. I don't see it as a bad assumption that surrogate keys are tighter.
Shestakov seems to be making the further point that I didn't have the knowledge to support myself - that numeric comparisons are faster than string comparisons.
So again, I just wanted to make sure that the point is clear for the up-and-coming dbas that meaningless, numeric, single-field, sequence-populated, surrogate keys are the best choice 99% of the time, IMHO. Existing, commonly-accepted string codes (states, countries, currencies, etc.) are notable exceptions. And as slimdave noted, we covered a lot of detail here to help with that last 1%
Click Here to Expand Forum to Full Width