varchar as primary key

Printable View

03-15-2003, 08:17 AM
aspdba

varchar as primary key

hello ppl,

Are there any kind of disadvantages keeping a varchar column as your primary key?
Thanks
03-15-2003, 08:22 AM
abhaysk

Performance might slightly get affected during insert.

Abhay.
03-15-2003, 08:41 AM
aspdba

thanks abhay

So it would even effect the performance of tables referencing this column , wouldnt it?
03-15-2003, 09:03 AM
nagarjuna

Quote:

Originally posted by aspdba
thanks abhay

So it would even effect the performance of tables referencing this column , wouldnt it?

It is otherway around this time...
03-15-2003, 07:47 PM
slimdave

I bet it would be a real challenge to find any difference in performance inserting into a table with a varchar2 pk compared to char, number or date. If the best data type for the column that is the PK happens to be varchar2, then use it.

There are plenty of other issues to think about in database design -- forget about this one, it's a irrelevance.
03-18-2003, 07:13 AM
abhaysk

Quote:

Originally posted by slimdave
I bet it would be a real challenge to find any difference in performance inserting into a table with a varchar2 pk compared to char, number or date.

Slimdave:

It does matter, if we think logically searching a character would result slower than seraching number. One Character will use 8 bits to be stored on harddisk and number would use 4 bits.
During insert or update oracle has to first search if the string exists in that column so goes if the column is as number & obivously searching a number is much faster than searching a string...

consider a example..

Code:

SQL> create table test1 ( Emp_Name Varchar(8) primary key ); Table created. Elapsed: 00:00:03.08 SQL> create table test2 ( Emp_Number Number(8) primary key ); Table created. Elapsed: 00:00:00.01 -- Added some 20 distinct records of 8 charactered value in table test1 & 8 digited value in test2...... SQL> update test1 set emp_name='AbhaySK1' where emp_name='AbhaySK2'; update test1 set emp_name='AbhaySK1' where emp_name='AbhaySK2' * ERROR at line 1: ORA-00001: unique constraint (ABHAY.SYS_C001262) violated Elapsed: 00:00:00.03 SQL> update test2 set emp_number = 44444444 where emp_number = 22222222; update test2 set emp_number = 44444444 where emp_number = 22222222 * ERROR at line 1: ORA-00001: unique constraint (ABHAY.SYS_C001263) violated Elapsed: 00:00:00.00 -- Searching a number will be faster then Characters & thas obivious.

it might appear that difference of .03 secs is not a big deal...yes it is not coz u have jus some 20 records so is fast......had u had some 10 million records then this would have gone in may be mins.....

Quote:

There are plenty of other issues to think about in database design -- forget about this one, it's a irrelevance.

It wud be of importance if he were to have huge tables with crores of records....

Abhay.
03-18-2003, 08:35 AM
slimdave

Is that what passes for a real test nowadays, then? I think not. Don't show me a 0.03 sec walltime difference on a constraint violation inserting into a twenty row table, then extrapolate it it to 10 million rows, do a real test. Are you teling me that an application is going to try inserting 10 million rows PK values, all the same, into a twenty row table?

If it makes logical sense for the PK to be numeric, store it as numeric. If it needs to be varchar2, then store it as that.
03-18-2003, 09:00 AM
abhaysk

slimdave:

now the discussion is diverting....the question was not if its logical to have PK of varchar or number....

Question was Whether PK of varchar wud hinder the performance...

and obiviously yes....as compared to PKs with numbers...

Wat i gave the example there is just to illustrate the fact that insert or update with varchar as Pk would take more time...

And s ....u can logically comprehend that by extrapolating the graph (from above example)....with many records, say 1 lackh, in table & trying to update/insert ...u will suerly have to wait for say 30secs to 1 row to be updated/inserted sucessfully.

If it were a numeric type , i bet , it wud jus take 2 to 3 secs to update/insert with same number of recs as above..i.e 1 lackh..

Abhay.
03-18-2003, 10:44 AM
slimdave

Your demonstration has no meaning whatsoever, so extrapolating it is a waste of time. A wall clock measurement of 0.03 seconds, vs 0.00 seconds, tells us nothing at all.

Firstly, measure how long it takes to successfully insert a row in CPU time -- not wall clock time, not PK violation.

Then take account of the need to actually store text. I don't believe that any one would seriously consider storing a numeric PK in a varchar2 column, so the need must be to store a unique text string, right? If you do that with a numeric PK you must be talking about a synthetic PK, so the test case ought to be ...

create table test1 ( Emp_Number Number primary key ,Emp_Name Varchar2(8) Unique);

... and ...

create table test2 ( Emp_Name Varchar2(8) Primary Key);

Test inserts into those structures.
03-18-2003, 11:59 AM
DaPi

Re: varchar as primary key

Quote:

Originally posted by aspdba
Are there any kind of disadvantages keeping a varchar column as your primary key?

Yes, it might tempt you to give meaning to the key. I'm a strong believer in meaningless PK's (hence use a sequence).
03-18-2003, 04:23 PM
slimdave

That's a good logical reason to use a numeric PK, although it might need the varchar2 UK also and hence be less performant.
03-18-2003, 04:27 PM
jmodic

Quote:

Originally posted by abhaysk
Wat i gave the example there is just to illustrate the fact that insert or update with varchar as Pk would take more time...

Abhay, no offence, but your example illustrates nothing whatsoever! You can't be serrious with your conclusion that the example you gave realy illustrates that updating varchar PK is slower than updating numeric PK! This would be simply too ridiculous. If I extrapolate from your childishly simpe example, what would you think of the following statement:

"As abhay's example shows, it is perfectly evident, that creating an empty table with varchar column as a primary key is about 300 times slower than creating the table with number column as a PK. Just a look at the following timings from abhay's example prooves this point:

SQL> create table test1 ( Emp_Name Varchar(8) primary key );

Table created.

Elapsed: 00:00:03.08
SQL> create table test2 ( Emp_Number Number(8) primary key );

Table created.

Elapsed: 00:00:00.01

QUED."

Would anyone seriously give any credibility to the above "proof"? Don't think so.

If you would want realy to prove your point, then you should at least produce a representative test case. And I totaly agree with slimdave when he says that "it would be a real challenge to find any difference in performance inserting into a table with a varchar2 pk compared to number" (because that's exactly what you were claiming in your first reply).

And BTW, your "logical" explanation why searching in character indexed column would be slower than searching in number indexed column only shows that you don't understand how oracle stores number datatype values in the database:

Quote:

It does matter, if we think logically searching a character would result slower than seraching number. One Character will use 8 bits to be stored on harddisk and number would use 4 bits.
03-18-2003, 05:47 PM
slimdave

Hooray for Jurij!
03-19-2003, 04:58 AM
abhaysk

Quote:

Originally posted by jmodic
Abhay, no offence, but your example illustrates nothing whatsoever! You can't be serrious with your conclusion that the example you gave realy illustrates that updating varchar PK is slower than updating numeric PK! This would be simply too ridiculous. If I extrapolate from your childishly simpe example, what would you think of the following statement:

Would anyone seriously give any credibility to the above "proof"? Don't think so.

If you would want realy to prove your point, then you should at least produce a representative test case. And I totaly agree with slimdave when he says that "it would be a real challenge to find any difference in performance inserting into a table with a varchar2 pk compared to number" (because that's exactly what you were claiming in your first reply).

And BTW, your "logical" explanation why searching in character indexed column would be slower than searching in number indexed column only shows that you don't understand how oracle stores number datatype values in the database:

First of all let me explain u how characters or numbers are stored at hard ware level ( I mean it hard disk level & not even OS level )

Take for example a integer 0, it requires 4 bits to store ( as i said in my previous post & ther too i meant hard ware level and not OS or index level as these are logical rather than physical )......

And that 4 bits would be '0000' as binay bit values.

Now take how the character 'a' would be stored, it requires 8 bits & would be stored as ASCII equivalent 39 and in binary it would be '01101001'.

So now for the CPU to compare the numeric digit 0 with 0 or 1 or so....lets assume it would take 'x' nano secs....

So logically for the comparison of the character 'a' with 'a' or 'b' or so would rather take much time say 'y' nano secs
and obivously 'y' is atleast 2 times the 'x'.

Now lets come to logical storage as u mentioned the values of numbers & chars in indexes and so.......

if u define char or varchar(2) then each character would require 1 byte to be stored at harware level.
if u define number then each digit will be stored in mantisa and exponention form which would also require 1byte, due 1 mantisa part would require 4 bits and exponent would require 4 bits......if its 2 digit number then it would require 12 bits.......

Now lets come to our question as to why Pk with varchar would be slower than Number or integer while insert or update.

While insert on PK column oracle would search for the string, if its char type, if it existed in the table....may it search thru index it still need the help of CPU to comapre the String value......

And as i explained above CPU wud atleast take time 2 times than if it were to compare with Number.......

This will not be significant if the table were to have less records....but ofcource yes if it contained huge amount of data......

My point was to show why char search takes more than number search.......and i hope this explains....

Well i thought logically and not childishly.

Abhay.
03-19-2003, 05:29 AM
pando

well how about practically?
03-19-2003, 06:38 AM
abhaysk

when u talk about practical things....it wud be noticable if table contains huge data.

Abhay.
03-19-2003, 08:29 AM
pando

no I mean if you have tested in real environment
03-19-2003, 09:14 AM
jmodic

Quote:

Originally posted by abhaysk
First of all let me explain u how characters or numbers are stored at hard ware level ( I mean it hard disk level & not even OS level )

Take for example a integer 0, it requires 4 bits to store ( as i said in my previous post & ther too i meant hard ware level and not OS or index level as these are logical rather than physical )......

And that 4 bits would be '0000' as binay bit values.

Now take how the character 'a' would be stored, it requires 8 bits & would be stored as ASCII equivalent 39 and in binary it would be '01101001'.

You are talking nonsence here. "Hardware level" (as you call it) doesn't store numbers or chars or dates or whatever - it only stores bits - zeroes and ones. That's all. When you store a data into the database, your OS or your controller or your disk doesn't know and doesn't care what kind of data you want to store - it simply writes the whole database block that DBWR has sent for write. So all the formatting of 0's and 1' in a database block is done by Oracle, and the database block is then simply broken down to multiple OS blocks which are written directly to disk, exactly the way as oracle has prepared them. And FYI, the number 0 stored in Oracle database doesn't occupy only 4 bits of your hard disk storage, it ocupies exactly 1 byte (8 bits). The number 1 will occupy 2 bytes, -1 will occupy 3 bytes, 999 will also require 3 bytes, while for example number 10000000000 will require only 2 bytes of your storage.

Quote:

if u define char or varchar(2) then each character would require 1 byte to be stored at harware level.

Thats not true either. Each character stored in your database will occupy 1 or 2 bytes of your disk storage, depending on your database characterset.

Quote:

if u define number then each digit will be stored in mantisa and exponention form which would also require 1byte, due 1 mantisa part would require 4 bits and exponent would require 4 bits......if its 2 digit number then it would require 12 bits.......

That's nonsence too. Read the previous explanation about how numbers are stored...

Quote:

While insert on PK column oracle would search for the string, if its char type, if it existed in the table....may it search thru index it still need the help of CPU to comapre the String value......

And as i explained above CPU wud atleast take time 2 times than if it were to compare with Number.......

Wrong conclusions due to previously explained misunderstandings. The time difference even on a big table would be negligible, I can assure you.
03-19-2003, 10:12 AM
abhaysk

Quote:

Originally posted by jmodic
You are talking nonsence here. "Hardware level" (as you call it) doesn't store numbers or chars or dates or whatever - it only stores bits - zeroes and ones.

when did i say it stored numbers or char or so...
if u read the post clearly i specifed the binary equivalent of number or char which wud be stored in the disk.

Quote:

So all the formatting of 0's and 1' in a database block is done by Oracle, and the database block is then simply broken down to multiple OS blocks which are written directly to disk, exactly the way as oracle has prepared them.

S its done by oracle and it is following ASCII standards (one of character set example) .... that is wat i had given the conversion of the char 'a' and number 0, assuming its 1-Byte character set & ASCII conversion.
03-19-2003, 11:32 AM
DaPi

Is there any difference in their impact on global warming?
03-19-2003, 11:35 AM
slimdave

They certainly tend to generate a lot of hot air.
03-19-2003, 03:40 PM
tamilselvan

I have create 2 tables T1 and T2.
Table T1

Name Null? Type
--------------------- -------- -------------------
ROW_NUM NOT NULL NUMBER
OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(128)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(18)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)

Table T2
Name Null? Type
--------------------- -------- -------------------
ROW_NUM NOT NULL VARCHAR2(10)
OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(128)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(18)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)

I have PKs on ROW_NUM columns on both tables.

select count(1) from t1;
COUNT(1)
----------
1890288

14:20:10 H8DEVW3>select count(1) from t2;

COUNT(1)
----------
1890288

Both tables have same number of rows (1.8 Million) and same data.

Here is my test results:

14:10:23 H8DEVW3>update t1 set row_num = 10 where row_num = 40000;
update t1 set row_num = 10 where row_num = 40000
*
ERROR at line 1:
ORA-00001: unique constraint (SYSADM.T1_PK) violated

Elapsed: 00:00:00.80

14:11:10 H8DEVW3>update t2 set row_num ='10' where row_num ='40000';
update t2 set row_num ='10' where row_num ='40000'
*
ERROR at line 1:
ORA-00001: unique constraint (SYSADM.T2_PK) violated

Elapsed: 00:00:00.80

In my test I found no time difference for searching a unique key in T1_PK and T2_PK.

However, the space required for T1_PK Index is less than that of T2_PK index.
BLOCKS NAME LF_ROWS LF_BLKS BR_BLKS BR_ROWS BTREE_SPACE USED_SPACE
---------- --------- ---------- ---------- ---------- ---------- ----------- ----------
4480 T1_PK 1890288 4070 7 4069 32599916 29262117

BLOCKS NAME LF_ROWS LF_BLKS BR_BLKS BR_ROWS BTREE_SPACE USED_SPACE
---------- --------- ---------- ---------- ---------- ---------- ----------- ----------
5120 T2_PK 1890288 4588 9 4587 36757900 32975669

T2_PK index needed an extra 518 (4588 - 4070) blocks for the leaf and an additional 2 blocks for the branch because of the varchar2 data type.

Certainly number data type requires less space than the varchar2 data type. Any serach on index requires minimum of 3 Logical I/Os. That is why I got the same elapsed time from both indexes. If I start searching 1000s rows using index, i will get different response time.

But if you are going to search a single row or less than 100 rows using index, then either data type will work efficiently. Please remember that defining a data type for a column is entirely dependant on application.
03-19-2003, 04:36 PM
slimdave

just to return to my original point lest it be forgotten, basing the choice of what data type should be used for a PK on any alleged performance benefits of one type over another is not a good way to spend your valuable work hours. You are chasing performance benefits of fractions of a percent for your application, and you risk degrading the flexibility/supportabilty/expandability of your data model.

i) use a data type that is appropriate for the data being stored.
ii) if your pk is synthetic, and thus probably numeric, then NUMBER is evidently the correct choice.
03-19-2003, 08:01 PM
jmodic

First of all, I don't want to object any of tamilselvan's observations because they are all justified and logical. However I do find the following to deserve some additional explanation:

Quote:

Originally posted by tamilselvan
Certainly number data type requires less space than the varchar2 data type. Any serach on index requires minimum of 3 Logical I/Os. That is why I got the same elapsed time from both indexes. If I start searching 1000s rows using index, i will get different response time.

The above is true only under the pressumption that in VARCHAR2 column you are actualy storing the numeric values (converted to strings) - in that case of course the space consumed for the same numeric values will be greater in VARCHAR2 column than if they are stored in NUMBER column. Of course noone in his right mind would want to store let's say ******* values in a VARCHAR column. However you can also think of it another way:

If we want to assign unique ******* values to those 1.8 milions, those ******* numbers will on avarage consume somwhere between 4 and 5 bytes each - so those numbers will require about 8 milion bytes of storage (that's just my estimate, don't take it literaly).

Now let's say that we assign unique VARCHAR2 values to those 1.8 million of records, where values can be composed of normal ALPHANUMERICAL characters (A-Z, a-z, 0-9, if we set asside those other more "exotic" characters) - that means that each of those 1,8 unique values will consume no more than 3 bytes, so in total about 5 milions bytes (with single byte chatacterset).

So in effect, for uniquely identifying 1.8 million of rows with ******* numeric values will require about 30% more space compared to if we use "normal" alphanumeric values! That's quite normal, because in effect in the first case we use base-10 numeric system, while in the second case we are using base-64 (or something similar) numeric system, which is much more efficient storage-wise. Of course it is totaly different isue which of the two optons have more practical use .....

So my point is: Saying that using numeric datatype for PK is more efficient compare dto VARCHAR2 datatype because of the sawing in space consumed by the underlying index is only true *under certain circumstances* - in general it is just the other way arround!

I think slimdave has realy given the most accurate bottomline of this whole isue:

Quote:

Originally posted by slimdave
i) use a data type that is appropriate for the data being stored.
ii) if your pk is synthetic, and thus probably numeric, then NUMBER is evidently the correct choice.

Everything else is just speculations....
03-20-2003, 03:52 AM
DaPi

Quote:

Originally posted by slimdave
i) use a data type that is appropriate for the data being stored.
ii) if your pk is synthetic, and thus probably numeric, then NUMBER is evidently the correct choice.

AMEN!
03-26-2003, 09:04 AM
abhaysk

Quote:

Originally posted by jmodic
You are talking nonsence here.
Wrong conclusions due to previously explained misunderstandings. The time difference even on a big table would be negligible, I can assure you.

The time difference wud be or order of milliseconds...PS below demo in my test environment.

Pin column of reporting_PRODUCT_Hier is Varchar type.
Pin column of temp_reporting_PRODUCT_Hier is number type.

Code:

US18>select count(*) from reporting_PRODUCT_Hier; COUNT(*) ---------- 421051 Elapsed: 00:00:00.34 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=130 Card=1) 1 0 SORT (AGGREGATE) 2 1 INDEX (FAST FULL SCAN) OF 'PK_REPORTING_PRODUCT_HIER' (U NIQUE) (Cost=130 Card=421043) US18>select count(*) from temp_reporting_PRODUCT_Hier; COUNT(*) ---------- 421047 Elapsed: 00:00:00.18 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=99 Card=1) 1 0 SORT (AGGREGATE) 2 1 INDEX (FAST FULL SCAN) OF 'PK_TEMP_REPORTING_PRODUCT_HIE R' (UNIQUE) (Cost=99 Card=421047) US18>insert into reporting_PRODUCT_Hier(pin,fordate) values('023252','20-Mar-03'); 1 row created. Elapsed: 00:00:00.94 Execution Plan ---------------------------------------------------------- 0 INSERT STATEMENT Optimizer=CHOOSE (Cost=2604 Card=421043 Byt es=133891674) US18>insert into temp_reporting_PRODUCT_Hier(pin,fordate) values(023252,'20-Mar-03'); 1 row created. Elapsed: 00:00:00.15 Execution Plan ---------------------------------------------------------- 0 INSERT STATEMENT Optimizer=CHOOSE (Cost=2568 Card=421047 Byt es=131787711) US18>insert into reporting_PRODUCT_Hier(pin,fordate) values('akjshj1','20-Mar-03'); 1 row created. Elapsed: 00:00:00.31 Execution Plan ---------------------------------------------------------- 0 INSERT STATEMENT Optimizer=CHOOSE (Cost=2604 Card=421043 Byt es=133891674) US18>insert into temp_reporting_PRODUCT_Hier(pin,fordate) values(9827341,'20-Mar-03'); 1 row created. Elapsed: 00:00:00.16 Execution Plan ---------------------------------------------------------- 0 INSERT STATEMENT Optimizer=CHOOSE (Cost=2568 Card=421047 Byt es=131787711)

If u just see above eg u wud see that a worst case diff wud be 150ms for one insert of record.

Now say u had to insert 100 records...takeing worst case wud be around 15 secs more that wat u require for number datatype..

now for 1000 it wud require 150 secs more time...

for 10000 records it wud be 1500 secs more time & this wud be unacceptable.

and more ever wat i stated in my first post was

Quote:

Performance might slightly get affected during insert.

Abhay.

__________________
funky...

and indeed it will.

and 1 more thing its all application dependent whether to use varchar as PK or not...i didnt say u shudnt use...but S performance wud slightly be affected.

Abhay.
03-26-2003, 09:36 AM
slimdave

Still "measuring" CPU time using elapsed time eh?

:rolleyes:

You still haven't proved anything about performance. Give us a proper measurement and then we'll talk again.
03-26-2003, 09:37 AM
gandolf989

There is another thing to consider. Using surrogate keys could cause you application to need more joins Thus negating you 15 millisecond performance gain. I think that the real answer depends on what type of data you are trying to store. Whether or not to use varchar2's or numbers depends on a number of factors. There are situations where Varchar2's would be preferable and there are situations where they would not be preferable. One other consideration is that if you are using a true surrogate key then the value of the key is probably derived from a sequence. Fetching the sequence might take a few milliseconds thus throwing off your numbers. :D
03-26-2003, 11:55 AM
chrisrlong

Hi all,

I know I haven't been around for a while, and I completely confess to not having the patience to read this entire thread. However, I simply canot believe that this argument is still going on. Okay, yes, if someone actually goes through the trouble of making some non-sensical series of letters and characters to make a unique key, then okay, maybe storing a varchar is more efficient. HOWEVER, this is very rarely the case and you all know it. People usually use string PKs because they have some existing string value that they want to use, like Department Name. Now we should all be able to agree that such a value will definitely not be more efficient than a surrogate, numeric PK. And efficiency of space is far more important than slimdave appears to realize. This is because the size of the data has a direct effect on the efficiency of the buffer cache. The buffer cache is one of the most important pieces of performance, what with disk accesses being orders of magnitude slower than memory accesses. Further, there is a geometric curve to cache utilization. If your data is 3 times larger than it should be, this means that only 1/3 of what could fit in the cache actualy does. This means that you are 3 times more likely to need to read from the disk. Since the data is 3 times larger, it will take you 3 times as long to read it. So your database now does 9 times more disk accesses than it should have.

Further, keys are even more important than other fields. Let's look at Department Name. In an accounting application, for example, just how many tables will contain the FK of the Department Name? Lots. Further, if that were a dynamic table, all of these FKs might have indexes on them (to avoid the idiotic locking issue, if nothing else). So now, we have this very large field duplicated in many tables, and further duplicated in many indexes. This single, bad PK choice is now costing a whole lot of cache and thus a whole lot of performance.

So, yes, the difference specifically between a string and a numeric value may be small, given similar sizes, and a string might actually be able to be tighter, but first of all, who actually does that? What most novices will take away from this argument is that it doesn't matter if I use large strings as PKs - jmodic and slimdave said that was fine. Large PKs need to be avoided at all costs!

Now, even when people *do* attempt to use 'tight' string PKs, they often run into another problem. That of meaningful keys, which I have also vented against on many occasions here. For Department names, people might try to make some cute, 3-letter abbreviations for each department. Of course, when this company merges with another one that is twice as big, that little abbreviation scheme goes out the window. And on and on. There are many benefits to having meaningless keys, including performance benefits, and there are many maintenance, concurrency, and other problems with using meaningful keys. I can elaborate if need be.

There are also many problems with multiple-field keys, which is another side-effect of using string keys. People use string keys because they have them and they mean something and they're easy. They then get propogated to other tables. When a cross-reference table is created, it gets the keys from the 2 original tables because they have meaning... When a child table is created, it gets the key from the parent and another field, because the person is thinking that way. When what they *should* be doing is to create a surrogate, single field, numeric, sequence-populated PK for each table. While this may not be the best solution in 100% of the cases, it will be a better solution than what is actually done 95% of the time!

This solution, in general, leads to tighter, faster keys, tighter indexes, better cache utilization, a smaller, faster database with less concurrency and maintenance headaches down the line (which are caused by meaningful and multi-field keys).

So I do agree with slimdave on one point - this is not something that you should waste cycles worrying about - follow this simple rule to a better database. There are always special-case exceptions, but the practice, IMHO, for each case is to assume that a surrogate key is better until proven otherwise.

- Chris
03-26-2003, 12:28 PM
slimdave

I got into this because of some ridiculous pseudo-measurements that attempted to prove the benefits of numerics over varchars using bogus methodologies, but don't take my defence of varchars as being a dismissal of synthetic numeric keys.

Quote:

if someone actually goes through the trouble of making some non-sensical series of letters and characters to make a unique key, then okay, maybe storing a varchar is more efficient. HOWEVER, this is very rarely the case and you all know it.

Why would anyone do this? No-one has suggested doing it - it would be a dumb idea.

Quote:

efficiency of space is far more important than slimdave appears to realize.

Haven't even mentioned field size, but you seem to be assuming that varchar2's must be lengthy fields -- could be a varchar2(2) though couldn't it? Long varchars are surely a bad idea for key fields, so that's a good point.

Quote:

What most novices will take away from this argument is that it doesn't matter if I use large strings as PKs - jmodic and slimdave said that was fine.

I doubt it -- jmodic and me never said any such thing, so i don't see how they could. Again, you're assuming that varchar2's are lengthy fields.

Quote:

There are many benefits to having meaningless keys, including performance benefits, and there are many maintenance, concurrency, and other problems with using meaningful keys. I can elaborate if need be.

No-one's arguing about the benefits of synthetic keys, they are generaly the best choice, but there are cases where a synthetic key is neither required nor beneficial - how about US State abbreviations, or international currency abbreviations (USD,GBP etc)? Nothing wrong with using those as varchar2's.

Quote:

People use string keys because they have them and they mean something and they're easy. They then get propogated to other tables. When a cross-reference table is created, it gets the keys from the 2 original tables because they have meaning... When a child table is created, it gets the key from the parent and another field, because the person is thinking that way. When what they *should* be doing is to create a surrogate, single field, numeric, sequence-populated PK for each table. While this may not be the best solution in 100% of the cases, it will be a better solution than what is actually done 95% of the time!

Sure, use a synthetic numeric key for such cases. It is the appropriate choice.

Quote:

So I do agree with slimdave on one point - this is not something that you should waste cycles worrying about - follow this simple rule to a better database. There are always special-case exceptions, but the practice, IMHO, for each case is to assume that a surrogate key is better until proven otherwise.

Yes, synthetic keys are a reasonable choice in 99% of cases. Let's not just dismiss the other 1% of special cases.

Anyway, this lengthy discussion has tuned out pretty well, i tink, as we've covered a lot of diverse considerations.
03-26-2003, 12:41 PM
Shestakov

In first difference between numeric and char(varchar2) datatypes
is tiny if they have the same length and I don't care about it.

But, as "common theoretical" problem, this difference exists.

1)char(varchar2) fields Oracle store as ascii/ebcdic/unicode characters
with maximum length 4000 characters.
Most CPU platforms on assembler/mashine code level have to use
commands like 'long compare' and 'long moving' for work with this type of
data.
Binary structure of these commands has only 1 byte for describe length of operation,
then maximum preparing data length will be no more then 256 bytes and Oracle
(and any other program) must use set of mashine commands for comparing
(for example) two varchar2 fields.

2)numeric(number) fields Oracle store in binary-decimal numeric system and
(as i know) most CPU platforms on assembler/mashine code/ level have
set of special command for work with this type of data (PACK group) and
any program can use only 1 mashine command in sort/find and so on operations.
In this case Oracle has to implement two independent parts of assempler
code for execution comparing or moving char(varchar2) datatype and number datatype.

I guess, this is internal nature of difference of number and varchar2 datatypes
as a primary key field.
03-26-2003, 01:14 PM
chrisrlong

Fair enough, slimdave, and all good points - sorry for the rant (but it *is* my hallmark;))

With the tight, non-sensical series of numbers and letters as a string PK, I was referring to jmodic's point that string PKs can actually be tighter than numeric PKs because of the greater availability of values (base-10 as opposed to base-26, or 36, or 64 or 128, depending on what characters you allow.)

As for pre-existing, commonly-accepted, static, small string pks, like state codes, or currency codes, or country codes - they are definitely an exception and preferable to surrogate keys.

So, here are the choices as I see them, either:
1 - There is a standard, short, acceptable string key in existence
2 - Somebody creates a meaningless series of characters
3 - Somebody creates a meaningless numeric key
4 - Somebody creates a meaningful short string key (abbreviations)
5 - An existing meaningful number is used
6 - An existing meaningful string is used

So, 1 is fine. 2 is not terrible, but rarely done, and harder to implement than 3. 3 is best, IMHO. 4-6 suffer from meaningful keys, and also usually result in keys that are larger than they would be with surrogate keys. Sometimes a lot larger. They are almost never smaller. I don't see it as a bad assumption that surrogate keys are tighter.

Shestakov seems to be making the further point that I didn't have the knowledge to support myself - that numeric comparisons are faster than string comparisons.

So again, I just wanted to make sure that the point is clear for the up-and-coming dbas that meaningless, numeric, single-field, sequence-populated, surrogate keys are the best choice 99% of the time, IMHO. Existing, commonly-accepted string codes (states, countries, currencies, etc.) are notable exceptions. And as slimdave noted, we covered a lot of detail here to help with that last 1% :)

- Chris