DBAsupport.com Forums - Powered by vBulletin
Results 1 to 8 of 8

Thread: String v/s Integer as primary key

  1. #1
    Join Date
    Jul 2003
    Posts
    136

    Cool String v/s Integer as primary key

    Design/Modeling Question: How bad/different can 2-3 tables join be with a string field (individual) as primary key versus having additional integer field (individual) as primary key. This table can have upto 100000 records.

    Please advice.

  2. #2
    Join Date
    Feb 2008
    Location
    SP, Brasil
    Posts
    7
    There is no difference.

    The only point that mades an integer PK better is trigger value generating via sequence.

  3. #3
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    Sorry, but there is a huge difference. All PKs should be numeric for multiple reasons. One of which is space. I have attached part of a document that describes why string PKs are bad.
    Attached Files Attached Files
    Christopher R. Long
    ChrisRLong@HotMail.Com
    But that's just my opinion. I could be wrong

  4. #4
    Join Date
    Mar 2007
    Location
    Ft. Lauderdale, FL
    Posts
    3,555
    I'm sorry but your paper assumes you have a donkey modeling your data. Who's going to use a varchar2(50) "department description" column as a PK?
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

  5. #5
    Join Date
    Feb 2008
    Location
    SP, Brasil
    Posts
    7

    Smile

    It is true, chrisrlong. No doubt about it.

    Of course if you use a numeric pk let's say NUMBER(5) you will use less space than a VARCHAR2(200) PK. This is not insignificant and sure, in this case, a numeric pk will be better.

    But what is the difference between NUMBER(5) and VARCHAR2(5) disk space or memory usages? As long as I can remember it's very small.

    So, if you compare same size columns, there will be no pratical difference.

    I think this was daljitsb point when asked this question...

    If there is no much situations where a small string can unique identify some thing, it's another question.

    My point is: if you use same size columns, there will be no significant difference in order to join, filter, insert, select, update or delete if the primary key is a number or varchar2.

    Sure, maybe I'm wrong. I made a quick search for disk usage of each datatype in order to be sure, but didn't find nothig about it.

  6. #6
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    Quote Originally Posted by PAVB
    I'm sorry but your paper assumes you have a donkey modeling your data. Who's going to use a varchar2(50) "department description" column as a PK?
    Well, it was a department 'name', and not a description, and I've seen it more times than I can count.

    In general, when I've had that question, it is because someone already has a string in mind that will uniquely identify each row, and that string column generally falls into one of two categories:
    - The existing name of something (always larger than a numeric ID)
    - A 'meaningful' code for something, such as a shipping code that, when broken into component parts, specifies the ship-from location, the ship-to location, and the date, or something equally ridiculous. (Again, always larger than a numeric ID)

    I have *yet* to have someone say: "Hey, can I use a 2-byte string instead of a 2-byte number to be the PK on this table".

    Because at that point, what is the purpose of using a string anyway? If you truly have a choice, why wouldn't you use the sequence-generated numeric PK? Why would you try to use some other non-automatically-populated string value?

    The next question is: Where does that string value come from anyway? The user?

    If that is the case, then we have another reason not to use that string field as a PK: It is user-entered and users can make mistakes.

    I have attached another snippet from my book that describes the issue of meaningful data and other issues with the definition of Primary Keys.

    And finally, NUMBER(5) will take less storage space than VARCHAR2(5).

    First, remember that different character sets store string data differently. Sometimes it's 1-byte per character and sometimes it is 2. But even at 1 byte per character, you're losing out. 1 byte of numeric data can store a value of up to 255. To store the same thing in a string variable would take 3 bytes. The NUMBER datatype can store a number with up to 38 precision in only 21-22 bytes. So unless I'm missing something, a numeric column will almost *always* be tighter than an equivalent string column.

    The only way I can see a string winning is if the string is used more like a number with greater than base 10, like base 16 (or up to base 255 if we could use every possible character). So, say we used 'ZZZ0' to represent 35*36^4+35*36^3+35*36^2+0, which is 60464880. For a numeric column, we would need at least 30 bits to store the same number, as 2^30-1 = 1073741823. And since the closest number of bytes is 4 (32 bits), it would still take the same amount of space (4 bytes).

    As always, I'm more than willing to be proven wrong, but it would seem to me that there is almost no good reason for using a string PK. As a matter of fact, the only time I do is when there is a 'universally'- accepted set of codes for something, such as a States list or something similar.
    Attached Files Attached Files
    Christopher R. Long
    ChrisRLong@HotMail.Com
    But that's just my opinion. I could be wrong

  7. #7
    Join Date
    Feb 2008
    Location
    SP, Brasil
    Posts
    7
    So, we get to the fallowing points:

    1) There's not much situations when a small string can identify something. I agree whith that.

    2) User entering PK's for tables is not cool. It's a bad user interface. And can lead to numerous troubles. I agree.

    3) There is diferences between storing numbers and varchars. Number are always small. I agree. I never developed a system in witch was necessary to store hexadecimal numbers.

    But I don't agree that is a RULE saying "Numbers are better".

    You listed one example of that: State lists.

    There are numerous business specific data that are not in the silly cases you listed, are small and meaningfull. They are, usually, lookup tables, so user doesn't usually insert data. They can have varchar2 columns and no problems with space.

    As I can't find some documentation about it, I made my example:

    CREATE TABLE test_space( col_1 NUMBER(10));

    BEGIN
    FOR i IN 1..1000000 LOOP
    INSERT INTO test_space(col_1) VALUES (i);
    END LOOP;
    COMMIT;
    END;
    /

    /*
    Takes this space:
    DADOS2 10088
    */

    DROP TABLE test_space;

    CREATE TABLE test_space( col_1 VARCHAR2(10));

    BEGIN
    FOR i IN 1..1000000 LOOP
    INSERT INTO test_space(col_1) VALUES (To_char(i));
    END LOOP;
    COMMIT;
    END;
    /

    /*
    Takes this space:
    DADOS2 13208
    */

    So, to a million record table (ten times bigger than 100000 mentioned in daljitsb's question), we have about 3M size difference. It's something arround 30% bigger. A lot, sure, but if I use a numeric pk it needs to be referenced in, at least, 3,33 other tables to justify the extra space on index. As there is no 3,33 tables, we need to join this table to 4 other tables all of then with a million records to justify a numeric pk just for space saving.

    So, there are 'cases' and 'cases'. They need to be evaluated and specifyed to know witch is better.

    And, correct me if I'm wrong, assuming that space used is equal for memory saving, there is no difference to a oracle server to join two tables using varchar or number columns...

    That's why I said that there is no difference.
    Last edited by ViniciusPacheco; 02-14-2008 at 04:00 PM.

  8. #8
    Join Date
    Mar 2007
    Location
    Ft. Lauderdale, FL
    Posts
    3,555
    Quote Originally Posted by chrisrlong
    Well, it was a department 'name', and not a description, and I've seen it more times than I can count.
    So my second statement is correct, a donkey was in charge of data modeling and in that case you have to remember what Ron White would say in a case like this: "you can't fix stupid"

    It's never been the arrows, it always been the archer.
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width