DBAsupport.com Forums - Powered by vBulletin
Page 1 of 3 123 LastLast
Results 1 to 10 of 23

Thread: Opinions on END_DT implementations

  1. #1
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339

    Opinions on END_DT implementations

    Okay, assume we have some historical data that has date ranges over which the data was applicable.

    Maybe an Employment History, where we have a START_DT and an END_DT. Obviously the START_DT column will always be filled, but what about the END_DT? If the person is currently employed, then their latest record will have a START_DT, but no END_DT yet, right?

    The question is: What are the pros and cons of keeping this field NULL versus storing some arbitrary date (such as 1/1/3000)?

    From what I can see:

    - NULL is harder to query. Must do WHERE :TargetDate < NVL(END_DT,:FutureDate) or something similar.
    - A function-based index could be created here to somewhat mitigate that.
    - Using some future constant value is easier to query: WHERE :TargetDate < END_DT
    - The statistics for the column get screwed up - the optimizer will now think we have dates running up until 1/1/3000
    - A histogram could somewhat mitigate that

    Am I missing anything?

    I'm looking for any facts or opinions that anyone has about this situation.

    Thanks,

    - Chris
    Christopher R. Long
    ChrisRLong@HotMail.Com
    But that's just my opinion. I could be wrong

  2. #2
    Join Date
    May 2002
    Posts
    2,645
    Add another column currently_employed with Y/N values, leave end_dt null until the person is no longer currently employed. Trigger to update the currently_employed column to N when a date is entered in end_dt.

    Pro of using null is being able to search based on who is currently employed. Bitmap index for Y or N values on the column. No make believe dates involved. Con of having to alter the table, write a trigger, but is only a one time effort.

  3. #3
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    While I have no issue with such trigger-populated fields, it would unfortunately not solve the basic issue. Finding the 'current' record is not the main issue. I'm talking about any such tables with date ranges and anything one might do with them, such as joining multiple tables with range to each other, or searching for the records that were valid on a given date, or that overlap a given range, or ended before a give date, etc. Every query that used that field would have to be able to handle a NULL value (if that is the chosen implementation), and not just a query looking for the 'current' record.

    Thanks anyway, and keep the feedback coming,

    - Chris
    Christopher R. Long
    ChrisRLong@HotMail.Com
    But that's just my opinion. I could be wrong

  4. #4
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    I would definately use NULL myself. Like you say there are some disadvantages, but it is the "correct" approach, and is ntuitively understood by anyone who has to maintain the system. Also, any disadvantages can be overcome pretty easily.
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

  5. #5
    Join Date
    Dec 2001
    Location
    UK
    Posts
    1,684
    Hi.

    I agree with the previous post. Use the NULL. Why use 2 columns when 1 will do. Remember, you can always use Function Based Indexes to improve performance so that's not an issue.

    http://www.oracle-base.com/Articles/...sedIndexes.asp

    Triggers... Avoid them like the plague unless there's no other way to solve your problem. If you do use them put no code execpt a package call in them. This will imporve performance loads as triggers are not cached your session so any code must be parsed each time it's called. See:

    http://asktom.oracle.com/pls/ask/f?p...7507706875149,

    Cheers

    Tim...
    Last edited by TimHall; 09-04-2003 at 03:50 AM.
    Tim...
    OCP DBA 7.3, 8, 8i, 9i, 10g, 11g
    OCA PL/SQL Developer
    Oracle ACE Director
    My website: oracle-base.com
    My blog: oracle-base.com/blog

  6. #6
    Join Date
    Nov 2002
    Location
    Geneva Switzerland
    Posts
    3,142
    I'm with slimdave on this. In a previous db it was decided not to deal with NULL birthdates and so set unknown ones to 1/1/1700 - in the end it would have been easier to do it properly with NULL's.

  7. #7
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    Originally posted by slimdave
    I would definately use NULL myself. Like you say there are some disadvantages, but it is the "correct" approach, and is ntuitively understood by anyone who has to maintain the system.
    That's the problem, though - I'm not so sure in this case. An open end date can be almost as 'correctly' described as being in effect forever, hence a future date. As for intuitive code, WHERE :TargetDate < END_DT is more intuitive than WHERE :TargetDate < NVL(END_DT,SYSDATE+1) or something similar.

    Originally posted by TimHall
    Triggers... Avoid them like the plague unless there's no other way to solve your problem.
    Not to digress on this point, but I have to note my strong objection to this statement. De-normalized fields and sequence-populated fields should always be trigger-filled, IMHO.

    Originally posted by TimHall
    If you do use them put no code execpt a package call in them. This will imporve performance loads as triggers are not cached your session so any code must be parsed each time it's called.
    Good point - agreed - but does not invalidate the usefulness of triggers.

    Originally posted by DaPi
    I'm with slimdave on this. In a previous db it was decided not to deal with NULL birthdates and so set unknown ones to 1/1/1700 - in the end it would have been easier to do it properly with NULL's.
    In general, NULLs mean NULL, and should never be substituted with some other value meaning NULL. For reference, here is a previous rant of mine on the subject
    http://www.dbasupport.com/forums/sho...ULL+ridiculous

    But again, in this case it does not seem to be such a blatant 'lie'. We are not saying "we don't know their age, so we put in 99". We are saying "We don't know when it will end, but it will definitely be somewhere in the future, so leave the record in effect until 'the end of time'".

    So again, if we go back to our possible implementations of SQL:

    WHERE :TargetDate < END_DT (with a future date instead of null)
    versus
    WHERE :TargetDate < NVL(END_DT,SYSDATE+1)

    However, the second query will not work for any future dates. Let's take an example of contracts. Some contracts are set to start today and end in a year, for instance. Therefore a future END_DT is possible. Therefore, some queries might exist to do forecasts and answer questions like 'What contracts will still be in place next month?'
    The generic query of WHERE :TargetDate < NVL(END_DT,SYSDATE+1)would now miss those contracts with open end dates, because :TargetDate would have a value larger than SYSDATE+1.

    So, what one might do is replace SYSDATE+1 with some future date constant and make the query WHERE :TargetDate < NVL(END_DT,c_FutureDate). This would now allow future date queries. Following this convention, we would now be treating NULL as c_FutureDate in all queries. If so, why not simply replace NULL with c_FutureDate in the actual records, thereby simplifying the SQL and eliminating the need for a function-based index?

    Again, I am looking for any opinions or arguments, as I am still not convinced that replacing the NULL is better, but the argument is pretty strong.

    - Chris
    Christopher R. Long
    ChrisRLong@HotMail.Com
    But that's just my opinion. I could be wrong

  8. #8
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    Originally posted by chrisrlong

    So again, if we go back to our possible implementations of SQL:

    WHERE :TargetDate < END_DT (with a future date instead of null)
    versus
    WHERE :TargetDate < NVL(END_DT,SYSDATE+1)

    However, the second query will not work for any future dates. Let's take an example of contracts. Some contracts are set to start today and end in a year, for instance. Therefore a future END_DT is possible. Therefore, some queries might exist to do forecasts and answer questions like 'What contracts will still be in place next month?'
    The generic query of WHERE :TargetDate < NVL(END_DT,SYSDATE+1)would now miss those contracts with open end dates, because :TargetDate would have a value larger than SYSDATE+1.

    So, what one might do is replace SYSDATE+1 with some future date constant and make the query WHERE :TargetDate < NVL(END_DT,c_FutureDate). This would now allow future date queries. Following this convention, we would now be treating NULL as c_FutureDate in all queries. If so, why not simply replace NULL with c_FutureDate in the actual records, thereby simplifying the SQL and eliminating the need for a function-based index?

    Again, I am looking for any opinions or arguments, as I am still not convinced that replacing the NULL is better, but the argument is pretty strong.

    - Chris
    I think that is one of the cases that is easy to overcome by constructing the queries as either ...

    Code:
    ...
    Where
       (:TargetDate < end_date or
        end_date is null)
    ...
    ... or ...
    Code:
    ...
    Where
       :TargetDate < Nvl(end_date,:TargetDate+1)
    ...
    The second case is more compact, but the first case is probably more amenable to index scans.
    Last edited by slimdave; 09-04-2003 at 01:49 PM.
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

  9. #9
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,135
    User requirements play an important role before a table and columns design take place.
    Take Contract example described by Chrislong, if user is asking what projects are going to end next month, then I would design the table in such way that, it would have 2 columns - actual end date, estimated end date, and to answer the question I would use estimated end date.

    When the end date is null, it means the project is not complete, otherwise it is still going on.

    If we add many conflict descriptions to a single attribute, then finally we end up with mess.

    Tamil

  10. #10
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    Originally posted by slimdave
    The second case is more compact, but the first case is probably more amenable to index scans.
    Actually, because of the OR and the fact that NULLS are not stored in indexes, neither construct is going to favor an index.

    Originally posted by tamilselvan
    User requirements play an important role before a table and columns design take place.
    Take Contract example described by Chrislong, if user is asking what projects are going to end next month, then I would design the table in such way that, it would have 2 columns - actual end date, estimated end date, and to answer the question I would use estimated end date.

    When the end date is null, it means the project is not complete, otherwise it is still going on.

    If we add many conflict descriptions to a single attribute, then finally we end up with mess.

    Tamil
    Interesting, but I don't see how that solves or even addresses the problem.

    Mind you, I fully agree with the concept of a singular definition per field, but this almost feels like a singular definition split across two fields.

    So we now have two fields instead of one. Okay, what is the difference in the data that is held? Since we don't have an end_dt for some records, they are null. But what if we don't have a real estimated end date either? Then are those null as well? Then we have two fields that always have the same data. Obviously pointless. So I'm assuming that you mean to fill this new field with the 'future date constant' we've been talking about when end_dt is null.

    Now, if the estimated date is filled with a future date, then it has the following advantages over the actual end date:
    - It is easier to use
    - It can be used for future queries

    Further, since one pretty much *has* to assume that when the actual end date is finally set, the estimated date will have to match it, then the estimated date field will most likely be used all the time instead of the actual end date field. So what has been gained? We now have two fields and only ever use one anyway.

    IF one wanted to distinguish between a 'real' end date and an 'estimated' one, a 'flag' field would seem to make more sense. But again, it doesn't necessarily answer the specific problem of whether to populate an 'unknown' estimated end date as null or with a future date.


    - Chris
    Christopher R. Long
    ChrisRLong@HotMail.Com
    But that's just my opinion. I could be wrong

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width