Opinions on END_DT implementations

**chrisrlong** · 09-03-2003, 04:54 PM

Okay, assume we have some historical data that has date ranges over which the data was applicable.

Maybe an Employment History, where we have a START_DT and an END_DT. Obviously the START_DT column will always be filled, but what about the END_DT? If the person is currently employed, then their latest record will have a START_DT, but no END_DT yet, right?

The question is: What are the pros and cons of keeping this field NULL versus storing some arbitrary date (such as 1/1/3000)?

From what I can see:

- NULL is harder to query. Must do WHERE :TargetDate < NVL(END_DT,:FutureDate) or something similar.
- A function-based index could be created here to somewhat mitigate that.
- Using some future constant value is easier to query: WHERE :TargetDate < END_DT
- The statistics for the column get screwed up - the optimizer will now think we have dates running up until 1/1/3000
- A histogram could somewhat mitigate that

Am I missing anything?

I'm looking for any facts or opinions that anyone has about this situation.

Thanks,

- Chris

**stecal** · 09-03-2003, 05:19 PM

Add another column currently_employed with Y/N values, leave end_dt null until the person is no longer currently employed. Trigger to update the currently_employed column to N when a date is entered in end_dt.

Pro of using null is being able to search based on who is currently employed. Bitmap index for Y or N values on the column. No make believe dates involved. Con of having to alter the table, write a trigger, but is only a one time effort.

**chrisrlong** · 09-03-2003, 05:49 PM

While I have no issue with such trigger-populated fields, it would unfortunately not solve the basic issue. Finding the 'current' record is not the main issue. I'm talking about any such tables with date ranges and anything one might do with them, such as joining multiple tables with range to each other, or searching for the records that were valid on a given date, or that overlap a given range, or ended before a give date, etc. Every query that used that field would have to be able to handle a NULL value (if that is the chosen implementation), and not just a query looking for the 'current' record.

Thanks anyway, and keep the feedback coming,

- Chris

**slimdave** · 09-03-2003, 09:27 PM

I would definately use NULL myself. Like you say there are some disadvantages, but it is the "correct" approach, and is ntuitively understood by anyone who has to maintain the system. Also, any disadvantages can be overcome pretty easily.

**TimHall** · 09-04-2003, 03:46 AM

Hi.

I agree with the previous post. Use the NULL. Why use 2 columns when 1 will do. Remember, you can always use Function Based Indexes to improve performance so that's not an issue.

http://www.oracle-base.com/Articles/...sedIndexes.asp

Triggers... Avoid them like the plague unless there's no other way to solve your problem. If you do use them put no code execpt a package call in them. This will imporve performance loads as triggers are not cached your session so any code must be parsed each time it's called. See:

http://asktom.oracle.com/pls/ask/f?p...7507706875149,

Cheers

Tim...

**DaPi** · 09-04-2003, 07:03 AM

I'm with slimdave on this. In a previous db it was decided not to deal with NULL birthdates and so set unknown ones to 1/1/1700 - in the end it would have been easier to do it properly with NULL's.

**chrisrlong** · 09-04-2003, 09:17 AM

Originally posted by slimdave
I would definately use NULL myself. Like you say there are some disadvantages, but it is the "correct" approach, and is ntuitively understood by anyone who has to maintain the system.

That's the problem, though - I'm not so sure in this case. An open end date can be almost as 'correctly' described as being in effect forever, hence a future date. As for intuitive code, WHERE :TargetDate < END_DT is more intuitive than WHERE :TargetDate < NVL(END_DT,SYSDATE+1) or something similar.

Originally posted by TimHall
Triggers... Avoid them like the plague unless there's no other way to solve your problem.

Not to digress on this point, but I have to note my strong objection to this statement. De-normalized fields and sequence-populated fields should always be trigger-filled, IMHO.

Originally posted by TimHall
If you do use them put no code execpt a package call in them. This will imporve performance loads as triggers are not cached your session so any code must be parsed each time it's called.

Good point - agreed - but does not invalidate the usefulness of triggers.

Originally posted by DaPi
I'm with slimdave on this. In a previous db it was decided not to deal with NULL birthdates and so set unknown ones to 1/1/1700 - in the end it would have been easier to do it properly with NULL's.

In general, NULLs mean NULL, and should never be substituted with some other value meaning NULL. For reference, here is a previous rant of mine on the subject
http://www.dbasupport.com/forums/sho...ULL+ridiculous

But again, in this case it does not seem to be such a blatant 'lie'. We are not saying "we don't know their age, so we put in 99". We are saying "We don't know when it will end, but it will definitely be somewhere in the future, so leave the record in effect until 'the end of time'".

So again, if we go back to our possible implementations of SQL:

WHERE :TargetDate < END_DT (with a future date instead of null)
versus
WHERE :TargetDate < NVL(END_DT,SYSDATE+1)

However, the second query will not work for any future dates. Let's take an example of contracts. Some contracts are set to start today and end in a year, for instance. Therefore a future END_DT is possible. Therefore, some queries might exist to do forecasts and answer questions like 'What contracts will still be in place next month?'
The generic query of WHERE :TargetDate < NVL(END_DT,SYSDATE+1)would now miss those contracts with open end dates, because :TargetDate would have a value larger than SYSDATE+1.

So, what one might do is replace SYSDATE+1 with some future date constant and make the query WHERE :TargetDate < NVL(END_DT,c_FutureDate). This would now allow future date queries. Following this convention, we would now be treating NULL as c_FutureDate in all queries. If so, why not simply replace NULL with c_FutureDate in the actual records, thereby simplifying the SQL and eliminating the need for a function-based index?

Again, I am looking for any opinions or arguments, as I am still not convinced that replacing the NULL is better, but the argument is pretty strong.

- Chris

**slimdave** · 09-04-2003, 09:31 AM

Originally posted by chrisrlong

So again, if we go back to our possible implementations of SQL:

WHERE :TargetDate < END_DT (with a future date instead of null)
versus
WHERE :TargetDate < NVL(END_DT,SYSDATE+1)

However, the second query will not work for any future dates. Let's take an example of contracts. Some contracts are set to start today and end in a year, for instance. Therefore a future END_DT is possible. Therefore, some queries might exist to do forecasts and answer questions like 'What contracts will still be in place next month?'
The generic query of WHERE :TargetDate < NVL(END_DT,SYSDATE+1)would now miss those contracts with open end dates, because :TargetDate would have a value larger than SYSDATE+1.

So, what one might do is replace SYSDATE+1 with some future date constant and make the query WHERE :TargetDate < NVL(END_DT,c_FutureDate). This would now allow future date queries. Following this convention, we would now be treating NULL as c_FutureDate in all queries. If so, why not simply replace NULL with c_FutureDate in the actual records, thereby simplifying the SQL and eliminating the need for a function-based index?

Again, I am looking for any opinions or arguments, as I am still not convinced that replacing the NULL is better, but the argument is pretty strong.

- Chris

I think that is one of the cases that is easy to overcome by constructing the queries as either ...

Code:

...
Where
   (:TargetDate < end_date or
    end_date is null)
...

... or ...

Code:

...
Where
   :TargetDate < Nvl(end_date,:TargetDate+1)
...

The second case is more compact, but the first case is probably more amenable to index scans.

**tamilselvan** · 09-04-2003, 10:36 AM

User requirements play an important role before a table and columns design take place.
Take Contract example described by Chrislong, if user is asking what projects are going to end next month, then I would design the table in such way that, it would have 2 columns - actual end date, estimated end date, and to answer the question I would use estimated end date.

When the end date is null, it means the project is not complete, otherwise it is still going on.

If we add many conflict descriptions to a single attribute, then finally we end up with mess.

Tamil

**chrisrlong** · 09-04-2003, 12:58 PM

Originally posted by slimdave
The second case is more compact, but the first case is probably more amenable to index scans.

Actually, because of the OR and the fact that NULLS are not stored in indexes, neither construct is going to favor an index.

Originally posted by tamilselvan
User requirements play an important role before a table and columns design take place.
Take Contract example described by Chrislong, if user is asking what projects are going to end next month, then I would design the table in such way that, it would have 2 columns - actual end date, estimated end date, and to answer the question I would use estimated end date.

When the end date is null, it means the project is not complete, otherwise it is still going on.

If we add many conflict descriptions to a single attribute, then finally we end up with mess.

Tamil

Interesting, but I don't see how that solves or even addresses the problem.

Mind you, I fully agree with the concept of a singular definition per field, but this almost feels like a singular definition split across two fields.

So we now have two fields instead of one. Okay, what is the difference in the data that is held? Since we don't have an end_dt for some records, they are null. But what if we don't have a real estimated end date either? Then are those null as well? Then we have two fields that always have the same data. Obviously pointless. So I'm assuming that you mean to fill this new field with the 'future date constant' we've been talking about when end_dt is null.

Now, if the estimated date is filled with a future date, then it has the following advantages over the actual end date:
- It is easier to use
- It can be used for future queries

Further, since one pretty much *has* to assume that when the actual end date is finally set, the estimated date will have to match it, then the estimated date field will most likely be used all the time instead of the actual end date field. So what has been gained? We now have two fields and only ever use one anyway.

IF one wanted to distinguish between a 'real' end date and an 'estimated' one, a 'flag' field would seem to make more sense. But again, it doesn't necessarily answer the specific problem of whether to populate an 'unknown' estimated end date as null or with a future date.

- Chris

Thread: Opinions on END_DT implementations

Thread Tools

Display

Opinions on END_DT implementations

Posting Permissions