-
Calculate percentage null per column
Hi All you code kiddies!!
Just had a quick request from a client. He want to know the percentage of nulls for each column in his schema.
So output would be
Table_Name Column_Name Percent_Null
Anyone got a quick solution in their bag of tricks??
I'll be working on it in the meantime....
Thanks.
-
How about
Code:
scott@ACME> select a.table_name,a.column_name,round((a.NUM_NULLS/b.NUM_ROWS)*100,0) percen_null
2 from all_tab_columns a,all_tables b
3 where a.table_name='EMPLOYEES'
4 and a.table_name=b.table_name
5 /
TABLE_NAME COLUMN_NAME PERCEN_NULL
------------------------------ ------------------------------ -----------
EMPLOYEES EMPLOYEE_ID 0
EMPLOYEES FIRST_NAME 0
EMPLOYEES LAST_NAME 0
EMPLOYEES EMAIL 0
EMPLOYEES PHONE_NUMBER 0
EMPLOYEES HIRE_DATE 0
EMPLOYEES JOB_ID 0
EMPLOYEES SALARY 0
EMPLOYEES COMMISSION_PCT 67
EMPLOYEES MANAGER_ID 1
EMPLOYEES DEPARTMENT_ID 1
11 rows selected.
Make sure you have analyzed your tables first.
HTH
Amar
"There is a difference between knowing the path and walking the path."
-
Thanks for that adewri,
but I need a completely accurate output, NUM_NULLS just isn't accurate enough.
-
then do compute statistics then?
-
Unfortunately, the tables are too big for a 'compute stats'.
Maybe I should post this in the Developers forum ?
-
How many significant figures does the client envision on this, if they want something completely accurate? You have three rows and one of the m is null, what's the completely accurate percentage for that?
See http://oraclesponge.blogspot.com/200...-sampling.html and the article by JL I reference there http://www.jlcomp.demon.co.uk/stats_i.html for information on the accuracy of statistics.
-
Tables have multi-million rows but...
No worries........ It's sorted it. One table scan as well......
Code:
set verify off
var result_set refcursor
set echo off
col "Table" Format a30
col "Column" Format a30
DECLARE
cursor c1 is SELECT table_name,column_name from user_tab_columns
WHERE table_name = UPPER('&&tablename');
myquery varchar2(32000) := NULL;
mysql varchar2(32000) := NULL;
BEGIN
FOR C in C1 LOOP
myquery := myquery || ',sum(nvl2('||c.column_name||',0,1)) '||c.column_name;
mysql := mysql || 'UNION ALL select '''||c.table_name||'''"Table",'
|| ''''||c.column_name ||'''"Column",'
||' ROUND('||c.column_name|| '/total*100) "Percent Nulls" from temp ';
END LOOP;
myquery := 'SELECT '||substr(myquery,2) ||',SUM(1) Total FROM &tablename';
myquery := 'with temp as ('||myquery||')' || SUBSTR(mysql,10) || 'ORDER by "Column"';
open :result_set for myquery;
END;
/
print result_set
undefine tablename
-
You might compare that with the cost of gathering statistics on the table ... it's mostly the same query
-
I was under the impression that computing statistics on very large tables was frowned upon ??
Not sure where I got that impression from, is it unfounded ??
-
I don't think it necessarily is ... it's just that you can get extremely accurate statistics without going that far. Data with heavy clustering of values generally needs a higher percentage.
I see from
Code:
ROUND('||c.column_name|| '/total*100)
that you are not really looking for a completely accurate answer here .. it is possible that an analyze with an estimation percent of less than 10 would have got you into the ballpark. When a person (especially a manager!) says "I need a completely accurate answer" it usually makes me shrug ... what are they going to do if the percentage of nulls is 13% that they wouldn't do if it was 14%? What if it was 13.25%? I really doubt that this makes any difference.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|