Calculate percentage null per column

Thread: Calculate percentage null per column

1. Calculate percentage null per column

Hi All you code kiddies!!

Just had a quick request from a client. He want to know the percentage of nulls for each column in his schema.

So output would be

Table_Name Column_Name Percent_Null

Anyone got a quick solution in their bag of tricks??

I'll be working on it in the meantime....

Thanks.

Code:
```scott@ACME> select a.table_name,a.column_name,round((a.NUM_NULLS/b.NUM_ROWS)*100,0) percen_null
2    from all_tab_columns a,all_tables b
3   where a.table_name='EMPLOYEES'
4     and a.table_name=b.table_name
5  /

TABLE_NAME                     COLUMN_NAME                    PERCEN_NULL
------------------------------ ------------------------------ -----------
EMPLOYEES                      EMPLOYEE_ID                              0
EMPLOYEES                      FIRST_NAME                               0
EMPLOYEES                      LAST_NAME                                0
EMPLOYEES                      EMAIL                                    0
EMPLOYEES                      PHONE_NUMBER                             0
EMPLOYEES                      HIRE_DATE                                0
EMPLOYEES                      JOB_ID                                   0
EMPLOYEES                      SALARY                                   0
EMPLOYEES                      COMMISSION_PCT                          67
EMPLOYEES                      MANAGER_ID                               1
EMPLOYEES                      DEPARTMENT_ID                            1

11 rows selected.```
Make sure you have analyzed your tables first.

HTH

but I need a completely accurate output, NUM_NULLS just isn't accurate enough.

4. Pando & Company
Join Date
Jun 2000
Location
Posts
7,447
then do compute statistics then?

5. Unfortunately, the tables are too big for a 'compute stats'.

Maybe I should post this in the Developers forum ?

6. Foreign Script Kiddie
Join Date
Aug 2002
Location
Posts
5,253
How many significant figures does the client envision on this, if they want something completely accurate? You have three rows and one of the m is null, what's the completely accurate percentage for that?

See http://oraclesponge.blogspot.com/200...-sampling.html and the article by JL I reference there http://www.jlcomp.demon.co.uk/stats_i.html for information on the accuracy of statistics.

7. Tables have multi-million rows but...
No worries........ It's sorted it. One table scan as well......

Code:
```set verify off
var result_set refcursor
set echo off
col "Table" Format a30
col "Column" Format a30

DECLARE
cursor c1 is SELECT table_name,column_name from user_tab_columns
WHERE table_name = UPPER('&&tablename');
myquery   varchar2(32000) := NULL;
mysql     varchar2(32000) := NULL;
BEGIN
FOR C in C1 LOOP
myquery := myquery || ',sum(nvl2('||c.column_name||',0,1)) '||c.column_name;
mysql   := mysql || 'UNION ALL select '''||c.table_name||'''"Table",'
|| ''''||c.column_name ||'''"Column",'
||' ROUND('||c.column_name|| '/total*100) "Percent Nulls" from temp ';
END LOOP;
myquery :=  'SELECT '||substr(myquery,2) ||',SUM(1) Total FROM &tablename';
myquery := 'with temp as ('||myquery||')' || SUBSTR(mysql,10) || 'ORDER by "Column"';

open :result_set for myquery;
END;
/
print result_set
undefine tablename```

8. Foreign Script Kiddie
Join Date
Aug 2002
Location
Posts
5,253
You might compare that with the cost of gathering statistics on the table ... it's mostly the same query

9. I was under the impression that computing statistics on very large tables was frowned upon ??

Not sure where I got that impression from, is it unfounded ??

10. Foreign Script Kiddie
Join Date
Aug 2002
Location
Posts
5,253
I don't think it necessarily is ... it's just that you can get extremely accurate statistics without going that far. Data with heavy clustering of values generally needs a higher percentage.

I see from
Code:
`ROUND('||c.column_name|| '/total*100)`
that you are not really looking for a completely accurate answer here .. it is possible that an analyze with an estimation percent of less than 10 would have got you into the ballpark. When a person (especially a manager!) says "I need a completely accurate answer" it usually makes me shrug ... what are they going to do if the percentage of nulls is 13% that they wouldn't do if it was 14%? What if it was 13.25%? I really doubt that this makes any difference.

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•