Something very strange cropped up recently.
Alert log shows one database hanging as it can't archive:
O/S-Error: There is not enough space on the disk.
Its Windoze 2K.
This happened at 8pm on a Sunday evening at 20:16hrs when the building is empty and I am safely at home
90 minutes later the redo log is archived successfully and the database continues on its merry way.
The alert.log shows 90 minutes worth of these errors as it tries to archive sequence # 7708, then suddenly it archives it. I certainly didn't intervene and no-one else in the department says they did ... so i'm a tad perplexed.
A space check on this disk done ten days previously shows 8.36Gb free. (Its only a small system). As part of regular maintenance Support clear out older archived logs so I can't see how many were generated since this last check.
Rather than plough through the alert logs, I thought I'd check v$archived_log. This shows me all the archival activity but only from a point AFTER the archival/space error.
02-Jun - Space check: 8.36Gb free.
12-Jun - archival / space issue
21-Jun - today space check: 5.5Gb free.
The v$archived_log view only has entries after 14 June. The instance was last rebooted on 11th May. What has happened to v$archived_log entries between 11th May and 14 June?
There were no incidents reported by users which is why this has only come to light now. I'm puzzled as to what else could have produced the ORA-19504 other than a full disk, and also what happened to the v$archived_log entries.
(Its Standard Edition 220.127.116.11)
Maybe NOT that strange. The IT manager now admits:
"I deleted some archive redo one Sunday (can’t remember when) - I left 3 days worth as the disk was full – might have been the week before? The impetus for it was that the server was effectively down – I got a ZoneAlert text message ... As per the usual ,I deleted archive redo"
Never bothered telling anyone or recording what he did.
Still have the issue with the v$archived_log view tho!
Look at controlfile_record_keep_time, parts of v$controlfile_record_section and MAXLOGHISTORY parameter on your database.
Originally Posted by JMac
I know, I know.
What can you do though. He still hasn't said "I should have told you" or "sorry!!!" or anything in fact. He's my boss - and a vindictive, out-for-himself, pompous idiot at that. I've had to word an all i nclusive email to the department reminding them to keep me informed and fill in the server logs when they do anything on a production Oracle machine.
But what else goes on that they don't tell me? What other idiot sticks his fingers in? (And believe me, there are plenty here.)
If it goes tits up you can bet he won't admit to anything and he'll be saying ... "Oooh Oracle - that's John's responsibility".
Time to change the "oracle" password. Of course, he's probably got the root password so that doesn't help...
The real problem is not an Oracle-Problem, the real problem is
Originally Posted by JMac
(1) missing of Monitoring
(2) missing of Incident Management Process
(3) missing of Problem Management Process
(2) and (3) are management problems,
that what YOU can - and should - do:
(1) Immediately implement monitoring for the archive log destination space.
Oracle OEM is very good, but you need to check if the monitoring function still requires extra license.
Otherwise any simple script sending email or SMS is better than nothing.
(2) Immediately implement monitoring of the alert-log
same comments as above
A database should not be declared "production" or "live" before that is implemented
(3) Implement automated archiving of archivelogs to tape
Note: Between "installing and starting" and "operating" a database is long way ....
(4) Purchase books about ITIL and read the chapters about "Incident Management" and "Problem Management" (that requires of course the recording/documentation of each incident and problem....)
Do that yourself (even a spreadsheet is better than nothing) - Later you might convince your manager that this processes should be introduced officially. I prefer "proofing that this can be done by doing it myself" then endless academic discussions when this "major project" with "significant cost for a tool" will be started... (A SOX-audit might give that an immediate kick....)
(5) Include the archivelog space into your capacity planning activities and ensure that for standard operations there is sufficient reserve dependent on the archiving-to-tape intervall. (Of course that requires some statistical analysis of historic data + understanding of load-situations)
(6) Then there is still the situation left that your archivedestination runs full because of an "unusual" activity, e.g. someone "repairing" data by updating most rows in a 100 Mio records table .... or by similar activities during applying an application change (e.g. adding a column with "not null default 0" to a 100 Mio record table ....). This you need to address by an awareness-workshop with developers and application support staff - and don't foreget to invite the change manager.
Click Here to Expand Forum to Full Width