During a moderately heavy load, the database sent the "Failed to archive log" message to the alert log. I understand that this is an informational message and can be ignored. But the issue is that around the time that this message appeared in the alert log, the database seemed to hang. Logins would take up to 30 seconds and active sessions would stall. I think the problem is with I/O contention with the redo logs and the archive log processes as they are on the same physical drives. The Log File Sync wait event is relatively high compared to the other wait events ( ~ 400ms/sec peak ).
Can anyone confirm my theory, offer an alternative reason for this behavior or point me to some other reasons? Thanks,
The key to understanding what is delaying 'log file sync' is to compare average times waited for 'log file sync' and 'log file parallel write':
* If they are almost similar, then redo logfile I/O is causing the delay and the guidelines for tuning it should be followed.
* If 'log file parallel write' is significantly different i.e smaller, then the delay is caused by the other parts of the Redo Logging mechanism that occur during a COMMIT/ROLLBACK (and are not I/O-related). Sometimes there will be latch contention on redo latches, evidenced by 'latch free' or 'LGWR wait for redo copy' wait events.