That’s the second thing I do when I get a call of the “the system is slow” type.
The first thing I do is run “top” (or whichever alternative for the OS) and check the overall CPU usage. IO wait on system using sar command.
Below sql is very helpful ,
select round(avg(max(cnt_tot)) over (order by sample_time RANGE BETWEEN INTERVAL '5' minute PRECEDING AND current row)) as avg,max(cnt_tot) as tot,
SAMPLE_time,max(cnt) as cnt, event,substr(sq.sql_text,1), ash.sql_id, ash.sql_child_number chd,/* plsql_entry_object_id, plsql_entry_subprogram_id,
plsql_object_id, plsql_subprogram_id, session_state, qc_session_id, qc_instance_id, blocking_session, blocking_session_status, blocking_session_serial#, event_id, event#, seq#, p1text, p1, p2text, p2, p3text, p3, wait_class, wait_class_id, wait_time, time_waited, xid, current_obj#, current_file#, current_block#, program, module, action, client_id*/to_char(round(sum(elapsed_time) / nullif(sum(executions), 0) / 1000000, 6), '9,999,999,990.999999') as "sec p",round(sum(disk_reads) / nullif(sum(executions), 0), 0) as "disk p", round(sum(buffer_gets) / nullif(sum(executions), 0), 0) as "gets p",round(sum(rows_processed) / nullif(sum(executions), 0), 0) as "rows p",round(sum(cpu_time) / 1000000 / nullif(sum(executions), 0), 3) as "cpu p", sum(executions) as exec, sum(users_opening) as open,sum(users_executing) as e
from ( select sum(count(*)) over (partition by sample_time) as cnt_tot, count(*) as cnt,SAMPLE_time, event,sql_id,sql_child_number from gv$active_session_history where 1=1
and sample_time > sysdate - interval '3' hour
group by event, sql_id,sql_child_number, SAMPLE_time having count(*) >= 2) ash, gv$sql sq
where ash.sql_id = sq.sql_id(+) and ash.sql_child_number=sq.child_number (+)
group by event,sql_text,ash.sql_id,ash.sql_child_number, ash.SAMPLE_time order by sample_time desc;
It has two “variables” that you can adjust: how far back to look (I use two hours), and how aggressively to look for problems (having count(*) >= 3).
Here’s the explanations of each column:
AVG — the average "load" (active sessions) over a 5-minute interval. This should help you spot a problem when you scroll through the results.
TOT — total "load" (active sessions) for that sample time. RAC users: each RAC node will have its own sample time, within 1 second of each other, but not exactly spot-on. So, even if you have sessions waiting on the same event, they will not be grouped together. I kind of like it this way, for now.
SAMPLE_TIME — self-explanatory
CNT — the number of active sessions waiting on the same event and query
EVENT — the event been waited on
SQL_TEXT — self-explanatory, except when empty which means either not found in shared pool or not available in ASH
SQL_ID — if you need to find the SQL
CHD — the child number being executed
No comments:
Post a Comment