[20200808]优化的困惑10.txt

来源:这里教程网 时间:2026-03-03 16:05:04 作者:

[20200808]优化的困惑10.txt --//前一段时间遇到的问题,链接http://www.itpub.net/thread-2137868-1-1.html,CPU使用率100%. --//我下载对方提供的awr报表,顺便说一下.对方提供的awr html文件无法正常跳转.必须使用vim执行如下: --// :%s+http://172.19.11.136/awr/dongguan/DGMESDB-mesdb/202007/mesdb_73613_73614_202007220700_202007220800.html++ --//很容易定位里面的问题,逻辑读太高,消耗大量CPU资源,而里面sql语句使用rule提示. --//但是我开始看时还是遇到1点点疑惑. --//注:数据库版本是10.2.0.5.0,操作系统:windows 64位版本的,实际上我个人建议不要在windows下跑数据库,特别在生产环境,无形增 --//加问题定位的难度. 1.DB Time问题:             Snap Id  Snap Time           Sessions  Cursors/Session Begin Snap: 73613    22-Jul-20 07:00:36  206       24.9 End Snap:   73614    22-Jul-20 08:00:16  212       23.2 Elapsed:             59.66 (mins)                                      DB Time:             1,070.16 (mins)          --//取样1个小时,而DB Time达到1070/60 = 17.83 小时.而实际的CPU数量是8.: Operating System Statistics Statistic               Total AVG_BUSY_TIME           354,972 AVG_IDLE_TIME           2,432 AVG_SYS_TIME            4,388 AVG_USER_TIME           350,487 BUSY_TIME               2,840,695 IDLE_TIME               19,653 SYS_TIME                36,011 USER_TIME               2,804,684 RSRC_MGR_CPU_WAIT_TIME  0 VM_IN_BYTES             -1.5E+19 VM_OUT_BYTES            2.1E+16 PHYSICAL_MEMORY_BYTES   85,862,932,480 NUM_CPUS                8 NUM_CPU_CORES           8 --//实际的CPU数量仅仅8个,感觉不大可能DB time大于8小时. --//再回过头看. Top 5 Timed Events Event                           Waits    Time(s)  Avg Wait(ms) % Total Call Time  Wait Class CPU time                                 29,020                45.2                 latch: cache buffers chains     101,977  4,455    44           6.9                Concurrency db file sequential read         261,308  3,864    15           6.0                User I/O log file sync                   124,032  3,540    29           5.5                Commit direct path read temp           38,162   1,321    35           2.1                User I/O --//注意观察列 Total Call Time,CPU time仅仅占45.2. 这些前5名相加仅仅45.2+6.9+6.0+5.5+2.1 = 65.7. --//还有什么其它事件占3X%呢? --//另外latch: cache buffers chains 的Avg Wait(ms) 达到 44ms,真心觉得有一些夸张...CPU已经消耗殆尽了. --//顺便说一下,我以前学习oracle一直认为CPU资源很难消耗100%.除非人为"破坏".直到随着学习的深入... 2.继续: --//再看SQL ordered by Elapsed Time部分. SQL ordered by Elapsed Time     Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.     % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100     Total DB Time (s): 64,209     Captured SQL account for 55.0% of Total --//注意Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code. Elapsed Time (s)  CPU Time (s)  Executions  Elap per Exec (s)  % Total DB Time  SQL Id         SQL Module  SQL Text 6,346             2,655         164,771     0.04               9.88             gyc05t01rz49h  TGS2.exe    declare p_owner varchar(30) :=... 6,110             2,551         164,791     0.04               9.52             0tpg5r73s1146  TGS2.exe    SELECT /*+ RULE */ COUNT(*) FR... 5,389             2,601         164,835     0.03               8.39             8cd1y3p3rnsys  TGS2.exe    declare p_owner varchar(30) :=... 5,316             2,563         164,829     0.03               8.28             6x6aknt5p0uqd  TGS2.exe    SELECT /*+ RULE */ ARGUMENT_NA... --//一定是sql_id=gyc05t01rz49h里面调用sql_id=0tpg5r73s1146语句. --//sql_id=gyc05t01rz49h格式化如下: DECLARE    p_owner         VARCHAR (30) := :owner;    p_object_name   VARCHAR (30) := :object_name;    p_proc_name     VARCHAR (30) := :procedure_name;    p_overload      NUMBER := :overload;    p_object_type   VARCHAR (19);    p_count         NUMBER; BEGIN    IF p_owner IS NULL    THEN       BEGIN          SELECT /*+ RULE */                object_type, owner            INTO p_object_type, p_owner            FROM sys.all_objects           WHERE     owner = SYS_CONTEXT ('USERENV', 'CURRENT_SCHEMA')                 AND object_type IN                        ('PROCEDURE'                        ,'FUNCTION'                        ,'PACKAGE'                        ,'PACKAGE BODY'                        ,'SYNONYM')                 AND object_name = p_object_name                 AND ROWNUM <= 1;       EXCEPTION          WHEN NO_DATA_FOUND          THEN             BEGIN                SELECT /*+ RULE */                      object_type, owner                  INTO p_object_type, p_owner                  FROM sys.all_objects                 WHERE     owner = 'PUBLIC'                       AND object_type IN                              ('PROCEDURE'                              ,'FUNCTION'                              ,'PACKAGE'                              ,'PACKAGE BODY'                              ,'SYNONYM')                       AND object_name = p_object_name                       AND ROWNUM <= 1;             EXCEPTION                WHEN NO_DATA_FOUND                THEN                   IF p_proc_name IS NOT NULL                   THEN                      p_owner := p_object_name;                      p_object_name := p_proc_name;                      p_proc_name := NULL;                   ELSE                      RAISE;                   END IF;             END;       END;    END IF;    IF p_object_type IS NULL    THEN       SELECT /*+ RULE */             object_type         INTO p_object_type         FROM sys.all_objects        WHERE     object_type IN                     ('PROCEDURE'                     ,'FUNCTION'                     ,'PACKAGE'                     ,'PACKAGE BODY'                     ,'SYNONYM')              AND owner = p_owner              AND object_name = p_object_name              AND ROWNUM <= 1;    END IF;    WHILE p_object_type = 'SYNONYM'    LOOP       SELECT /*+ RULE */             table_owner, table_name         INTO p_owner, p_object_name         FROM sys.all_synonyms        WHERE owner = p_owner AND synonym_name = p_object_name;       SELECT /*+ RULE */             object_type         INTO p_object_type         FROM sys.all_objects        WHERE owner = p_owner AND object_name = p_object_name AND ROWNUM <= 1;    END LOOP;    SELECT /*+ RULE */          COUNT (*)      INTO p_count      FROM sys.all_procedures     WHERE     owner = p_owner           AND object_name = p_object_name           AND (   procedure_name = p_proc_name                OR p_proc_name IS NULL AND procedure_name IS NULL);    IF p_count < p_overload    THEN       RAISE NO_DATA_FOUND;    END IF;    :owner := p_owner;    :object_name := p_object_name;    :procedure_name := p_proc_name;    SELECT /*+ RULE */          COUNT (*)      INTO :param_count      FROM sys.all_arguments     WHERE     owner = p_owner           AND (       p_proc_name IS NULL                   AND package_name IS NULL                   AND object_name = p_object_name                OR     p_proc_name IS NOT NULL                   AND package_name = p_object_name                   AND object_name = p_proc_name                   AND NVL (overload, 1) = p_overload)           AND data_type IS NOT NULL; END; --//sql_id = 0tpg5r73s1146. SELECT /*+ RULE */       COUNT (*)   FROM SYS.ALL_ARGUMENTS  WHERE     OWNER = :B4        AND (   :B2 IS NULL AND PACKAGE_NAME IS NULL AND OBJECT_NAME = :B3             OR     :B2 IS NOT NULL                AND PACKAGE_NAME = :B3                AND OBJECT_NAME = :B2                AND NVL (OVERLOAD, 1) = :B1)        AND DATA_TYPE IS NOT NULL; --//仔细比较可以发现sql_id = 0tpg5r73s1146正是前面存储过程最后要执行的语句. --//我有点奇怪的是使用rule提示.这些存储过程是开发写的吗?还是程序后台自动生成执行的,这个也是我的疑问. --//这样就很好理解我为什么看到db time很高了,db time的执行时间被重复计算了. --//再看SQL ordered by Gets部分. SQL ordered by Gets Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code. Total Buffer Gets: 8,982,272,506 Captured SQL account for 29.5% of Total Buffer Gets  Executions  Gets per Exec  %Total  CPU Time (s)  Elapsed Time (s)  SQL Id                 SQL Module  SQL Text 945,581,260  164,771     5,738.76       10.53   2655.28       6345.51           gyc05t01rz49h  TGS2.exe    declare p_owner varchar(30) :=... 940,649,793  164,835     5,706.61       10.47   2600.58       5389.32           8cd1y3p3rnsys  TGS2.exe    declare p_owner varchar(30) :=... 940,453,081  164,791     5,706.94       10.47   2550.73       6109.69           0tpg5r73s1146  TGS2.exe    SELECT /*+ RULE */ COUNT(*) FR... 940,296,517  164,829     5,704.68       10.47   2562.72       5316.13           6x6aknt5p0uqd  TGS2.exe    SELECT /*+ RULE */ ARGUMENT_NA... --//可以相互验证.其实每次并不是很多,仅仅57XX 逻辑读,但是执行频率非常高. --//164771/3600 = 45.76次/秒. 3.解决方案: --//设置 alter system set "_optimizer_ignore_hints"=true看看. SYS@test> alter system set "_optimizer_ignore_hints"=true scope=memory; System altered. --//也可以在session 设置"_optimizer_ignore_hints"=true;参数. 通过登陆触发器设置. --//分析数据字典以及固定对象. execute sys.dbms_stats.GATHER_FIXED_OBJECTS_STATS() execute sys.dbms_stats.GATHER_DICTIONARY_STATS() --//可惜提出解决方案后,楼主一直没给出反馈建议.也不知道我分析是否正确. --//还有1个非常无赖的方案就是玩一点点欺骗,改名视图sys.all_arguments,然后建立一张真实的表代替. --//sys.all_arguments里面实际上记录调用函数,存储过程的参数. --//还有我记忆遇到程序大量调用SYS.ALL_ARGUMENTS是使用deliph程序开发的. --//不过我以前遇到的情况不同,执行语句里面没owner,而是直接写ALL_ARGUMENTS,这样更加简单,我直接在用户的schema下建议一张 --//ALL_ARGUMENTS表,来欺骗应用程序,风险就是如果改存储过程带入参数发生变化,ALL_ARGUMENTS也要同步更新.不知道不更新情况会怎 --//样?? --//当然解决完上述问题,还可能需要优化剩下的sql语句,不过lz没反馈,只能分析到这里了.比如: a4g0jrkkr3ggn --//格式化:   SELECT COUNT (*) AS fail_QTY, T1.WORK_ORDER, T2.DEFECT_DESC     FROM (SELECT a.recid                 ,a.DEFECT_id                 ,A.WORK_ORDER                 ,a.SERIAL_NUMBER                 ,a.rec_time                 ,B.Defect_Code                 ,ROW_NUMBER () OVER ( PARTITION BY A.WORK_ORDER, A.SERIAL_NUMBER ORDER BY A.rec_time ASC, B.Defect_Code DESC)                     AS rowindex             FROM sajet.G_SN_DEFECT a, sajet.sys_defect B            WHERE     A.PROCESS_ID = :ProcessId                  AND A.WORK_ORDER = :WO                  AND a.defect_id = B.defect_id                  AND a.SERIAL_NUMBER IN                         (SELECT SERIAL_NUMBER                            FROM (SELECT A.WORK_ORDER                                        ,a.SERIAL_NUMBER                                        ,a.CURRENT_STATUS                                        ,a.OUT_PROCESS_TIME                                        ,RANK () OVER ( PARTITION BY A.WORK_ORDER ,A.SERIAL_NUMBER ORDER BY a.OUT_PROCESS_TIME)                                            AS rowindex                                    FROM sajet.G_SN_TRAVEL a                                   WHERE     A.WORK_ORDER = :WO                                         AND A.PROCESS_ID = :ProcessId) b                           WHERE     b.CURRENT_STATUS = 1                                 AND b.rowindex = 1                                 AND TO_CHAR ( b.OUT_PROCESS_TIME ,'yyyymmddHH24') BETWEEN :STime AND :ETime)) T1         ,sajet.SYS_DEFECT T2    WHERE T1.rowindex = 1 AND T1.DEFECT_id = T2.DEFECT_id GROUP BY T1.WORK_ORDER, T2.DEFECT_DESC ORDER BY COUNT (*) DESC; --//使用TO_CHAR函数,以及内层IN使用分析函数是否多余,理论讲只要exists应该就可以了.看看是否有优化余地. 4.继续分析: --//看看Load Profile部分: Load Profile                   Per Second    Per Transaction Redo size:        338,989.88    2,438.14 Logical reads:    2,509,093.86  18,046.33 Block changes:    3,404.38      24.49 Physical reads:   284.63        2.05 Physical writes:  257.32        1.85 User calls:       975.01        7.01 Parses:           925.49        6.66 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Hard parses:      33.63         0.24 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sorts:            940.09        6.76 Logons:           0.64          0.00 Executes:         10,063.58     72.38 Transactions:     139.04                            --//硬分析不是很大.但是在SQL ordered by CPU Time部分出现: CPU Time (s)  Elapsed Time (s)  Executions  CPU per Exec (s)  % Total  % Total DB Time  SQL Id           SQL Module     SQL Text 2,655         6,346             164,771     0.02              9.15     9.88             gyc05t01rz49h  TGS2.exe       declare p_owner varchar(30) :=... 2,601         5,389             164,835     0.02              8.96     8.39             8cd1y3p3rnsys  TGS2.exe       declare p_owner varchar(30) :=... 2,563         5,316             164,829     0.02              8.83     8.28             6x6aknt5p0uqd  TGS2.exe       SELECT /*+ RULE */ ARGUMENT_NA... 2,551         6,110             164,791     0.02              8.79     9.52             0tpg5r73s1146  TGS2.exe       SELECT /*+ RULE */ COUNT(*) FR... 848           984               30          28.26             2.92     1.53             67086ft5pmkbn                 DECLARE job BINARY_INTEGER := ... 582           1,174             6,607       0.09              2.01     1.83             04t2q2nf3pd4b  APService.exe  begin SAJET.SJ_CKRT_INPUTMATER... 579           1,165             6,607       0.09              1.99     1.81             cca7qc96c2k2h  APService.exe  SELECT COUNT (DISTINCT STATION... 293           736               128,775     0.00              1.01     1.15             7dupqruy76j0u  TGS2.exe       BEGIN COMMAND_CODE(:TSAJET1, :... 213           380               690         0.31              0.73     0.59             gdb3vjjk0m4xs  APService.exe  select max(END_TIME) from SAJE... 211           392               690         0.31              0.73     0.61             88jc1wztmqbky  APService.exe  select min(END_TIME) from SAJE... --//sql_id=gdb3vjjk0m4xs gdb3vjjk0m4xs     select max(END_TIME) from SAJET.G_WH_HR A where A.WORK_ORDER= '20405946' AND PROCESS_ID='100085' AND SHIFT_ID IN ( 10000017) AND SHIFT_DATE ='2020/07/21' AND PDLINE_ID = 10075 SELECT max(END_TIME)   FROM SAJET.G_WH_HR A  WHERE A.WORK_ORDER = '20405946'    AND PROCESS_ID   = '100085'    AND SHIFT_ID IN ( 10000017)    AND SHIFT_DATE   = '2020/07/21'    AND PDLINE_ID    = 10075 --//没有使用绑定变量,应该可以有优化余地. 88jc1wztmqbky     select min(END_TIME) from SAJET.G_WH_HR A where A.WORK_ORDER= '20405946' AND PROCESS_ID='100085' AND SHIFT_ID IN ( 10000017) AND SHIFT_DATE ='2020/07/21' AND PDLINE_ID = 10075 SELECT min(END_TIME)   FROM SAJET.G_WH_HR A  WHERE A.WORK_ORDER = '20405946'    AND PROCESS_ID   = '100085'    AND SHIFT_ID IN ( 10000017)    AND SHIFT_DATE   = '2020/07/21'    AND PDLINE_ID    = 10075 --//没有使用绑定变量,应该可以有优化余地.我估计程序应该有很多类似的语句.awr报表仅仅出现2次.

相关推荐