[20240512]没有建表引发的灾难.txt --//五一前遇到的问题,生产系统应用程序升级,但是3个表忘记建立,而编写的程序可能存在问题,导致频繁调用这些根本不可能执行的sql --//语句,正常情况下应该报ORA-00942: table or view does not exist,很奇怪的同事反映应用前台根本不报错,真不知道开发如何写代 --//码,难道没做例外处理以及记录吗?导致前台无法应用程序无法执行任何操作,业务停顿接近1个小时. --//在测试环境模拟看看出现这样的情况,看看如何诊断和分析问题. 1.环境: SCOTT@test01p> @ver1 PORT_STRING VERSION BANNER CON_ID -------------------- ---------- ---------------------------------------------------------------------------- ------ IBMPC/WIN_NT64-9.1.0 12.2.0.1.0 Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production 0 $ ./sql_idz.sh 'select count(1) from deptxxx' 0 sql_text = select count(1) from deptxxx\0 full_hash_value(16) = AA0B8A5E997323CE2D65F9B7AF91ED4F hash_value(10) = 2945576271 sql_id(32) = 2utgtqyrt3vag sql_id(32) = 2utgtqyrt3vag sql_id(32) = 2utgtqyrt3vag 2.建立测试环境: $ cat m2.txt set verify off declare v_id number; begin for i in 1 .. &&1 loop select count(1) into v_id from deptxxx; end loop; end ; / quit SCOTT@test01p> @ m2.txt 20 select count(1) into v_id from deptxxx; * ERROR at line 5: ORA-06550: line 5, column 40: PL/SQL: ORA-00942: table or view does not exist ORA-06550: line 5, column 9: PL/SQL: SQL Statement ignored --//不能通过PL/SQL测试,等于无法编译自动中断执行.也就是仅仅调用1次, 3.换一个测试方式: $ cat m2.sh #! /bin/bash sqlplus -s -l scott/btbtms@test01p <<EOF >/dev/null set feedback off select count(1) from deptxxx; / ... --//snap / quit EOF --//主:中间的/太长,我加入20000个/,我的测试环境cygwin有点慢.如果是linux环境,你可以写成$(seq 20000|xargs -IQ echo /). --//这样执行有一个缺点就是先要展开里面/才能执行,这样在测试环境时有点延迟执行,我的测试环境延迟实在是无法忍受. --//我估计在linux执行也存在类似问题,可以尝试改成$(seq 1000|xargs -IQ echo /). --//实在不行还是写成20000个/的形式. --//开始测试,每次并发10个执行. $ ~/bin/zzdate ; seq 100 | xargs -IQ -P 10 bash -c ./m2.sh trunc(sysdate)+19/24+56/1440+51/86400 == 2024/05/12 19:56:51 == "timestamp'2024-05-12 19:56:51'" ... SYS@test> @ ashtop event 1=1 trunc(sysdate)+19/24+56/1440+51/86400 sysdate Total Distinct Distinct Distinct Seconds AAS %This EVENT FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1 ------- ------- ------- ----------------------------- ------------------- ------------------- ---------- -------- ----------- 150 3.8 54% | kksfbc child completion 2024-05-12 19:56:56 2024-05-12 19:57:30 1 32 32 69 1.7 25% | library cache lock 2024-05-12 19:56:59 2024-05-12 19:57:30 1 17 17 19 .5 7% | cursor: pin S wait on X 2024-05-12 19:56:57 2024-05-12 19:57:28 1 11 11 19 .5 7% | 2024-05-12 19:56:57 2024-05-12 19:57:30 1 18 18 11 .3 4% | SQL*Net break/reset to client 2024-05-12 19:56:54 2024-05-12 19:57:28 1 10 10 9 .2 3% | library cache: mutex X 2024-05-12 19:56:59 2024-05-12 19:57:29 1 8 8 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 .0 0% | control file sequential read 2024-05-12 19:57:26 2024-05-12 19:57:26 1 1 1 7 rows selected. --//2024/05/12 19:56:51过了8秒以后出现library cache lock,library cache: mutex X. SYS@test> @ ashtop sql_id,event 1=1 trunc(sysdate)+19/24+56/1440+51/86400 sysdate Total Distinct Distinct Distinct Seconds AAS %This SQL_ID EVENT FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1 ------- ------- ------- ------------- ----------------------------- ------------------- ------------------- ---------- -------- ----------- 633 4.1 49% | 2utgtqyrt3vag library cache lock 2024-05-12 19:56:59 2024-05-12 19:59:24 1 113 113 411 2.7 32% | 2utgtqyrt3vag kksfbc child completion 2024-05-12 19:56:56 2024-05-12 19:59:23 1 121 121 84 .5 7% | 2utgtqyrt3vag 2024-05-12 19:56:57 2024-05-12 19:59:24 1 77 77 55 .4 4% | 2utgtqyrt3vag cursor: pin S wait on X 2024-05-12 19:56:57 2024-05-12 19:59:24 1 39 39 36 .2 3% | 2utgtqyrt3vag library cache: mutex X 2024-05-12 19:56:59 2024-05-12 19:59:13 1 33 33 35 .2 3% | SQL*Net break/reset to client 2024-05-12 19:56:54 2024-05-12 19:59:24 1 32 32 14 .1 1% | library cache: mutex X 2024-05-12 19:57:08 2024-05-12 19:59:14 1 12 12 7 .0 1% | 2024-05-12 19:57:24 2024-05-12 19:59:17 1 7 7 2 .0 0% | 2utgtqyrt3vag cursor: mutex S 2024-05-12 19:58:23 2024-05-12 19:58:23 1 1 1 2 .0 0% | cbs6jb3jmg9nm 2024-05-12 19:57:31 2024-05-12 19:59:20 2 2 2 2 .0 0% | control file sequential read 2024-05-12 19:57:26 2024-05-12 19:57:32 1 2 2 1 .0 0% | 2utgtqyrt3vag library cache: bucket mutex X 2024-05-12 19:59:24 2024-05-12 19:59:24 1 1 1 12 rows selected. --//sql_id=2utgtqyrt3vag,我前面有计算.执行脚本源代码在最后. SYS@test> @ sql_id 2utgtqyrt3vag --SQL_ID = 2utgtqyrt3vag --//查询不到sql语句,我估计我同事在这里给搞昏了. --//如果当时查询到sql语句,我估计定位很快,不会走许多弯路.... SYS@test> @ sharepool/shp4x 2utgtqyrt3vag 0 TEXT KGLHDADR KGLHDPAR C40 KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0 KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16 N0_6_16 N20 KGLNAHSH KGLOBT03 KGLOBT09 --------------------- ---------------- ---------------- ---------------------------- -------- -------- -------- ---------------- -------- -------- -------- -------- --------- ------- ---------- ------------- -------- .. child handle address 000007FF11A10938 000007FF000E6070 select count(1) from deptxxx 0 0 57 00 00 0 0 3561717 3561717 3561717 2945576271 2utgtqyrt3vag 4374 child handle address 000007FF11F45AD8 000007FF000E6070 select count(1) from deptxxx 0 0 3 00 00 0 0 3561717 3561717 3561717 2945576271 2utgtqyrt3vag 4375 parent handle address 000007FF000E6070 000007FF000E6070 select count(1) from deptxxx 1 0 275975 000007FF11F91680 00 3563000 0 0 3563000 3563000 2945576271 2utgtqyrt3vag 65535 4377 rows selected. --//我的测试产生大量子光标.估计 kksfbc child completion,cursor: pin S wait on X与这个因素有关 SYS@test> @ wcx sysdate-3/86400 sysdate -- Display ASH Wait Chain Signatures script v0.7 by Tanel Poder ( http://blog.tanelpoder.com ) %This SECONDS AAS WAIT_CHAIN FIRST_SEEN LAST_SEEN ------ ------- --- ------------------------------------------------------------------------------------------------------------------------------------------ ------------------- ------------------- 7% 2 .7 -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58 7% 2 .7 -> 179,16416,@1=>181,3323,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58 7% 2 .7 -> 179,16416,@1=>258,12330,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58 7% 2 .7 -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58 7% 2 .7 -> 179,16416,@1=>175,11537,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58 7% 2 .7 -> 179,16416,@1=>97,50538,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58 7% 2 .7 -> 179,16416,@1=>18,48560,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58 7% 2 .7 -> 179,16416,@1=>262,10796,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58 7% 2 .7 -> 179,16416,@1=>98,33150,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58 7% 2 .7 -> 179,16416,@1=>176,47654,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58 3% 1 .3 -> ,,@=>175,11537,@1=>kksfbc child completion 2024-05-12 20:04:59 2024-05-12 20:04:59 3% 1 .3 -> ,,@=>18,48560,@1=> 2024-05-12 20:04:59 2024-05-12 20:04:59 3% 1 .3 -> 18,48560,@1=>176,47654,@1=>library cache lock -> ,,@=>18,48560,@1=> 2024-05-12 20:04:59 2024-05-12 20:04:59 3% 1 .3 -> 18,48560,@1=>98,33150,@1=>library cache lock -> ,,@=>18,48560,@1=> 2024-05-12 20:04:59 2024-05-12 20:04:59 3% 1 .3 -> 18,48560,@1=>17,23902,@1=>library cache lock -> ,,@=>18,48560,@1=> 2024-05-12 20:04:59 2024-05-12 20:04:59 3% 1 .3 -> 18,48560,@1=>258,12330,@1=>library cache lock -> ,,@=>18,48560,@1=> 2024-05-12 20:04:59 2024-05-12 20:04:59 3% 1 .3 -> 18,48560,@1=>97,50538,@1=>library cache lock -> ,,@=>18,48560,@1=> 2024-05-12 20:04:59 2024-05-12 20:04:59 3% 1 .3 -> ,,@=>181,3323,@1=>kksfbc child completion 2024-05-12 20:04:59 2024-05-12 20:04:59 3% 1 .3 -> 18,48560,@1=>179,16416,@1=>library cache lock -> ,,@=>18,48560,@1=> 2024-05-12 20:04:59 2024-05-12 20:04:59 19 rows selected. --//出现大量library cache lock阻塞. SYS@test> @ ashtop event 1=1 sysdate-5/86400 sysdate Total Distinct Distinct Distinct Seconds AAS %This EVENT FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1 ------- ------- ------- ----------------------- ------------------- ------------------- ---------- -------- ----------- 38 7.6 79% | library cache lock 2024-05-12 20:05:52 2024-05-12 20:05:56 1 5 5 4 .8 8% | library cache: mutex X 2024-05-12 20:05:54 2024-05-12 20:05:55 1 2 2 3 .6 6% | kksfbc child completion 2024-05-12 20:05:52 2024-05-12 20:05:56 1 3 3 3 .6 6% | 2024-05-12 20:05:52 2024-05-12 20:05:56 1 3 3 --// kksfbc child completion 应该与产生大量子光标有关,为什么不是很清楚,生产系统我没有看到,也许不在前30条. SYS@test> @ ashtop sql_id,event,p1,p2,p3 "event like 'library cache%'" sysdate-5/86400 sysdate Total Distinct Distinct Distinct Seconds AAS %This SQL_ID EVENT P1 P2 P3 FIRST_SEEN LAST_SEEN Execs Seen Tstamps Execs Seen1 ------- --- ------- ------------- ---------------------- -------------- -------------- ------- ------------------- ------------------- ---------- -------- ----------- 3 .6 6% | 2utgtqyrt3vag library cache lock 8792101725488 8792090672984 5373954 2024-05-12 20:07:02 2024-05-12 20:07:06 1 3 3 2 .4 4% | 2utgtqyrt3vag library cache lock 8792101725488 8791800107672 5373954 2024-05-12 20:07:04 2024-05-12 20:07:06 1 2 2 2 .4 4% | 2utgtqyrt3vag library cache lock 8792101725488 8791808899128 5373954 2024-05-12 20:07:03 2024-05-12 20:07:05 1 2 2 2 .4 4% | 2utgtqyrt3vag library cache lock 8792101725488 8792090306696 5373954 2024-05-12 20:07:03 2024-05-12 20:07:04 1 2 2 2 .4 4% | 2utgtqyrt3vag library cache lock 8792101725488 8792156248984 5373954 2024-05-12 20:07:04 2024-05-12 20:07:05 1 2 2 2 .4 4% | 2utgtqyrt3vag library cache lock 8792101725488 8792156687152 5373954 2024-05-12 20:07:03 2024-05-12 20:07:05 1 2 2 2 .4 4% | 2utgtqyrt3vag library cache: mutex X 3468241561 0 5374033 2024-05-12 20:07:04 2024-05-12 20:07:05 1 2 2 2 .4 4% | 2utgtqyrt3vag library cache: mutex X 3468241561 768799145984 5374033 2024-05-12 20:07:03 2024-05-12 20:07:03 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8791798282960 5373954 2024-05-12 20:07:02 2024-05-12 20:07:02 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8791798283256 5373954 2024-05-12 20:07:05 2024-05-12 20:07:05 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8791798284016 5373954 2024-05-12 20:07:06 2024-05-12 20:07:06 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8791800107376 5373954 2024-05-12 20:07:05 2024-05-12 20:07:05 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8791800107376 5373955 2024-05-12 20:07:03 2024-05-12 20:07:03 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8791800107968 5373954 2024-05-12 20:07:02 2024-05-12 20:07:02 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8791808899720 5373954 2024-05-12 20:07:04 2024-05-12 20:07:04 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8791808900480 5373954 2024-05-12 20:07:06 2024-05-12 20:07:06 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792090307920 5373954 2024-05-12 20:07:05 2024-05-12 20:07:05 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792090308384 5373954 2024-05-12 20:07:02 2024-05-12 20:07:02 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792090671928 5373954 2024-05-12 20:07:04 2024-05-12 20:07:04 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792100505400 5373954 2024-05-12 20:07:05 2024-05-12 20:07:05 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792100505992 5373954 2024-05-12 20:07:04 2024-05-12 20:07:04 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792100506752 5373954 2024-05-12 20:07:02 2024-05-12 20:07:02 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792102509568 5373954 2024-05-12 20:07:03 2024-05-12 20:07:03 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792102509864 5373954 2024-05-12 20:07:04 2024-05-12 20:07:04 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792102510920 5373955 2024-05-12 20:07:05 2024-05-12 20:07:05 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792156249912 5373954 2024-05-12 20:07:03 2024-05-12 20:07:03 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792156250208 5373955 2024-05-12 20:07:02 2024-05-12 20:07:02 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792156250672 5373954 2024-05-12 20:07:06 2024-05-12 20:07:06 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792156687784 5373954 2024-05-12 20:07:02 2024-05-12 20:07:02 1 1 1 1 .2 2% | 2utgtqyrt3vag library cache lock 8792101725488 8792156688376 5373954 2024-05-12 20:07:06 2024-05-12 20:07:06 1 1 1 30 rows selected. --//记住library cache lock的P1=8792101725488,library cache: mutex X P1=3468241561 --//8792101725488 = 0x7ff1219a530 --//5373954 = /2^16 %2^16 (Type | Mode) = 82,2 = 0x520002 ,82 表示SQL AREA BUILD. --//前面计算的sql语句hash_value(10) = 2945576271,而等待事件的P1,P2,P3并没有与之相关. SYS@test> select distinct kglhdnsp,kglhdnsd,kglobtyd from x$kglob where KGLHDNSP=82 order by 1; KGLHDNSP KGLHDNSD KGLOBTYD -------------- -------------- ------------- 82 SQL AREA BUILD CURSOR SYS@test> @ ev_name "library cache%" EVENT# EVENT_ID NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS DISPLAY_NAME ------ ---------- --------------------------------- -------------- ------------ ------------------ ------------- ----------- ----------- ---------------------------------- 342 2802704141 library cache pin handle address pin address 100*mode+namespace 3875070507 4 Concurrency library cache pin 343 916468430 library cache lock handle address lock address 100*mode+namespace 3875070507 4 Concurrency library cache lock 344 2952162927 library cache load lock object address lock address 100*mask+namespace 3875070507 4 Concurrency library cache load lock 345 1646780882 library cache: mutex X idn value where 3875070507 4 Concurrency library cache: mutex X 346 2601513493 library cache: bucket mutex X idn value where 3875070507 4 Concurrency library cache: bucket mutex X 347 2382371857 library cache: dependency mutex X idn value where 3875070507 4 Concurrency library cache: dependency mutex X 348 2446268751 library cache: mutex S idn value where 3875070507 4 Concurrency library cache: mutex S 1361 800384622 library cache revalidation 1893977003 0 Other library cache revalidation 1362 1578286316 library cache shutdown 1893977003 0 Other library cache shutdown 9 rows selected. --//library cache lock的P1表示handle address,看看是怎么对象. --//library cache: mutex X的P1表示idn,我以前理解对应hash_vale.可以是sql语句或者某个对象的hash_value. --//Tanel Poder有1篇文档,Oracle_Latch_And_Mutex_Contention_Troubleshooting.pdf library cache: mutex* wait events 1) library cache hash bucket number (if idn <= 131072) 2) idn = hash value of the library cache object under protection (if idn > 131072) --//一点疑问,如果hash value<=131072(2^17)呢? --//library cache lock的P1表示handle address,看看是怎么对象. SYS@test> @ fchaz 0x7ff1219a530 LOC KSMCHPTR KSMCHIDX KSMCHDUR KSMCHCOM KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR KSMCHPTR_END --- ---------------- -------------- -------------- ---------------- -------------- -------- -------------- ---------------- ----------------- SGA 000007FF1219A500 1 1 KGLHD 856 recr 80 00 000007FF1219A857 --//000007FF1219A500+0x30 = 8792101725488 = 0x7ff1219a530,实际上正好等于@ fchaz 0x7ff1219a530的查询地址. --//如果熟悉以前学习可以知道开始的偏移0x30(48字节)就是handle address,顺便说一下我这里是测试环境, --//执行tpt fcha脚本会报错或者要么很慢,生产系统共享池很大,扫描会很慢.我使用我改写fchaz,取消执行前的警告提示. SYS@test> @ sharepool/shp4 7ff1219a530 0 SYS@test> @ prxx ============================== TEXT : parent handle address KGLHDADR : 000007FF1219A530 KGLHDPAR : 000007FF1219A530 C40 : 2d65f9b7af91ed4f KGLHDLMD : 2 KGLHDPMD : 0 KGLHDIVC : 0 KGLOBHD0 : 00 KGLOBHD6 : 00 KGLOBHS0 : 0 KGLOBHS6 : 0 KGLOBT16 : 0 N0_6_16 : 0 N20 : 0 KGLNAHSH : 3468241561 KGLOBT03 : KGLOBT09 : 0 PL/SQL procedure successfully completed. --//看到的对象是2d65f9b7af91ed4f,而不是DEPTXXX. --//注:如果生产系统执行以上语句可能报错或者很慢,实际上选择hash_value=3468241561,使用shp4x.sql脚本查询,就没有这个问题. --//2d65f9b7af91ed4f ,我检索发现正好对应sql语句的 full_hash_value(16)的后16位. --//取后8位计算 af91ed4f = 2945576271,正好对应sql语句的hash_value之. --//而且KGLNAHSH正好等于library cache: mutex X 的P1=3468241561 --//说明是library cache: mutex X导致大量library cache lock. $ ./sql_idz.sh 'select count(1) from deptxxx' 0 sql_text = select count(1) from deptxxx\0 full_hash_value(16) = AA0B8A5E997323CE2D65F9B7AF91ED4F hash_value(10) = 2945576271 sql_id(32) = 2utgtqyrt3vag sql_id(32) = 2utgtqyrt3vag sql_id(32) = 2utgtqyrt3vag SYS@test> @ lcl display library cache lock problem INST_ID SADDR HANDLE MOD REQ OBJECT SQL_ID HASH_VALUE KGLNAOBJ USER_NAME C50 ------- ---------------- ---------------- ---- ---- -------------------- ------------- -------------- ---------------- --------- -------------------------------------------------- 1 000007FF09EF26B8 000007FF1219A530 2 0 2d65f9b7af91ed4f 1sym80r7bkant 3468241561 2d65f9b7af91ed4f SCOTT alter system kill session '98,36220' immediate; display wait library cache lock SID SERIAL# SPID PID P_SERIAL# EVENT C50 --- -------------- ---- ------- -------------- ------------------ -------------------------------------------------- 264 53720 7040 27 34 library cache lock alter system kill session '264,53720' immediate; 18 58858 2812 64 14 library cache lock alter system kill session '18,58858' immediate; 97 45103 6488 65 9 library cache lock alter system kill session '97,45103' immediate; 176 47333 4044 66 12 library cache lock alter system kill session '176,47333' immediate; 262 33120 6588 67 10 library cache lock alter system kill session '262,33120' immediate; 17 50789 6304 68 9 library cache lock alter system kill session '17,50789' immediate; 98 36220 4768 69 8 library cache lock alter system kill session '98,36220' immediate; 258 12758 2828 71 23 library cache lock alter system kill session '258,12758' immediate; 8 rows selected. SYS@test> @ sql_id 1sym80r7bkant --SQL_ID = 1sym80r7bkant --//还是看不到sql语句.也许这是问题的难点,实际上对象是2d65f9b7af91ed4f. --//kill这些library cache lock进程没有任何用处,因为程序还在跑. --//实际上如果当时查询x$kglob视图,KGLOBT03=2utgtqyrt3vag,就可以获得sql语句,一看就明白问题在那里,重写建立表就ok了. SYS@test> select * from v$db_object_cache where name = '2d65f9b7af91ed4f' 2 @ prxx ============================== OWNER : $BUILD$ NAME : 2d65f9b7af91ed4f DB_LINK : NAMESPACE : SQL AREA BUILD TYPE : CURSOR SHARABLE_MEM : 0 LOADS : 0 EXECUTIONS : 0 LOCKS : 7 PINS : 0 KEPT : NO CHILD_LATCH : 76441 INVALIDATIONS : 0 HASH_VALUE : 3468241561 LOCK_MODE : SHARED PIN_MODE : NONE STATUS : UNKOWN TIMESTAMP : PREVIOUS_TIMESTAMP : LOCKED_TOTAL : 850974 PINNED_TOTAL : 0 PROPERTY : FULL_HASH_VALUE : 53270f034126313b1c7a6805ceb92a99 CON_ID : 3 CON_NAME : TEST01P ADDR : 000007FF1219A530 EDITION : PL/SQL procedure successfully completed. --//NAMESPACE=SQL AREA BUILD,select distinct kglhdnsp,kglhdnsd,kglobtyd from x$kglob where KGLHDNSP=82 order by 1; --//kglhdnsp=82 , 对应8进制是\01222 --//3468241561 % 2^17 = 76441,对应CHILD_LATCH =76441. --//注意OWNER=$BUILD$,我使用SCOTT计算sql_id不对.使用我以前写sql_idx.sh脚本. $ echo -n -e '2d65f9b7af91ed4f.$BUILD$.TEST01P' > cc.txt $ ./sql_idz.sh cc.txt 1 '\0122\0\0\0' sql_text = 2d65f9b7af91ed4f.$BUILD$.TEST01P\0122\0\0\0 full_hash_value(16) = 53270F034126313B1C7A6805CEB92A99 hash_value(10) = 3468241561 sql_id(32) = 1sym80r7bkant sql_id(32) = 1sym80r7bkant sql_id(32) = 1sym80r7bkant --//基本都能对上前面select * from v$db_object_cache where name = '2d65f9b7af91ed4f'的的输出. --//注:我改写我原来写的脚本,因为cat 文件里面包含chr(0),cat的输出就忽略掉. --//82 ascii码 R,注意使用dbms_sql_translator包计算不用加chr(0),这样对于表少1个chr(0). SELECT dbms_sql_translator.sql_id ('2d65f9b7af91ed4f.$BUILD$.TEST01PR'||chr(0)||chr(0)) c20, dbms_sql_translator.sql_hash ('2d65f9b7af91ed4f.$BUILD$.TEST01PR'||chr(0)||chr(0)) N20 FROM dual; C20 N20 -------------------- ---------- 1sym80r7bkant 3468241561 --//也能对上. 4.总结: --//我已经基本展示生产系统出现的问题,还是不明白为什么这样会产生大量的子光标. --//在分析sql语句时或许与产生大量子光标有关,在对象2d65f9b7af91ed4f上存在争用,NAMESPACE = SQL AREA BUILD,导致 --//出现library cache: mutex X,library cache lock. --//另外我做了测试,如果打开几个回话,手工执行select count(1) from deptxxx;当然不是并发执行,并没有产生多个子光标的情况. --//也许问题就出在密集的执行产生大量子光标上面. --//在测试前我以为阻塞的位置要么与sql语句的hash_value有关,要么与deptxxx对象有关,甚至在shared pool latch上存在争用, --//因为错误的sql语句每次存在硬解析,实际的情况都不是,而在NAMESPACE = SQL AREA BUILD,建立的对象上存在争用. --//还有一些细节我需要验证,好好消化,写的有点乱... --//我估计同事看到sql_id后查询不到对应sql语句,而不断的kill进程并不会有助于问题解决,在这上面浪费大量时间,而且实际上 --//阻塞的时间都很短. --//实际上只要查询底层的x$kglob的,KGLOBT03=2utgtqyrt3vag,就可以获得sql语句,一看就明白问题在那里,至少能发现表没有建立. --//实际上还可以查询v$db_object_cache的HASH_VALUE(它可以通过sql_id获得),获得对应sql语句,该视图的底层就是x$kglob. --//贴一段生产系统看到的情况. > @ dashtop event "event like 'library cache%'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 10:30:00'" Total Seconds AAS %This EVENT FIRST_SEEN LAST_SEEN --------- ------- ------- ------------------------------------------ ------------------- ------------------- 8164610 913.0 76% library cache lock 2024-04-26 08:23:22 2024-04-26 09:17:44 2587860 289.4 24% library cache: mutex X 2024-04-26 08:23:22 2024-04-26 09:17:44 10 .0 0% library cache pin 2024-04-26 10:00:40 2024-04-26 10:00:40 --//出现问题时间点2024-04-26 08:23:22,到2024-04-26 09:17:44问题解决.后面library cache pin仅仅出现1次,可以不用考虑. > @ dashtop event,sql_id "event='library cache lock'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 08:30:00'" Total Seconds AAS %This EVENT SQL_ID FIRST_SEEN LAST_SEEN --------- ------- ------- ------------------ ------------- ------------------- ------------------- 554750 318.3 96% library cache lock arw6k2rf5s29b 2024-04-26 08:23:22 2024-04-26 08:29:55 13880 8.0 2% library cache lock bakhcg5b1749p 2024-04-26 08:23:22 2024-04-26 08:29:55 2580 1.5 0% library cache lock ddd4xgabw2tct 2024-04-26 08:23:33 2024-04-26 08:29:55 2440 1.4 0% library cache lock 306sjy7n4m9bk 2024-04-26 08:23:22 2024-04-26 08:29:55 2300 1.3 0% library cache lock av7t0b1v7jn1m 2024-04-26 08:23:22 2024-04-26 08:29:55 440 .3 0% library cache lock 69zcqdq405z1h 2024-04-26 08:25:31 2024-04-26 08:29:55 420 .2 0% library cache lock dgvzbsr848vam 2024-04-26 08:25:09 2024-04-26 08:29:55 410 .2 0% library cache lock bg1t2pc5ym090 2024-04-26 08:24:15 2024-04-26 08:29:55 350 .2 0% library cache lock 0j7bs1g3fg66x 2024-04-26 08:23:33 2024-04-26 08:29:55 310 .2 0% library cache lock 7yhjdqcg8wzny 2024-04-26 08:23:22 2024-04-26 08:29:55 300 .2 0% library cache lock 4f03m8k9vgb6r 2024-04-26 08:23:43 2024-04-26 08:29:55 250 .1 0% library cache lock 9bsd8hfq3cf81 2024-04-26 08:23:54 2024-04-26 08:29:55 200 .1 0% library cache lock 5xw2vsz1868zh 2024-04-26 08:25:31 2024-04-26 08:29:55 20 .0 0% library cache lock 899c5c7ug4dp4 2024-04-26 08:23:22 2024-04-26 08:23:22 14 rows selected. > @ sqlhh arw6k2rf5s29b 100 time unit : millisecond BEGIN_INTERVAL_TIME INST_ID SQL_ID PLAN_HASH_VALUE EXECUTIONS ELA_MS_PER_EXEC CPU_MS_PER_EXEC ROWS_PER_EXEC LIOS_PER_EXEC BLKRD_PER_EXEC IOW_MS_PER_EXEC AVG_IOW_MS CLW_MS_PER_EXEC APW_MS_PER_EXEC CCW_MS_PER_EXEC ------------------- ---------- ------------- --------------- ---------- --------------- --------------- ------------- ------------- -------------- --------------- ----------- --------------- --------------- --------------- 2024-04-26 11:00:15 1 arw6k2rf5s29b 3673589541 135628 0 0 0.0 0 0 0 0.0 0 0 0 2024-04-26 12:00:17 1 arw6k2rf5s29b 3673589541 125952 0 0 0.0 0 0 0 0.0 0 0 0 2024-04-26 13:00:36 1 arw6k2rf5s29b 3673589541 117259 0 0 0.0 0 0 0 0.0 0 0 0 2024-04-26 14:00:13 1 arw6k2rf5s29b 3673589541 121757 0 0 0.0 0 0 0 0.0 0 0 0 2024-04-26 15:00:15 1 arw6k2rf5s29b 3673589541 131134 0 0 0.0 0 0 0 0.0 0 0 0 2024-04-26 16:00:35 1 arw6k2rf5s29b 3673589541 135313 0 0 0.0 0 0 0 0.0 0 0 0 ... 2024-04-29 02:00:20 1 arw6k2rf5s29b 2199259672 54386 0 0 0.0 0 0 0 0.0 0 0 0 2024-04-29 03:00:39 1 arw6k2rf5s29b 2199259672 52685 0 0 0.0 0 0 0 0.0 0 0 0 2024-04-29 04:00:08 1 arw6k2rf5s29b 2199259672 52485 0 0 0.0 0 0 0 0.0 0 0 0 2024-04-29 05:00:17 1 arw6k2rf5s29b 2199259672 52623 0 0 0.0 0 0 0 0.0 0 0 0 2024-04-29 06:00:37 1 arw6k2rf5s29b 2199259672 51764 0 0 0.0 0 0 0 0.0 0 0 0 2024-04-29 07:00:04 1 arw6k2rf5s29b 2199259672 56257 0 0 0.0 0 0 0 0.0 0 0 0 2024-04-29 09:00:35 1 arw6k2rf5s29b 2199259672 127381 0 0 0.0 0 0 0 0.0 0 0 0 47 rows selected. --//可以发现1个细节,我无法解析,9:17恢复系统,很奇怪的是9:17-11:00并没有大量执行.而之后有记录. --//要么就是9,10点执行次数很少,没有写入awr相关历史表中,要么就是这个过程中程序升级回退了. --//最奇怪的涉及查询返回0,我查看相关表现在的情况表中没有任何记录,似乎再2024-04-29 09:00:35之后某个时刻发现问题,不再执行. --//我估计取消了升级与改动,我们团队缺乏必要的交流与沟通,根本不知道开发做了什么. --//另外有1个小技巧可能便于辅助判断,就是执行计划的SQL_PLAN_HASH_VALUE是0. > @ dashtop sql_id,SQL_PLAN_HASH_VALUE "event like 'library cache%'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 09:30:00'" Total Seconds AAS %This SQL_ID SQL_PLAN_HASH_VALUE FIRST_SEEN LAST_SEEN --------- ------- ------- ------------- ------------------- ------------------- ------------------- 9898240 1852.6 92% arw6k2rf5s29b 0 2024-04-26 08:23:22 2024-04-26 09:17:32 409370 76.6 4% bakhcg5b1749p 0 2024-04-26 08:23:22 2024-04-26 09:17:44 81520 15.3 1% ddd4xgabw2tct 0 2024-04-26 08:23:33 2024-04-26 09:17:44 59940 11.2 1% 0j7bs1g3fg66x 0 2024-04-26 08:23:33 2024-04-26 09:17:44 53760 10.1 0% 306sjy7n4m9bk 0 2024-04-26 08:23:22 2024-04-26 09:17:32 40870 7.6 0% dgvzbsr848vam 823207121 2024-04-26 08:24:58 2024-04-26 09:17:44 37100 6.9 0% 69zcqdq405z1h 0 2024-04-26 08:25:20 2024-04-26 09:17:44 36280 6.8 0% 7yhjdqcg8wzny 0 2024-04-26 08:23:22 2024-04-26 09:17:44 32860 6.2 0% bg1t2pc5ym090 3916753348 2024-04-26 08:24:15 2024-04-26 09:17:44 31400 5.9 0% av7t0b1v7jn1m 0 2024-04-26 08:23:22 2024-04-26 09:17:32 17990 3.4 0% fuspz01qnu8hd 0 2024-04-26 08:52:28 2024-04-26 09:17:32 8800 1.6 0% 9bsd8hfq3cf81 0 2024-04-26 08:23:54 2024-04-26 09:17:44 8300 1.6 0% gc3s9xsr2v4td 0 2024-04-26 08:44:24 2024-04-26 09:17:44 7060 1.3 0% c4jmjmy1hsp7t 0 2024-04-26 08:56:21 2024-04-26 09:17:44 4850 .9 0% cyy3tk3dqxpf7 0 2024-04-26 08:33:49 2024-04-26 09:17:32 ... 35 rows selected. --//实际上是SQL_PLAN_HASH_VALUE=0,基本都是表不存在的情况. --//当然我不知道存在一种特殊情况,sql_id记录的是匿名PL/SQL块语句.这样SQL_PLAN_HASH_VALUE=0.我没有测试. 5.相关脚本: --//lcl.sql PROMPT PROMPT display library cache lock problem PROMPT column object format a20 column user_name format a20 SELECT inst_id ,kgllkses saddr ,kgllkhdl handle ,kgllkmod MOD ,kgllkreq REQ ,kglnaobj object ,KGLLKSQLID sql_id ,kglnahsh hash_value ,KGLNAOBJ ,user_name , 'alter system kill session ''' || s.sid || ',' || s.serial# || '''' || ' immediate;' c50 --,lock_a.* FROM x$kgllk lock_a, v$session s WHERE s.saddr = lock_a.kgllkses and kgllkmod > 0 AND EXISTS (SELECT lock_b.kgllkhdl FROM x$kgllk lock_b WHERE kgllkses IN (SELECT saddr FROM v$session WHERE event like 'library cache lock') /* blocked session */ --WHERE event like 'library cache pin') /* blocked session */ AND lock_a.kgllkhdl = lock_b.kgllkhdl AND kgllkreq > 0); PROMPT PROMPT display wait library cache lock PROMPT SELECT s.sid ,s.serial# ,p.spid ,p.pid ,p.serial# p_serial# ,s.event , 'alter system kill session ''' || s.sid || ',' || s.serial# || '''' || ' immediate;' c50 FROM v$session s, v$process p WHERE s.paddr = p.addr AND s.event= 'library cache lock'; --AND s.event= 'library cache pin'; --//shp4.sql column N0_6_16 format 99999999 column fcura_addrlen new_value _fcura_addrlen set termout off select vsize(addr)*2 fcura_addrlen from x$dual; set termout on SELECT DECODE (kglhdadr, kglhdpar, 'parent handle address', 'child handle address') text, kglhdadr, kglhdpar, substr(kglnaobj,1,40) c40, KGLHDLMD, KGLHDPMD, kglhdivc, kglobhd0, kglobhd6, kglobhs0,kglobhs6,kglobt16, kglobhs0+kglobhs6+kglobt16 N0_6_16, kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20, kglnahsh, kglobt03 , kglobt09 FROM x$kglob WHERE KGLHDPAR = lpad(upper('&1'), &_fcura_addrlen, '0') or KGLHDADR = lpad(upper('&1'), &_fcura_addrlen, '0') or KGLOBHD0 = lpad(upper('&1'), &_fcura_addrlen, '0') --or KGLOBHD1 = lpad(upper('&1'), &_fcura_addrlen, '0') --or KGLOBHD2 = lpad(upper('&1'), &_fcura_addrlen, '0') --or KGLOBHD3 = lpad(upper('&1'), &_fcura_addrlen, '0') --or KGLOBHD4 = lpad(upper('&1'), &_fcura_addrlen, '0') --or KGLOBHD5 = lpad(upper('&1'), &_fcura_addrlen, '0') or KGLOBHD6 = lpad(upper('&1'), &_fcura_addrlen, '0') or KGLOBT03 = lower('&1') or KGLNAHSH= &2; --//shp4x.sql column N0_6_16 format 99999999 SELECT /*+ USE_CONCAT(@"SEL$1" 8 OR_PREDICATES(1)) */ DECODE (kglhdadr, kglhdpar, 'parent handle address', 'child handle address') text, kglhdadr, kglhdpar, substr(kglnaobj,1,40) c40, KGLHDLMD, KGLHDPMD, kglhdivc, kglobhd0, kglobhd6, kglobhs0,kglobhs6,kglobt16, kglobhs0+kglobhs6+kglobt16 N0_6_16, kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20, kglnahsh, kglobt03, kglobt09 FROM x$kglob WHERE kglobt03 = lower('&1') or KGLNAHSH= &2; --//sql_idz.sh #! /bin/bash # calcucate sql_text of full_hash_value(16),hash_value(10),sql_id(32). # argv1 sql statement or sql of text file # argv2 flag: 0= sql statement 1=sql of text file for sqlplus 2=sql of text file for other 3=original # argv3 default = '\0' add tailstr odebug=${ODEBUG:-0} oflag=${2:-0} tailstr=${3:-'\0'} if [ $oflag -eq 0 ] then sql_text=${1}${tailstr} fi # sqlplus format sql_text if [ $oflag -eq 1 ] then sql_text="$( cat $1 | sed -e "s/ $//" -e "s/\s*$//" -e '$s/;$//')""${tailstr}" # sql_text="$( cat $1 | unix2dos | sed '$s/;\s*$//')"'\0' # sql_text="$( cat $1 | sed -e "s/ $//" -e sed '$s/;\s*$//')"'\0' # sql_text="$( cat $1 | sed '$s/;\s*$//')"'\0' fi # other format sql_text if [ $oflag -eq 2 ] then sql_text="$( cat $1 | sed '$s/;\s*$//')""${tailstr}" # sql_text="$( cat $1 | unix2dos | sed '$s/;\s*$//')"'\0' # sql_text="$( cat $1 | sed -e "s/ $//" -e '$s/;\s*$//')"'\0' # sql_text="$( cat $1 | sed '$s/;\s*//')"'\0' fi # exact_matching_signature, force_matching_signature if [ $oflag -eq 3 ] then sql_text=${1} fi v1=$(echo -e -n "$sql_text" | md5sum | sed 's/ -//' | xxd -r -p | od -t x4 | sed -n -e 's/^0\+ //' -e 's/ //gp' | tr 'a-z' 'A-Z') v2=${v1:(-16):16} v3=${v2:(-8):8} # v2=$(echo "obase=16;ibase=16; $v1 % 10000000000000000" | bc| tr -d '\\\r\n') # v3=$(echo "obase=10;ibase=16; $v1 % 100000000" | bc| tr -d '\\\r\n') if [ $odebug -eq 1 ] ; then echo v1=$v1 v2=$v2 v3=$v3 fi echo "sql_text = $sql_text" echo "full_hash_value(16) = $v1 or ${v1,,}" if [ $oflag -eq 3 ] ; then echo "xxxxx_matching_signature(10) = $(( 16#$v2 )) or " $(echo $(( 16#$v2 )) + 2^64|bc ) fi echo "hash_value(10) = $(( 16#$v3 )) " BASE32=($(echo {0..9} {a..z} | tr -d 'eilo')) res='' for i in $(echo "obase=32;ibase=16; $v2" | bc| tr -d '\\\r\n') do res=${res}${BASE32[$(( 10#$i ))]} done echo "sql_id(32) = $(printf "%13s" $res | tr ' ' '0')" echo "sql_id(32) = $(printf "%013s" $res)" res1=$(eval $(echo "obase=32;ibase=16; $v2" | bc| tr -d '\\\r\n' | awk 'BEGIN{RS=" +"; printf "echo " }/./{printf "${BASE32[$(( 10#%02d))]}", $1}' )) echo "sql_id(32) = $(printf "%013s" $res1)"
[20240512]没有建表引发的灾难.txt
来源:这里教程网
时间:2026-03-03 20:01:02
作者:
编辑推荐:
- [20240512]没有建表引发的灾难.txt03-03
- [20240513]ORA-38029 object statistics are locked.txt03-03
- [20240515]建立完善t2sh.sql脚本.txt03-03
- [20240514]建立完善vim bccalc_win.vim插件.txt03-03
- [20240516]建立任意进制转10进制脚本xto10.sql脚本.txt03-03
- [20240516]建立完善vim bccalc_linux.vim插件.txt03-03
- [20240516]关于v$db_object_cache视图.txt03-03
- [20240518]任意进制转换xtoy.sql脚本.txt03-03
下一篇:
相关推荐
-
雷神推出 MIX PRO II 迷你主机:基于 Ultra 200H,玻璃上盖 + ARGB 灯效
2 月 9 日消息,雷神 (THUNDEROBOT) 现已宣布推出基于英
-
制造商 Musnap 推出彩色墨水屏电纸书 Ocean C:支持手写笔、第三方安卓应用
2 月 10 日消息,制造商 Musnap 现已在海外推出一款 Oce
热文推荐
- 数据库管理-第186期 23ai:啥?我还能干掉Neo4j?(20240509)
- rac asm新增磁盘报0RA-15333或ORA-15075
rac asm新增磁盘报0RA-15333或ORA-15075
26-03-03 - Oracle RAC的排障案例一则
Oracle RAC的排障案例一则
26-03-03 - 测试开发新技能:Oracle到高斯数据库的无缝迁移
测试开发新技能:Oracle到高斯数据库的无缝迁移
26-03-03 - 因Oracle 23ai,甲骨文中国罕见的开了个会
因Oracle 23ai,甲骨文中国罕见的开了个会
26-03-03 - 数据库管理-第190期 备份堪比生死(20240515)
数据库管理-第190期 备份堪比生死(20240515)
26-03-03 - 数据库管理-第180期 23ai: Cloud/Container Plus AI(20240503)
- Oracle 23ai新特性—DBMS_DICTIONARY_CHECK
Oracle 23ai新特性—DBMS_DICTIONARY_CHECK
26-03-03 - oracle怎么处理json格式
oracle怎么处理json格式
26-03-03 - Oracle 23ai新特性—DB_DEVELOPER_ROLE
Oracle 23ai新特性—DB_DEVELOPER_ROLE
26-03-03
