[20231012]如何查看unicode编码内容.txt

来源:这里教程网 时间:2026-03-03 19:00:28 作者:

[20231012]如何查看unicode编码内容.txt --//今天看一条语句的执行计划: SYS@192.168.100.235:1521/orcl> @ dpc dtt2yfpx8jhp4 '' '' PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID  dtt2yfpx8jhp4, child number 0 ------------------------------------- select *from  lis_sample_operate where barcode=:Barcode and EXECUTE_STATE_TEXT='锁定报告' Plan hash value: 2118786184 ---------------------------------------------------------------------------------------------------------------------- | Id  | Operation                           | Name                          | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | ---------------------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT                    |                               |        |       |    12 (100)|          | |*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| LIS_SAMPLE_OPERATE            |      1 |   127 |    12   (0)| 00:00:01 | |*  2 |   INDEX RANGE SCAN                  | IX_LIS_SAMPLE_OPERATE_BARCODE |      8 |       |     4   (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): -------------------------------------------------------------    1 - SEL$1 / LIS_SAMPLE_OPERATE@SEL$1    2 - SEL$1 / LIS_SAMPLE_OPERATE@SEL$1 Peeked Binds (identified by position): --------------------------------------    1 - :1 (VARCHAR2(30), CSID=852): '1801158810' Predicate Information (identified by operation id): ---------------------------------------------------    1 - filter("EXECUTE_STATE_TEXT"=U'\9501\5B9A\62A5\544A')    2 - access("BARCODE"=SYS_OP_C2C(:BARCODE)) --//注意过滤条件是U'\9501\5B9A\62A5\544A',当然查看语句很容易知道字符串的内容是'锁定报告'. --//换一句话,如果通过U'\9501\5B9A\62A5\544A'知道里面的内容呢? https://smarttechways.com/2023/05/29/nls_nchar_characterset-and-nls_characterset-define-in-oracle/ National Character Set(NLS_NCHAR_CHARACTERSET) defines the encoding of NCHAR, NVARCHAR2, and NCLOB columns and is in 9i and up consistently Unicode. eg. AL16UTF16 国家字符集(NLS_NCHAR_CHARACTERSET)定义了NCHAR、NVARCHAR2和NCLOB列的编码,并在9i中表示和一致的统一码。例如AL16UTF16 Character Set(NLS_CHARACTERSET) defines the encoding of CHAR, VARCHAR2, LONG, and CLOB columns, these can also be used for storing Unicode. eg AL32UTF8 or UTF8 字符集(NLS_CHARACTERSET)定义了CHAR、VARCHAR2、LONG和CLOB列的编码,也可以使用它们用于存储Unicode。如AL32UTF8或UTF8 --//注:感觉这里不对,clob列的编码好像也是AL16UTF16. The national Character set (NLS_NCHAR_CHARACTERSET) is used for data stored in NCHAR, NVARCHAR2, and NCLOB datatypes and is a character set that is defined in addition to the (standard) database character set (NLS_CHARACTERSET), which is used for CHAR, VARCHAR2, LONG and CLOB datatypes. 国家字符集(NLS_NCHAR_CHARACTERSET)用于存储在NCHAR、NVARCHAR2和NCLOB数据类型和中的数据,和是在(标准)数据库字符集 (NLS_CHARACTERSET)之外定义的字符集,即用于CHAR、VARCHAR2、LONG和CLOB数据类型。 SYS@192.168.100.235:1521/orcl> select parameter,value from NLS_DATABASE_PARAMETERS where parameter in ('NLS_CHARACTERSET','NLS_NCHAR_CHARACTERSET'); PARAMETER              VALUE ---------------------- --------- NLS_NCHAR_CHARACTERSET AL16UTF16 NLS_CHARACTERSET       ZHS16GBK --//我看了我以前的测试 [20221012]简单探究nvarchar2数据类型存储.txt BBED> x /rnxx *kdbr[1] rowdata[44]                                 @8136 ----------- flag@8136: 0x2c (KDRHFL, KDRHFF, KDRHFH) lock@8137: 0x00 cols@8138:    3 col    0[2] @8139: 46 col    1[7] @8142:  0x49  0x5f  0x55  0x53  0x45  0x52  0x31 --//I_USER1 col   2[14] @8150:  0x00  0x49  0x00  0x5f  0x00  0x55  0x00  0x53  0x00  0x45  0x00  0x52  0x00  0x31 --//你可以发现英文字符nvarchar2编码前面每个多了0x00,这样占用空间会加倍,并且0x00在前。 --//我个人不建议在应该中导出使用narchar2类型的,我讲过开发使用的可能原因,就是滥用后没有办法统一varchar2类型全部使用 --//nvarchar2类型。 --//使用windows的记事本写入如下内容: a锁定报告a --//另存为a.txt,编码选择unicode。 D:\>xxd -c 20 a.txt 0000000: fffe 6100 0195 9a5b a562 4a54 6100 0d00 0a00       ..a....[.bJTa..... --//U'\9501\5B9A\62A5\544A' --//a 的编码竟然是 6100,跟前面的测试反过来的,0x00在后,锁 前面看到编码9501 ,而现在看到的是0195 --//也就是实际上unicode big endian编码,oracle设计真的很奇葩!! --//前面的fffe 类似一种标识,估计表示文件是unicode编码文件,实际上可以猜测unicode big endian编码开头就是feff。 --//这样看U'\9501\5B9A\62A5\544A'表示怎么就简单了。 D:\>c:\windows\system32\echo feff 9501 5B9A 62A5 544A | xxd -r -p > b.txt --//在使用记事本打开b.txt就ok了。如果使用vim查看,可以这样设置: --//vim下可以这样操作,建立一个空文件,设置gvim如下,再调入b.txt,就可以查看里面的内容了: set bomb set fileencodings=ucs-bom,utf-8,cp936,big5,latin1 set encoding=utf-8 :r b.txt SYS@192.168.100.235:1521/orcl> select * from V$TRANSPORTABLE_PLATFORM; PLATFORM_ID PLATFORM_NAME                     ENDIAN_FORMAT      CON_ID ----------- --------------------------------- -------------- ----------           1 Solaris[tm] OE (32-bit)           Big                     0           2 Solaris[tm] OE (64-bit)           Big                     0           7 Microsoft Windows IA (32-bit)     Little                  0          10 Linux IA (32-bit)                 Little                  0           6 AIX-Based Systems (64-bit)        Big                     0           3 HP-UX (64-bit)                    Big                     0           5 HP Tru64 UNIX                     Little                  0           4 HP-UX IA (64-bit)                 Big                     0          11 Linux IA (64-bit)                 Little                  0          15 HP Open VMS                       Little                  0           8 Microsoft Windows IA (64-bit)     Little                  0           9 IBM zSeries Based Linux           Big                     0          13 Linux x86 64-bit                  Little                  0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~          16 Apple Mac OS                      Big                     0          12 Microsoft Windows x86 64-bit      Little                  0          17 Solaris Operating System (x86)    Little                  0          18 IBM Power Based Linux             Big                     0          19 HP IA Open VMS                    Little                  0          20 Solaris Operating System (x86-64) Little                  0          21 Apple Mac OS (x86-64)             Little                  0          22 Linux OS (S64)                    Big                     0 21 rows selected. --//顺便测试windows xp的记事本各种编码的情况,记录如下: --//ANSI编码                无开头标识,直接就是文件内容编码。 --//Unicode (Little)        开头标识fffe --//Unicode Big endian      开头标识feff --//UTF-8                   开头表示ef bb bf --//UTF-8保存的输出。 D:\>xxd -c 20 a.txt 0000000: efbb bf61 e994 81e5 ae9a e68a a5e5 918a 610d 0a      ...a............a.. --// 删除 定报告,UTF-8保存的输出。 D:\>xxd -c 20 a.txt 0000000: efbb bf61 e994 8161 0d0a                             ...a...a.. --//可以看出UTF-8 是一种可变长度的字符集,a -> 61, 锁->e9 94 81

相关推荐