mmap 是檔案資料被載入屬於 kernel space 的 cache buffer 後將其地址映射至 user space 的虛擬地址,不需將其複製至 user space 應用程式便可讀寫,一般認為可增進 I/O 的效能。
執行指令
cat /proc/<pid>/maps
可觀察某程序的 mmap 狀態,如下例第 3 行顯示檔案 test2.csv 內容被映射至該程序之虛擬地址 43376000-76e17000。00010000-00011000 r-xp 00000000 b3:07 393579 /home/pi/Downloads/mytest 00020000-00021000 rw-p 00000000 b3:07 393579 /home/pi/Downloads/mytest 43376000-76e17000 r--p 00000000 b3:07 393560 /home/pi/Downloads/test2.csv 76e17000-76f42000 r-xp 00000000 b3:07 402094 /lib/arm-linux-gnueabihf/libc-2.19.so 76f42000-76f52000 ---p 0012b000 b3:07 402094 /lib/arm-linux-gnueabihf/libc-2.19.so 76f52000-76f54000 r--p 0012b000 b3:07 402094 /lib/arm-linux-gnueabihf/libc-2.19.so 76f54000-76f55000 rw-p 0012d000 b3:07 402094 /lib/arm-linux-gnueabihf/libc-2.19.so 76f55000-76f58000 rw-p 00000000 00:00 0 76f6c000-76f71000 r-xp 00000000 b3:07 790330 /usr/lib/arm-linux-gnueabihf/libarmmem.so 76f71000-76f80000 ---p 00005000 b3:07 790330 /usr/lib/arm-linux-gnueabihf/libarmmem.so 76f80000-76f81000 rw-p 00004000 b3:07 790330 /usr/lib/arm-linux-gnueabihf/libarmmem.so 76f81000-76fa1000 r-xp 00000000 b3:07 400332 /lib/arm-linux-gnueabihf/ld-2.19.so 76fab000-76fb0000 rw-p 00000000 00:00 0 76fb0000-76fb1000 r--p 0001f000 b3:07 400332 /lib/arm-linux-gnueabihf/ld-2.19.so 76fb1000-76fb2000 rw-p 00020000 b3:07 400332 /lib/arm-linux-gnueabihf/ld-2.19.so 7eb79000-7eb9a000 rwxp 00000000 00:00 0 [stack] 7ed8e000-7ed8f000 r-xp 00000000 00:00 0 [sigpage] 7ed8f000-7ed90000 r--p 00000000 00:00 0 [vvar] 7ed90000-7ed91000 r-xp 00000000 00:00 0 [vdso] ffff0000-ffff1000 r-xp 00000000 00:00 0 [vectors]
但系統效能受諸多因素相互作用影響,mmap 有時不必然能有幫助,實應整體考量。
在此藉一 Java 程式簡易測試 mmap 讀取大檔案,並觀察比較在不同的系統條件下執行之結果。
測試程式片段:
// reading with memory mapped file... private static int mapped(){ int c= 0; log("mapped file processing..."); try{ RandomAccessFile f = new RandomAccessFile(filename,"r"); long pos = 0, len = f.length(); FileChannel fc = f.getChannel(); while (len > 0){ MappedByteBuffer buff = fc.map(MapMode.READ_ONLY, pos, (len > Integer.MAX_VALUE)? Integer.MAX_VALUE:len); log("%d, %d, %d",buff.remaining(),buff.position(),buff.limit()); while (buff.hasRemaining()){ byte b = buff.get(); if (b == '\n') c++; } len -= buff.limit(); pos += buff.limit(); } fc.close(); f.close(); } catch(IOException e){ e.printStackTrace(); } return c; } // reading with file stream... private static int normal(){ int c = 0; log("standard io processing..."); FileReader fr; try { fr = new FileReader(filename); BufferedReader br = new BufferedReader(fr); String r = br.readLine(); while (r != null){ ++c; r = br.readLine(); } br.close(); fr.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } return c; }
準備了下列三個測試系統環境:
測試環境-A: Raspberry 3B, Raspbian, Oracle JDK, 1GB ram, 32GB MicroSD
測試環境-B: Rock64, Armbian, OpenJDK, 4GB ram, 64GB EMMC
測試環境-C: Rock64, Armbian, OpenJDK, 4GB ram, 32GB microSD
給與三種不同大小的檔案定義下列測試集:
T1 :測試檔案大小 866782000 bytes, file stream 讀取
T2 :測試檔案大小 1733564000 bytes, file stream 讀取
T3 :測試檔案大小 2783650000 bytes, file stream 讀取
T1':測試檔案大小 866782000 bytes, memory mapped file 讀取
T2':測試檔案大小 1733564000 bytes, memory mapped file 讀取
T3':測試檔案大小 2783650000 bytes, memory mapped file 讀取
執行測試程式時以 vmstat 收集數據:
vmstat -t -n 1|awk -Winteractive 'BEGIN {OFS=","} NR==2 {print $18,$3,$4,$5,$6,$7,$8,$9,$10,$13,$14,$16} NR>2 {print $19,$3,$4,$5,$6,$7,$8,$9,$10,$13,$14,$16}
若 vmstat 不支援 -t 選項時,可以下列指令取得時間欄:
vmstat -n 1 | awk -Winteractive 'BEGIN {OFS=","} NR==2 {print "time",$3,$4,$5,$6,$7,$8,$9,$10,$13,$14,$16} NR > 2 {cmd="date +%H:%M:%S"; cmd | getline t;print t,$3,$4,$5,$6,$7,$8,$9,$10,$13,$14,$16;close(cmd)}
測試執行收集到的數據彙整如下:
T1 T2 T3 T1' T2' T3' filesize 866782000 1733564000 2783650000 866782000 1733564000 2783650000 A(RPi) --------- ---------- ---------- --------- ---------- ---------- sec 62 123 NA 62 122 NA avg_bi 13714.90 13882.83 NA 13769.10 13959.90 NA avg_us 20.21 20.32 NA 23.84 24.14 NA avg_sy 2.47 2.41 NA 1.68 1.56 NA avg_wa 3.19 3.65 NA 0.45 0.62 NA B(Rock64)--------- ---------- ---------- --------- ---------- ---------- sec 13 24 54 9 17 27 avg_bi 60465.14 67717.44 51339.10 94052 99588.10 97086 avg_us 24.79 25.24 25.83 20.22 19.12 20.82 avg_sy 7.07 7.48 5.83 6.44 6.59 6.43 avg_wa 0.29 0.16 0.11 5.56 5.35 4.32 C(Rock64)--------- ---------- ---------- --------- ---------- ---------- sec 39 76 123 39 77 123 avg_bi 21566.97 22056.26 22081.50 21829.54 21882.49 22096.29 avg_us 7.13 6.63 10.68 3.46 3.34 3.91 avg_sy 1.85 1.95 1.90 1.05 1.00 1.03 avg_wa 17.31 17.71 12.97 20.97 20.77 20.31 ======================================================================== T1'-T1 T2'-T2 T3'-T3 A(RPi) --------- --------- --------- sec 0 -1(0.8%) NA us +3.65 +3.82 NA sy -0.79 -0.85 NA wa -2.74 -3.03 NA B(Rock64)--------- --------- --------- sec -4(30.7%) -7(29.1%) -27(50%) us -4.57 -6.12 -5.01 sy -0.63 -0.89 +0.6 wa +5.27 +5.19 +4.21 C(Rock64)--------- --------- --------- sec 0 +1(-1.3%) 0 us -3.67 -3.29 -6.77 sy -0.85 -0.95 -0.87 wa +3.66 +3.06 +7.34
觀察測試數據發現,在[測試環境-B]的測試結果以 memory mapped file 讀取有預期的整體效能提升, block in 平均值增加,CPU 使用率分佈有些變化。
而在[測試環境-A]或[測試環境-C]看來整體效能和 block in 平均值幾乎沒有變化,儘管 CPU 使用率分佈也有改變。
以同為 Rock64 的[測試環境-B]與[測試環境-C]來比較,差異在儲存體:emmc vs. microSD。對此二者各執行指令
sysbench --test=fileio --file-test-mode=seqrd run
觀察得檔案讀取性能的極限分別為 105.06Mb/sec, 21.799Mb/sec。
----- fileio benchmark on system C: Rock64 with microSD ----- Operations performed: 131072 Read, 0 Write, 0 Other = 131072 Total Read 2Gb Written 0b Total transferred 2Gb (21.799Mb/sec) 1395.16 Requests/sec executed Test execution summary: total time: 93.9476s total number of events: 131072 total time taken by event execution: 93.8404 per-request statistics: min: 0.01ms avg: 0.72ms max: 10.36ms approx. 95 percentile: 5.64ms Threads fairness: events (avg/stddev): 131072.0000/0.00 execution time (avg/stddev): 93.8404/0.00 ----- fileio benchmark on system B: Rock64 with emmc ----- Operations performed: 131072 Read, 0 Write, 0 Other = 131072 Total Read 2Gb Written 0b Total transferred 2Gb (105.06Mb/sec) 6723.53 Requests/sec executed Test execution summary: total time: 19.4945s total number of events: 131072 total time taken by event execution: 19.3889 per-request statistics: min: 0.01ms avg: 0.15ms max: 13.04ms approx. 95 percentile: 1.08ms Threads fairness: events (avg/stddev): 131072.0000/0.00 execution time (avg/stddev): 19.3889/0.00
[測試環境-B]在測試集(T1,T2,T3)以 file stream 讀檔時,block in 未達系統讀檔性能極限, 測試集(T1',T2',T3')採 memory mapped file 讀檔便有提升效能的空間。
[測試環境-C]在測試集(T1,T2,T3)以 file stream 讀檔時,block in 已近系統讀檔性能極限,memory mapped file 讀檔已無提升效能空間, 此時恐怕除了提升儲存裝置硬體性能別無他法。
沒有留言:
張貼留言