mmap 是檔案資料被載入屬於 kernel space 的 cache buffer 後將其地址映射至 user space 的虛擬地址,不需將其複製至 user space 應用程式便可讀寫,一般認為可增進 I/O 的效能。
執行指令
cat /proc/<pid>/maps 可觀察某程序的 mmap 狀態,如下例第 3 行顯示檔案 test2.csv 內容被映射至該程序之虛擬地址 43376000-76e17000。00010000-00011000 r-xp 00000000 b3:07 393579 /home/pi/Downloads/mytest 00020000-00021000 rw-p 00000000 b3:07 393579 /home/pi/Downloads/mytest 43376000-76e17000 r--p 00000000 b3:07 393560 /home/pi/Downloads/test2.csv 76e17000-76f42000 r-xp 00000000 b3:07 402094 /lib/arm-linux-gnueabihf/libc-2.19.so 76f42000-76f52000 ---p 0012b000 b3:07 402094 /lib/arm-linux-gnueabihf/libc-2.19.so 76f52000-76f54000 r--p 0012b000 b3:07 402094 /lib/arm-linux-gnueabihf/libc-2.19.so 76f54000-76f55000 rw-p 0012d000 b3:07 402094 /lib/arm-linux-gnueabihf/libc-2.19.so 76f55000-76f58000 rw-p 00000000 00:00 0 76f6c000-76f71000 r-xp 00000000 b3:07 790330 /usr/lib/arm-linux-gnueabihf/libarmmem.so 76f71000-76f80000 ---p 00005000 b3:07 790330 /usr/lib/arm-linux-gnueabihf/libarmmem.so 76f80000-76f81000 rw-p 00004000 b3:07 790330 /usr/lib/arm-linux-gnueabihf/libarmmem.so 76f81000-76fa1000 r-xp 00000000 b3:07 400332 /lib/arm-linux-gnueabihf/ld-2.19.so 76fab000-76fb0000 rw-p 00000000 00:00 0 76fb0000-76fb1000 r--p 0001f000 b3:07 400332 /lib/arm-linux-gnueabihf/ld-2.19.so 76fb1000-76fb2000 rw-p 00020000 b3:07 400332 /lib/arm-linux-gnueabihf/ld-2.19.so 7eb79000-7eb9a000 rwxp 00000000 00:00 0 [stack] 7ed8e000-7ed8f000 r-xp 00000000 00:00 0 [sigpage] 7ed8f000-7ed90000 r--p 00000000 00:00 0 [vvar] 7ed90000-7ed91000 r-xp 00000000 00:00 0 [vdso] ffff0000-ffff1000 r-xp 00000000 00:00 0 [vectors]
但系統效能受諸多因素相互作用影響,mmap 有時不必然能有幫助,實應整體考量。
在此藉一 Java 程式簡易測試 mmap 讀取大檔案,並觀察比較在不同的系統條件下執行之結果。
測試程式片段:
// reading with memory mapped file...
private static int mapped(){
int c= 0;
log("mapped file processing...");
try{
RandomAccessFile f = new RandomAccessFile(filename,"r");
long pos = 0, len = f.length();
FileChannel fc = f.getChannel();
while (len > 0){
MappedByteBuffer buff = fc.map(MapMode.READ_ONLY, pos,
(len > Integer.MAX_VALUE)? Integer.MAX_VALUE:len);
log("%d, %d, %d",buff.remaining(),buff.position(),buff.limit());
while (buff.hasRemaining()){
byte b = buff.get();
if (b == '\n') c++;
}
len -= buff.limit();
pos += buff.limit();
}
fc.close();
f.close();
} catch(IOException e){
e.printStackTrace();
}
return c;
}
// reading with file stream...
private static int normal(){
int c = 0;
log("standard io processing...");
FileReader fr;
try {
fr = new FileReader(filename);
BufferedReader br = new BufferedReader(fr);
String r = br.readLine();
while (r != null){
++c;
r = br.readLine();
}
br.close();
fr.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return c;
}
準備了下列三個測試系統環境:
測試環境-A: Raspberry 3B, Raspbian, Oracle JDK, 1GB ram, 32GB MicroSD
測試環境-B: Rock64, Armbian, OpenJDK, 4GB ram, 64GB EMMC
測試環境-C: Rock64, Armbian, OpenJDK, 4GB ram, 32GB microSD
給與三種不同大小的檔案定義下列測試集:
T1 :測試檔案大小 866782000 bytes, file stream 讀取
T2 :測試檔案大小 1733564000 bytes, file stream 讀取
T3 :測試檔案大小 2783650000 bytes, file stream 讀取
T1':測試檔案大小 866782000 bytes, memory mapped file 讀取
T2':測試檔案大小 1733564000 bytes, memory mapped file 讀取
T3':測試檔案大小 2783650000 bytes, memory mapped file 讀取
執行測試程式時以 vmstat 收集數據:
vmstat -t -n 1|awk -Winteractive 'BEGIN {OFS=","} NR==2 {print $18,$3,$4,$5,$6,$7,$8,$9,$10,$13,$14,$16} NR>2 {print $19,$3,$4,$5,$6,$7,$8,$9,$10,$13,$14,$16}
若 vmstat 不支援 -t 選項時,可以下列指令取得時間欄:
vmstat -n 1 | awk -Winteractive 'BEGIN {OFS=","} NR==2 {print "time",$3,$4,$5,$6,$7,$8,$9,$10,$13,$14,$16} NR > 2 {cmd="date +%H:%M:%S"; cmd | getline t;print t,$3,$4,$5,$6,$7,$8,$9,$10,$13,$14,$16;close(cmd)}
測試執行收集到的數據彙整如下:
T1 T2 T3 T1' T2' T3'
filesize 866782000 1733564000 2783650000 866782000 1733564000 2783650000
A(RPi) --------- ---------- ---------- --------- ---------- ----------
sec 62 123 NA 62 122 NA
avg_bi 13714.90 13882.83 NA 13769.10 13959.90 NA
avg_us 20.21 20.32 NA 23.84 24.14 NA
avg_sy 2.47 2.41 NA 1.68 1.56 NA
avg_wa 3.19 3.65 NA 0.45 0.62 NA
B(Rock64)--------- ---------- ---------- --------- ---------- ----------
sec 13 24 54 9 17 27
avg_bi 60465.14 67717.44 51339.10 94052 99588.10 97086
avg_us 24.79 25.24 25.83 20.22 19.12 20.82
avg_sy 7.07 7.48 5.83 6.44 6.59 6.43
avg_wa 0.29 0.16 0.11 5.56 5.35 4.32
C(Rock64)--------- ---------- ---------- --------- ---------- ----------
sec 39 76 123 39 77 123
avg_bi 21566.97 22056.26 22081.50 21829.54 21882.49 22096.29
avg_us 7.13 6.63 10.68 3.46 3.34 3.91
avg_sy 1.85 1.95 1.90 1.05 1.00 1.03
avg_wa 17.31 17.71 12.97 20.97 20.77 20.31
========================================================================
T1'-T1 T2'-T2 T3'-T3
A(RPi) --------- --------- ---------
sec 0 -1(0.8%) NA
us +3.65 +3.82 NA
sy -0.79 -0.85 NA
wa -2.74 -3.03 NA
B(Rock64)--------- --------- ---------
sec -4(30.7%) -7(29.1%) -27(50%)
us -4.57 -6.12 -5.01
sy -0.63 -0.89 +0.6
wa +5.27 +5.19 +4.21
C(Rock64)--------- --------- ---------
sec 0 +1(-1.3%) 0
us -3.67 -3.29 -6.77
sy -0.85 -0.95 -0.87
wa +3.66 +3.06 +7.34
觀察測試數據發現,在[測試環境-B]的測試結果以 memory mapped file 讀取有預期的整體效能提升, block in 平均值增加,CPU 使用率分佈有些變化。
而在[測試環境-A]或[測試環境-C]看來整體效能和 block in 平均值幾乎沒有變化,儘管 CPU 使用率分佈也有改變。
以同為 Rock64 的[測試環境-B]與[測試環境-C]來比較,差異在儲存體:emmc vs. microSD。對此二者各執行指令
sysbench --test=fileio --file-test-mode=seqrd run 觀察得檔案讀取性能的極限分別為 105.06Mb/sec, 21.799Mb/sec。
----- fileio benchmark on system C: Rock64 with microSD -----
Operations performed: 131072 Read, 0 Write, 0 Other = 131072 Total
Read 2Gb Written 0b Total transferred 2Gb (21.799Mb/sec)
1395.16 Requests/sec executed
Test execution summary:
total time: 93.9476s
total number of events: 131072
total time taken by event execution: 93.8404
per-request statistics:
min: 0.01ms
avg: 0.72ms
max: 10.36ms
approx. 95 percentile: 5.64ms
Threads fairness:
events (avg/stddev): 131072.0000/0.00
execution time (avg/stddev): 93.8404/0.00
----- fileio benchmark on system B: Rock64 with emmc -----
Operations performed: 131072 Read, 0 Write, 0 Other = 131072 Total
Read 2Gb Written 0b Total transferred 2Gb (105.06Mb/sec)
6723.53 Requests/sec executed
Test execution summary:
total time: 19.4945s
total number of events: 131072
total time taken by event execution: 19.3889
per-request statistics:
min: 0.01ms
avg: 0.15ms
max: 13.04ms
approx. 95 percentile: 1.08ms
Threads fairness:
events (avg/stddev): 131072.0000/0.00
execution time (avg/stddev): 19.3889/0.00
[測試環境-B]在測試集(T1,T2,T3)以 file stream 讀檔時,block in 未達系統讀檔性能極限, 測試集(T1',T2',T3')採 memory mapped file 讀檔便有提升效能的空間。
[測試環境-C]在測試集(T1,T2,T3)以 file stream 讀檔時,block in 已近系統讀檔性能極限,memory mapped file 讀檔已無提升效能空間, 此時恐怕除了提升儲存裝置硬體性能別無他法。
沒有留言:
張貼留言