JuiceFS性能怎么样

本文主要记录用vdbench测试JuiceFS的性能表现情况及遇到的问题。vdbench的安装使用,请参考vdbench安装使用。本文的测试环境请见下文,另外也可以看看JuiceFS官方给出的性能数据:单机性能测试

测试环境

客户端

硬件

  • CPU: Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz

  • 内存: 128GB

软件

  • fio 3.12

  • juices version 1.0-dev (2021-11-03 d98cd00f)

1
2
3
juicefs -V
juicefs version 1.0-dev (2021-11-03 d98cd00f)
root@xxx:/home/xxx/juicefs# ls /mnt/jfs/

服务端

Redis

使用向DB申请的8GB的Redis环境

S3实例信息

机房S3实例信息略

4k小文件场景

本文中默认vdbench放在/home/xxx/software/vdbench目录下。

单机执行vdbench测试

单机4k小文件场景的配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#hd=default,vdbench=/home/xxx/software/vdbench/,user=root,shell=vdbench
#hd=hd1,system=xxx

fsd=fsd1,anchor=/mnt/jfs,depth=3,width=10,files=30,size=20M,openflags=o_direct

fwd=fwd1,fsd=fsd1,operation=write,xfersize=4k,fileio=random,fileselect=random,threads=10

fwd=fwd2,fsd=fsd1,operation=read,xfersize=4k,fileio=random,fileselect=random,threads=10

fwd=fwd3,fsd=fsd1,operation=write,xfersize=4k,fileio=sequential,fileselect=sequential,threads=10

fwd=fwd4,fsd=fsd1,operation=read,xfersize=4k,fileio=sequential,fileselect=sequential,threads=10

fwd=fwd5,fsd=fsd1,operation=read,rdpct=70,xfersize=4k,fileio=random,fileselect=random,threads=10

rd=rd1,fwd=fwd1,fwdrate=max,format=restart,elapsed=600,interval=2
rd=rd2,fwd=fwd2,fwdrate=max,format=restart,elapsed=600,interval=2
rd=rd3,fwd=fwd3,fwdrate=max,format=restart,elapsed=600,interval=2
rd=rd4,fwd=fwd4,fwdrate=max,format=restart,elapsed=600,interval=2
rd=rd5,fwd=fwd5,fwdrate=max,format=restart,elapsed=600,interval=2

准备好配置文件之后,就可以执行如下命令了:

1
2
3
4
5
6
7
8
9
root@xxx:/home/xxxx/software/vdbench# nohup /home/xxx/software/vdbench/vdbench -f /home/xxx/software/vdbench/conf/4k_test.cfg -o /home/xxx/software/vdbench/output_4K &
[1] 2026642
nohup: ignoring input and appending output to 'nohup.out'
root@xxx:/home/xxx/software/vdbench# ps -ef | grep vdbench
root 2026642 2008208 0 16:48 pts/1 00:00:00 /bin/bash /home/xxx/software/vdbench/vdbench -f /home/xxx/software/vdbench/conf/4k_test.cfg -o /home/xxx/software/vdbench/output_4K
root 2026651 2026642 19 16:48 pts/1 00:00:01 java -client -Xmx512m -Xms64m -cp /home/xxx/software/vdbench/:/home/xxx/software/vdbench/classes:/home/xxx/software/vdbench/vdbench.jar Vdb.Vdbmain -f /home/xxx/software/vdbench/conf/4k_test.cfg -o /home/xxx/software/vdbench/output_4K
root 2026688 2026651 0 16:48 pts/1 00:00:00 /bin/bash /home/xxx/software/vdbench/vdbench SlaveJvm -m localhost -n localhost-10-211109-16.48.06.714 -l localhost-0 -p 5570
root 2026692 2026688 99 16:48 pts/1 00:00:10 java -client -Xmx1024m -Xms128m -cp /home/xxx/software/vdbench/:/home/xxx/software/vdbench/classes:/home/xxx/software/vdbench/vdbench.jar Vdb.SlaveJvm SlaveJvm -m localhost -n localhost-10-211109-16.48.06.714 -l localhost-0 -p 5570
root 2026809 2008208 0 16:48 pts/1 00:00:00 grep vdbench

需要注意的是,在单机执行的时候,配置文件中不需要写host相关信息了。
如果配置了,很可能因为防火墙等原因会报如下错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
root@xxx:/home/xxx/software/vdbench# /home/xxx/software/vdbench/vdbench -f /home/xxx/software/vdbench/conf/4k_test.cfg -o /home/xxx/software/vdbench/output_4K

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Vdbench distribution: vdbench50407 Tue June 05 9:49:29 MDT 2018
For documentation, see 'vdbench.pdf'.

16:37:06.842 input argument scanned: '-f/home/xxx/software/vdbench/conf/4k_test.cfg'
16:37:06.843 input argument scanned: '-o/home/xxx/software/vdbench/output_4K'
16:37:06.898 Anchor size: anchor=/mnt/jfs: dirs: 1,110; files: 30,000; bytes: 585.938g (629,145,600,000)
16:37:06.936 Starting slave: /home/xxx/software/vdbench/vdbench SlaveJvm -m 7.32.198.214 -n xxx-10-211109-16.37.06.805 -l hd1-0 -p 5570
16:37:10.076 hd1-0 : 16:37:10.076 common.failure(): System.exit(-99)
16:37:11.088
16:37:11.088 Slave hd1-0 prematurely terminated.
16:37:11.088 Look at file hd1-0.stdout.html for more information.
16:37:11.589
16:37:11.589 Slave hd1-0 prematurely terminated.
16:37:11.589
java.lang.RuntimeException: Slave hd1-0 prematurely terminated.
at Vdb.common.failure(common.java:350)
at Vdb.SlaveStarter.startSlave(SlaveStarter.java:188)
at Vdb.SlaveStarter.run(SlaveStarter.java:47)
root@xxx:/home/xxx/software/vdbench#
root@xxx:/home/xxx/software/vdbench#
root@xxx:/home/xxx/software/vdbench# cat output_4K/hd1-0.stdout.html

<title>Vdbench output_4K/hd1-0.stdout.html</title><pre>
stdout/stderr for slave=hd1-0

16:37:07.010 16:37:07.009 SlaveJvm execution parameter: '-m 7.32.198.214'
16:37:07.010 16:37:07.009 SlaveJvm execution parameter: '-n xxx-10-211109-16.37.06.805'
16:37:07.010 16:37:07.009 SlaveJvm execution parameter: '-l hd1-0'
16:37:07.010 16:37:07.010 SlaveJvm execution parameter: '-p 5570'
16:37:07.010 16:37:07.010 SlaveJvm positional parameter: 'SlaveJvm'
16:37:10.067 16:37:10.067 java.net.ConnectException
16:37:10.067 java.net.ConnectException: Connection timed out (Connection timed out)
16:37:10.068 at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
16:37:10.068 at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
16:37:10.068 at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
#hd=default,vdbench=/home/xxx/software/vdbench/,user=root,shell=vdbench
16:37:10.068 at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
16:37:10.068 at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
16:37:10.068 at java.base/java.net.Socket.connect(Socket.java:609)
16:37:10.068 at java.base/java.net.Socket.connect(Socket.java:558)
16:37:10.068 at java.base/java.net.Socket.<init>(Socket.java:454)
16:37:10.068 at java.base/java.net.Socket.<init>(Socket.java:231)
16:37:10.069 at Vdb.SlaveSocket.<init>(SlaveSocket.java:61)
16:37:10.069 at Vdb.SlaveJvm.connectToMaster(SlaveJvm.java:98)
16:37:10.069 at Vdb.SlaveJvm.main(SlaveJvm.java:425)
16:37:10.069 java.net.ConnectException: Connection timed out (Connection timed out)
16:37:10.069 at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
16:37:10.069 at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
16:37:10.069 at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
16:37:10.069 at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
16:37:10.069 at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
16:37:10.069 at java.base/java.net.Socket.connect(Socket.java:609)
16:37:10.070 at java.base/java.net.Socket.connect(Socket.java:558)
16:37:10.070 at java.base/java.net.Socket.<init>(Socket.java:454)
16:37:10.070 at java.base/java.net.Socket.<init>(Socket.java:231)
16:37:10.070 at Vdb.SlaveSocket.<init>(SlaveSocket.java:61)
16:37:10.070 at Vdb.SlaveJvm.connectToMaster(SlaveJvm.java:98)
16:37:10.070 at Vdb.SlaveJvm.main(SlaveJvm.java:425)
#hd=default,vdbench=/home/xxx/software/vdbench/,user=root,shell=vdbench
16:37:10.070 16:37:10.068 Slave ConnectException.
16:37:10.070 16:37:10.069
16:37:10.070 16:37:10.069 It took at least 60 seconds to connect. SlaveJvm terminated
16:37:10.070 16:37:10.070
16:37:10.070 java.lang.RuntimeException: It took at least 60 seconds to connect. SlaveJvm terminated
16:37:10.070 at Vdb.common.failure(common.java:350)
16:37:10.070 at Vdb.SlaveJvm.connectToMaster(SlaveJvm.java:113)
16:37:10.071 at Vdb.SlaveJvm.main(SlaveJvm.java:425)
16:37:10.076 16:37:10.076 Trying to send message 'It took at least 60 seconds to connect. SlaveJvm terminated' to master, but we have no socket yet:
16:37:10.076 16:37:10.076 common.failure(): System.exit(-99)

测试结果

负载名称 写IOPS 写时延/ms 写带宽MB/s 读IOPS 读时延/ms 读带宽MB/s
rd1 随机写 2316.6 3.939 9.05 - - -
rd2 随机读 - - - 26279 0.376 102.6
rd3 顺序写 211550 0.025 826.3 - - -
rd4 顺序读 - - - 57102 0.173 223.0
rd5 随机混合 1103.1 0.045 14.37 2576.0 3.855 4.31

4M大文件场景

单机执行vdbench测试

配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
fsd=fsd1,anchor=/mnt/jfs,depth=3,width=10,files=30,size=20M,openflags=o_direct

fwd=fwd1,fsd=fsd1,operation=write,xfersize=4M,fileio=random,fileselect=random,threads=10

fwd=fwd2,fsd=fsd1,operation=read,xfersize=4M,fileio=random,fileselect=random,threads=10

fwd=fwd3,fsd=fsd1,operation=write,xfersize=4M,fileio=sequential,fileselect=sequential,threads=10

fwd=fwd4,fsd=fsd1,operation=read,xfersize=4M,fileio=sequential,fileselect=sequential,threads=10

fwd=fwd5,fsd=fsd1,operation=read,rdpct=70,xfersize=4M,fileio=random,fileselect=random,threads=10

rd=rd1,fwd=fwd1,fwdrate=max,format=restart,elapsed=600,interval=2
rd=rd2,fwd=fwd2,fwdrate=max,format=restart,elapsed=600,interval=2
rd=rd3,fwd=fwd3,fwdrate=max,format=restart,elapsed=600,interval=2
rd=rd4,fwd=fwd4,fwdrate=max,format=restart,elapsed=600,interval=2
rd=rd5,fwd=fwd5,fwdrate=max,format=restart,elapsed=600,interval=2

测试结果:

负载名称 写IOPS 写时延/ms 写带宽MB/s 读IOPS 读时延/ms 读带宽MB/s
rd1 随机写 224.1 2.615 896.3 - - -
rd2 随机读 - - - 303.7 32.758 1214
rd3 顺序写 224.7 2.527 898.7 - - -
rd4 顺序读 - - - 361.6 27.534 1446
rd5 随机混合 59.6 3.022 238.2 137.4 61.330 549.4

遇到的错误

java.lang.RuntimeException: Slave hd1-0 prematurely terminated.

该错误上面已经提到过了,主要是防火墙的问题,所以会报错。
解决办法是:直接不填hd相关信息,默认就是单机执行,会自行拉起相关进程,通过

1
ps -ef | grep vdbench

可以看到:

1
2
3
4
5
6
root@xxx:/home/xxx/software/vdbench# ps -ef | grep vdbench
root 2026642 2008208 0 16:48 pts/1 00:00:00 /bin/bash /home/xxx/software/vdbench/vdbench -f /home/xxx/software/vdbench/conf/4k_test.cfg -o /home/xxx/software/vdbench/output_4K
root 2026651 2026642 1 16:48 pts/1 00:00:07 java -client -Xmx512m -Xms64m -cp /home/xxx/software/vdbench/:/home/xxx/software/vdbench/classes:/home/xxx/software/vdbench/vdbench.jar Vdb.Vdbmain -f /home/xxx/software/vdbench/conf/4k_test.cfg -o /home/xxx/software/vdbench/output_4K
root 2026688 2026651 0 16:48 pts/1 00:00:00 /bin/bash /home/xxx/software/vdbench/vdbench SlaveJvm -m localhost -n localhost-10-211109-16.48.06.714 -l localhost-0 -p 5570
root 2026692 2026688 20 16:48 pts/1 00:02:15 java -client -Xmx1024m -Xms128m -cp /home/xxx/software/vdbench/:/home/xxx/software/vdbench/classes:/home/xxx/software/vdbench/vdbench.jar Vdb.SlaveJvm SlaveJvm -m localhost -n localhost-10-211109-16.48.06.714 -l localhost-0 -p 5570
root 2031420 2008208 0 16:58 pts/1 00:00:00 grep vdbench

Make sure you also specify ‘format=yes’ in the Run Definition (RD)

在多次测试时候,可能那个目录深度和文件数量都有变量,于是遇到如下错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
root@xxx:/home/xxx/software/vdbench# /home/xxx/software/vdbench/vdbench -f /home/xxx/software/vdbench/conf/4k_test_multi_machine_4M.cfg -o /home/xxx/software/vdbench/output_4K_multi_machine_4M

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Vdbench distribution: vdbench50407 Tue June 05 9:49:29 MDT 2018
For documentation, see 'vdbench.pdf'.

16:44:32.484 input argument scanned: '-f/home/xxx/software/vdbench/conf/4k_test_multi_machine_4M.cfg'
16:44:32.485 input argument scanned: '-o/home/xxx/software/vdbench/output_4K_multi_machine_4M'
16:44:32.742 Anchor size: anchor=/mnt/jfs/test1: dirs: 11,110; files: 400,000; bytes: 1.526t (1,677,721,600,000)
16:44:32.748 Anchor size: anchor=/mnt/jfs/test2: dirs: 11,110; files: 400,000; bytes: 1.526t (1,677,721,600,000)
16:44:32.752 Anchor size: anchor=/mnt/jfs/test3: dirs: 11,110; files: 400,000; bytes: 1.526t (1,677,721,600,000)
16:44:32.805 Estimated totals for all 3 anchors: dirs: 33,330; files: 1,200,000; bytes: 4.578t
16:44:32.842 Starting slave: /home/xxx/software/vdbench/vdbench SlaveJvm -m 7.32.198.214 -n xxx-10-211110-16.44.32.445 -l hd1-0 -p 5570
16:44:32.863 Starting slave: /home/xxx/software/vdbench/vdbench SlaveJvm -m 7.32.198.214 -n xxx-11-211110-16.44.32.445 -l hd2-0 -p 5570
16:44:32.869 Successfully connected to the Vdbench rsh daemon on host xxx
16:44:32.869 RSH Connection to xxx using port 5560 successful
16:44:32.883 Starting slave: /home/xxx/software/vdbench/vdbench SlaveJvm -m 7.32.198.214 -n xxx-12-211110-16.44.32.445 -l hd3-0 -p 5570
16:44:32.883 Successfully connected to the Vdbench rsh daemon on host xxx
16:44:32.883 RSH Connection to xxx using port 5560 successful
16:44:33.098 All slaves are now connected
16:44:33.327 hd2-0:

fwd=format
when=end
old depth=3; new depth=4
old width=10; new width=10
old files=30; new files=40
old dist=bottom; new dist=bottom
also check the sizes=() parameters from previous and current execution.
The FWD parameters defined for 'fwd=format' do not
match the parameters used in the previous run.

- Correct the parameters, or
- use the 'format=' RD parameter, or
- Add '-c' execution parameter
Make sure you also specify 'format=yes' in the Run Definition (RD)
16:44:33.329 hd3-0:

fwd=format
when=end
old depth=3; new depth=4
old width=10; new width=10
old files=30; new files=40
old dist=bottom; new dist=bottom
also check the sizes=() parameters from previous and current execution.
The FWD parameters defined for 'fwd=format' do not
match the parameters used in the previous run.

- Correct the parameters, or
- use the 'format=' RD parameter, or
- Add '-c' execution parameter
Make sure you also specify 'format=yes' in the Run Definition (RD)
16:44:33.337 hd1-0:

fwd=format
when=end
old depth=3; new depth=4
old width=10; new width=10
old files=30; new files=40
old dist=bottom; new dist=bottom
also check the sizes=() parameters from previous and current execution.
The FWD parameters defined for 'fwd=format' do not
match the parameters used in the previous run.

- Correct the parameters, or

- use the 'format=' RD parameter, or

- Add '-c' execution parameter
Make sure you also specify 'format=yes' in the Run Definition (RD)
16:44:33.338 *
16:44:33.338 ****************************************************
16:44:33.338 * Slave hd2-0 aborting: Parameter definition error *
16:44:33.338 ****************************************************
16:44:33.338 *
16:44:33.338 Slave hd1-0 killed by master
16:44:33.338 Slave hd3-0 killed by master
16:44:35.657
16:44:35.657 Slave hd2-0 prematurely terminated.
16:44:35.657
16:44:35.657 Slave aborted. Abort message received:
16:44:35.657 Parameter definition error
16:44:35.657
16:44:35.657 Look at file hd2-0.stdout.html for more information.
16:44:36.161
16:44:36.161 Slave hd2-0 prematurely terminated.
16:44:36.161
java.lang.RuntimeException: Slave hd2-0 prematurely terminated.
at Vdb.common.failure(common.java:350)
at Vdb.SlaveStarter.startSlave(SlaveStarter.java:188)
at Vdb.SlaveStarter.run(SlaveStarter.java:47)

错误提示说得很明显了,解决办法就是在rd中加上“ format=yes”

  • format= 可选值为yes、no或restart,标识预处理目录和文件结构的方式
    –yes表示删除目录和文件结构再重新创建
    –no表示不删除目录和文件结构
    –restart表示只创建未生成的目录或文件,并且增大未达到实际大小的文件

本来预期一个小时的测试,实际时间花了几个小时

在测试多机多线程(大文件4M)场景时,实际测试完成时间花了几个小时,本来预期是一个小时完成的。
经研究发现跑测试的时候确实是一个小时,而每次测试完成后,因为选的运行参数format设置为yes,这样每次都会重新去准备测试环境,包括清理掉上次的环境,但是此时由于环境中有大量小文件,删除耗时很长,我自己尝试删除了一下,很久才完全删除。这就是实际整个测试花了几个小时的原因。

参考资料

vdbench安装使用

如果你觉得本文对你有帮助,欢迎打赏