OLAP 有 ClickBench,向量化有 VectorDBBench,那么数据湖就不能有一个 DataLakeBench?正可谓知己知彼,方能百战不殆。国庆自己整了一个 TPCH 100G 测试,来测试目前国内几个 AP 系统在湖上的查询能力。
这次只是一个初步摸底,为后续跑通 DataLakeBench 流程做准备。
测试数据集分为 ORC、Parquet 和 Iceberg + Parquet 三种。其中 Iceberg 里面我设置了分区列。
参与 benchmark 软件分别为:Trino 427,StarRocks 3.1.3,Apache Doris 2.0.1.1。均为发文时,最新的 Release 版本。
当前测试比较简单,机器只有一台,104 核,384G 内存,系统是 Ubuntu,数据存放在 HDFS 上面。
所有测试均为开箱即用,没有调过优。除了 Trino 稍微调了一下关于内存的参数,不然老查询失败,没啥测试的意义。
当前测试比较随意,结果仅供参考,不具备权威性。脚本数据集以后有空整理下,开个源。
Trino 427
ORC | Parquet | Iceberg-Parquet | |
---|---|---|---|
Q01 | 4.67978644371033 | 3.77954769134522 | 2.79405236244202 |
Q02 | 5.96442723274231 | 6.17221164703369 | 6.3358519077301 |
Q03 | 19.5121872425079 | 19.79079246521 | 21.6914899349213 |
Q04 | 5.210289478302 | 5.69352793693543 | 8.11815810203552 |
Q05 | 28.8093211650848 | 27.22141289711 | 11.0894706249237 |
Q06 | 2.45991969108582 | 2.85007929801941 | 0.376362085342407 |
Q07 | 28.3537654876709 | 28.0265860557556 | 4.02484941482544 |
Q08 | FAIL | FAIL | 5.69447374343872 |
Q09 | FAIL | FAIL | 40.3660924434662 |
Q10 | 3.88121962547302 | 3.73709321022034 | 2.79789590835571 |
Q11 | 6.43776345252991 | 6.57292699813843 | 2.36083197593689 |
Q12 | 5.22342753410339 | 5.65483951568604 | 2.10238742828369 |
Q13 | 8.30173945426941 | 7.96515226364136 | 7.00725865364075 |
Q14 | 3.24394512176514 | 2.96157336235046 | 0.921835422515869 |
Q15 | 4.90188980102539 | 5.73980617523193 | 2.05904507637024 |
Q16 | 45.7503273487091 | 44.814661026001 | 94.0362498760223 |
Q17 | 31.6631224155426 | 33.7504358291626 | 18.1692168712616 |
Q18 | 44.1904747486115 | 44.1336300373077 | 93.3035883903503 |
Q19 | 3.52585244178772 | 3.74158716201782 | 3.91571068763733 |
Q20 | 8.33001446723938 | 7.89795851707459 | 4.7635703086853 |
Q21 | FAIL | FAIL | 21.2177674770355 |
Q22 | 2.70866227149963 | 2.6376895904541 | 1.9974513053894 |
StarRocks 3.1.3
ORC | Parquet | Iceberg-Parquet | |
---|---|---|---|
Q01 | 2.80447483062744 | 2.71516346931458 | 2.35490465164185 |
Q02 | 2.67755961418152 | 2.31625986099243 | 10.0161077976227 |
Q03 | 12.2691857814789 | 13.0840027332306 | 12.8242733478546 |
Q04 | 2.08644437789917 | 2.49946022033691 | 2.34274172782898 |
Q05 | 2.57925128936768 | 5.56448793411255 | 5.14619517326355 |
Q06 | 1.97757029533386 | 1.1486554145813 | 0.231107234954834 |
Q07 | 2.56635522842407 | 2.98881196975708 | 2.36176466941834 |
Q08 | 3.03237748146057 | 2.65049457550049 | 2.9362211227417 |
Q09 | 8.62035155296326 | 3.91000962257385 | 3.93072724342346 |
Q10 | 2.77616024017334 | 2.45526075363159 | 1.96043992042542 |
Q11 | 0.588205337524414 | 0.643933296203613 | 0.620319366455078 |
Q12 | 2.63902044296265 | 1.35685443878174 | 0.989479780197144 |
Q13 | 1.78126502037048 | 1.97567868232727 | 1.96064805984497 |
Q14 | 2.24990963935852 | 1.56478404998779 | 0.290832996368408 |
Q15 | 4.64493179321289 | 3.38019061088562 | 0.498672723770142 |
Q16 | 8.83686685562134 | 8.92917680740356 | 9.22841262817383 |
Q17 | 5.81133794784546 | 2.18907308578491 | 4.35989594459534 |
Q18 | 8.77055954933167 | 8.55220222473145 | 8.95299363136292 |
Q19 | 2.50460815429688 | 1.58997964859009 | 1.98817658424377 |
Q20 | 2.86747312545776 | 1.76498198509216 | 1.56262278556824 |
Q21 | 5.28 | 4.43561387062073 | 7.31192994117737 |
Q22 | 0.75 | 0.658215761184692 | 0.918938159942627 |
Apache Doris 2.0.1.1
ORC | Parquet | Iceberg-Parquet | |
---|---|---|---|
Q01 | 2.96022772789001 | 2.51620554924011 | 4.44727396965027 |
Q02 | 1.62103414535522 | 1.62782716751099 | 21.0157468318939 |
Q03 | 12.0039019584656 | 11.794282913208 | 0.2110915184021(结果错了) |
Q04 | 1.48055839538574 | 1.44276428222656 | 0.675463199615479(结果错了) |
Q05 | FAIL | FAIL | 5.6227822303772(结果错了) |
Q06 | 1.4 | 2.53750324249268 | 0.00912237167358398(结果错了) |
Q07 | 3.04926323890686 | 8.26076364517212 | 43.5791993141174(结果错了) |
Q08 | 2.59419512748718 | 2.78373074531555 | 1.1178765296936(结果错了) |
Q09 | 7.07540774345398 | 5.54762649536133 | FAIL |
Q10 | 3.95090389251709 | 2.47946429252625 | 5.64428234100342 |
Q11 | 0.481706380844116 | 0.55328106880188 | 0.709768772125244(结果错了) |
Q12 | 1.55267477035522 | 1.3079845905304 | 0.300201892852783(结果错了) |
Q13 | 1.72986364364624 | 1.88720107078552 | 2.21859788894653 |
Q14 | 2.09376096725464 | 1.79414987564087 | 0.407549142837524(结果错了) |
Q15 | 2.98066115379334 | 2.90532398223877 | 0.438501834869385(结果错了) |
Q16 | 21.2221658229828 | 21.3568460941315 | FAIL |
Q17 | 2.40807604789734 | 2.33370995521545 | 7.85690379142761 |
Q18 | 21.1572012901306 | 21.4497358798981 | FAIL |
Q19 | 1.92541122436523 | 2.22820687294006 | 3.76530265808105 |
Q20 | 3.543625831604 | 2.97740173339844 | 1.35076451301575(结果错了) |
Q21 | 7.98912453651428 | 11.582316160202 | 14.4489989280701 |
Q22 | 0.651208162307739 | 0.479535102844238 | 2.10197758674622 |
汇总
ORC
Trino | StarRocks | Apache Doris | |
---|---|---|---|
Q01 | 4.67978644371033 | 2.80447483062744 | 2.96022772789001 |
Q02 | 5.96442723274231 | 2.67755961418152 | 1.62103414535522 |
Q03 | 19.5121872425079 | 12.2691857814789 | 12.0039019584656 |
Q04 | 5.210289478302 | 2.08644437789917 | 1.48055839538574 |
Q05 | 28.8093211650848 | 2.57925128936768 | FAIL |
Q06 | 2.45991969108582 | 1.97757029533386 | 1.4 |
Q07 | 28.3537654876709 | 2.56635522842407 | 3.04926323890686 |
Q08 | FAIL | 3.03237748146057 | 2.59419512748718 |
Q09 | FAIL | 8.62035155296326 | 7.07540774345398 |
Q10 | 3.88121962547302 | 2.77616024017334 | 3.95090389251709 |
Q11 | 6.43776345252991 | 0.588205337524414 | 0.481706380844116 |
Q12 | 5.22342753410339 | 2.63902044296265 | 1.55267477035522 |
Q13 | 8.30173945426941 | 1.78126502037048 | 1.72986364364624 |
Q14 | 3.24394512176514 | 2.24990963935852 | 2.09376096725464 |
Q15 | 4.90188980102539 | 4.64493179321289 | 2.98066115379334 |
Q16 | 45.7503273487091 | 8.83686685562134 | 21.2221658229828 |
Q17 | 31.6631224155426 | 5.81133794784546 | 2.40807604789734 |
Q18 | 44.1904747486115 | 8.77055954933167 | 21.1572012901306 |
Q19 | 3.52585244178772 | 2.50460815429688 | 1.92541122436523 |
Q20 | 8.33001446723938 | 2.86747312545776 | 3.543625831604 |
Q21 | FAIL | 5.28 | 7.98912453651428 |
Q22 | 2.70866227149963 | 0.75 | 0.651208162307739 |
Parquet
Trino | StarRocks | Apache Doris | |
---|---|---|---|
Q01 | 3.77954769134522 | 2.71516346931458 | 2.51620554924011 |
Q02 | 6.17221164703369 | 2.31625986099243 | 1.62782716751099 |
Q03 | 19.79079246521 | 13.0840027332306 | 11.794282913208 |
Q04 | 5.69352793693543 | 2.49946022033691 | 1.44276428222656 |
Q05 | 27.22141289711 | 5.56448793411255 | FAIL |
Q06 | 2.85007929801941 | 1.1486554145813 | 2.53750324249268 |
Q07 | 28.0265860557556 | 2.98881196975708 | 8.26076364517212 |
Q08 | FAIL | 2.65049457550049 | 2.78373074531555 |
Q09 | FAIL | 3.91000962257385 | 5.54762649536133 |
Q10 | 3.73709321022034 | 2.45526075363159 | 2.47946429252625 |
Q11 | 6.57292699813843 | 0.643933296203613 | 0.55328106880188 |
Q12 | 5.65483951568604 | 1.35685443878174 | 1.3079845905304 |
Q13 | 7.96515226364136 | 1.97567868232727 | 1.88720107078552 |
Q14 | 2.96157336235046 | 1.56478404998779 | 1.79414987564087 |
Q15 | 5.73980617523193 | 3.38019061088562 | 2.90532398223877 |
Q16 | 44.814661026001 | 8.92917680740356 | 21.3568460941315 |
Q17 | 33.7504358291626 | 2.18907308578491 | 2.33370995521545 |
Q18 | 44.1336300373077 | 8.55220222473145 | 21.4497358798981 |
Q19 | 3.74158716201782 | 1.58997964859009 | 2.22820687294006 |
Q20 | 7.89795851707459 | 1.76498198509216 | 2.97740173339844 |
Q21 | FAIL | 4.43561387062073 | 11.582316160202 |
Q22 | 2.6376895904541 | 0.658215761184692 | 0.479535102844238 |
Iceberg + Parquet
Trino | StarRocks | Apache Doris | |
---|---|---|---|
Q01 | 2.79405236244202 | 2.35490465164185 | 4.44727396965027 |
Q02 | 6.3358519077301 | 10.0161077976227 | 21.0157468318939 |
Q03 | 21.6914899349213 | 12.8242733478546 | 0.2110915184021(结果错了) |
Q04 | 8.11815810203552 | 2.34274172782898 | 0.675463199615479(结果错了) |
Q05 | 11.0894706249237 | 5.14619517326355 | 5.6227822303772(结果错了) |
Q06 | 0.376362085342407 | 0.231107234954834 | 0.00912237167358398(结果错了) |
Q07 | 4.02484941482544 | 2.36176466941834 | 43.5791993141174(结果错了) |
Q08 | 5.69447374343872 | 2.9362211227417 | 1.1178765296936(结果错了) |
Q09 | 40.3660924434662 | 3.93072724342346 | FAIL |
Q10 | 2.79789590835571 | 1.96043992042542 | 5.64428234100342 |
Q11 | 2.36083197593689 | 0.620319366455078 | 0.709768772125244(结果错了) |
Q12 | 2.10238742828369 | 0.989479780197144 | 0.300201892852783(结果错了) |
Q13 | 7.00725865364075 | 1.96064805984497 | 2.21859788894653 |
Q14 | 0.921835422515869 | 0.290832996368408 | 0.407549142837524(结果错了) |
Q15 | 2.05904507637024 | 0.498672723770142 | 0.438501834869385(结果错了) |
Q16 | 94.0362498760223 | 9.22841262817383 | FAIL |
Q17 | 18.1692168712616 | 4.35989594459534 | 7.85690379142761 |
Q18 | 93.3035883903503 | 8.95299363136292 | FAIL |
Q19 | 3.91571068763733 | 1.98817658424377 | 3.76530265808105 |
Q20 | 4.7635703086853 | 1.56262278556824 | 1.35076451301575(结果错了) |
Q21 | 21.2177674770355 | 7.31192994117737 | 14.4489989280701 |
Q22 | 1.9974513053894 | 0.918938159942627 | 2.10197758674622 |
总结
因为只有 StarRocks 跑通全部 SQL,所以没法对比三个系统的总查询时间。
机器规格过高,导致很多 SQL 执行的很快,以至于时间上面的差距你不清楚是波动,还是就是菜。下次改进下。
Apache Doris 的 Iceberg 很明显不成熟,我盲猜分区裁剪那里有比较大的 BUG,因为错的 SQL 基本都是返回的都是 empty。
原创文章,作者:Smith,如若转载,请注明出处:https://www.inlighting.org/archives/2023-10-1-datalake-benchmark