Trino / StarRocks 阿里云 EMR Kerberos 认证指南

Kerberos authentication

Kerberos 是最为头疼的鉴权配置,但是 Hadoop 全家桶绕不开,只能硬着头皮干了。本文以 Trino 和 StarRocks 为例,讲述如何在非 EMR 的节点上,通过一系列魔幻配置连上阿里云 EMR 的 Kerberos。StarRocks 和 Trino 的配置风格有点不同,Trino 因为在 catalog properties 已经暴露了 Kerberos 相关的配置,所以可以替代部分 xxx-site.xml 里面的内容。而 StarRocks 因为什么 Kerberos 接口都没暴露,只能纯靠 xxx-site.xml 进行配置,所以 StarRocks 这套配置方法理论上可以应用于所有调 Hadoop 包的软件上。

购买一个新的 EMR 和一台测试 ECS 服务器

首先,你得保证创建的时候,EMR 和 ECS 网络是在同一个安全组里面,要是网都不通的话,就别瞎搞了。

拿到新的 EMR 后,我们会输入 hive,创建一张测试表,然后你上来就会遇到如下报错:

[root@master-1-1(172.26.95.71) ~]# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apps/HIVE/hive-3.1.3-hadoop3.1-1.0.4/lib/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/apps/HADOOP-COMMON/hadoop-3.2.1-1.2.7-alinux3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 1b337056-6f87-46e8-b3b7-52665e5622bf

Logging initialized using configuration in file:/etc/taihao-apps/hive-conf/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: DestHost:destPort master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com:9000 , LocalHost:localPort master-1-1.c-8120a41f6b0c44
3d.cn-zhangjiakou.emr.aliyuncs.com/172.26.95.71:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:651)
        at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
......
Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
        at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:770)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:733)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:827)
        at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:421)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1606)
        at org.apache.hadoop.ipc.Client.call(Client.java:1435)
        ... 34 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
        at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:173)
        at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:390)
        at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:627)
        at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:421)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:814)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:810)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:810)
        ... 37 more

因为机器没有 kinit 过,我们需要选择一个 principal kinit 下。

你可以通过 kadmin.local 登入 kadmin,输入 list_principals,看到 EMR 已经内置了如下 principals。

HTTP/core-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
HTTP/core-1-2.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
HTTP/core-1-3.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
HTTP/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
K/M@EMR.C-8120A41F6B0C443D.COM
emr-monitor/core-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
emr-monitor/core-1-2.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
emr-monitor/core-1-3.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
emr-monitor/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
flink/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
hadoop/core-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
hadoop/core-1-2.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
hadoop/core-1-3.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
hadoop/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
hdfs/core-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
hdfs/core-1-2.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
hdfs/core-1-3.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
hdfs/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
hive/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
host/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
kadmin/admin@EMR.C-8120A41F6B0C443D.COM
kadmin/changepw@EMR.C-8120A41F6B0C443D.COM
kadmin/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
kiprop/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
krbtgt/EMR.C-8120A41F6B0C443D.COM@EMR.C-8120A41F6B0C443D.COM
rangeradmin/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
rangerlookup/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
rangerusersync/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
root/admin@EMR.C-8120A41F6B0C443D.COM
spark/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
trino/core-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
trino/core-1-2.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
trino/core-1-3.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
trino/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
zookeeper/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM

这里我就不自己创建 principal 了,直接用个现成的 spark,因为 EMR 肯定不同组件都给我们配置好了 spark 的相关权限,直接用这个 principal 就行了。

执行如下 kinit 命令:

kinit -kt /etc/taihao-apps/spark-conf/keytab/spark.keytab spark/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM

之后你可以通过 klist 命令确认 kinit 成功了。

[root@master-1-1(172.26.95.71) ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: spark/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM

Valid starting       Expires              Service principal
2023-08-20T22:53:12  2023-08-21T22:53:12  krbtgt/EMR.C-8120A41F6B0C443D.COM@EMR.C-8120A41F6B0C443D.COM
        renew until 2023-08-27T22:53:12

自此,你可以在这个新的 emr 进行任何建表的操作了。

ECS 配置

我的机器系统是 Centos 7,当然 ECS 得先安装 Kerberos 套件,执行如下命令:

yum install -y krb5-server krb5-libs krb5-workstation

安装成功后,修改下 /etc/krb5.conf 内容,就是加个 [realms] 字段和修改下 default_realm 就行了。这两个字段怎么填,你直接抄 EMR 上面的 /etc/krb5.conf 就行了,我删减了一部分修改后如下:

[libdefaults]
  default_realm = EMR.C-8120A41F6B0C443D.COM
[realms]
  EMR.C-8120A41F6B0C443D.COM = {
    kdc = master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com:88
    admin_server = master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com:749
  }

default_relam 一定要设置,不然 Trino 会启动报错。

然后先从 EMR 上面把 spark.keytab 拷到 ECS 上面:

scp root@172.26.95.71:/etc/taihao-apps/spark-conf/keytab/spark.keytab .

然后执行 kinit 命令:

kinit -kt spark.keytab spark/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM

并通过 klist 确认 kinit 成功:

Ticket cache: FILE:/tmp/krb5cc_1025
Default principal: spark/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM

Valid starting       Expires              Service principal
08/20/2023 23:07:05  08/21/2023 23:07:04  krbtgt/EMR.C-8120A41F6B0C443D.COM@EMR.C-8120A41F6B0C443D.COM

自此,ECS 上面的配置算是完成了。

Trino 配置

Trino 的 Hive catalog 配置如下:

connector.name=hive
hive.metastore.uri=thrift://master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com:9083
hive.metastore.authentication.type=KERBEROS
# 注意,这里填的是 hive,而不是 ECS 上面 kinit 的 spark
hive.metastore.service.principal=hive/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
hive.metastore.client.principal=spark/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
# Spark principal 对应的 keytab
hive.metastore.client.keytab=/home/disk1/smith/kerberos/spark.keytab


hive.hdfs.authentication.type=KERBEROS
hive.hdfs.trino.principal=spark/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
hive.hdfs.trino.keytab=/home/disk1/smith/kerberos/spark.keytab

hive.config.resources=/home/disk1/smith/tools/trino-server-405/etc/catalog/kerberos/hdfs-site.xml

注意这里需要额外引用下 EMR 上面自带的 hdfs-site.xml正常来说是不需要的,但是感觉阿里云的 EMR 配置过一些别的乱七八糟的配置。如果你不配置 hive.config.resources,HDFS 访问会报错。具体报错信息见下面的 【疑难杂症-org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block】。

EMR 上面自带的 hdfs-site.xml/etc/taihao-apps/hadoop-conf/hdfs-site.xml 上面原封不动拷贝过来就行了。

StarRocks 配置

这里以 Hive catalog 为例。我们把 hdfs-site.xmlcore-site.xmlhive-site.xml 分别放置在 FE/BE 对应的 conf 目录下就行了。

FE

hive-site.xml 内容如下:

注意用的是 hive 而不是 ECS kinitspark

<configuration> 
    <property>
        <name>hive.metastore.kerberos.principal</name>
        <value>hive/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM</value>
    </property>
    <property>
        <name>hive.metastore.sasl.enabled</name>
        <value>true</value>
    </property>
</configuration>  

core-site.xml 内容如下:

<configuration> 
    <property>
        <name>hadoop.security.authentication</name>
        <value>KERBEROS</value>
    </property>
</configuration>

hdfs-site.xml 内容如下:

注意这里配置的是 hdfs 而不是 spark

<configuration> 
    <property>
        <name>dfs.datanode.kerberos.principal</name>
        <value>hdfs/_HOST@EMR.C-8120A41F6B0C443D.COM</value>
    </property>
    <property>
        <name>dfs.namenode.kerberos.principal</name>
        <value>hdfs/_HOST@EMR.C-8120A41F6B0C443D.COM</value>
    </property>
</configuration>

BE

BE 不会访问 Hive,所以不需要配置 hive-site.xml

core-site.xml 内容如下:

<configuration> 
    <property>
        <name>hadoop.security.authentication</name>
        <value>KERBEROS</value>
    </property>
</configuration>

hdfs-site.xml 内容比 FE 的复杂,估计是阿里云 EMR 的特殊设置,我们直接拷贝 EMR 的 hdfs-site.xml 即可。不然会报错,见【疑难杂症-org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block】。

疑难杂症

如何得到更多的 Kerberos 错误信息

在 JVM 的启动参数上面添加 -Dsun.security.krb5.debug=true,然后你可以在日志里面看见更多的 Kerberos 认证信息。

Kerberos 连接超时 Receive timed out

可以看见如下日志:

2023-08-20 23:18:21,489 ERROR (starrocks-mysql-nio-pool-0|163) [TSaslTransport.open():307] SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) ~[jdk.security.jgss:?]
    at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:95) ~[libthrift-0.13.0.jar:0.13.0]
    at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:265) ~[libthrift-0.13.0.jar:0.13.0]
    at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:38) ~[libthrift-0.13.0.jar:0.13.0]
    at org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:51) ~[hive-apache-3.1.2-13.jar:?]
    at org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:48) ~[hive-apache-3.1.2-13.jar:?]
    at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
    at javax.security.auth.Subject.doAs(Subject.java:423) ~[?:?]
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) ~[hadoop-common-3.3.6.jar:?]
    at org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport.open(TUGIAssumingTransport.java:48) ~[hive-apache-3.1.2-13.jar:?]
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:528) ~[starrocks-fe.jar:?]
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:301) ~[starrocks-fe.jar:?]
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:?]
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:490) ~[?:?]
    at org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84) ~[hive-apache-3.1.2-13.jar:?]
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:95) ~[hive-apache-3.1.2-13.jar:?]
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:148) ~[hive-apache-3.1.2-13.jar:?]
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:119) ~[hive-apache-3.1.2-13.jar:?]
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:112) ~[hive-apache-3.1.2-13.jar:?]
    at com.starrocks.connector.hive.HiveMetaClient$RecyclableClient.<init>(HiveMetaClient.java:94) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.HiveMetaClient$RecyclableClient.<init>(HiveMetaClient.java:83) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.HiveMetaClient.getClient(HiveMetaClient.java:138) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.HiveMetaClient.callRPC(HiveMetaClient.java:154) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.HiveMetaClient.callRPC(HiveMetaClient.java:146) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.HiveMetaClient.getDb(HiveMetaClient.java:232) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.HiveMetastore.getDb(HiveMetastore.java:85) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.CachingHiveMetastore.loadDb(CachingHiveMetastore.java:281) ~[starrocks-fe.jar:?]
    at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:169) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.CacheLoader$1.load(CacheLoader.java:192) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3570) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2312) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2189) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2079) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache.get(LocalCache.java:4011) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:5017) ~[spark-dpp-1.0.0.jar:?]
    at com.starrocks.connector.hive.CachingHiveMetastore.get(CachingHiveMetastore.java:522) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.CachingHiveMetastore.getDb(CachingHiveMetastore.java:277) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.CachingHiveMetastore.loadDb(CachingHiveMetastore.java:281) ~[starrocks-fe.jar:?]
    at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:169) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.CacheLoader$1.load(CacheLoader.java:192) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3570) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2312) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2189) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2079) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache.get(LocalCache.java:4011) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4034) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010) ~[spark-dpp-1.0.0.jar:?]
    at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:5017) ~[spark-dpp-1.0.0.jar:?]
    at com.starrocks.connector.hive.CachingHiveMetastore.get(CachingHiveMetastore.java:522) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.CachingHiveMetastore.getDb(CachingHiveMetastore.java:277) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.HiveMetastoreOperations.getDb(HiveMetastoreOperations.java:142) ~[starrocks-fe.jar:?]
    at com.starrocks.connector.hive.HiveMetadata.getDb(HiveMetadata.java:100) ~[starrocks-fe.jar:?]
    at com.starrocks.server.MetadataMgr.lambda$getDb$1(MetadataMgr.java:149) ~[starrocks-fe.jar:?]
    at java.util.Optional.map(Optional.java:265) ~[?:?]
    at com.starrocks.server.MetadataMgr.getDb(MetadataMgr.java:149) ~[starrocks-fe.jar:?]
    at com.starrocks.server.GlobalStateMgr.changeCatalogDb(GlobalStateMgr.java:3597) ~[starrocks-fe.jar:?]
    at com.starrocks.mysql.MysqlProto.negotiate(MysqlProto.java:231) ~[starrocks-fe.jar:?]
    at com.starrocks.mysql.nio.AcceptListener.lambda$handleEvent$1(AcceptListener.java:86) ~[starrocks-fe.jar:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
    at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Receive timed out)
    at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:776) ~[java.security.jgss:?]
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266) ~[java.security.jgss:?]
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196) ~[java.security.jgss:?]
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) ~[jdk.security.jgss:?]
    ... 64 more
Caused by: java.net.SocketTimeoutException: Receive timed out
    at java.net.PlainDatagramSocketImpl.receive0(Native Method) ~[?:?]
    at java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:181) ~[?:?]
    at java.net.DatagramSocket.receive(DatagramSocket.java:814) ~[?:?]
    at sun.security.krb5.internal.UDPClient.receive(NetClient.java:205) ~[java.security.jgss:?]
    at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:404) ~[java.security.jgss:?]
    at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:364) ~[java.security.jgss:?]
    at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
    at sun.security.krb5.KdcComm.send(KdcComm.java:348) ~[java.security.jgss:?]
    at sun.security.krb5.KdcComm.sendIfPossible(KdcComm.java:253) ~[java.security.jgss:?]
    at sun.security.krb5.KdcComm.send(KdcComm.java:229) ~[java.security.jgss:?]
    at sun.security.krb5.KdcComm.send(KdcComm.java:200) ~[java.security.jgss:?]
    at sun.security.krb5.KrbTgsReq.send(KrbTgsReq.java:246) ~[java.security.jgss:?]
    at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:261) ~[java.security.jgss:?]
    at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:308) ~[java.security.jgss:?]
    at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:126) ~[java.security.jgss:?]
    at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458) ~[java.security.jgss:?]
    at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695) ~[java.security.jgss:?]
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:266) ~[java.security.jgss:?]
    at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196) ~[java.security.jgss:?]
    at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) ~[jdk.security.jgss:?]
    ... 64 more

通过 Kerberos 的 Debug 日志可以看到:

Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 20 19 16 23.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbKdcReq send: kdc=master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com UDP:88, timeout=30000, number of retries =3, #bytes=907
>>> KDCCommunication: kdc=master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com UDP:88, timeout=30000,Attempt =1, #bytes=907
SocketTimeOutException with attempt: 1
>>> KDCCommunication: kdc=master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com UDP:88, timeout=30000,Attempt =2, #bytes=907
SocketTimeOutException with attempt: 2
>>> KDCCommunication: kdc=master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com UDP:88, timeout=30000,Attempt =3, #bytes=907
SocketTimeOutException with attempt: 3
>>> KrbKdcReq send: error trying master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com:88
java.net.SocketTimeoutException: Receive timed out
    at java.base/java.net.PlainDatagramSocketImpl.receive0(Native Method)
    at java.base/java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:181)
    at java.base/java.net.DatagramSocket.receive(DatagramSocket.java:814)
    at java.security.jgss/sun.security.krb5.internal.UDPClient.receive(NetClient.java:205)
    at java.security.jgss/sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:404)
    at java.security.jgss/sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:364)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.security.jgss/sun.security.krb5.KdcComm.send(KdcComm.java:348)

仿佛在用 UDP 协议访问目标 88 端口的时候,超时了。进过排查,我的安全组没有放行 UDP。

可以在 /etc/krb5.conf 下面的 [libdefaults] 添加一行:udp_preference_limit = 1 使用 TCP 协议进行访问就行了。

Trino Failed connecting to Hive metastore

执行 sql 发现:

trino> select * from hive.smith.hello;
Query 20230820_002828_00000_7etqp failed: Failed connecting to Hive metastore: [master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com:9083]

但是你可以从 log 日志里面发现 Kerberos 鉴权实际上是成功的。

Client Principal = spark/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
Server Principal = spark/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM
Session Key = EncryptionKey: keyType=18 keyBytes (hex dump)=
0000: 60 4E EE 46 9F 3E E1 24   B2 88 24 5A 43 34 49 A8  `N.F.>.$..$ZC4I.
0010: 8A 06 A9 C2 51 A9 DC EF   D9 46 AB A3 78 F9 86 4C  ....Q....F..x..L


Forwardable Ticket true
Forwarded Ticket false
Proxiable Ticket false
Proxy Ticket false
Postdated Ticket false
Renewable Ticket false
Initial Ticket false
Auth Time = Sun Aug 20 08:28:28 CST 2023
Start Time = Sun Aug 20 08:28:28 CST 2023
End Time = Mon Aug 21 08:28:28 CST 2023
Renew Till = null
Client Addresses  Null
2023-08-20T08:28:37.940+0800    INFO    Query-20230820_002828_00000_7etqp-147   stdout  >>> KrbApReq: APOptions are 00100000 00000000 00000000 00000000
2023-08-20T08:28:37.941+0800    INFO    Query-20230820_002828_00000_7etqp-147   stdout  >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
2023-08-20T08:28:37.942+0800    INFO    Query-20230820_002828_00000_7etqp-147   stdout  Krb5Context setting mySeqNumber to: 820609998
2023-08-20T08:28:37.942+0800    INFO    Query-20230820_002828_00000_7etqp-147   stdout  Created InitSecContextToken:
0000: 01 00 6E 82 03 11 30 82   03 0D A0 03 02 01 05 A1  ..n...0.........
0010: 03 02 01 0E A2 07 03 05   00 20 00 00 00 A3 82 01  ......... ......
0020: D3 61 82 01 CF 30 82 01   CB A0 03 02 01 05 A1 1C  .a...0..........
0030: 1B 1A 45 4D 52 2E 43 2D   44 31 44 31 44 36 36 42  ..EMR.C-D1D1D66B
0040: 35 31 37 32 32 45 37 43   2E 43 4F 4D A2 51 30 4F  51722E7C.COM.Q0O
0050: A0 03 02 01 00 A1 48 30   46 1B 05 73 70 61 72 6B  ......H0F..spark
0060: 1B 3D 6D 61 73 74 65 72   2D 31 2D 31 2E 63 2D 64  .=master-1-1.c-d
0070: 31 64 31 64 36 36 62 35   31 37 32 32 65 37 63 2E  1d1d66b51722e7c.
0080: 63 6E 2D 7A 68 61 6E 67   6A 69 61 6B 6F 75 2E 65  cn-zhangjiakou.e
0090: 6D 72 2E 61 6C 69 79 75   6E 63 73 2E 63 6F 6D A3  mr.aliyuncs.com.
00A0: 82 01 51 30 82 01 4D A0   03 02 01 12 A1 03 02 01  ..Q0..M.........
00B0: 02 A2 82 01 3F 04 82 01   3B 43 BD 58 AB 7C 93 A5  ....?...;C.X....
00C0: 15 E7 4C 36 F1 B0 10 DF   1E 3A 83 74 8B CB 7A ED  ..L6.....:.t..z.
00D0: C3 01 E3 63 CF ED B6 B6   F7 E4 C2 84 B4 EC 85 A0  ...c............
00E0: 2C E3 01 94 38 6C AB 86   43 A4 3B 90 4E BF DE 7E  ,...8l..C.;.N...
00F0: F5 02 86 A3 6A 96 E5 DC   0F 44 0A C4 B4 F1 E2 14  ....j....D......
0100: 03 B8 B5 85 57 17 FE D7   AC 45 82 10 FA 6E E2 A0  ....W....E...n..
0110: 4C FB 02 A0 2C 44 90 3B   9A 1A 3F F5 08 29 27 21  L...,D.;..?..)'!
0120: 26 8E 66 E4 A1 F5 89 CB   6C A9 9D A8 B9 9E F0 03  &.f.....l.......
0130: C1 22 5F 4A 08 77 F3 0E   63 4E DF 43 C6 5E 41 10  ."_J.w..cN.C.^A.
0140: 88 CC 11 0D 67 5B 4C CD   C4 24 59 00 64 F0 1A 38  ....g[L..$Y.d..8

这是因为 Trino 的 Hive catalog 配置 hive.metastore.service.principal 错了,不应该用 spark/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM 换成 hive 的就行了,hive/master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com@EMR.C-8120A41F6B0C443D.COM

org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block xxx

这个报错无论是 StarRocks 还是 Trino,都会遇到,错误的栈大概就是 Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1115803670-172.26.95.71-1691556308511:blk_1073742076_1253 file=/user/hive/warehouse/smith.db/orc_map_late_bug_table/part-00001-bc8f96b0-17f3-46ec-903e-0c04b8fb2687-c000.snappy.orc

看起来不像是 Kerberos 的错误,你只需要从 EMR 中,把 hdfs-site.xml 文件拷贝过来用就行了,估计阿里云的 EMR HDFS 配置了一些奇怪的东西所导致。因为里面配置项太多了,没有精力一一排查。

2023-08-21T11:07:12.876+0800    ERROR   stage-scheduler io.trino.execution.StageStateMachine    Stage 20230821_030658_00000_n6ytk.1 failed                    [5/4518]
io.trino.spi.TrinoException: Error opening Hive split hdfs://master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com:9000/user/hive/warehouse/smith.db/or
c_map_late_bug_table/part-00001-bc8f96b0-17f3-46ec-903e-0c04b8fb2687-c000.snappy.orc (offset=0, length=793): Could not obtain block: BP-1115803670-172.26.95.71-169155
6308511:blk_1073742076_1253 file=/user/hive/warehouse/smith.db/orc_map_late_bug_table/part-00001-bc8f96b0-17f3-46ec-903e-0c04b8fb2687-c000.snappy.orc
        at io.trino.plugin.hive.orc.OrcPageSourceFactory.createOrcPageSource(OrcPageSourceFactory.java:474)
        at io.trino.plugin.hive.orc.OrcPageSourceFactory.createPageSource(OrcPageSourceFactory.java:197)
        at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:291)
        at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:196)
        at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:49)
        at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:62)
        at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:308)
        at io.trino.operator.Driver.processInternal(Driver.java:411)
        at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
        at io.trino.operator.Driver.tryWithLock(Driver.java:706)
        at io.trino.operator.Driver.process(Driver.java:306)
        at io.trino.operator.Driver.processForDuration(Driver.java:277)
        at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:752)
        at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
        at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:519)
        at io.trino.$gen.Trino_405____20230821_030529_2.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1115803670-172.26.95.71-1691556308511:blk_1073742076_1253 file=/user/hive/warehouse/smith.db/orc_map_late_bug_table/part-00001-bc8f96b0-17f3-46ec-903e-0c04b8fb2687-c000.snappy.orc
        at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:879)
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:862)
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:841)
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:567)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
        at java.base/java.io.DataInputStream.read(DataInputStream.java:151)
        at java.base/java.io.DataInputStream.read(DataInputStream.java:151)
        at io.trino.hdfs.FSDataInputStreamTail.readTail(FSDataInputStreamTail.java:59)
        at io.trino.filesystem.hdfs.HdfsInput.readTail(HdfsInput.java:56)
        at io.trino.filesystem.TrinoInput.readTail(TrinoInput.java:46)
        at io.trino.plugin.hive.orc.HdfsOrcDataSource.readTailInternal(HdfsOrcDataSource.java:66)
        at io.trino.orc.AbstractOrcDataSource.readTail(AbstractOrcDataSource.java:93)
        at io.trino.orc.OrcReader.wrapWithCacheIfTiny(OrcReader.java:325)
        at io.trino.orc.OrcReader.createOrcReader(OrcReader.java:103)
        at io.trino.orc.OrcReader.createOrcReader(OrcReader.java:94)
        at io.trino.plugin.hive.orc.OrcPageSourceFactory.createOrcPageSource(OrcPageSourceFactory.java:274)
        ... 18 more

后面经大佬指点,其实只要在 hdfs-site.xml 里面添加如下配置就行了:

<property>
    <name>dfs.data.transfer.protection</name>
    <value>integrity</value>
</property>

本人因为环境没了,就没有验证,需要的人可以自己试试。

为啥不直接把 EMR 上面的 hdfs-site.xmlcore-site.xmlhive-site.xml 一股脑的拷贝过来?

2023-08-21T11:07:12.876+0800    ERROR   stage-scheduler io.trino.execution.StageStateMachine    Stage 20230821_030658_00000_n6ytk.1 failed                    [5/4518]
io.trino.spi.TrinoException: Error opening Hive split hdfs://master-1-1.c-8120a41f6b0c443d.cn-zhangjiakou.emr.aliyuncs.com:9000/user/hive/warehouse/smith.db/or
c_map_late_bug_table/part-00001-bc8f96b0-17f3-46ec-903e-0c04b8fb2687-c000.snappy.orc (offset=0, length=793): Could not obtain block: BP-1115803670-172.26.95.71-169155
6308511:blk_1073742076_1253 file=/user/hive/warehouse/smith.db/orc_map_late_bug_table/part-00001-bc8f96b0-17f3-46ec-903e-0c04b8fb2687-c000.snappy.orc
        at io.trino.plugin.hive.orc.OrcPageSourceFactory.createOrcPageSource(OrcPageSourceFactory.java:474)
        at io.trino.plugin.hive.orc.OrcPageSourceFactory.createPageSource(OrcPageSourceFactory.java:197)
        at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:291)
        at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:196)
        at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:49)
        at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:62)
        at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:308)
        at io.trino.operator.Driver.processInternal(Driver.java:411)
        at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
        at io.trino.operator.Driver.tryWithLock(Driver.java:706)
        at io.trino.operator.Driver.process(Driver.java:306)
        at io.trino.operator.Driver.processForDuration(Driver.java:277)
        at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:752)
        at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
        at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:519)
        at io.trino.$gen.Trino_405____20230821_030529_2.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1115803670-172.26.95.71-1691556308511:blk_1073742076_1253 file=/user/hive/warehouse/smith.db/orc_map_late_bug_table/part-00001-bc8f96b0-17f3-46ec-903e-0c04b8fb2687-c000.snappy.orc
        at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:879)
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:862)
        at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:841)
        at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:567)
        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
        at java.base/java.io.DataInputStream.read(DataInputStream.java:151)
        at java.base/java.io.DataInputStream.read(DataInputStream.java:151)
        at io.trino.hdfs.FSDataInputStreamTail.readTail(FSDataInputStreamTail.java:59)
        at io.trino.filesystem.hdfs.HdfsInput.readTail(HdfsInput.java:56)
        at io.trino.filesystem.TrinoInput.readTail(TrinoInput.java:46)
        at io.trino.plugin.hive.orc.HdfsOrcDataSource.readTailInternal(HdfsOrcDataSource.java:66)
        at io.trino.orc.AbstractOrcDataSource.readTail(AbstractOrcDataSource.java:93)
        at io.trino.orc.OrcReader.wrapWithCacheIfTiny(OrcReader.java:325)
        at io.trino.orc.OrcReader.createOrcReader(OrcReader.java:103)
        at io.trino.orc.OrcReader.createOrcReader(OrcReader.java:94)
        at io.trino.plugin.hive.orc.OrcPageSourceFactory.createOrcPageSource(OrcPageSourceFactory.java:274)
        ... 18 more

其实大部分情况下也是 work 的,但是它们包含的配置项太多了,很难说不会对系统造成一些 Unforeseen Consequences。

Half-Life: Alyx Ending

原创文章,作者:Smith,如若转载,请注明出处:https://www.inlighting.org/archives/trino-starrocks-emr-kerberos-setup

打赏 微信扫一扫 微信扫一扫
SmithSmith
上一篇 2021年1月21日 下午1:35
下一篇 2023年11月12日 下午7:38

相关推荐

  • Spark-SQL 有用的 SQL

    我发现自己每次用 Spark 造 Iceberg 表都要耗费老大的劲,官方文档总是没有一个现成的 Demo,网上也搜索不到,全靠自己琢磨。故在这里记录一下,顺带帮助一下可能需要的人…

    2023年11月12日
    1.4K3
  • HDFS Hedged Read 的利弊分析

    HDFS Hedged read 是一种优化 HDFS 客户端读取文件性能的方法。它会在存在慢节点的情况下,通过申请多个内存来提高读取性能。但是,由于 Hedged read 会频繁申请内存,可能会导致内存消耗过大,从而影响系统性能。因此,HDFS 并没有默认开启 Hedged read 功能。在使用 Hedged read 时,需要注意内存消耗的问题,以避免对系统性能造成负面影响。

    2023年11月12日
    1.0K2
  • Apache Parquet Bloom Filter

    Bloom Filter 只能处理 =,IN 谓词。 什么是 Bloom Filter? Bloom Filter 是用于判断某个元素是否在一个集合中的数据结构,优点是空间效率和查…

    2024年11月23日
    1461

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注