1

I’m trying to integrate spark(3.1.1) and hive local metastore (3.1.2) to use spark-sql.

i configured the spark-defaults.conf according to https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html and hive jar files exists in correct path.

but an exception occurred when execute 'spark.sql("show tables").show' like below.

any mistakes, hints, or corrections would be appreciated.

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.sql("show tables").show
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException
  at java.lang.Class.getDeclaredConstructors0(Native Method)
  at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
  at java.lang.Class.getConstructors(Class.java:1651)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:291)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:492)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:352)
  at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:71)
  at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:70)
  at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:22
4)
  at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)

spark-defaults.conf

spark.master                            yarn
spark.eventLog.enabled                  true
spark.eventLog.dir                      hdfs://192.168.5.130:9000/spark
spark.history.fs.logDirectory           hdfs://192.168.5.130:9000/spark
spark.history.provider                  org.apache.spark.deploy.history.FsHistoryProvider
spark.yarn.historyServer.address        http://192.168.5.130:8188
spark.yarn.historyServer.allowTracking  true

spark.sql.uris                          thrift://192.168.5.130:10000
spark.sql.warehouse.dir                 /user/hive/warehouse
spark.sql.hive.metastore.jars           path
spark.sql.hive.metastore.jars.path      file:///usr/local/hive/lib/*.jar
spark.sql.hive.metastore.version        3.1.2
spark.sql.hive.metastore.sharedPrefixes org.postgresql

ls /usr/local/hive/lib | grep hive

hive-accumulo-handler-3.1.2.jar
hive-beeline-3.1.2.jar
hive-classification-3.1.2.jar
hive-cli-3.1.2.jar
hive-common-3.1.2.jar
hive-contrib-3.1.2.jar
hive-druid-handler-3.1.2.jar
hive-exec-3.1.2.jar
hive-hbase-handler-3.1.2.jar
hive-hcatalog-core-3.1.2.jar
hive-hcatalog-server-extensions-3.1.2.jar
hive-hplsql-3.1.2.jar
hive-jdbc-3.1.2.jar
hive-jdbc-handler-3.1.2.jar
hive-kryo-registrator-3.1.2.jar
hive-llap-client-3.1.2.jar
hive-llap-common-3.1.2.jar
hive-llap-common-3.1.2-tests.jar
hive-llap-ext-client-3.1.2.jar
hive-llap-server-3.1.2.jar
hive-llap-tez-3.1.2.jar
hive-metastore-3.1.2.jar
hive-serde-3.1.2.jar
hive-service-3.1.2.jar
hive-service-rpc-3.1.2.jar
hive-shims-0.23-3.1.2.jar
hive-shims-3.1.2.jar
hive-shims-common-3.1.2.jar
hive-shims-scheduler-3.1.2.jar
hive-standalone-metastore-3.1.2.jar
hive-storage-api-2.7.0.jar
hive-streaming-3.1.2.jar
hive-testutils-3.1.2.jar
hive-upgrade-acid-3.1.2.jar
hive-vector-code-gen-3.1.2.jar

hive-site.xml

<configuration>
  <property>
     <name>javax.jdo.option.ConnectionURL</name>
     <value>jdbc:postgresql://192.168.5.130:5432/hive?createDatabaseIfNotExist=true</value>
  </property>
  <property>
     <name>javax.jdo.option.ConnectionDriverName</name>
     <value>org.postgresql.Driver</value></property>
  <property>
     <name>javax.jdo.option.ConnectionUserName</name>
     <value>hive</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>hive</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>hdfs://192.168.5.130:9000/user/hive/warehouse</value>
  </property>
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
  </property>
  <property>
     <name>datanucleus.autoCreateSchema</name>
     <value>false</value>
  </property>
  <property>
    <name>hive.aux.jars.path</name>
    <value>file:///usr/local/hive/lib</value> 
   </property>
</configuration>

After copy hive-site.xml to $SPARK_HOME/conf, not found exception occrred about org/apache/commons/collections/CollectionUtils like below.

spark.sql("show tables").show

scala> spark.sql("show tables").show
21/05/24 00:49:58 ERROR FileUtils: The jar file path file:///usr/local/hive/lib/*.jar doesn't exist
Hive Session ID = a6d63a41-e235-4d8c-a660-6f7b1a22996b
21/05/24 00:49:59 WARN ObjectStore: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:01 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
21/05/24 00:50:02 WARN Hive: Failed to register all functions.
java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:86)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:95)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:148)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:119)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:4299)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4367)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4347)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:4603)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:291)
        at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:274)
        at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:435)
        at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:375)
        at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:355)
        at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:331)
        at org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:257)
        at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:283)
        at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224)
        at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223)
        at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
        at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:384)
        at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:224)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
        at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224)
        at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:134)
        at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:124)
        at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:154)
        at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:152)
        at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$2(HiveSessionStateBuilder.scala:60)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:99)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:99)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listTables(SessionCatalog.scala:946)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listTables(SessionCatalog.scala:932)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listTables(SessionCatalog.scala:924)
        at org.apache.spark.sql.execution.command.ShowTablesCommand.$anonfun$run$43(tables.scala:868)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.execution.command.ShowTablesCommand.run(tables.scala:868)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
        at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
        at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
        at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24)
        at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:28)
        at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
        at $line14.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32)
        at $line14.$read$$iw$$iw$$iw$$iw.<init>(<console>:34)
        at $line14.$read$$iw$$iw$$iw.<init>(<console>:36)
        at $line14.$read$$iw$$iw.<init>(<console>:38)
        at $line14.$read$$iw.<init>(<console>:40)
        at $line14.$read.<init>(<console>:42)
        at $line14.$read$.<init>(<console>:46)
        at $line14.$read$.<clinit>(<console>)
        at $line14.$eval$.$print$lzycompute(<console>:7)
        at $line14.$eval$.$print(<console>:6)
        at $line14.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
        at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
        at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
        at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
        at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
        at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894)
        at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762)
        at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464)
        at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239)
        at org.apache.spark.repl.Main$.doMain(Main.scala:78)
        at org.apache.spark.repl.Main$.main(Main.scala:58)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84)
        ... 101 more
Caused by: MetaException(message:org/apache/commons/collections/CollectionUtils)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:84)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:93)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:8667)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:169)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:94)
        ... 106 more
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/collections/CollectionUtils
        at org.apache.hadoop.hive.metastore.ObjectStore.grantPrivileges(ObjectStore.java:5709)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
        at com.sun.proxy.$Proxy39.grantPrivileges(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultRoles_core(HiveMetaStore.java:828)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultRoles(HiveMetaStore.java:794)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:539)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:80)
        ... 110 more
Caused by: java.lang.ClassNotFoundException: org.apache.commons.collections.CollectionUtils
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:247)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:236)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 127 more


repeated logs ...

this error can be fixed to copying $SPARK_HOME/jars/commons-collections-3.2.2.jar to $HIVE_HOME/lib.

0

1 Answer 1

0

Seems your hive conf is missing. To connect to hive metastore you need to copy the hive-site.xml file into spark/conf directory.

Try

cp  /usr/lib/hive/conf/hive-site.xml    ${SPARK_HOME}/conf/
Sign up to request clarification or add additional context in comments.

4 Comments

after copy file, something chaged. but another exceptions. lol 21/05/21 07:41:27 WARN MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 21/05/21 07:41:27 WARN Hive: Failed to register all functions. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStore Client
Can you copy and paste the hive-site.xml content here? Also, full stack trace is needed to diagnose this issue.
I'm sorry too late comment. as requested, full stack trace and hive-site.xml were added.
The not found exception for org.apache.commons.collections.CollectionUtils can be fixed by copying $SPARK_HOME/jars/commons-collections-3.2.2.jar to $HIVE_HOME/lib. spark-sql is work. Thanks very much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.