25

Running a Spark SQL (v2.1.0_2.11) program in Java immediately fails with the following exception, as soon as the first action is called on a dataframe:

java.lang.ClassNotFoundException: org.codehaus.commons.compiler.UncheckedCompileException

I ran it in Eclipse, outside of the spark-submit environment. I use the following Spark SQL Maven dependency:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.1.0</version>
    <scope>provided</scope>
</dependency>

11 Answers 11

37

The culprit is the library commons-compiler. Here is the conflict:

enter image description here

To work around this, add the following to your pom.xml:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.codehaus.janino</groupId>
            <artifactId>commons-compiler</artifactId>
            <version>2.7.8</version>
        </dependency>
    </dependencies>
</dependencyManagement>

Sign up to request clarification or add additional context in comments.

4 Comments

Maybe somebody stumbles upon... when I upgraded from Spark 2.1 to 2.3. the above workaround stopped to work for me. So I fixed the version of org.codehaus.janino:janino to 3.0.8. That helped.
Hi now is 2019-12-17, I still see the error after explicitly adding a dependency on org.codehaus.janino:commons-compiler:3.1.0. My Spark is now at 2.4.3, SparkSQL is at 2.12. I have the full output of gradle dependency here gist.github.com/leeyuiwah-sl/cc9e2f36ebccea0d875a995551abe3ad Can you help! Thanks!
Okay I have found a solution -- I must use Janino 3.0.8 not newer (such as 3.1.0). Also, I must have explicit dependencies on both org.codehaus.janino:janino as well as org.codehaus.janino:commons-compiler:. Thanks!
@leeyuiwah After a long struggling your comment resolved the problem for my case. There was a dependency conflict after adding spring boot to my spark project. Thanks!
34

I had the similar issues, when updated spark-2.2.1 to spark-2.3.0.

In my case, I had to fix commons-compiler and janino

Spark 2.3 solution:

<dependencyManagement>
    <dependencies>
        <!--Spark java.lang.NoClassDefFoundError: org/codehaus/janino/InternalCompilerException-->
        <dependency>
            <groupId>org.codehaus.janino</groupId>
            <artifactId>commons-compiler</artifactId>
            <version>3.0.8</version>
        </dependency>
        <dependency>
            <groupId>org.codehaus.janino</groupId>
            <artifactId>janino</artifactId>
            <version>3.0.8</version>
        </dependency>
    </dependencies>
</dependencyManagement>
<dependencies>
    <dependency>
        <groupId>org.codehaus.janino</groupId>
        <artifactId>commons-compiler</artifactId>
        <version>3.0.8</version>
    </dependency>
    <dependency>
        <groupId>org.codehaus.janino</groupId>
        <artifactId>janino</artifactId>
        <version>3.0.8</version>
    </dependency>
</dependencies>

3 Comments

Perfect, thank you! I actually got this after adding the spring boot plugin which was conflicting with these Spark dependencies. But this solution fixed that.
The accepted solution does not work in Spark 2.3, but this one works perfectly.
Adding the above two dependencies with 3.0.16 version fixed it for me. The spark version is 3.1.2
12

If you are using the Spark 3.0.1 version or higher, you have to select version 3.0.16 for the two janino dependencies for the @Maksym solution that works very well.


And since Spark 3.4.0, I had to switch from 3.0.16 to 3.1.19 (the latest I've found)

Comments

5

My implementation requirement is Spring-boot + Scala + Spark(2.4.5)

For this issue, solution is to exclude artifactID 'janino' and 'commons-compiler' which comes with 'spark-sql_2.12' version 2.4.5.
The reason being the updated version 3.1.2 for both artifactID 'janino' and 'commons-compiler' which comes with 'spark-sql_2.12' version 2.4.5.

After excluding, add version 3.0.8 for both artifactID 'janino' and 'commons-compiler' as separate dependency.

<dependencies>
     <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>2.4.5</version>
        <exclusions>
            <exclusion>
                <artifactId>janino</artifactId>
                <groupId>org.codehaus.janino</groupId>
            </exclusion>
            <exclusion>
                <artifactId>commons-compiler</artifactId>
                <groupId>org.codehaus.janino</groupId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <artifactId>janino</artifactId>
        <groupId>org.codehaus.janino</groupId>
        <version>3.0.8</version>
    </dependency>
    <dependency>
        <artifactId>commons-compiler</artifactId>
        <groupId>org.codehaus.janino</groupId>
        <version>3.0.8</version>
    </dependency>
    ...............
    ...............
    ...............
    ...............
    ...............
</dependencies>

5 Comments

Works for me. But could not understand why it works. Could you explain it please a bit more detailed?
@AnnaKlein It's all linked to commons-compiler version, which should be align with Scala version.
@abhijitcaps How do you know that? Where can you find the correlation between Scala version and commons-compiler version?
@abhijitcaps I don't think it has anything to do with Scala version. See issues.apache.org/jira/browse/… for the explanation.
@abhijitcaps - if we add the above dependency - will there be any Open Source Software (OSS) issues ?
4

this error still arises with org.apache.spark:spark-sql_2.12:2.4.6, but the Janino version have to be used is 3.0.16 With Gradle:

implementation 'org.codehaus.janino:commons-compiler:3.0.16'
implementation 'org.codehaus.janino:janino:3.0.16'

Comments

2

in our migration from CDH Parcel 2.2.0.cloudera1 to 2.3.0.cloudera4 we have simply overwritten the maven property :

<janino.version>3.0.8</janino.version>

In addition, we have defined the proper version of the hive dependency in the dependency management part:

<hive.version>1.1.0-cdh5.13.3</hive.version>

    <dependency>
         <groupId>org.apache.hive</groupId>
         <artifactId>hive-jdbc</artifactId>
         <version>${hive.version}</version>
         <scope>runtime</scope>
         <exclusions>
             <exclusion>
                 <groupId>org.eclipse.jetty.aggregate</groupId>
                 <artifactId>*</artifactId>
             </exclusion>
             <exclusion>
                 <artifactId>slf4j-log4j12</artifactId>
                 <groupId>org.slf4j</groupId>
             </exclusion>
             <exclusion>
                 <artifactId>parquet-hadoop-bundle</artifactId>
                 <groupId>com.twitter</groupId>
             </exclusion>
         </exclusions>
     </dependency>

The exclusions were necessary for the previous version, they might not be necessary anymore

Comments

1

Apache spark-sql brings the required versions of janino and commons-compiler. If you're encountering this error, something else in your pom (or parent pom) is over-riding the version. While you can explicitly set the janino and commons-compiler versions in your pom to match what spark brings as suggested in other answers this will make long-term maintenance more difficult because maintainers will need to remember to update these explicit versions each-time you update spark. Instead, I recommend what worked well for me:

Figure out what is bringing in the wrong version of janino by running:

mvn dependency:tree #-Dverbose may be helpful

Exclude janino and commons-compiler from the offending dependency. In my case it was an in-house hadoop testing framework:

        <dependency>
            <groupId>org.my.client.pkg</groupId>
            <artifactId>hadoop-testing-framework</artifactId>
            <version>${some.version}</version>
            <exclusions>
                <!-- We want only and exactly Spark's janino version -->
                <exclusion>
                    <groupId>org.codehaus.janino</groupId>
                    <artifactId>janino</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.codehaus.janino</groupId>
                    <artifactId>commons-compiler</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

Re-run mvn dependency:tree and repeat the above process for any other dependencies which are overriding spark's janino version until your versions of janino and commons-compiler are coming from your spark-sql dependency as shown in my (abbreviated) mvn dependency:tree output below:

[INFO] +- org.apache.spark:spark-sql_2.11:jar:2.4.0.cloudera1:provided
[INFO] |  +- org.apache.spark:spark-catalyst_2.11:jar:2.4.0.cloudera1:provided
[INFO] |  |  +- org.codehaus.janino:janino:jar:3.0.9:compile
[INFO] |  |  +- org.codehaus.janino:commons-compiler:jar:3.0.9:compile

Note, if you see something like:

[INFO] +- org.apache.spark:spark-sql_2.11:jar:2.4.0.cloudera1:provided
[INFO] |  +- org.apache.spark:spark-catalyst_2.11:jar:2.4.0.cloudera1:provided
[INFO] |  |  +- org.codehaus.janino:janino:jar:2.6.1:compile  <-- Note old version
[INFO] |  |  +- org.codehaus.janino:commons-compiler:jar:3.0.9:compile

then someone else is still over-riding spark's janino version. In my case, the parent pom was explicitly bringing in v2.6.1. Removing that dependency block from the parent pom solved my problem. This is where the -Dverbose flag may help.

Final note, at least my version of spark could not tolerate any change in the janino or commons-compiler versions. It had to be exactly what spark brought with it down to the patch (assuming codehaus follows semver).

Comments

1

I had to downgrade the janino version provided by spring-boot 2.7.3 and manually include:

<dependency>
  <groupId>org.codehaus.janino</groupId>
  <artifactId>janino</artifactId>
  <version>3.0.16</version>
</dependency>

Comments

0

Selected answer didn't work for me, this did:

<dependency>
      <groupId>org.codehaus.janino</groupId>
      <artifactId>janino</artifactId>
      <version>3.0.8</version>
    </dependency>

Comments

0

I am using spark 3.2.1 It does not have such issue, you can upgrade it if possible

"org.apache.spark" %% "spark-core" % "3.2.1",
"org.apache.spark" %% "spark-sql" % "3.2.1",

Comments

0

Java 17, Spring Boot 2.7.18, Spark 3.5.5. It works after following dependencies.


<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.12</artifactId>
    <version>3.5.5</version>
    <exclusions>
        <exclusion>
            <artifactId>janino</artifactId>
            <groupId>org.codehaus.janino</groupId>
        </exclusion>
        <exclusion>
            <artifactId>commons-compiler</artifactId>
            <groupId>org.codehaus.janino</groupId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>org.codehaus.janino</groupId>
    <artifactId>commons-compiler</artifactId>
    <version>3.1.12</version>
</dependency>
<dependency>
    <groupId>org.codehaus.janino</groupId>
    <artifactId>janino</artifactId>
    <version>3.1.12</version>
</dependency>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.