0

When creating DataFrame.withcolumn(),Spark dev team forgot to check it that column name is already in use.

In the beginning:

val res = sqlContext.sql("select * from tag.tablename where dt>20150501 limit 1").withColumnRenamed("tablename","tablename")
res.columns

shows:

res6: Array[String] = Array(user_id, service_type_id, tablename, dt)

then

val res1 = res.withColumn("tablename",res("tablename")+1)
res1.columns

shows:

res7: Array[String] = Array(user_id, service_type_id, tablename, dt, tablename)

By the way, res1.show works.

BUG begins here:

res1.select("tablename")

org.apache.spark.sql.AnalysisException: Ambiguous references to tablename: (tablename#48,List()),(tablename#53,List());
2
  • this should be reported in JIRA of spark, instead of StackOverflow Commented May 6, 2015 at 14:16
  • They've already fixed it -- I've linked the relevant JIRA as an answer. However, this is good advice: I've found the Spark SQL team to be very responsive to tickets I've filed. Search also works quite well on the ASF JIRA site -- it was the second result when I searched on dataframe withcolumn. (Actually, it rang a bell because I noticed the checkin fixing it some time ago.) Commented May 6, 2015 at 15:25

1 Answer 1

2

This has already been reported as SPARK-6635. It's already been fixed, and seems set to be released in Spark 1.4.0.

Sign up to request clarification or add additional context in comments.

3 Comments

found another one and submitted to JIRA but several days later no response. what to do now?
Which one is it? I haven't had this experience so I'm not sure. You could try to use the Spark developers mailing list to try to get someone interested. See here.
SPARK-7428. not sure if I submit this jira in right way

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.