Spark sql version of the same query does not work whereas the normal sql query does

Question

The normal sql query :

    SELECT DISTINCT(county_geoid), state_geoid, sum(PredResponse), sum(prop_count) FROM table_a GROUP BY county_geoid;

Gives me an output. However the spark sql version of this same query used in pyspark is giving me an error. How to resolve this issue?

    result_county_performance_alpha = spark.sql("SELECT distinct(county_geoid), sum(PredResponse), sum(prop_count), state_geoid FROM table_a group by county_geoid")

This gives an error :

AnalysisException: u"expression 'tract_alpha.`state_geoid`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

How to resolve this issue?

Gordon Linoff · Accepted Answer · 2017-07-05 11:31:37Z

3

Your "normal" query should not work anywhere. The correct way to write the query is:

SELECT county_geoid, state_geoid, sum(PredResponse), sum(prop_count)
FROM table_a
GROUP BY county_geoid, state_geoid;

This should work on any database (where the columns and tables are defined and of the right types).

Your version has state_geoid in the SELECT, but it is not being aggregated. That is not correct SQL. It might happen to work in MySQL, but that is due to a (mis)feature in the database (that is finally being fixed).

Also, you almost never want to use SELECT DISTINCT with GROUP BY. And, the parentheses after the DISTINCT make no difference. The construct is SELECT DISTINCT. DISTINCT is not a function.

answered Jul 5, 2017 at 11:31

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Viv Over a year ago

which feature ? could you explain more on why it works only in mysql?

Viv Over a year ago

I understood the distinct use is a mistake, but when using group by with state_geoid and without in mysql, the answer varies.

Collectives™ on Stack Overflow

Spark sql version of the same query does not work whereas the normal sql query does

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related