How can we collapse a boolean column to a single row with OR operation using scala?
Part 1:
A true
A false
B false
B false
C true
B false
A true
C true
Desired Output
B false
A true
C true
A solution I could think of was to group them by first column entries, filter true & false rows in separate data frames, drop duplicates and finally adding one data frame(false) to other(true) while checking if the letter (eg A) already exists in the other(true) data frame.
This solution is quite messy. Also, don't know if this would work for all edge cases. Is there some smart way to do this.
I'm an absolute beginner, any help is appreciated.
Edit: the given answers work for the above-given scenario but doesn't work for this scenario. Any way to achieve the desired output?
Part 2:
A true "Apple"
A false ""
B false ""
B false ""
C true "Cat"
C true "Cotton"
C false ""
Desired Output
B false []
A true ["Apple"]
C true ["Cat","Cotton"]
I tried to achieve this by grouping by col1 and col2 and then collapsing the col3 using collect_set, then
- Group by 1st column
- Collect 2nd column as Set of boolean
- Check if there's a single true if yes then your OR expression will evaluate to true always.
but this leads to loss of col3_set all together.