-3

I have two csv files like this;

422174,XN,20.99,2020-09-01,2022-01-20 20:20:28.613+00
421348,SB,21.99,2021-01-26,2022-01-20 20:20:28.613+00
885176,XN,41.80,2021-11-17,2022-01-20 20:20:28.613+00
881751,SB,12.81,2020-09-01,2022-01-20 20:20:28.613+00
722483,XN,67.50,2020-09-01,2022-01-20 20:20:28.613+00

Second file;

422174,XN,25.99,2020-09-01,2022-01-21 20:20:28.613+00
667843,XN,22.99,2020-09-01,2022-01-20 20:20:28.613+00
421348,SB,21.99,2021-01-26,2022-01-20 20:20:28.613+00
885176,XN,41.80,2021-11-17,2022-01-20 20:20:28.613+00
881751,SB,12.81,2020-09-01,2022-01-20 20:20:28.613+00
156734,XN,34.50,2020-09-01,2022-01-20 20:20:28.613+00

Output should be ;

667843,XN,22.99,2020-09-01,2022-01-20 20:20:28.613+00
156734,XN,34.50,2020-09-01,2022-01-20 20:20:28.613+00

But the thing is that, I need to compare this two csv files by column1 and column2 only.

For example;

if column1 - column2 of file1.csv = column1 - column2 of file2.csv this shouldn't be considered as difference.

Because last column can be different from file1 but column1 and column2 should be same, which means no difference.

How can I achieve this?

3
  • Well, just to be sure , do you want the difference between both files or do you want the lines that are in file2 but not in file1? (According to their column in this case) Commented Dec 2, 2022 at 21:14
  • Yes @EdgarMagallon I want the lines that are in file2 but not in file1 according to the column1 and column2, not the whole row. Commented Dec 2, 2022 at 21:19
  • I think this is what you need unix.stackexchange.com/a/727169/195582 Commented Dec 3, 2022 at 18:13

2 Answers 2

3

Using any awk:

$ awk -F, 'NR==FNR{a[$1,$2]; next} !(($1,$2) in a)' file1.csv file2.csv
667843,XN,22.99,2020-09-01,2022-01-20 20:20:28.613+00
156734,XN,34.50,2020-09-01,2022-01-20 20:20:28.613+00
3

You can use the great Miller and a classic JOIN operation, to have unpaired rows from the right file (the second one):

Running:

mlr --csv -N join --np --ur -j 1,2 -f input_01.csv then unsparsify input_02.csv

you get

667843,XN,22.99,2020-09-01,2022-01-20 20:20:28.613+00
156734,XN,34.50,2020-09-01,2022-01-20 20:20:28.613+00

Some notes:

  • -N to set that your csv have no heading
  • --np to have not paired records
  • --ur to emit unpaired records from the right file
  • -j 1,2 to set the join fields
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.