awk command in linux does not execute; always get the error (awk: line 1: syntax error at or near '}' )

Question

I am a beginner with using linux bash for bioinformatics purpose and recently i encountered some error with this 'awk' command. ChatGPT suggestion is not helping and the task is very basic. I have a big file of human genome and i need to extract CDS region. This is one of the examples from the file:" CDS 648..2924"Screenshot from the file start position is first number, end position is second number. my code:

awk '/CDS/ && /\.\./ { if (match($0, /([0-9]+)\.\.([0-9]+)/, arr)) { print arr[1], arr[2] } }' BRCA1.gb

Every suggestion is appreciated

Note: I know that there are other ways around to complete this task, but i need to complete it specifically with 'awk' and 'match'. Thanks a lot! (Picture from the file below)

https://i.sstatic.net/TMqe80wJ.png

I don't get the error you mentioned using the awk command you pasted. — Arkadiusz Drabczyk
– Arkadiusz Drabczyk, Commented Jun 1 at 18:03
My best guesses are that you're either a) using a gawk-only extension (3rd argument to match()) but not using gawk or b) on Solaris using old, broken awk. FYI ChatGPT usually outputs wrong answers when given software questions. — Ed Morton
– Ed Morton, Commented Jun 1 at 18:04
@EdMorton Thank you very much for your suggestion, i just tried with gawk and it is working. Wish you all the best. — Luka Jašović
– Luka Jašović, Commented Jun 1 at 18:08
@markp-fuso thank you very much. I just fixed the issue with simply using 'gawk'. — Luka Jašović
– Luka Jašović, Commented Jun 1 at 18:09
Your welcome. By the way your script could be written more concisely and robustly as awk 'match($0, /CDS.*\<([0-9]+)\.\.([0-9]+)/, arr) { print arr[1], arr[2] }' BRCA1.gb — Ed Morton
– Ed Morton, Commented Jun 1 at 18:10

Daweo · Accepted Answer · 2025-06-02 11:42:54Z

You must not use 3-argument match if you are using mawk. You can use 3-argument match if you use GNU AWK (gawk). Your code

awk '/CDS/ && /\.\./ { if (match($0, /([0-9]+)\.\.([0-9]+)/, arr)) { print arr[1], arr[2] } }' BRCA1.gb

is valid gawk command, but could be made more concise, observe that your if does something only for 1 branch, so you might avoid using them by combining it with other conditions using logical AND that is

awk '/CDS/ && /\.\./ && match($0, /([0-9]+)\.\.([0-9]+)/, arr) { print arr[1], arr[2] }' BRCA1.gb

Observe that 2nd regular expression is infix of regular expression used in match, so you could drop it, as if there will not be .. then match would not hold, that is you could do

awk '/CDS/ && match($0, /([0-9]+)\.\.([0-9]+)/, arr) { print arr[1], arr[2] }' BRCA1.gb

to get same effect.

I acknowledge that you must use match AT ANY PRICE due to imposed requirements, but I want to note that this might done different way if presence CDA is 100% sure way to tell line to extract, namely

awk 'BEGIN{FPAT="[0-9]+"}/CDS/{print $1,$2}' BRCA1.gb

Explanation: I infrom GNU AWK that fields consist of one-or-more (+) digits, then for line with CDS I print 1st and 2nd field.

Collectives™ on Stack Overflow

awk command in linux does not execute; always get the error (awk: line 1: syntax error at or near '}' )

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related