1

I'm trying to sort phrases such as the following:

a12_b7
a12_b11
a5_b3
a5_b30
a12_b10

using the numbers following the letters, lexicographically. For the example above, I expect the result to be:

a5_b3
a5_b10
a12_b7
a12_b11
a12_b30

Reading man sort, I thought I had this figured out:

  • Fields are delimited by underscores
  • Two search keys
  • First search key: first field, beginning with 2nd character in the field
  • Second search key: second field, beginning with 2nd character in the field.

But - that does not work like I thought it would:

$ cat | sort --debug --key=1.2 --key=2.2 --field-separator=_
sort: text ordering performed using ‘en_IL.UTF-8’ sorting rules
a12_b10
 ______
     __
_______
a12_b11
 ______
     __
_______
a12_b7
 _____
     _
______
a5_b3
 ____
    _
_____
a5_b30
 _____
    __
______

What have I gotten wrong? And what would be the appropriate sort command-line in this case?

2
  • 1
    What result are you expecting? Commented Aug 7 at 14:06
  • Look into the -n option. I don't think "lexicographically" means what you think it means. Commented Aug 7 at 15:58

1 Answer 1

1

Looks like you want to sort numerically when the default sort order is alphabetically. You could do:

$ sort -nt'_' -k1.2 -k2.2 file
a5_b3
a5_b30
a12_b7
a12_b10
a12_b11

but if the input was any more complicated than that (e.g. not always single letter chars before each sort key) then I'd use the Decorate-Sort-Undecorate idiom, e.g.:

$ cat file
phc12_bob7
efg12_bk11
cfad5_xxxx3
df5_chekb30
a12_tg10

$ sed -E 's/([^0-9]+)(.*)(_[^0-9]+)(.*)/\1\t\2\t\3\t\4/' file
phc     12      _bob    7
efg     12      _bk     11
cfad    5       _xxxx   3
df      5       _chekb  30
a       12      _tg     10

$ sed -E 's/([^0-9]+)(.*)(_[^0-9]+)(.*)/\1\t\2\t\3\t\4/' file | sort -k2,2n -k4,4n
cfad    5       _xxxx   3
df      5       _chekb  30
phc     12      _bob    7
a       12      _tg     10
efg     12      _bk     11

$ sed -E 's/([^0-9]+)(.*)(_[^0-9]+)(.*)/\1\t\2\t\3\t\4/' file | sort -k2,2n -k4,4n | tr -d '\t'
cfad5_xxxx3
df5_chekb30
phc12_bob7
a12_tg10
efg12_bk11

The sed modifies (Decorates) the input so the sort command can Sort it, then the tr removes the extra chars that the sed added (Undecorates).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.