4

I'm working on Linux, the sort command returns not as expected.

Input text:

$ cat input.txt
rep1_1.fq
rep1_2.fq
rep12_1.fq
rep12_2.fq

Command and output:

$ sort input.txt
rep1_1.fq
rep12_1.fq
rep12_2.fq
rep1_2.fq


$ sort --version
sort (GNU coreutils) 8.28
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.

After sorting, I expected rep1_2.fq would be after rep1_1.fq, but the result is different.

Solved

according to @Federico klez Culloca's advice, use LC_ALL=C

$ LC_ALL=C sort input.txt
rep12_1.fq
rep12_2.fq
rep1_1.fq
rep1_2.fq

Edited

use LC_ALL=C also fix sorting files in a directory.

in case there are four files in current directory:

$ LC_ALL= ls
rep1_1.fq  rep12_1.fq  rep12_2.fq  rep1_2.fq

$ LC_ALL=C ls
rep12_1.fq  rep12_2.fq  rep1_1.fq  rep1_2.fq
4
  • 2
    Try this LC_ALL=C sort input.txt. See the manpage: "*** WARNING *** The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values." Commented Jun 10, 2019 at 8:32
  • Thanks so much, problem fixed. I find my default LC_ALL was empty, after appending export LC_ALL=C to ~/.bashrc file, sort command works. Commented Jun 10, 2019 at 9:02
  • I always chuckle a bit when I see a title like "The xyz command doesn't work correctly in Linux" (or in C, etc..) 99.99999% of the time, it isn't the command that is at fault. Commented Jun 11, 2019 at 1:53
  • Sure, I totally agree with you, it's not the command's fault, but the way to use it. While, this kind of title could tell what happened to users in direct way. Commented Jun 11, 2019 at 2:31

1 Answer 1

2

Try with version-sort. From the manual:

       -V, --version-sort
          natural sort of (version) numbers within text

This is the output using your example:

$ sort -V input.txt 
rep1_1.fq
rep1_2.fq
rep12_1.fq
rep12_2.fq
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much. It's an alternative way to fix my problem. I see Federico klez Culloca's method using "by native byte values" works better.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.