3

I'm in the following situation

> ls
H9/                             HG01109_chr1_hap1_contigs.list  HG01952_chr1_hap2_contigs.list  HG02572/                        HG03486_chr1_hap1_contigs.list
H9_chr1_hap1_contigs.list       HG01109_chr1_hap2_contigs.list  HG01978/                        HG02572_chr1_hap1_contigs.list  HG03486_chr1_hap2_contigs.list
H9_chr1_hap2_contigs.list       HG01123/                        HG01978_chr1_hap1_contigs.list  HG02572_chr1_hap2_contigs.list  HG03492/
HG00438/                        HG01123_chr1_hap1_contigs.list  HG01978_chr1_hap2_contigs.list  HG02622/                        HG03492_chr1_hap1_contigs.list
HG00438_chr1_hap1_contigs.list  HG01123_chr1_hap2_contigs.list  HG02055/                        HG02622_chr1_hap1_contigs.list  HG03492_chr1_hap2_contigs.list
HG00438_chr1_hap2_contigs.list  HG01175/                        HG02055_chr1_hap1_contigs.list  HG02622_chr1_hap2_contigs.list  HG03516/
HG00621/                        HG01175_chr1_hap1_contigs.list  HG02055_chr1_hap2_contigs.list  HG02630/                        HG03516_chr1_hap1_contigs.list
HG00621_chr1_hap1_contigs.list  HG01175_chr1_hap2_contigs.list  HG02080/                        HG02630_chr1_hap1_contigs.list  HG03516_chr1_hap2_contigs.list
HG00621_chr1_hap2_contigs.list  HG01243/                        HG02080_chr1_hap1_contigs.list  HG02630_chr1_hap2_contigs.list  HG03540/
HG00673/                        HG01243_chr1_hap1_contigs.list  HG02080_chr1_hap2_contigs.list  HG02717/                        HG03540_chr1_hap1_contigs.list
HG00673_chr1_hap1_contigs.list  HG01243_chr1_hap2_contigs.list  HG02109/                        HG02717_chr1_hap1_contigs.list  HG03540_chr1_hap2_contigs.list
HG00673_chr1_hap2_contigs.list  HG01258/                        HG02109_chr1_hap1_contigs.list  HG02717_chr1_hap2_contigs.list  HG03579/
HG00733/                        HG01258_chr1_hap1_contigs.list  HG02109_chr1_hap2_contigs.list  HG02723/                        HG03579_chr1_hap1_contigs.list
HG00733_chr1_hap1_contigs.list  HG01258_chr1_hap2_contigs.list  HG02145/                        HG02723_chr1_hap1_contigs.list  HG03579_chr1_hap2_contigs.list
HG00733_chr1_hap2_contigs.list  HG01358/                        HG02145_chr1_hap1_contigs.list  HG02723_chr1_hap2_contigs.list  NA18906/
HG00735/                        HG01358_chr1_hap1_contigs.list  HG02145_chr1_hap2_contigs.list  HG02818/                        NA18906_chr1_hap1_contigs.list
HG00735_chr1_hap1_contigs.list  HG01358_chr1_hap2_contigs.list  HG02148/                        HG02818_chr1_hap1_contigs.list  NA18906_chr1_hap2_contigs.list
HG00735_chr1_hap2_contigs.list  HG01361/                        HG02148_chr1_hap1_contigs.list  HG02818_chr1_hap2_contigs.list  NA20129/
HG00741/                        HG01361_chr1_hap1_contigs.list  HG02148_chr1_hap2_contigs.list  HG02886/                        NA20129_chr1_hap1_contigs.list
HG00741_chr1_hap1_contigs.list  HG01361_chr1_hap2_contigs.list  HG02257/                        HG02886_chr1_hap1_contigs.list  NA20129_chr1_hap2_contigs.list
HG00741_chr1_hap2_contigs.list  HG01891/                        HG02257_chr1_hap1_contigs.list  HG02886_chr1_hap2_contigs.list  NA21309/
HG01071/                        HG01891_chr1_hap1_contigs.list  HG02257_chr1_hap2_contigs.list  HG03098/                        NA21309_chr1_hap1_contigs.list
HG01071_chr1_hap1_contigs.list  HG01891_chr1_hap2_contigs.list  HG02486/                        HG03098_chr1_hap1_contigs.list  NA21309_chr1_hap2_contigs.list
HG01071_chr1_hap2_contigs.list  HG01928/                        HG02486_chr1_hap1_contigs.list  HG03098_chr1_hap2_contigs.list
HG01106/                        HG01928_chr1_hap1_contigs.list  HG02486_chr1_hap2_contigs.list  HG03453/
HG01106_chr1_hap1_contigs.list  HG01928_chr1_hap2_contigs.list  HG02559/                        HG03453_chr1_hap1_contigs.list
HG01106_chr1_hap2_contigs.list  HG01952/                        HG02559_chr1_hap1_contigs.list  HG03453_chr1_hap2_contigs.list
HG01109/                        HG01952_chr1_hap1_contigs.list  HG02559_chr1_hap2_contigs.list  HG03486/

where I do have two files .list which I should move to the corresponding directory based on their prefix e.g H9 should look like this at the end of the process:

[ 128]  H9
├── [   15]  H9_chr1_hap1_contigs.list
└── [   15]  H9_chr1_hap2_contigs.list

1 directory, 2 files

I was testing some of my old code to do so, but it requires a lot of tweaking. So, upon searching I found a one-liner that seemed intuitive and I edited; however, I'm missing on something since it isn't working...

This is the original code:

find . -name "*.txt" -exec bash -c 'folder=$(basename "{}" .txt); mv "{}" "./$folder/"' \;

and that is how I edited it:

find . -name "*.list" -exec bash -c 'folder=$(basename "{}" _chr[A-Z,0-9,a-z]\+_hap[1-2]_contigs.list); mv "{}" "./$folder/"' \;

Now, I kept the regex since I have several chr folders besides chr1 with numbers up to 22, or followed by letters that can be in combinations of uppercase and lowercase. Any help is appreciated, thanks!

1
  • On a side note placing the replace-string {} inside the shell command string can allow for arbitrary execution of some file names as commands and AFAIK it doesn't help or even make a difference whether it's quoted or not ... Use it like e.g. sh -c 'file="$1"; ...' sh {} Commented 3 hours ago

3 Answers 3

6

Assuming the directories exist already and that you simply want to remove the first underscore and everything following it from the file name, the shell can do all that without find and basename:

for file in *_*.list; do mv -- "$file" "${file%%_*}/"; done
1
  • thanks, for the heads-up. This works as intended for my specific case! Commented 8 hours ago
4

You should do this with the perl rename utility, not with find and mv.

For example, creating the prefix directories as needed:

$ rename -n 'if (m/^(.*?)_/) {
       my $prefix = $1;
       mkdir $prefix;
       s=^=$prefix/=
     }' *.list
rename(H9_chr1_hap1_contigs.list, H9/H9_chr1_hap1_contigs.list)
rename(H9_chr1_hap2_contigs.list, H9/H9_chr1_hap2_contigs.list)
rename(HG00438_chr1_hap1_contigs.list, HG00438/HG00438_chr1_hap1_contigs.list)
rename(HG00438_chr1_hap2_contigs.list, HG00438/HG00438_chr1_hap2_contigs.list)
rename(HG00621_chr1_hap1_contigs.list, HG00621/HG00621_chr1_hap1_contigs.list)
rename(HG00621_chr1_hap2_contigs.list, HG00621/HG00621_chr1_hap2_contigs.list)

... many output lines deleted
rename(NA18906_chr1_hap1_contigs.list, NA18906/NA18906_chr1_hap1_contigs.list)
rename(NA18906_chr1_hap2_contigs.list, NA18906/NA18906_chr1_hap2_contigs.list)
rename(NA20129_chr1_hap1_contigs.list, NA20129/NA20129_chr1_hap1_contigs.list)
rename(NA20129_chr1_hap2_contigs.list, NA20129/NA20129_chr1_hap2_contigs.list)
rename(NA21309_chr1_hap1_contigs.list, NA21309/NA21309_chr1_hap1_contigs.list)
rename(NA21309_chr1_hap2_contigs.list, NA21309/NA21309_chr1_hap2_contigs.list)

This first performs a non-greedy match to get the prefix. If the match succeeds, it creates the prefix directory and renames the file into it.

BTW, perl rename will not overwrite an existing file unless you tell it to with the -f, --force option.

Note: the -n option makes this a dry-run, only showing what it would move without actually moving it. When you're sure it does what you want, either remove the -n option or change it to -v for verbose output.

If you don't want it to create the prefix directories, but only move the files if the prefix dir exists:

$ rename -n 'if (m/^(.*?)_/) {my $prefix = $1; -d $prefix && s=^=$prefix/=}' *.list

Alternatively, you can run rename with find's -exec:

find . -maxdepth 1 -type f -name '*.list' \
    -exec rename -n 'if (m/^(.*?)_/) {my $prefix = $1;  mkdir $prefix ;  s=^=$prefix/=}' {} +

The find command is run with -maxdepth 1 to avoid trying to rename files that are already in their appropriate prefix directory.

or piping a NUL-separated list of filenames from find:

find . -maxdepth 1 -type f -name '*.list' -print0 |
    rename -n -0 'if (m/^(.*?)_/) {my $prefix = $1;  mkdir $prefix ;  s=^=$prefix/=}'

Finally: the perl rename utility is NOT the same as the rename utility in the util-linux package - that is a completely different program with incompatible command-line options.

You can check which version you have installed by running rename -V - if it mentions perl or File::Rename, it's the perl version. If it mentions util-linux, it's the util-linux version. e.g.

$ rename -V
/usr/bin/rename using File::Rename version 2.02, File::Rename::Options version 2.01

Some distros have perl rename installed as prename, perl-rename, or file-rename, so check if one of those is in /usr/bin and adjust the command name accordingly. On debian and related distros, perl rename is easily installed with apt - it's in the rename package. Other distros have similarly-named packages.


One more thing:

If you really want to do it with find, bash, and mv do it like this:

$ find . -maxdepth 1 -name "*.list" -exec \
    bash -c 'for f; do
        folder=${f/_*}
        mkdir -p "$folder"
        echo mv -- "$f" "./$folder/"
      done' find-sh {} +

This runs bash only once per ARG_MAX (about 2M characters on Linux systems) worth of filenames instead of once per filename (so will be significantly faster without the overhead of forking bash so many times).

It uses bash's parameter expansion to extract the prefix from each filename ($f) into variable $folder, then creates the directory and renames the file. Again, this is a dry-run, remove the echo and optionally change mv to mv -v for verbose output.

The bash -c script can be squished into one line if needed, but you'll need to add semi-colons between each line of code.

2
  • thank you so much for the perl alternative and also for the suggestion using find. Also, thanks for explaining the various details; I will definitely make use of this more versatile options in the future! Commented 8 hours ago
  • 1
    Technically speaking, it's not exactly a perl alternative, it's a rename alternative (and rename happens to be written in perl and allows any arbitrary perl code to be executed as part of the rename script). It's an excellent utility and should be the first, "go-to" option to consider whenever anyone needs to bulk rename a batch of files. Commented 7 hours ago
3

I agree with DonHolgo's answer!

But because "pivoting the problem" is such a relatively common technique when attacking things like sorting files into directories, categorizing gene sequences by subsequences, sorting your favorite buttons into little boxes, I thought I'd just write a quick answer based on that.

It seems you already have the directories things need to be moved into.

So, my intuition here is instead of looking at each file name to see which directory it belongs to, go through the directories and move all files belonging into these.

for dir in */; do mv  -t ${dir} -- ${dir%%/}*.list ; done

(that works in both bash and zsh. If you can ignore the bash,

for dir in *(/); do mv -t ${dir} -- ${dir}*.list; done

because the (/) glob specifier in zsh allows you to only look for directories without adding a / to the end of the enumerated names)

3
  • thanks a lot for the insight; I do work in bash but it's useful to have alternatives as I often run small test cases locally on my MAC. Commented 5 hours ago
  • @Matteo I'm just using zsh on my Linux system (as I know a lot of my peers do) Commented 5 hours ago
  • In bash, you'd need to quote all those parameter expansions. Beware -t is a GNU extension and is not really needed here. Note that */ expansion also includes symlinks to directories, while zsh's *(/) is for directories only. Change to *(-/) for the file type to be checked after symlink resolution. Commented 2 hours ago

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.