How to merge duplicate folders with “name (1)”, “name (1) (1)” etc. structure

deduplicationfile managementgoogle-sync

Syncing between my Google Filestream, Google Drive, and Synology CloudSync and got all messed up and I'm left with hundreds of duplicate folders with the folder name followed by a "(1)" or "(2)" etc., and going up to "(1) (1) (1)".

Do you know of a program or script that can merge these folders?

Example top-level folder structure:

1100 Beetledwarf - Happy ATE
1100 Beetledwarf - Happy ATE (1)
1100 Beetledwarf - Happy ATE (2)
1100 Beetledwarf - Happy ATE (3)
1100 Beetledwarf - Happy ATE (3) (1)
1100 Beetledwarf - Happy ATE (3) (1) (1)
1100 Beetledwarf - Happy ATE (4)
1100 Beetledwarf - Happy ATE (5)
1100 Beetledwarf - Happy ATE (6)

Because subfolders sometimes also have the same problem, the program or script would need to be able to merge folders that follow that naming pattern for all subfolders, example:

Example 2nd level folders:

1100 Beetledwarf - Happy ATE (6)
    Analysis
    Analysis (1)
    Smirckle_HL
    Smirckle_HL (2)
    Pending Reports
    Photos & Logos

The best solution would also allow me to move files instead of copying them since it takes a long time to copy files but moving is almost instantaneous.

List of things I've already tried, but none of them can deal with the "name (1)" folder structure (that I can tell so far), and all of them copy files instead of moving them:

  • WinMerge for Windows 10 <- chokes when trying to copy google drive files (returns something like "DOS command not supported" for them)
  • Meld for MacOS. <- slow.
  • Terminal with "ditto" command in OS X <- Best option so far.

Thanks for your help!

Best Answer

This is the approach I would try in Linux. I have no experience with Google Filestream, Google Drive nor Synology CloudSync, so I cannot tell if the solution can be applied at all. Still I hope this will at least give you some ideas.


Assumptions

  • you can mount the share in your directory tree, so mv, cp and other sane tools can work with directories as if they were local;
  • files (or directories) with paths that become identical after you remove all (N) strings are in fact instances of the same file (directory);
  • instances of the same file should leave just one file;
  • instances of the same directory should merge their content in a single directory;
  • you can use all the tools I use here.

Procedure

Please read the entire answer before attempting to do anything.

I think some steps could be written as a script, but since the solution is highly experimental, it's better to do it by hand, step by step, paying attention what happens.

  1. In a shell cd to the mountpoint and invoke find . | vidir -; use a text editor of your choice, e.g. kate, like this:

    find . | EDITOR=kate vidir
    

    This will open the editor with a list of all objects, each one with its own number in front. When you alter the content and save the (temporary) file and close the editor, all the changes are applied. In general this is what you can do:

    • change paths to move (rename) files or directories;
    • delete lines to remove files or directories;
    • swap two or more numbers to swap files (you won't need it).

    Don't save the file unless you're sure the new content describes the directory tree you want to get.

  2. Copy the content from the editor to another file. The point is to work with it and paste the result back (and save it) only when you're sure you got it right. Next steps refer to the new file unless explicitly stated otherwise.

  3. Use sed or any other tool to get rid of all (N) strings (note the leading space). At this point you should get "clean" paths, many of them will occur more than once (with different numbers given by vidir).

  4. Use sort -k 2 to sort according to these paths. Thanks to -s the former Analysis should still precede the former Analysis (1).

  5. Use uniq -f 1 to drop duplicated paths. Now any path should occur just once.

  6. Double check the sanity of the directory structure encoded in the result.

  7. Paste the result into the original editor, save the file and exit the editor. vidir will remove objects associated with missing numbers and move objects associated with numbers that are left.


Testing

I would first use this solution to replicate the directory structure:

cp -a --attributes-only /mountpoint/ /guinea_pig_dir/

and test the procedure on the resulting empty files. This should reveal problems (if any) and hopefully allow to improve the method.


Possible problems

  1. vidir refuses to work with some non-standard characters.

  2. In general the order of objects is important. There are few pitfalls which generate objects like foo~ or foo~1, foo~2 when there's a collision with foo. You will "contract" your directory tree in a way that should generate no collisions, still I haven't investigated all possible scenarios. I really think you should experiment with /guinea_pig_dir/ and see what you get. In case of troubles maybe a clever sort between find and vidir will help.

Related Question