Ubuntu – How to remove duplicate imported photos in Shotwell

duplicatephoto-managementshotwelltag

I've noticed that Shotwell has imported many images twice (e.g. from my camera SD card). Apparently the duplicate detection is buggy once a photo is imported, tagged and then re-imported.

I have "write meta data tags" enabled in the settings. If I import a photo test-images.jpg and add tags to it the photo will not be picked up by the duplicate detection upon another import of the same file.
The second time the file is imported it will be named test-images-1.jpg and placed in the library folder as per the active rules (not necessarily into the same folder).

test-images.jpg and test-images-1.jpg will have the same image data but due to the added tag/metadata the files are not the same anymore and won't be picked up by searching for duplicates (e.g. md5 hash).

My usage scenario that caused multiple duplicate is as follows:

  1. I take pictures with my phone
  2. I import the photos from my phone, add tags but leave the images on the phone as I want to keep them for sharing etc.
  3. I add further tags to the imported photos
  4. After some weeks I repeat the import step from the phone and old photos that I have already imported will be imported again (with '-1.jpg' or '-2.jpg' added)

How to clean up the duplicates?
Using a file name based search would be possible but I can't exclude that I have not imported a file ending with -1 to which was not imported as a duplicate.

How can I clean up my photo library? I tried to use the search function in Shotwell but with more than 1000 photos there must be a better, more reliable, less error prone an simpler way.

I'm not to worried about tags getting lost, typically the second import (the duplicate) has no tags applied.

Best Answer

Sort of spamish, but I found myself with the same problem a few monts ago, and I wrote a small utility that does just that:

https://github.com/jesjimher/imgdupes

It's a python script that scans a directory tree looking for duplicates. Its syntax is intentionally similar to fdupes, with the difference that imgdupes ignores all metadata and analyzes only the image data chunk of a JPEG file. This means that two different versions of the same image, with different tags, rotation flags, dates, etc., will be reported as duplicates even if physical files are different (and thus not detected as duplicates by fdupes/shotwell).

It was recently renamed to jpegdupes, and is now on Pypi repos, so scanning a tree for duplicated images might be done like this:

sudo pip install jpegdupes jpegdupes -d ~/Photos/ (or whatever your path is)

It would look for JPEGs which are actually the same picture (differing only in metadata), and would interactively show differences and ask for which version to keep.

Hope it helps.

Related Question