Ubuntu – Check for redundancy before downloading any file

checksumsdownload-managerfilesmd5summp3

Is there a way you can check whether you have already downloaded a file previously or not before actually downloading it.

I know

  • wget can do that, only if the filename of the file trying to be fetched is same as compared to the file already retrieved before.
  • You can apply checksum or md5hash for finding & removing redundant files, but this can be done only AFTER you have downloaded the file.

Please suggest a way to check whether a file is same in terms of content before actually downloading it FULLY again (independent of the filename its gonna save into).

To make it more precise: I am interested in downloading ONLY mp3 files but from different sources like Jamendo, Soundcloud etc. which may have same content(song) but will be having different filenames.

Best Answer

Read the first 500 bytes of the first file:

head -c 500 file1.mp3 > fragment1

Use curl -r 0-499 -o fragment2 http://... to retrieve the first 500 bytes of the second file. Then, do diff fragment1 fragment2 to see if they are equal.

curl is a tool like wget only with more options. The -r flag lets you specify a range, which will result in a partial download. wget has a quota option that will not, however, let you do a partial download.