How to find duplicate files in GNU/Linux

Thanks to this site I found out how to track duplicate files on my GNU/Linux system. I modified the solution proposed to adapt it for my own need. To sum up, the command get every file size, and compare them in order to know if they are same files sizes. If it match, a md5 hash will be executed to be sure that the files are exactly the same.



We set the SEARCH variable which contains the path where we would like to track duplicate files :

root@host:~# SEARCH=/data; find $SEARCH -not -empty -type f -printf %s\\n | sort -rn | uniq -d | xargs -I{} -n1 find $SEARCH -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate


find $SEARCH -not -empty -type f -printf %s\\n
sort -rn
uniq -d
xargs -I{} -n1 find $SEARCH -type f -size {}c -print0
xargs -0 md5sum
uniq -w32 -all-repeated=separate
