rss logo

How to find duplicate files in GNU/Linux

Thanks to this site I found out how to track duplicate files on my GNU/Linux system. I modified the solution proposed to adapt it for my own need. To sum up, the command get every file size, and compare them in order to know if they are same files sizes. If it match, a md5 hash will be executed to be sure that the files are exactly the same.

Configuration

Command

We set the SEARCH variable which contains the path where we would like to track duplicate files :

root@host:~# SEARCH=/data
root@host:~# find $SEARCH -not -empty -type f -printf %s\\n | sort -rn | uniq -d | xargs -I{} -n1 find $SEARCH -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

Explanations

find $SEARCH -not -empty -type f -printf %s\\n
sort -rn
uniq -d
xargs -I{} -n1 find $SEARCH -type f -size {}c -print0
xargs -0 md5sum
uniq -w32 -all-repeated=separate
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Contact :

contact mail address