Page 1 of 1

Duplicate finding & removing

Posted: 01 Apr 2013 06:20
by Maulz
Hi everybody! How can i find file duplicates among more then 7500 *.epubs using bat-file?
There are no absolutely matching entries (such as

Never Go Back - Charles DeVet.epub
Never Go Back - Charles DeVet.epub)

But there are a lot of

1984.epub
J.Orwell - 1984.epub
Jeorge Orwell - 1984.epub

(These three are all the same, of coarse, the date, size of files may differ).
I didn't find something like "Duplicate remover" for this goal, because they find only 100% match. Please help!

Re: Duplicate finding & removing

Posted: 01 Apr 2013 08:00
by foxidrive
A batch file relies on rules that are derived from the makeup of the source files/text.
Your source files name are quite random in nature so aren't a good candidate for a simple batch file.

The best you could hope for is matching the title inside the epub files and producing a list of those that match - and then you can manually figure out which copy is the one you want to keep from a set of files.