dup.cmd list duplicated files

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
carlos
Expert
Posts: 503
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

dup.cmd list duplicated files

#1 Post by carlos » 07 May 2015 14:53

Hello. I download some pdf files with similar names. Thus, for avoid have duplicate files I try download a software for do it. But because the task is not so complicated, I write a script for check it.
Is very basic, currently it only list the duplicate files (of any extension) in the current directory.
Note: the hidden files are not considered.

If a duplicate file is found, the origin file is printed in yellow, and the duplicates in red.
If none duplicate file is found, a message in green says it.

example:
Image

edit: updated version to 0.35 that have the most updated fixes.

dup.cmd

Code: Select all

@echo off

::dup
::list duplicate files by content in the current directory.
::a file is considered duplicated if have:
::the same crc32 checksum and file size
::version: 0.35 :
:: bugfixed problem with filenames with special characters.
:: bugfixed problem with short filenames in windows xp.
:: bugfixed problem using cmd /u.
:: bugfixed problem with filenames having exclamation character.
::author: cmontiers
setlocal enableextensions disabledelayedexpansion
call :color 70 "working ..."
for %%. in (.) do set "curr_drv=%%~d."
subst +: /d >nul & subst +: "." & +:
set "aecho=cmd /a /d /c echo"
for %%# in ("%temp%") do set "temp=%%~s#"
set "inffile=%temp%\dup.inf"
set "cnffile=%temp%\dup.cfg"
if exist "%inffile%" del "%inffile%"
if exist "%cnffile%" del "%cnffile%"
(
%aecho% .set destinationdir="%temp%"
%aecho% .set diskdirectorytemplate=""
%aecho% .set inffooter=""
%aecho% .set infheader=""
%aecho% .set infdisklineformat=""
%aecho% .set donotcopyfiles=on
%aecho% .set destinationdir=""
%aecho% .set rptfilename=nul
%aecho% .set checksumwidth=8
%aecho% .set cabinet=off
%aecho% .set compress=off
%aecho% .set generateinf=on
%aecho% .set infdiskheader=""
%aecho% .set inffileheader=""
%aecho% .set infcabinetheader=""
%aecho% .set inffilelineformat="*csum*:*size* *file*"
for %%# in (*) do if not "%inffile%"=="%%~s#" (
if not "%cnffile%"=="%%~s#" %aecho% "%%~s#"
)
) > "%cnffile%"
makecab /d "inffilename=%inffile%" /f "%cnffile%" >nul
set "dup=0"
set "last_hash="
set "last_origin_name="
for /f "tokens=1,*" %%a in ('sort "%inffile%"') do (
if not "%%~a"=="" (
   for /f "tokens=*" %%n in ('dir /a /b "%%~b"') do (
   setlocal enabledelayedexpansion
      if not "%%~a"=="!last_hash!" (
      endlocal
      set "last_origin_name=%%~n"
      ) else (
      if 0 equ !dup! call :del_characters 11
      endlocal
      set "dup=1"

      if defined last_origin_name (
      setlocal enabledelayedexpansion
      call :color 0e "!last_origin_name!" \n
      endlocal
      set "last_origin_name="
      )
      set "fullname=%%~n"
      setlocal enabledelayedexpansion
      call :color ce "[*]" 00 " " 0c "!fullname!" \n
      endlocal
   )
   set "last_hash=%%~a"
   )
)
)

if 0 equ %dup% (
call :del_characters 11
call :color 0a "no duplicate files found" \n
)

del "%inffile%" "%cnffile%"
%curr_drv%
goto :eof

:del_characters
for /f %%# in (
'"prompt $h &for %%_ in (_) do rem"'
) do for /l %%_ in (1,1,%~1) do set /p "=%%# %%#" <nul
goto :eof

:color
:: v23c
:: arguments: hexcolor text [\n] ...
:: \n -> newline ... -> repeat
:: supported in windows xp, 7, 8.
:: this version works using cmd /u
:: in xp extended ascii characters are printed as dots.
:: for print quotes, use empty text.
setlocal enableextensions enabledelayedexpansion
subst `: "!temp!" >nul &`: &cd \
setlocal disabledelayedexpansion
echo(|(pause >nul &findstr "^" >`)
cmd /a /d /c set /p "=." >>` <nul
for /f %%# in (
'"prompt $h &for %%_ in (_) do rem"') do (
cmd /a /d /c set /p "=%%# %%#" <nul >`.1
copy /y `.1 /b + `.1 /b + `.1 /b `.3 /b >nul
copy /y `.1 /b + `.1 /b + `.3 /b `.5 /b >nul
copy /y `.1 /b + `.1 /b + `.5 /b `.7 /b >nul
)
:__color
set "text=%~2"
if not defined text (set text=^")
setlocal enabledelayedexpansion
for %%_ in ("&" "|" ">" "<"
) do set "text=!text:%%~_=^%%~_!"
set /p "lf=" <` &set "lf=!lf:~0,1!"
for %%# in ("!lf!") do for %%_ in (
\ / :) do set "text=!text:%%_=%%~#%%_%%~#!"
for /f delims^=^ eol^= %%# in ("!text!") do (
if #==#! endlocal
if \==%%# (findstr /a:%~1 . \` nul
type `.3) else if /==%%# (findstr /a:%~1 . /.\` nul
type `.5) else (cmd /a /d /c echo %%#\..\`>`.dat
findstr /f:`.dat /a:%~1 .
type `.7))
if "\n"=="%~3" (shift
echo()
shift
shift
if ""=="%~1" del ` `.1 `.3 `.5 `.7 `.dat &goto :eof
goto :__color

Last edited by carlos on 27 Jun 2015 10:56, edited 14 times in total.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: dup.cmd list duplicated files

#2 Post by foxidrive » 08 May 2015 07:25

I like that carlos - some brief testing here and it looks good.

The short filename routine was fixed in a later version of windows I recall, but some versions give a flawed short file and path in some circumstances - this may be an issue.

Does this have a filesize limitation?

It would be good to filter out files which have mismatched sizes to start with, as very large files could be processed for no reason and take up a lot of extra time.

It also fails in a path with a ! in it. Maybe the file creation can be done before enabling delayed expansion.

I hope you find my criticism constructive, as I don't mean to be annoying.

carlos
Expert
Posts: 503
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Re: dup.cmd list duplicated files

#3 Post by carlos » 08 May 2015 09:08

foxidrive, very thanks for the ideas.
the idea about only get the checksum of files with the same filesize is very good. I try implement this for save time.

About short filename routine, I not understand very well the problem.

About the problem with filenames with ! character. I fixed it in version 0.21. Also I added a text animation message that says: "Working ..."

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: dup.cmd list duplicated files

#4 Post by foxidrive » 08 May 2015 10:26

carlos wrote:About short filename routine, I not understand very well the problem.


Thanks carlos.

I wrote this script a while back to show the issue in XP and it seems to have been fixed in Vista.
There are threads about it but atm I'm off to dreamland.

Code: Select all

@echo off
md "tempfolder with a long name"
cd "tempfolder with a long name"
type nul>"0=3=biz.jpg"
:: type nul>"0 test.jpg"
echo Short name (with bug) and longname
call :test
echo.
echo.running command.com
command /c rem
echo.exiting from command.com
echo.
echo Short name corrected and longname (with path bug)
call :test
cd ..
echo.
pause
rd /s /q "tempfolder with a long name"
goto :EOF
:test
for %%a in (0*.*) do echo %%~sa &echo %%~fa
pause

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: dup.cmd list duplicated files

#5 Post by foxidrive » 08 May 2015 22:24

The short path and filename that is returned contains extra bits of the same path, and is unpredictable.

longname
C:\cdroms\tempfolder with a long name\0=3=biz.jpg


Short name (with bug that adds more to the end)
C:\cdroms\TEMPFO~1\0_3_BI~1.JPGg name\0=3=biz.jpg


You can solve it by using command.com to get the short path\filename
and if I recall correctly then you are unable to get a long path\filename
- but it may work for dup.cmd

carlos
Expert
Posts: 503
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Re: dup.cmd list duplicated files

#6 Post by carlos » 09 May 2015 09:32

Thanks foxidrive.
in:

Code: Select all

http://stackoverflow.com/questions/8354305/batch-parameter-s1-gives-incorrect-8-3-short-name

jeb explains when the bug ocurrs in xp.
In summary I found that it happens when the last folder path length of the file is greather than 10, and if the filename contains one of the next special characters

Code: Select all

];,+=<space>
.

Proof example:

Code: Select all

@echo off
if not exist "longpathzzz" (
   md "longpathzzz"
)
set "file=longpathzzz\a[1].abc"
echo h>"%file%"
for %%a in ("%file%") do echo %%~sa


it prints:

Code: Select all

LONGPA~1\A_1_~1.ABCc


But, because the main problem is the length of the last folder path, in the case of dup.cmd I solved the problem, mounting the current directory using subst to a single drive letter, using that I access to the file without use the last folder path.

I updated the code to version 0.31 that have the bug of short filenames of xp fixed.

Thanks foxidrive for all the suggestions and help.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: dup.cmd list duplicated files

#7 Post by foxidrive » 09 May 2015 13:34

Thanks for that info and link to jeb's work - it fills in the details wonderfully,
plus your workaround.

If only I'd know several years ago when I discovered the bug on my XP machine. :)
https://groups.google.com/d/msg/alt.msd ... 5vC7_b0mgJ

carlos
Expert
Posts: 503
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Re: dup.cmd list duplicated files

#8 Post by carlos » 20 May 2015 08:26

I did minor changes, and updated to version 0.35 that fix a problem with filenames with special characters.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: dup.cmd list duplicated files

#9 Post by foxidrive » 09 Jun 2015 04:11

Hi carlos, I've only just been able to get back to your code as I had a need to sort some files I have, and the method I wrote many years ago fails - sorry for the lack of feedback but my attention isn't what it used to be.

I noticed that this thread is very popular, with over 500 views and only 7 posts!

What I noticed just earlier is a problem in my system - where short filename generation was turned off for some time, and the string of files I have do not all have short filenames, and aren't processed.

What is really curious is that it breaks CMD as well.

There are a lot of lines, but it illustrates how cmd ignores a whole swag of files.
I turned shortfilename generation back on yesterday - Windows 8.1 Pro 32 bit
and I can't explain why that one file has a short filename.

> dir /b

a.txt
Dup.cmd
reminderfox.ics.2015-01-14_02.11
rf-tb.js


> dir /b /a-d

a.txt
Dup.cmd
reminderfox.ics.2014-01-21_22.33
reminderfox.ics.2014-01-22_22.13
reminderfox.ics.2014-01-26_18.33
reminderfox.ics.2014-01-26_18.44
reminderfox.ics.2014-01-28_03.00
reminderfox.ics.2014-02-02_22.53
reminderfox.ics.2014-02-03_22.36
reminderfox.ics.2014-02-06_15.50
reminderfox.ics.2014-02-06_17.28
reminderfox.ics.2014-02-09_15.10
reminderfox.ics.2014-02-10_06.53
reminderfox.ics.2014-02-18_18.10
reminderfox.ics.2014-02-22_23.59
reminderfox.ics.2014-02-23_00.35
reminderfox.ics.2014-02-23_00.43
reminderfox.ics.2014-02-23_19.48
reminderfox.ics.2014-02-25_11.08
reminderfox.ics.2014-02-26_16.07
reminderfox.ics.2014-02-27_01.15
reminderfox.ics.2014-03-05_10.47
reminderfox.ics.2014-03-08_20.06
reminderfox.ics.2014-03-11_20.30
reminderfox.ics.2014-03-11_23.38
reminderfox.ics.2014-03-15_01.31
reminderfox.ics.2014-03-22_12.33
reminderfox.ics.2014-03-24_11.16
reminderfox.ics.2014-03-24_19.43
reminderfox.ics.2014-03-25_01.36
reminderfox.ics.2014-03-25_16.33
reminderfox.ics.2014-03-29_17.50
reminderfox.ics.2014-03-29_18.14
reminderfox.ics.2014-03-29_18.31
reminderfox.ics.2014-03-29_18.48
reminderfox.ics.2014-03-29_19.17
reminderfox.ics.2014-03-31_02.08
reminderfox.ics.2014-03-31_15.07
reminderfox.ics.2014-03-31_15.14
reminderfox.ics.2014-03-31_15.15
reminderfox.ics.2014-04-05_23.37
reminderfox.ics.2014-04-06_01.31
reminderfox.ics.2014-04-06_01.52
reminderfox.ics.2014-04-06_14.47
reminderfox.ics.2014-04-06_16.06
reminderfox.ics.2014-04-06_18.54
reminderfox.ics.2014-04-07_16.55
reminderfox.ics.2014-04-10_20.24
reminderfox.ics.2014-04-13_21.28
reminderfox.ics.2014-04-13_23.28
reminderfox.ics.2014-04-17_19.29
reminderfox.ics.2014-04-19_22.45
reminderfox.ics.2014-04-20_12.56
reminderfox.ics.2014-04-21_15.51
reminderfox.ics.2014-04-26_20.46
reminderfox.ics.2014-04-26_20.48
reminderfox.ics.2014-04-27_04.51
reminderfox.ics.2014-04-27_21.29
reminderfox.ics.2014-05-02_11.56
reminderfox.ics.2014-05-03_02.32
reminderfox.ics.2014-05-03_21.49
reminderfox.ics.2014-05-04_16.37
reminderfox.ics.2014-05-10_04.12
reminderfox.ics.2014-05-11_02.06
reminderfox.ics.2014-05-11_04.40
reminderfox.ics.2014-05-14_02.22
reminderfox.ics.2014-05-14_02.24
reminderfox.ics.2014-05-16_01.21
reminderfox.ics.2014-05-17_00.52
reminderfox.ics.2014-05-17_15.25
reminderfox.ics.2014-05-17_23.24
reminderfox.ics.2014-05-18_02.46
reminderfox.ics.2014-05-18_02.52
reminderfox.ics.2014-05-18_02.55
reminderfox.ics.2014-05-30_16.30
reminderfox.ics.2014-06-05_06.31
reminderfox.ics.2014-06-07_15.11
reminderfox.ics.2014-06-08_02.16
reminderfox.ics.2014-06-13_02.29
reminderfox.ics.2014-06-15_15.43
reminderfox.ics.2014-06-21_01.55
reminderfox.ics.2014-06-23_04.06
reminderfox.ics.2014-06-27_20.07
reminderfox.ics.2014-07-02_20.37
reminderfox.ics.2014-07-10_00.24
reminderfox.ics.2014-07-10_00.26
reminderfox.ics.2014-07-12_03.09
reminderfox.ics.2014-07-14_20.39
reminderfox.ics.2014-07-23_03.25
reminderfox.ics.2014-07-23_03.33
reminderfox.ics.2014-07-23_03.41
reminderfox.ics.2014-07-23_19.56
reminderfox.ics.2014-07-24_18.57
reminderfox.ics.2014-07-28_16.00
reminderfox.ics.2014-07-30_19.21
reminderfox.ics.2014-08-02_09.34
reminderfox.ics.2014-08-03_21.12
reminderfox.ics.2014-08-03_22.12
reminderfox.ics.2014-08-04_21.33
reminderfox.ics.2014-08-07_19.20
reminderfox.ics.2014-08-09_00.54
reminderfox.ics.2014-08-10_03.29
reminderfox.ics.2014-08-11_04.24
reminderfox.ics.2014-08-14_15.19
reminderfox.ics.2014-08-15_11.01
reminderfox.ics.2014-08-19_00.12
reminderfox.ics.2014-08-20_22.13
reminderfox.ics.2014-08-22_15.01
reminderfox.ics.2014-08-24_17.16
reminderfox.ics.2014-08-26_17.19
reminderfox.ics.2014-08-26_17.20
reminderfox.ics.2014-08-28_15.10
reminderfox.ics.2014-08-29_00.44
reminderfox.ics.2014-09-01_15.53
reminderfox.ics.2014-09-10_04.33
reminderfox.ics.2014-09-10_04.36
reminderfox.ics.2014-09-11_00.21
reminderfox.ics.2014-09-17_18.42
reminderfox.ics.2014-09-18_02.07
reminderfox.ics.2014-09-21_17.19
reminderfox.ics.2014-09-22_06.50
reminderfox.ics.2014-09-23_10.50
reminderfox.ics.2014-09-23_10.53
reminderfox.ics.2014-09-25_22.43
reminderfox.ics.2014-09-27_22.07
reminderfox.ics.2014-09-28_19.01
reminderfox.ics.2014-09-30_02.26
reminderfox.ics.2014-10-09_17.48
reminderfox.ics.2014-10-10_09.46
reminderfox.ics.2014-10-11_04.04
reminderfox.ics.2014-10-11_04.17
reminderfox.ics.2014-10-11_04.35
reminderfox.ics.2014-10-12_15.03
reminderfox.ics.2014-10-12_16.13
reminderfox.ics.2014-10-14_01.19
reminderfox.ics.2014-10-15_16.55
reminderfox.ics.2014-10-16_10.07
reminderfox.ics.2014-10-17_13.20
reminderfox.ics.2014-10-17_23.52
reminderfox.ics.2014-10-18_00.18
reminderfox.ics.2014-10-19_18.55
reminderfox.ics.2014-10-21_13.32
reminderfox.ics.2014-10-23_00.31
reminderfox.ics.2014-10-24_14.22
reminderfox.ics.2014-10-24_14.23
reminderfox.ics.2014-10-24_23.18
reminderfox.ics.2014-10-25_03.23
reminderfox.ics.2014-10-28_10.25
reminderfox.ics.2014-11-01_00.53
reminderfox.ics.2014-11-02_00.15
reminderfox.ics.2014-11-02_00.32
reminderfox.ics.2014-11-02_15.20
reminderfox.ics.2014-11-03_12.37
reminderfox.ics.2014-11-10_18.48
reminderfox.ics.2014-11-10_19.16
reminderfox.ics.2014-11-12_01.57
reminderfox.ics.2014-11-14_19.05
reminderfox.ics.2014-11-19_19.50
reminderfox.ics.2014-11-21_08.52
reminderfox.ics.2014-11-22_20.48
reminderfox.ics.2014-11-28_15.08
reminderfox.ics.2014-12-05_17.29
reminderfox.ics.2014-12-06_19.05
reminderfox.ics.2014-12-07_12.18
reminderfox.ics.2014-12-10_16.10
reminderfox.ics.2014-12-11_06.13
reminderfox.ics.2014-12-13_03.54
reminderfox.ics.2014-12-13_04.19
reminderfox.ics.2014-12-15_10.44
reminderfox.ics.2014-12-18_09.15
reminderfox.ics.2014-12-20_07.45
reminderfox.ics.2014-12-20_09.05
reminderfox.ics.2014-12-21_03.31
reminderfox.ics.2014-12-21_03.49
reminderfox.ics.2014-12-31_16.56
reminderfox.ics.2015-01-01_17.23
reminderfox.ics.2015-01-05_16.38
reminderfox.ics.2015-01-05_16.45
reminderfox.ics.2015-01-05_16.48
reminderfox.ics.2015-01-05_16.49
reminderfox.ics.2015-01-05_16.50
reminderfox.ics.2015-01-05_16.52
reminderfox.ics.2015-01-05_16.53
reminderfox.ics.2015-01-05_16.54
reminderfox.ics.2015-01-05_16.55
reminderfox.ics.2015-01-05_16.57
reminderfox.ics.2015-01-05_16.59
reminderfox.ics.2015-01-06_01.26
reminderfox.ics.2015-01-11_03.56
reminderfox.ics.2015-01-14_02.11
rf-tb.js

> dir /x

Volume in drive D is Drive D
Volume Serial Number is FFFF-FFFF

Directory of d:\abc\zzz

09/06/2015 20:03 <DIR> .
09/06/2015 20:03 <DIR> ..
09/06/2015 20:05 6,516 a.txt
19/05/2015 22:56 3,801 Dup.cmd
13/01/2015 23:03 14,066 REMIND~2.11 reminderfox.ics.2015-01-14_02.11
13/02/2015 09:51 4,433 rf-tb.js
4 File(s) 28,816 bytes
2 Dir(s) 664,620,654,592 bytes free

carlos
Expert
Posts: 503
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Re: dup.cmd list duplicated files

#10 Post by carlos » 09 Jun 2015 17:01

foxidrive please you can output the result of

Code: Select all

dir /x /a
Last edited by carlos on 12 Jun 2015 12:37, edited 1 time in total.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: dup.cmd list duplicated files

#11 Post by foxidrive » 10 Jun 2015 10:51

carlos, you gave me the clue to sort out what was happening.

All the files not displayed had a HIDDEN attribute, and all had short names.

Thanks for helping me see what I couldn't! :oops:

Post Reply