test.txt contains these records (and there are no empty lines):
Code: Select all
1
6
5
4
3
2
1
2
0
I wanted to remove duplicate entries without sorting the file (it was very important to keep the current order).
One way of achieving this is like this (let's just say that the data is unique in that way that it is not possible for findstr to find one record in another record as a substring - it can only be matched as a whole record (I have an 8 digit serial numbers)):
Code: Select all
@echo off
echo %time%
type nul>test.out_dir_3
for /f "tokens=1 delims=" %%f in (test.txt) do findstr /B /C:%%f test.out_dir_3>nul||>>test.out_dir_3 (echo.%%f)
echo %time%
Another way is using temporary files:
Code: Select all
@echo off
set tmpdir=TMPDIR%random%_%random%_%random%
md %tmpdir%
echo %time%
REM for each record write a temporary file
for /f "tokens=1 delims=" %%f in (test.txt) do >%tmpdir%\%%f (echo X)
REM if file exists write this record and delete this temporary file. So it means
REM that if file does not exist it has already been written to the output
REM file only once
for /f "tokens=1 delims=" %%f in (test.txt) do if exist %tmpdir%\%%f del %tmpdir%\%%f&>>test.out_dir_1 (echo %%f)
echo %time%
rd /s /q %tmpdir%
Surprisingly the second way is much faster (I have 24000+ records). FINDSTR has to search bigger output file every time (plus any antivirus programs will delay such operations - I am sure IF EXIST and DEL commands do not cause antivirus to check for any threats).
Of course this approach depends on the data you have. Just thought maybe someone finds this approach useful.
Saso