Extract UNIQUE rows ONLY from .txt file.

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
PAB
Posts: 139
Joined: 12 Aug 2019 13:57

Extract UNIQUE rows ONLY from .txt file.

#1 Post by PAB » 07 Jun 2020 10:08

Good afternoon,

I hope you are all keeping well and safe!

I have a .txt file that gets produced with many lines of text, which could include many duplicates.
Here is the part of the code that creates the .txt file . . .

Code: Select all

type "%tmp%" | findstr /I /G:"%Filter%" >> "%Output_File%"
[1] I want to exclude duplicates.
[2] I want the original order kept [ excluding the duplictes of course! ].

I already had a bit of code in my collection which I have adapted to do the above, which it does, but is there anyway to incorporate it into my existing code above please instead of running it seperately?

Code: Select all

@echo off

set "InputFile=C:\Users\System-Admin\Desktop\Errors.txt"
set "OutputFile=C:\Users\System-Admin\Desktop\DISM_Errors2.txt"

set "PSScript=%Temp%\~tmpRemoveDupe.ps1"
if exist "%PSScript%" del /q /f "%PSScript%"
echo Get-Content "%InputFile%" ^| Get-Unique ^> "%OutputFile%" >> "%PSScript%"
set "PowerShellDir=C:\Windows\System32\WindowsPowerShell\v1.0"
cd /D "%PowerShellDir%"
Powershell -ExecutionPolicy Bypass -Command "& '%PSScript%'"
del "%PSScript%"
pause
goto :EOF
EOF
Thanks in advance.

Hackoo
Posts: 103
Joined: 15 Apr 2014 17:59

Re: Extract UNIQUE rows ONLY from .txt file.

#2 Post by Hackoo » 07 Jun 2020 10:41

Hi :)
You can try like that :

Code: Select all

@echo off
set "InputFile=C:\Users\System-Admin\Desktop\Errors.txt"
set "OutputFile=C:\Users\System-Admin\Desktop\DISM_Errors2.txt"
Call :RemoveDuplicateEntry %InputFile% %OutputFile%
Pause & Exit
::----------------------------------------------------
:RemoveDuplicateEntry <InputFile> <OutPutFile>
Powershell  ^
$Contents=Get-Content '%1';  ^
$LowerContents=$Contents.ToLower(^);  ^
$LowerContents ^| select -unique ^| Out-File '%2'
Exit /b
::----------------------------------------------------

PAB
Posts: 139
Joined: 12 Aug 2019 13:57

Re: Extract UNIQUE rows ONLY from .txt file.

#3 Post by PAB » 07 Jun 2020 11:51

Thanks for the reply, it is appreciated.

I have tried all different ways of getting this to work but I get at least one error on the PS side. One being . . .
Method invocation failed because [System.Object[]] doesn't contain a
method named 'ToLower'.
At line:1 char:97
+ $Contents=Get-Content 'C:\Users\System-Admin\Desktop\Dups.txt'; $Lo
werContents=$Contents.ToLower <<<< (); $LowerContents | select -uniqu
e | Out-File 'C:\Users\System-Admin\Desktop\DISM_Errors.txt'
+ CategoryInfo : InvalidOperation: (ToLower:String) [],
RuntimeException
+ FullyQualifiedErrorId : MethodNotFound

Code: Select all

) else (
  type "%tmp%" | findstr /I /G:"%Filter%" >> "%InputFile%"
  Call :RemoveDuplicateEntry %InputFile% %OutputFile%
  echo. & echo ^>Press ANY key to EXIT . . . & pause >nul
  goto :Exit
)
:Exit

:RemoveDuplicateEntry <InputFile> <OutputFile>
Powershell  ^
$Contents=Get-Content '%1';  ^
$LowerContents=$Contents.ToLower(^);  ^
$LowerContents ^| select -unique ^| Out-File '%2'
Exit /b
UPDATE:

It is important that the file is NOT sorted.
The .txt file is a log file that is sorted in yyyy-mm-dd hh-mm-secs, therefore, when there are duplicate rows, they are actually together anyway, pretty much as if they were already sorted.
The code I posted previously works great, except that I need this done within the same file rather than having another file perform this!

Thanks in advance.
Last edited by PAB on 07 Jun 2020 17:39, edited 2 times in total.

PAB
Posts: 139
Joined: 12 Aug 2019 13:57

Re: Extract UNIQUE rows ONLY from .txt file.

#4 Post by PAB » 07 Jun 2020 15:02

OK, this actually works . . .

Code: Select all

) else (
  cls
  del "%Input_File%"
  type "%tmp%" | findstr /I /G:"%Filter%" >> "%Input_File%"
  del "%tmp%"
  echo. > "%Output_File%" & echo ERRORS FOUND . . . >> "%Output_File%" & echo. >> "%Output_File%"
  for /f "tokens=* delims= " %%a in (%Input_File%) do (
  find "%%a" < "%Output_File%" >nul || >> "%Output_File%" echo.%%a
  )
  del "%Input_File%"
  echo. & echo ^>Press ANY key to EXIT . . . & pause >nul
  goto :Exit
)
:Exit
One question please.

Throughout my code I have added speech marks around my path variables as I always do.
For some reason however, if I add speech marks around in ("%Input_File%") do, it returns a single line of the path and the file name.
Without the speech marks it returns the list of unique rows as expected!

Thanks in advance.

Squashman
Expert
Posts: 4471
Joined: 23 Dec 2011 13:59

Re: Extract UNIQUE rows ONLY from .txt file.

#5 Post by Squashman » 30 Jul 2020 16:43

PAB wrote:
07 Jun 2020 15:02
OK, this actually works . . .

Code: Select all

) else (
  cls
  del "%Input_File%"
  type "%tmp%" | findstr /I /G:"%Filter%" >> "%Input_File%"
  del "%tmp%"
  echo. > "%Output_File%" & echo ERRORS FOUND . . . >> "%Output_File%" & echo. >> "%Output_File%"
  for /f "tokens=* delims= " %%a in (%Input_File%) do (
  find "%%a" < "%Output_File%" >nul || >> "%Output_File%" echo.%%a
  )
  del "%Input_File%"
  echo. & echo ^>Press ANY key to EXIT . . . & pause >nul
  goto :Exit
)
:Exit
One question please.

Throughout my code I have added speech marks around my path variables as I always do.
For some reason however, if I add speech marks around in ("%Input_File%") do, it returns a single line of the path and the file name.
Without the speech marks it returns the list of unique rows as expected!

Thanks in advance.
Read the help file for the FOR command.

Code: Select all

usebackq        - specifies that the new semantics are in force,
                  where a back quoted string is executed as a
                  command and a single quoted string is a
                  literal string command and allows the use of
                  double quotes to quote file names in
                  file-set.
If you don't use this option the FOR command thinks the IN clause is a string when quotes are around it.

Post Reply