Question about Removing duplicate lines [SOLVED]

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
abc0502
Posts: 1007
Joined: 26 Oct 2011 22:38
Location: Egypt

Question about Removing duplicate lines [SOLVED]

#1 Post by abc0502 » 26 Aug 2012 19:22

Hi,
I'm trying to remove duplicate lines from a file,
I found this code made by !k

Code: Select all

for /f "tokens=2 delims=/" %%a in ('sort infile.txt') do (
  findstr /ixc:"%%a" exclude.txt only_new.txt ||echo,%%a>> only_new.txt
)


But i don't understand it, what exclude.txt and only_new.txt do

my file is like this
test1
test2
test1
test3
test2
test1
test3

the output should be like this
test1
test2
test3

The code work fine after changing the tokens to * and removing the delims, but it generate errors becdasue the two files not exist.

can any one explain it,
thanks
Last edited by abc0502 on 27 Aug 2012 02:33, edited 1 time in total.

Squashman
Expert
Posts: 4488
Joined: 23 Dec 2011 13:59

Re: Question about Removing duplicate lines

#2 Post by Squashman » 26 Aug 2012 19:52

Hi abc0502,
I think I understand what !k's code is trying to do. Could you post a link to the original thread you found the code in.
Thanks.

abc0502
Posts: 1007
Joined: 26 Oct 2011 22:38
Location: Egypt

Re: Question about Removing duplicate lines

#3 Post by abc0502 » 26 Aug 2012 19:57

This is the link viewtopic.php?f=3&t=1957

Squashman
Expert
Posts: 4488
Joined: 23 Dec 2011 13:59

Re: Question about Removing duplicate lines

#4 Post by Squashman » 26 Aug 2012 19:58

Dave answered a similar question over on StackOverFlow just last month. A bit more involved then the code you have now.
http://stackoverflow.com/questions/1168 ... -text-file

abc0502
Posts: 1007
Joined: 26 Oct 2011 22:38
Location: Egypt

Re: Question about Removing duplicate lines

#5 Post by abc0502 » 26 Aug 2012 20:15

WoW, as dbenham said "it is not pretty" :o
I take !k Code,
I don't need the output to be sorted i just need to remove the duplicate as i use this list as a file names
so there will be no repeated lines in the final files.

Squashman
Expert
Posts: 4488
Joined: 23 Dec 2011 13:59

Re: Question about Removing duplicate lines

#6 Post by Squashman » 26 Aug 2012 20:23

I think this works or do I have my logic backwards.

Code: Select all

@echo off
type nul>only_new.txt
for /f "tokens=* delims=" %%a in (infile.txt) do (
  findstr /ixc:"%%a" only_new.txt || >>only_new.txt echo.%%a
)

abc0502
Posts: 1007
Joined: 26 Oct 2011 22:38
Location: Egypt

Re: Question about Removing duplicate lines

#7 Post by abc0502 » 26 Aug 2012 20:34

Squashman wrote:I think this works or do I have my logic backwards.

Code: Select all

@echo off
type nul>only_new.txt
for /f "tokens=* delims=" %%a in (infile.txt) do (
  findstr /ixc:"%%a" only_new.txt || >>only_new.txt echo.%%a
)

This is great, it works without errors
Thanks :)

EDIT
I just wanted to add,
when it work it echo the lines to the cmd, but if you add >nul after only_new.txt it suppress that
findstr /ixc:"%%a" only_new.txt >nul || >>only_new.txt echo.%%a


thanks again Squashman :)

Squashman
Expert
Posts: 4488
Joined: 23 Dec 2011 13:59

Re: Question about Removing duplicate lines

#8 Post by Squashman » 26 Aug 2012 20:41

abc0502 wrote:
Squashman wrote:I think this works or do I have my logic backwards.

Code: Select all

@echo off
type nul>only_new.txt
for /f "tokens=* delims=" %%a in (infile.txt) do (
  findstr /ixc:"%%a" only_new.txt || >>only_new.txt echo.%%a
)

This is great, it works without errors
Thanks :)

EDIT
I just wanted to add,
when it work it echo the lines to the cmd, but if you add >nul after only_new.txt it suppress that
findstr /ixc:"%%a" only_new.txt >nul || >>only_new.txt echo.%%a


thanks again Squashman :)

Yep. I forgot to put that back in. I had it in there on my first attempt but then had an error on something so removed it. I had my logic backwards on my first attempt. Had the /V switch in there.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Question about Removing duplicate lines

#9 Post by dbenham » 26 Aug 2012 22:19

That solution works fine as long as the lines do not contain backslash \ or quote ". But those characters must be escaped if they appear in a literal search string. Unfortunately the escape rules for command line literal search strings are ridiculous, especially if the string contains both characters.

"hello" --> \"hello\"

abc\def --> abc\\def

"c:\folder\file" --> \"c:\\folder\\\\file\"

The escape rules are much simpler for a file based string literal - Only the backslash need be escaped and it is escaped consistently as \\.

That is why my 2nd solution posted at Batch to remove duplicate rows from text file is as complicated as it is.

See What are the undocumented features and limitations of the Windows FINDSTR command for more info on FINDSTR escape rules.

---------------------------

Note, the Squashman solution is case insensitive because of the /I switch, but it can be made case sensitive by removing the switch. My solution at the link is case sensitive.


Dave Benham

abc0502
Posts: 1007
Joined: 26 Oct 2011 22:38
Location: Egypt

Re: Question about Removing duplicate lines

#10 Post by abc0502 » 27 Aug 2012 00:00

hi, dbenham
Thanks for your notes, for me i don't have to worry about the \ and " as i'm using this list as a files and folders names,
and this two don't accept the two chrachters, but if i'm going to do proccess on a files that has \ or "
I guess i will have to use your code then.

Thanks again :)

joeshoe
Posts: 1
Joined: 30 Apr 2014 09:35

Re: Question about Removing duplicate lines [SOLVED]

#11 Post by joeshoe » 30 Apr 2014 09:47

for /f "tokens=2 delims=/" %%a in (text.txt) do echo,%%a>> text_done.txt


Probably a stupid question, but is there in something so it only leaves the first instance? Instead of the output being a blank text file?

Checked out stack overflow thread, but way over my head :)

Post Reply