Help please - sort one url per line with batch file

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
philip123
Posts: 4
Joined: 19 Jun 2015 07:24

Help please - sort one url per line with batch file

#1 Post by philip123 » 19 Jun 2015 08:28

Hi,

I'm new to this forum.

I have a text file containing hundreds of website url's in it, they are all separated by a single spaces.
I need to sort the url's so that I end up with a text file(with all url's) that contains only one url per line, in no particular order.

Change this: google.com youtube.com yahoo.com
To this: google.com
youtube.com
yahoo.com

Could this be done with a batch file?

Thanks

Ben Mar
Posts: 22
Joined: 03 May 2015 10:51

Re: Help please - sort one url per line with batch file

#2 Post by Ben Mar » 19 Jun 2015 12:06

Try this:

Code: Select all

@echo off
setlocal EnableDelayedExpansion
(for /f "tokens=1,*" %%A in (url.txt) do (
   echo %%A
   if "%%B"=="" goto :eof
   set nextURL=%%B
   :Begin
   for /f "tokens=1,*" %%F in ("!nextURL!") do (
      echo %%F
      if "%%F"=="" goto :eof
      set nextURL=%%G
      goto Begin
   )
) )>url.lst
type url.lst
Last edited by Ben Mar on 19 Jun 2015 20:31, edited 1 time in total.

philip123
Posts: 4
Joined: 19 Jun 2015 07:24

Re: Help please - sort one url per line with batch file

#3 Post by philip123 » 19 Jun 2015 13:38

Hi Ben,

Thanks for your help.
This is what I need, but I only get the first two url's in "url.txt' sorted in the "url.lst'' file.
All the other url's in the "url.txt' file is ignored.

Any suggestions, what I should do.

Thanks
Philip

Aacini
Expert
Posts: 1615
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: Help please - sort one url per line with batch file

#4 Post by Aacini » 19 Jun 2015 15:31

If I correctly understood you, you have a series of url's in a file with no line separators, so the file contain just one "long line". However, you have not said us the maximum size of this long line nor other details that may affect the method.

If the total size is less than 8192 characters and the url's does not contain special characters, then the solution is relatively simple:

Code: Select all

@echo off

(for /F "delims=" %%a in (input.txt) do (
   for %%b in (%%a) do echo %%b
)) > output.txt

However, if the file size is longer than 8192, then it must be processed in "chunks" of a lesser size:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

call :SplitLongLine < input.txt > output.txt
goto :EOF


:SplitLongLine

set "lastPart="
:nextPart
   set "part="
   set /P part=
   if not defined part goto endFile
   if "!part:~1022!" equ " " (
      for %%a in (!lastPart!!part!) do echo %%a
      set "lastPart="
   ) else (
      set "lastUrl="
      for %%a in (!lastPart!!part!) do (
         if defined lastUrl echo !lastUrl!
         set "lastUrl=%%a"
      )
      set "lastPart=!lastUrl!"
   )
goto nextPart

:endFile
if defined lastPart echo %lastPart%
exit /B

This code fail if an url contain a question-mark or asterisk.

Antonio

philip123
Posts: 4
Joined: 19 Jun 2015 07:24

Re: Help please - sort one url per line with batch file

#5 Post by philip123 » 19 Jun 2015 16:19

Hi Antonio,

The file size is longer than 8192, so I tried the code for that, and it works perfectly!

Thank you very, very much!!!

Regards,
Philip

Ben Mar
Posts: 22
Joined: 03 May 2015 10:51

Re: Help please - sort one url per line with batch file

#6 Post by Ben Mar » 19 Jun 2015 23:45

philip123 wrote:Hi Ben,

Thanks for your help.
This is what I need, but I only get the first two url's in "url.txt' sorted in the "url.lst'' file.
All the other url's in the "url.txt' file is ignored.

Any suggestions, what I should do.

Thanks
Philip

Could you post a sample of your "url.txt" file?

foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: Help please - sort one url per line with batch file

#7 Post by foxidrive » 19 Jun 2015 23:49

Ben Mar wrote:Try this:

Code: Select all

@echo off
setlocal EnableDelayedExpansion
(for /f "tokens=1,*" %%A in (url.txt) do (
   echo %%A
   if "%%B"=="" goto :eof
   set nextURL=%%B
   :Begin
   for /f "tokens=1,*" %%F in ("!nextURL!") do (
      echo %%F
      if "%%F"=="" goto :eof
      set nextURL=%%G
      goto Begin
   )
) )>url.lst
type url.lst


AFAIK you can't have a goto loop within a for construct, Ben.
I didn't test your code but unless you've found a way to do it, it's going to fall out of the for loop.

Ben Mar
Posts: 22
Joined: 03 May 2015 10:51

Re: Help please - sort one url per line with batch file

#8 Post by Ben Mar » 20 Jun 2015 00:43

foxidrive wrote:
Ben Mar wrote:Try this:

Code: Select all

@echo off
setlocal EnableDelayedExpansion
(for /f "tokens=1,*" %%A in (url.txt) do (
   echo %%A
   if "%%B"=="" goto :eof
   set nextURL=%%B
   :Begin
   for /f "tokens=1,*" %%F in ("!nextURL!") do (
      echo %%F
      if "%%F"=="" goto :eof
      set nextURL=%%G
      goto Begin
   )
) )>url.lst
type url.lst


AFAIK you can't have a goto loop within a for construct, Ben.
I didn't test your code but unless you've found a way to do it, it's going to fall out of the for loop.


First time to hear about it. But anyway in this code I've tested so far I didn't see any wrong result yet.
This is my makeup test case:

Code: Select all

cnn0.com cnn1.com cnn2.com cnn3.com cnn4.com cnn5.com cnn6.com cnn7.com cnn8.com cnn9.com cnn10.com cnn11.com cnn12.com cnn13.com cnn14.com cnn15.com cnn16.com cnn17.com cnn18.com cnn19.com cnn20.com cnn21.com cnn22.com cnn23.com cnn24.com cnn25.com cnn26.com cnn27.com cnn28.com cnn29.com cnn30.com cnn31.com cnn32.com cnn33.com cnn34.com cnn35.com cnn36.com cnn37.com cnn38.com cnn39.com cnn40.com cnn41.com cnn42.com cnn43.com cnn44.com cnn45.com cnn46.com cnn47.com cnn48.com cnn49.com cnn50.com cnn51.com cnn52.com cnn53.com cnn54.com cnn55.com cnn56.com cnn57.com cnn58.com cnn59.com cnn60.com cnn61.com cnn62.com cnn63.com cnn64.com cnn65.com cnn66.com cnn67.com cnn68.com cnn69.com cnn70.com cnn71.com cnn72.com cnn73.com cnn74.com cnn75.com cnn76.com cnn77.com cnn78.com cnn79.com cnn80.com cnn81.com cnn82.com cnn83.com cnn84.com cnn85.com cnn86.com cnn87.com cnn88.com cnn89.com cnn90.com cnn91.com cnn92.com cnn93.com cnn94.com cnn95.com cnn96.com cnn97.com cnn98.com cnn99.com cnn100.com

and I've got this result:

Code: Select all

cnn2.com
cnn3.com
cnn4.com
cnn5.com
cnn6.com
cnn7.com
cnn8.com
cnn9.com
cnn10.com
cnn11.com
cnn12.com
cnn13.com
cnn14.com
cnn15.com
cnn16.com
cnn17.com
cnn18.com
cnn19.com
cnn20.com
cnn21.com
cnn22.com
cnn23.com
cnn24.com
cnn25.com
cnn26.com
cnn27.com
cnn28.com
cnn29.com
cnn30.com
cnn31.com
cnn32.com
cnn33.com
cnn34.com
cnn35.com
cnn36.com
cnn37.com
cnn38.com
cnn39.com
cnn40.com
cnn41.com
cnn42.com
cnn43.com
cnn44.com
cnn45.com
cnn46.com
cnn47.com
cnn48.com
cnn49.com
cnn50.com
cnn51.com
cnn52.com
cnn53.com
cnn54.com
cnn55.com
cnn56.com
cnn57.com
cnn58.com
cnn59.com
cnn60.com
cnn61.com
cnn62.com
cnn63.com
cnn64.com
cnn65.com
cnn66.com
cnn67.com
cnn68.com
cnn69.com
cnn70.com
cnn71.com
cnn72.com
cnn73.com
cnn74.com
cnn75.com
cnn76.com
cnn77.com
cnn78.com
cnn79.com
cnn80.com
cnn81.com
cnn82.com
cnn83.com
cnn84.com
cnn85.com
cnn86.com
cnn87.com
cnn88.com
cnn89.com
cnn90.com
cnn91.com
cnn92.com
cnn93.com
cnn94.com
cnn95.com
cnn96.com
cnn97.com
cnn98.com
cnn99.com
cnn100.com
cnn0.com
cnn1.com

foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: Help please - sort one url per line with batch file

#9 Post by foxidrive » 20 Jun 2015 01:24

I did test immediately after posting and it only showed me two lines.

I tried your sample case and got this, too. A 20 byte URL.LST file.

Code: Select all

cnn0.com
cnn1.com

philip123
Posts: 4
Joined: 19 Jun 2015 07:24

Re: Help please - sort one url per line with batch file

#10 Post by philip123 » 20 Jun 2015 02:13

However, if the file size is longer than 8192, then it must be processed in "chunks" of a lesser size:

Code: Select all

Code:
@echo off
setlocal EnableDelayedExpansion

call :SplitLongLine < input.txt > output.txt
goto :EOF


:SplitLongLine

set "lastPart="
:nextPart
   set "part="
   set /P part=
   if not defined part goto endFile
   if "!part:~1022!" equ " " (
      for %%a in (!lastPart!!part!) do echo %%a
      set "lastPart="
   ) else (
      set "lastUrl="
      for %%a in (!lastPart!!part!) do (
         if defined lastUrl echo !lastUrl!
         set "lastUrl=%%a"
      )
      set "lastPart=!lastUrl!"
   )
goto nextPart

:endFile
if defined lastPart echo %lastPart%
exit /B


This code fail if an url contain a question-mark or asterisk.

Antonio


Here is the link to my file: https://www.dropbox.com/s/s7n3jw705ab69sj/url.txt?dl=0
It's an add/malware block list I want to use for my network hosts file

After my last post I ran the batch file again and had a look at the output again.
I noticed that some lines had two url's merged, for example:

Code: Select all

youtube.comgoogle.com

Ben Mar
Posts: 22
Joined: 03 May 2015 10:51

Re: Help please - sort one url per line with batch file

#11 Post by Ben Mar » 20 Jun 2015 07:30

foxidrive wrote:I did test immediately after posting and it only showed me two lines.

I tried your sample case and got this, too. A 20 byte URL.LST file.

Code: Select all

cnn0.com
cnn1.com

OK you've right for some reason the output text is not correct and I've fixed my code to have it output correctly.
I don't know why the output is not working properly in the original post. Here is the new fixed:

Code: Select all

@echo off
setlocal EnableDelayedExpansion
for /f "tokens=1,*" %%A in (long_url.txt) do (
   echo %%A>url.lst
   if "%%B"=="" goto :eof
   set nextURL=%%B
   :Begin
   for /f "tokens=1,*" %%F in ("!nextURL!") do (
      echo %%F>>url.lst
      if "%%F"=="" goto :eof
      set nextURL=%%G
      goto Begin
   )
)
type url.lst

Aacini
Expert
Posts: 1615
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: Help please - sort one url per line with batch file

#12 Post by Aacini » 20 Jun 2015 08:08

philip123 wrote:After my last post I ran the batch file again and had a look at the output again.
I noticed that some lines had two url's merged, for example:

Code: Select all

youtube.comgoogle.com

Yes, this happen when a chunk start in space; the original code just check that the chunk ends in space. This is the fixed code:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

call :SplitLongLine < input.txt > output.txt
goto :EOF


:SplitLongLine

set "lastPart="
:nextPart
   set "part="
   set /P part=
   if not defined part goto endFile
   if "!part:~0,1!" equ " " if defined lastPart (
      echo %lastPart%
      set "lastPart="
   )
   if "!part:~1022!" equ " " (
      for %%a in (!lastPart!!part!) do echo %%a
      set "lastPart="
   ) else (
      set "lastUrl="
      for %%a in (!lastPart!!part!) do (
         if defined lastUrl echo !lastUrl!
         set "lastUrl=%%a"
      )
      set "lastPart=!lastUrl!"
   )
goto nextPart

:endFile
if defined lastPart echo %lastPart%
exit /B

Antonio

foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: Help please - sort one url per line with batch file

#13 Post by foxidrive » 20 Jun 2015 09:13

Ben Mar wrote:I don't know why the output is not working properly in the original post. Here is the new fixed:

Code: Select all

@echo off
setlocal EnableDelayedExpansion
for /f "tokens=1,*" %%A in (long_url.txt) do (
   echo %%A>url.lst
   if "%%B"=="" goto :eof
   set nextURL=%%B
   :Begin
   for /f "tokens=1,*" %%F in ("!nextURL!") do (
      echo %%F>>url.lst
      if "%%F"=="" goto :eof
      set nextURL=%%G
      goto Begin
   )
)
type url.lst


To use your technique, this is all that's needed.

Code: Select all

@echo off
del url.txt 2>nul
:begin
for /f "tokens=1,*" %%A in (long_url.txt) do (
   echo %%A>>url.lst
   if not "%%B"=="" echo %%B>long_url.txt & goto :begin
)
type url.lst
pause

Aacini
Expert
Posts: 1615
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: Help please - sort one url per line with batch file

#14 Post by Aacini » 20 Jun 2015 09:14

Ben Mar wrote:
foxidrive wrote:AFAIK you can't have a goto loop within a for construct, Ben.
I didn't test your code but unless you've found a way to do it, it's going to fall out of the for loop.


First time to hear about it. But anyway in this code I've tested so far I didn't see any wrong result yet.


When a GOTO is executed inside a ( code block ) any pending commands that comprise the code block are cancelled; this include any pending FOR iterations, any ELSE part of nested IF's, and any redirections made to the block. The execution jump to the label and continue from there like if the label were placed outside any block, so any closing parentheses found are just ignored.

In your code this point not affect the result because the GOTO is placed at the end of a FOR /F that not iterate; however, the redirection to the output file is also cancelled, so the results after the second one (after the GOTO) are not redirected to the file, but shown on the screen. Of course, the way to solve this point is appending each line to the output file, but in this case the code may also be rewritten so the :Begin label be placed outside of the first FOR.

Anyway, your code fail to fulfill the core point of this problem: a FOR command can not read a line if it is longer than 8 KB. You may test this point in an easy way: just copy the data in your test case 10 times, so the resulting file is 9900 characters long, and run your program with it...

Antonio

foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: Help please - sort one url per line with batch file

#15 Post by foxidrive » 20 Jun 2015 09:26

Aacini wrote:Anyway, your code fail to fulfill the core point of this problem: a FOR command can not read a line if it is longer than 8 KB. You may test this point in an easy way: just copy the data in your test case 10 times, so the resulting file is 9900 characters long, and run your program with it...

Antonio


That's an issue there, for sure.

I'd use your tool, or Dave's tool to do this.

This uses a native Windows batch script called findrepl.bat (by aacini)
it can be found here: viewtopic.php?f=3&t=4697

Code: Select all

@echo off
type inputfile.txt|findrepl " " "\r\n" /a >outputfile.txt



This uses a native Windows batch script called Jrepl.bat (by dbenham)
it can be found here: viewtopic.php?f=3&t=6044

Code: Select all

@echo off
call jrepl " " "\r\n" /x /f inputfile.txt /o outputfile.txt

Post Reply