Split string to characters

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: Split string to characters

#16 Post by foxidrive » 29 Mar 2012 00:28

jeb wrote:No, CALL does not run an extra CMD processor, it only restarts a new parser loop (and stops after the special character phase).


Many years ago that was the explanation.

Call has also some side effects, it's very slow and it doubles all carets, for more you could read
CALL me, or better avoid call
jeb


Thanks.

Sponge Belly
Posts: 196
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: Split string to characters

#17 Post by Sponge Belly » 28 Dec 2017 10:53

Hello All,

Sorry I’m late to the party, but here’s my proffered solution to this hoary old chestnut of a problem:

Code: Select all

@echo off & setLocal enableExtensions disableDelayedExpansion
(call;) %= sets errorLevel to 0 =%

set "testStr=uncopyrightable"
call :splitStr testStr
if errorLevel 1 (
    >&2 echo(string is %errorLevel% char(s^) in length
) else (
    >&2 echo(empty string
    goto die
) %= if =%
goto end

:die
(call) %= sets errorLevel to 1 =%
:end
endLocal & goto :EOF

:splitStr string=
:: outputs string one character per line
setLocal disableDelayedExpansion
set "var=%1"

set "chrCount=0" & if defined var for /f "delims=" %%A in ('
    cmd /v:on /q /c for /l %%I in (0 1 8190^) do ^
    if "!%var%:~%%I,1!" neq "" (^
    echo(:^!%var%:~%%I^,1^!^) else exit 0
') do (
    set /a chrCount+=1
    echo%%A
) %= for /f =%

endLocal & exit /b %chrCount%
The subroutine is expansion-insensitive, has no goto loop, and prints every character including CR and LF.

Happy 2018! :)

- SB

penpen
Expert
Posts: 1726
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Split string to characters

#18 Post by penpen » 28 Dec 2017 19:47

If i see it right, then you could minimize your key idea:

Code: Select all

@echo off
setlocal enableExtensions disableDelayedExpansion
set "testStr=some text"

%ComSpec% /d /q /e:on /v:on /cif defined testStr for /L %%a in (0, 1, 8190) do if "!testStr:~%%~a,1!" == "" (exit) else echo(!testStr:~%%~a,1!

endlocal
goto :eof
There's also a solution (basing on jebs strLen algorithm idea) without using call or cmd.exe:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
set "s=some text"

if not defined s goto :eof
set "i=4096"
2>nul set /a "i+=2048, i-!s:~%i%,1!=4096"
2>nul set /a "i+=1024, i-!s:~%i%,1!=2048"
2>nul set /a "i+=512, i-!s:~%i%,1!=1024"
2>nul set /a "i+=256, i-!s:~%i%,1!=512"
2>nul set /a "i+=128, i-!s:~%i%,1!=256"
2>nul set /a "i+=64, i-!s:~%i%,1!=128"
2>nul set /a "i+=32, i-!s:~%i%,1!=64"
2>nul set /a "i+=16, i-!s:~%i%,1!=32"
2>nul set /a "i+=8, i-!s:~%i%,1!=16"
2>nul set /a "i+=4, i-!s:~%i%,1!=8"
2>nul set /a "i+=2, i-!s:~%i%,1!=4"
2>nul set /a "i+=1, i-!s:~%i%,1!=2"
2>nul set /a "i-!s:~%i%,1!=1"

for /l %%a in (0, 1, %i%) do echo(!s:~%%~a,1!

endlocal
goto :eof
penpen

Sponge Belly
Posts: 196
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: Split string to characters

#19 Post by Sponge Belly » 01 Jan 2018 04:43

Hello Penpen and Happy 2018! :)

Your first solution is very reductionist… I love it! 8)

My only criticism is that it prints 2 newlines for a single LF.

As for your second solution, I’m still trying to figure it out. :lol:

Thanks for your feedback.

- SB

BatchGuy
Posts: 3
Joined: 13 Nov 2019 08:35

Re: Split string to characters

#20 Post by BatchGuy » 13 Nov 2019 13:59

Aacini wrote:
28 Mar 2012 12:01
This is another, simpler method:

Code: Select all

@echo off
set mystring=example
:loop
if defined mystring (
   echo(%mystring:~0,1%
   set "mystring=%mystring:~1%"
   goto loop
)
Hi,

I may be late for even the afterparty, but maybe it's worth reviving.
I tried to adapt Aacini's most elegant technique to browse through a text string character by character, as it seemed apt to remove "=" signs from a URL (text string) like this one:

https://www.some-site.com/index.php?p=a&select=59=1=&2=3&&=4?=55321=abcd

For such removal substring replacement cannot be used, because the equal sign is also the operator for such replacements.
(in the above URL I'll first replace the ampersands "&" by "¿", i.e. some character which I'll never run into in normal situations, and
which does not hinder my batch either, as it is accepted as ANSI. But this is not the issue here.)

But, when next I try Aacini's approach like this

Code: Select all

@echo off
setlocal EnableDelayedExpansion
set "name1=https://www.some-site.com/index.php?p=a¿select=59=1=¿2=3¿¿=4?=55321=abcd"

:loop
IF DEFINED name1 (
	IF "%name1:~0,1%" EQU "=" (
		set str=!str!¡
	) ELSE (
		set str=!str!%name1:~0,1%
	)
	set "name1=%name1:~1%"
	GOTO loop
)
echo Modified string: !str!
pause
the batch crashes after the last time the IF DEFINED part is executed, with the error message "( was not expected at this time" flashing by. (in fact it's waaaay too quick to read, but with a camera one can do miracles. :D )
This error message seems to hint that sth. goes wrong with the ending of the loop, and that it still tries to enter the code block after the string has been reduced to empty, instead of continuing with the "pause" command. But if that's the case, I can't explain it. :(

I'll add to this, that I'm not sure why this technique works with "mystring" (in the original code), i.e. without percent symbols around the variable, as I'd rather expect %mystring%.
Interestingly, the latter (i.e. %name1%) seems to work as well for my code, until it runs into another kind of problem (but no crash this time). The output string "str" is correct, until it reaches the last "=" sign; next, the remainder of the string is no longer appended to "str". As if sth. else happens because of the "=" sign. But I have no clue...

NB:
expected output (variable "str") would be:

https://www.some-site.com/index.php?p¡a¿select¡59¡1¡¿2¡3¿¿¡4?¡55321¡abcd

Note that this may not be displayed correctly in your output screen (depending on system settings), but the preview of this forum's interface indicates that here they're displayed correctly.

Any suggestions to what goes wrong in the above code, are welcome, of course.


BatchGuy
(Win7 Ultimate, x64)

aGerman
Expert
Posts: 3761
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Split string to characters

#21 Post by aGerman » 14 Nov 2019 11:33

I can't reproduce your problem.
Your script example outputs ...
Modified string: https://www.some-site.com/index.php?pía┐selectí59í1í┐2í3┐┐í4?í55321íabcd
... which is absolutely expected since neither ¿ nor ¡ belong to ASCII. I saved the script in Windows-1252 (which is my default ANSI code page) but the cmd interprets it in CP850 (which is my default OEM code page). As soon as I incorporate a ...
>nul chcp 1252
... at the beginning of the script I get ...
Modified string: https://www.some-site.com/index.php?p¡a¿select¡59¡1¡¿2¡3¿¿¡4?¡55321¡abcd
... which exactly matches your expectations.

If you have characters with a special meaning in your test string (like &) you have to use quotation marks around the assignments like that:

Code: Select all

@echo off
setlocal EnableDelayedExpansion
>nul chcp 1252
set "name1=https://www.some-site.com/index.php?p=a&select=59=1=&2=3&&=4?=55321=abcd"

:loop
IF DEFINED name1 (
	IF "%name1:~0,1%" EQU "=" (
		set "str=!str!¡"
	) ELSE (
		set "str=!str!%name1:~0,1%"
	)
	set "name1=%name1:~1%"
	GOTO loop
)
echo Modified string: !str!
pause
Output:
Modified string: https://www.some-site.com/index.php?p¡a&select¡59¡1¡&2¡3&&¡4?¡55321¡abcd

Whether or not 1252 is the right code page for the chcp command in your case depends on the code page that you used to save the script.

Steffen

BatchGuy
Posts: 3
Joined: 13 Nov 2019 08:35

Re: Split string to characters

#22 Post by BatchGuy » 15 Nov 2019 11:18

Hi Steffen,

I'm still not sure if my issue is Code Page related.
Granted, my on-screen display is blurred by rubbish displayal of both "exotic" characters I use, but whatever I try, I cannot get this test string (ULR) correct on my screen.
My machine's CP is 850, but none of the regular CP values I tried (437, 850, 858, 1250, 1252), results in a correct on-screen display.

However, all these CP values redirect the string correctly to a txt file (verified by redirecting the last echo to a txt file), be it never the complete string, the "¡abcd" bit is always missing! :(
That these redirected values are the correct rendering of the values really processed by the code, is why I assume that the issue is not about the Code Page.
For either on-screen or redirected: I never get the full string.

Nevertheless, I may be wrong, as I cannot explain why you get the full string, and I don't, as you are telling that in fact my code is correct and does work alright on your side. Is it possible that this anomaly is caused by some difference in DOS versions?

BatchGuy

aGerman
Expert
Posts: 3761
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Split string to characters

#23 Post by aGerman » 15 Nov 2019 11:42

It might be helpful if you tell us about your Windows version, text editor, and encoding you used to save the code.
Is it possible that this anomaly is caused by some difference in DOS versions?
Not sure what version you're talking about, but I'm pretty sure it isn't DOS since DOS is an operating system. I admit that the name of this website is pretty misleading. Most of the time we are talking about Windows Batch scripting. Topics about DOS Batch scripting are very rare.

Steffen

BatchGuy
Posts: 3
Joined: 13 Nov 2019 08:35

Re: Split string to characters

#24 Post by BatchGuy » 15 Nov 2019 11:54

aGerman wrote:
15 Nov 2019 11:42
It might be helpful if you tell us about your Windows version, text editor, and encoding you used to save the code.

Windows version was mentioned already (Win7 Ultimate, x64).
Batch is written in Notepad, and is a proper ANSI txt file. (as required for batches, if I'm not mistaken)

BatchGuy


EDIT:
reading this SO topic about How to use unicode characters in Windows command line?,
I once more tried CP 65001, toggled with the fonts (changing away from Lucida Console and back), changed CP back to 1252 again, and all of the sudden the on-screen display is correct. Beats me. :roll:
But the missing string bit remains missing...

aGerman
Expert
Posts: 3761
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Split string to characters

#25 Post by aGerman » 15 Nov 2019 13:01

It doesn't help if you use CHCP 65001 in your code if your script isn't UTF-8-encoded (without Byte Order Mark).

Use a HEX editor and see if there is an invisible character somewhere before the missing substring.

Steffen

Post Reply