split string into substrings based on delimiter

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
thefeduke
Posts: 211
Joined: 05 Apr 2015 13:06
Location: MA South Shore, USA

Re: split string into substrings based on delimiter

#31 Post by thefeduke » 23 Dec 2017 13:33

I have modified my last post that contained code with comments to correct the negative return code and to point out a restriction in using overlapping search strings.
John A.

Aacini
Expert
Posts: 1719
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: split string into substrings based on delimiter

#32 Post by Aacini » 19 Feb 2018 15:25

I developed a new method that allows to use this technique to split two variables in the same replacement line, although it uses a trick in order to avoid the REM command that is usually used for this purpose, but that can not work in this case. The trick can also be used to split more than two variables.

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "str=1.0.2.25"
set "vars=Major Minor Revision Subrev"

set "p=%%"
set "v=%vars: =" & set "s=!str:*.=!" & call set "!v!=!p!str:.!s!=!p!" & set "str=!s!" & set "v=%" & set "!v!=!s!"

echo Major: %Major%, Minor: %Minor%, Revision: %Revision%, Subrev: %Subrev%
Antonio

jeb
Expert
Posts: 987
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: split string into substrings based on delimiter

#33 Post by jeb » 19 Feb 2018 16:33

Hi Aacini,

really nice :!: :D
It takes a minute to understand your code :idea:

Once upon a time, someone told me, that it's nice to explain a bit more and ever after I tried hard. :D

I suppose for some readers it would be helpful, when you show your idea.

jeb

IcarusLives
Posts: 132
Joined: 17 Jan 2016 23:55

Re: split string into substrings based on delimiter

#34 Post by IcarusLives » 25 Feb 2018 20:50

Aacini wrote:
19 Feb 2018 15:25
I developed a new method that allows to use this technique to split two variables in the same replacement line, although it uses a trick in order to avoid the REM command that is usually used for this purpose, but that can not work in this case. The trick can also be used to split more than two variables.

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "str=1.0.2.25"
set "vars=Major Minor Revision Subrev"

set "p=%%"
set "v=%vars: =" & set "s=!str:*.=!" & call set "!v!=!p!str:.!s!=!p!" & set "str=!s!" & set "v=%" & set "!v!=!s!"

echo Major: %Major%, Minor: %Minor%, Revision: %Revision%, Subrev: %Subrev%
Antonio
Aacini,

I'm really interested in how this works, but I'm struggling to understand it. Can you please explain with some detail this method?

Aacini
Expert
Posts: 1719
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: split string into substrings based on delimiter

#35 Post by Aacini » 26 Feb 2018 09:21

jeb wrote:
19 Feb 2018 16:33
Hi Aacini,

really nice :!: :D
It takes a minute to understand your code :idea:

Once upon a time, someone told me, that it's nice to explain a bit more and ever after I tried hard. :D

I suppose for some readers it would be helpful, when you show your idea.

jeb
IcarusLives wrote: I'm really interested in how this works, but I'm struggling to understand it. Can you please explain with some detail this method?

The purpose of the method is to split two strings in their parts. This can be easily done in two lines and then combine the parts in a third line:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "str=1.0.2.25"
set "vars=Major Minor Revision Subrev"

set "i=1" & set "s!i!=%str:.=" & set /A i+=1 & set "s!i!=%"
set "i=1" & set "v!i!=%vars: =" & set /A i+=1 & set "v!i!=%"
for /L %%i in (1,1,%i%) do set "!v%%i!=!s%%i!"

echo Major: %Major%, Minor: %Minor%, Revision: %Revision%, Subrev: %Subrev%
In this way, the goal is to complete the same split in just one line. To do that, in each division of the first string we must "split" the second string, that is, perform the equivalent process: take the value before the first dot to assign it to the variable, and ignore the rest of values. A first attempt to ignore the rest of the values (after the first dot) is inserting a "REM" command in the usual way:

Code: Select all

set "p=%%"
set "v=%vars: =" & call set "!v!=!p!str:.=& rem !p!" & set "str=!str:*.=!" & set "v=%" & set "!v!=!s!"
                   ^^^^          \________________/
The purpose of the marked part is to insert a REM command in place of the dots in the value of STR variable. However, this method have two problems: it is necessary to escape several special characters in this part in order to correctly execute the REM command, otherwise it is just assigned to the !V! variable. Anyway, when the REM command is successfully executed, it causes to ignore all commands after it, including the rest of commands that should split the rest of parts in the first string.

A workaround to do the same process without using a REM command is simple: first, take the rest of values after the first dot: set "s=!str:*.=!"; then, eliminate such a part from the variable (this gives just the first part): call set "!v!=!p!str:.!s!=!p!". Finally, assign the rest of values to the same variable (in preparation for the next part): set "str=!s!".

Code: Select all

set "v=%vars: =" & set "s=!str:*.=!" & call set "!v!=!p!str:.!s!=!p!" & set "str=!s!" & set "v=%" & set "!v!=!s!"
Note that three previous steps are performed in each part of the split of first variable, that is, they are placed inside the only percent-signs in the line. At end, the last values remains in their respective variables but have not been processed, so an additional set "!v!=!s!" is inserted at end of line...

As jeb would say: it's obvious! 8)

Antonio

IcarusLives
Posts: 132
Joined: 17 Jan 2016 23:55

Re: split string into substrings based on delimiter

#36 Post by IcarusLives » 28 Feb 2018 11:49

OOOOHHH Okay now I can see it. Thank you so much for the great explanation!

Aacini
Expert
Posts: 1719
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: split string into substrings based on delimiter

#37 Post by Aacini » 19 Aug 2018 10:25

I have extended the last method I posted here in order to easily split a string in several parts specified by the length of each one:

Code: Select all

@echo off
setlocal EnableDelayedExpansion


call :Split "10225" "Major:1 Minor:1 Revision:1 Subrev:2"
rem The result should be Major:1, Minor:0, Revision:2, Subrev:25
echo Major: %Major%, Minor: %Minor%, Revision: %Revision%, Subrev: %Subrev%


for /F "tokens=2 delims==" %%t in ('wmic os get localdatetime /value') do set "dateTime=%%t"
echo/
echo DateTime: %dateTime%
call :Split "%dateTime%" "Year:4 Month:2 Day:2 Hour:2 Minute:2 Second:2 _:7 Offset:4"
echo Year:%Year%  Month:%Month%  Day:%Day%  Hour:%Hour%  Minute:%Minute%  Second:%Second%  Offset:%Offset%

goto :EOF



:Split string "var1:len1 var2:len2 ..."
set "str=%~1" & set "vars=%~2 " & set "p=%%"
set "v=%vars: =" & set "len=!v:*:=!" & call set "name=!p!v::!len!=!p!" & call set "!name!=!p!str:~0,!len!!p!" & call set "str=!p!str:~!len!!p!" & set "v=%"
exit /B
Output:

Code: Select all

Major: 1, Minor: 0, Revision: 2, Subrev: 25

DateTime: 20180819111956.885000-300
Year:2018  Month:08  Day:19  Hour:11  Minute:19  Second:56  Offset:-300
The method is exactly the same than the last explained one, so no further explanations here... :roll:

Antonio

aGerman
Expert
Posts: 4276
Joined: 22 Jan 2010 18:01
Location: Germany

Re: split string into substrings based on delimiter

#38 Post by aGerman » 19 Aug 2018 13:07

That's neat. Thanks for sharing, Antonio!

Steffen

Aacini
Expert
Posts: 1719
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: split string into substrings based on delimiter

#39 Post by Aacini » 27 Sep 2021 21:47

I developed a couple new methods to simultaneously process two or more variables in a more efficient way. Let's start with this example:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "names=A B C D E F G H I J K L M N O P Q R S T U V W X Y Z "
set "values=1851174 310228712 23097674 165929174 945915997 23855547 206119680 110018783 280218208 23864878 963715423 33967149 271921049 72951533 171526681 188716352 893510509 189019180 26048663 214410110 233512550 909623440 83224965 145018318 24411586 214715718 "

rem Standard method to assign each value to each variable name
set "p=%%"
set "x=%names: =" & set "y=!values:* =!" & call set "var!x!=!p!values: !y!=!p!" & set "values=!y!" & set "x=%"

set var
The problem here is that the values variable is very long and the frequent assignment of its parts is not efficient. I wonder if I could separate variable values in parts without use !values:* =! and call !p!values: !y!=!p! slow !delayed! expansion substrings. The only method I could think of is a FOR /F command nested inside each expansion of the other variable. This is the first attempt:

Code: Select all

set "x=%names: =" & for /F "tokens=1*" %%a in ("!values!") do (set "var!x!=%%a" & set "values=%%b") & set "x=%"
However, this method fails because the percent char in %%a FOR parameter breaks the original expansion of x=%names variable.

I think of replace the %% part by !p!, that is:

Code: Select all

set "p=%%"
set "x=%names: =" & for /F "tokens=1*" !p!a in ("!values!") do (set "var!x!=!p!a" & set "values=!p!b") & set "x=%"
But the parsing of FOR command happens before !delayed! expansion, so the !p! part is not a valid syntax in FOR command.

The problem is this: although a combination of !delayed! expansion and other tricks could sucessfully generate a series of valid commands, such a commands are completed until the execution of !delayed! expansion, but the parsing of the FOR command happens before that...

If we carefully read previous paragraph, then we find a simple and obvious solution: first use !delayed! expansion to generate the series of commands, but do not execute they! Just store they in a variable. Then, in the next line, execute the contents of such a variable.

Of course, because in this case the commands are not executed, but stored, it is necessary to caret-escape certain special characters:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "names=A B C D E F G H I J K L M N O P Q R S T U V W X Y Z "
set "values=1851174 310228712 23097674 165929174 945915997 23855547 206119680 110018783 280218208 23864878 963715423 33967149 271921049 72951533 171526681 188716352 893510509 189019180 26048663 214410110 233512550 909623440 83224965 145018318 24411586 214715718 "

rem New method replicating FOR /F commands

rem Assemble the command line by including FOR /F commands
set "p=%%"
SET LINE=set "x=%names: =" ^& for /F "tokens=1*" !p!a in ("^!values^!") do (set "var^!x^!=!p!a" ^& set "values=!p!b") ^& set "x=%"

rem ... and run it
%LINE%

set var
A simple repeating timing test show that test1.bat takes 28 seconds, but test2.bat takes just 12 seconds. Less than the half! :)


This second method lead me to a new idea: if we can use a FOR /F command, then we could use it to access several elements at once (perhaps all of them), instead of just the "first" and the "rest" in each "iteration". If we can access all the elements at once, we could assemble a long line of assignments that would be processed in a single long SET /A command!

Although this method is more complicated than the formers, the number of executed comands would be much lesser, so the execution should be faster, specially if large amounts of data are processed. This is the implementation of the last method:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem New method using just a couple FOR /F commands and a long SET /A command
rem Antonio Perez Ayala

rem Initialization part: can be performed just once
set  "tokensNames=? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] "
set "tokensValues=_ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } "

rem Use the previous new method to process tokensNames and tokensValues string pairs
rem to create a LINE variable that will generate assignment pairs
set "p=%%"
SET LINE=set "SETS=" ^& set "x=%tokensNames: =" ^& for /F "tokens=1*" !p!a in ("^!TokensValues^!") do (set "SETS=^!SETS^!var!p!^!x^!=!p!!p!a," ^& set "TokensValues=!p!b") ^& set "x=%"

rem Run created LINE variable to generate SETS variable containing %%A=%%a,%%B=%%b,... FOR /F tokens assignments pairs
%LINE%

rem You can use SETS variable from now on

rem =====================================

rem Process data part

rem "names" variable must have an additional "| " element at end
set "names=A B C D E F G H I J K L M N O P Q R S T U V W X Y Z | "
set "values=1851174 310228712 23097674 165929174 945915997 23855547 206119680 110018783 280218208 23864878 963715423 33967149 271921049 72951533 171526681 188716352 893510509 189019180 26048663 214410110 233512550 909623440 83224965 145018318 24411586 214715718 "

rem Match names vs. values and assemble these assignments pairs: varA=1851174,varB=310228712,...,varZ=214715718,var|=,
for /F "tokens=1-31" %%? in ("%names%") do for /F "tokens=1-31" %%_ in ("%values%") do set "VALS=%SETS%"

rem Execute the assignments
set /A "%VALS:,var|=" & REM "%"

set var
Note that this method works over a maximum of 30 elements (FOR /F tokens) only. Previous methods works on an unlimited number of elements, as long as the resulting command line is no longer than 8191 characters. If necessary, this method could be extended to an unlimited number of tokens.

When repeating the whole test3.bat, the timing test take 18 seconds. However, when the initialization part was performed just once and then the process data part was repeated, the timing test was just 5.35 seconds! :shock: Less than 1/5 of the original time! :D

Antonio

Post Reply