While working on a "universal" %DATE% parser it became necessary to sort tokens within a string. A few techniques were briefly bandied about that all relied on a fixed small number of tokens within the string. I thought it might be useful to have a generic function that can efficiently handle any number of tokens.
The first function I developed relies on a pipe to the SORT command, and is therefore case insensitive. I'm very happy that it does not require any explicit temporary files. (The SORT command can internally create a temporary file, but I doubt tokens within a single string could ever cause that to happen.) The performance is good, and it is virtually uneffected by the length of the string or the number of tokens.
Code: Select all
@echo off
setlocal
set "str=red blue yellow white black grey green purple orange"
echo: unsorted str = %str%
call :sortStrTokensI str
echo: ascending str = %str%
call :sortStrTokensI str /r
echo:descending str = %str%
echo:
set str=I i e E A a À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
echo: unsorted = %str%
call :sortStrTokensI str
echo: native sort = %str%
set str=I i e E A a À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
call :sortStrTokensI str "/l C"
echo:^>=128 binary sort = %str%
exit /b
:sortStrTokensI StrVar ["sort options"]
::
:: Perform a case insensitive sort of tokens within the string contained
:: by variable StrVar.
::
:: By default the tokens are sorted using the local collating sequence
:: in ascending order. All sorts are case insensitive.
::
:: The following sort options can over-ride default behaviour
::
:: /R Specifies a descending sort.
::
:: "/L C" Characters greater than ASCII 127 are sorted according to
:: their binary encoding.
::
:: Multiple options should be enclosed by a single pair of quotes
::
:: This function does not properly handle tokens containing * or ?
::
setlocal enableDelayedExpansion
set "str=!%~1!"
set "sorted="
for /f %%a in ('^(for %%t in ^(!str!^) do @echo %%t^)^|sort %~2') do set "sorted=!sorted! %%a"
(endlocal
set "%~1=%sorted:~1%"
)
exit /b
Here are the sortStrTokensI test results:
Code: Select all
unsorted str = red blue yellow white black grey green purple orange
ascending str = black blue green grey orange purple red white yellow
descending str = yellow white red purple orange grey green blue black
unsorted = I i e E A a À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
native sort = ï Ä Í É À È Ã Æ Ì Â Ë Á Ï Ê Å Î æ ì a A E e I i á à â ë î é ã ä å è í ê
>=128 binary sort = A a E e i I À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
My next attempt is a case sensitive version that does not rely on SORT or pipes. Performance is three times faster for a string with only 3 tokens. But performance dramatically suffers as the number of tokens grows. The function could be extended to support a case insensitive option, but I prefer the performance profile of the first function.
Code: Select all
@echo off
setlocal
set "str=red blue yellow white black grey green purple orange"
echo: unsorted str = %str%
call :sortStrTokens str
echo: ascending str = %str%
call :sortStrTokens str /r
echo:descending str = %str%
echo:
set str=I i e E A a À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
echo: unsorted str = %str%
call :sortStrTokens str
echo: ascending str = %str%
call :sortStrTokens str /r
echo:descending str = %str%
exit /b
:sortStrTokens StrVar [/R]
::
:: Perform a case sensitive sort of tokens within the string contained
:: by variable StrVar.
::
:: By default the tokens are sorted using the local collating sequence
:: in ascending order.
::
:: The case insenstive /R option specifies a descending sort
::
:: This function does not properly handle tokens containing * or ?
::
setlocal enableDelayedExpansion
set "str=!%~1!"
set "sorted="
if /i "%~2"=="/R" (set comp=geq) else set comp=leq
for %%t in (!str!) do (
if not defined sorted (set "sorted=%%t") else (
set "sorted2="
set placed=
for %%a in (!sorted!) do (
if not defined placed if %%t %comp% %%a (
set "sorted2=!sorted2! %%t"
set placed=true
)
set "sorted2=!sorted2! %%a"
)
if not defined placed set "sorted2=!sorted2! %%t"
set "sorted=!sorted2:~1!"
)
)
(endlocal
set "%~1=%sorted%"
)
exit /b
Here are the sortStrTokens test results:
Code: Select all
unsorted str = red blue yellow white black grey green purple orange
ascending str = black blue green grey orange purple red white yellow
descending str = yellow white red purple orange grey green blue black
unsorted str = I i e E A a À Á Â Ã Ä Å Æ È É Ê Ë Ì Í Î Ï à á â ã ä å æ è é ê ë ì í î ï
ascending str = ï Ä Í É À È Ã Æ Ì Â Ë Á Ï Ê Å Î æ ì a A e E i I á à â ë î é ã å ä í è ê
descending str = ê è í ä å ã é î ë â à á I i E e A a ì æ Î Å Ê Ï Á Ë Â Ì Æ Ã È À É Í Ä ï
Here is a comparison of the performance profile of the two functions
Code: Select all
Token Seconds to Perform 100 Iterations
Count sortStrTokensI sortStrTokens
----- -------------- -------------
3 2.7 0.8
10 2.9 1.4
20 2.7 2.4
30 2.8 4.1
50 2.9 9.6
As always I'm interested if anyone can point out problems, optimizations, or entirely new solutions.
Dave Benham