Length of a string - my first contribution

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Jan Antonin
Posts: 9
Joined: 20 May 2018 13:53

Length of a string - my first contribution

#1 Post by Jan Antonin » 20 May 2018 16:10

Hi guys!
First I would like to say big Thank You! Thank you dbenham, jeb, Aacini, aGerman and others.
Although I dont use your amazing results at my work nor for making any profit I really appreciate them. It is not only about your results but it is rather about your attitude.
It may sound a little bit theatrically: Thank you for your existence! :-) Our world is full of idiots which we have to deal with and it is much easier trying to live my life knowing that somewhere on a planet somebody is thinking the very similar way as myself.

Many years ago I tried to write my first cmd script and I needed to measure a lenght of a string. Because this forum didnt exist at that time I had to type my own.
At the very same moment I tried to find out what big boys from Microsoft might mean when they talked about "Unicode". I worked with Excel at that time.

My whole idea was this:

@set "myVar=qwertyuiop"
@cmd /U /S /D /c set myVar|find /V /N ""

It is not necessary to write more details at this forum I am sure.
Another obvious option is a switch /C instead of /N
(Btw My biggest problem during those old days was to accept using an empty string with the switch /V. I had no idea that this sort of cripled thinking might exist in IT business at all. It is so obvious to me only by pure logic that empty string is a substring of every string - apparently not obvious for MS. My script without /V worked well and with shorter and shorter string-to search argument the number of found lines went up and up and at he very last step the script stopped working totally - no results at all. I had no idea why, it was very frustrating.

@cmd /U /S /D /c set myVar|find /V /C ""

I am almost sure that this method is totally "toxic-character-proof". According to some scripts I have seen at this board I dont dare to say my usual and humble estimate of my own certainty - I mean 100% :-)

My second contribution in a few next days will be a small one. How to avoid disaster if any toxic character is in a path of your data and scripts. How to safely call them using CMD command. Yes, I was so naive that I have created my own system of naming backups and incremental backups of all my files and in this system the most frequent character is a ^ carret :-). Lucky guy I am. I simply read MS documents and it was written that it is possible and no problem was meant - so I did it. I was young, naive and happy.

My biggest (not from the pragmatic point of view of course) contribution which I am really proud of will be an ultimate method of obtainig all command line arguments of a sript inside from the script of course. With no compromise, without any temp file. And storing these arguments inside a normal named variable like %myVar%, not %1.
I dont care about the speed (I care but there is no principal difference in a speed - the upper bound of time is polynomial depending on a lenght of an input :-) )
I took this goal as my personal chalenge. I have to addmit I almost gave up but finally it is done.
I can say I would have absolutely no chance to create such a script without a knowledge from this forum especially of Jeb, dbenham and Aacini.

To be honest I dont believe that such a script has not been written till this time. The same feeling I have with the idea of this post - measuring the lenght of a given string. But I tried really hard to search the Web and no success.
If anybody thinks that it would be better to move this post to somewhere where it belongs more I will be pleased.

Jan Antonin

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Length of a string - my first contribution

#2 Post by penpen » 21 May 2018 18:54

Jan Antonin wrote:
20 May 2018 16:10
My whole idea was this:

@set "myVar=qwertyuiop"
@cmd /U /S /D /c set myVar|find /V /N ""

(...)

I am almost sure that this method is totally "toxic-character-proof".
Hello Jan Antonin and welcome to this forum.

This approach is not bad, and i used (and still use) a similar script for a long time and it is indeed bullet proof against the usual toxic characters ("()!&|...").
But sad to say you can't use it in any situation: It totally fails for most possible characters.

The command shell internally encodes characters with UCS-2 (pretty similar to the default unicode UTF-16 le - the difference doesn't matter here).
When you are using the unicode characters U+0000 to U+00FF, then they are encoded as (in hex) "00 00" to "FF 00".
Your approach splits any single 2-byte long characters into 2 1-byte long characters.
The appplication "find.exe" interprets a NUL byte ("00") as a string terminator (which is in use since ansi-c was developed, and is still in use nowadays), and interprets this as a line terminator.

So your approach will only work with series of characters that have "00" in it (and also these NUL bytes must be at the same position in all characters that shall be counted.)
But unfortunately UCS-2 also supports characters without any partial NUL byte, for example '∑' (U+2211 == "◄), and '∫' (U+222b == "+):

Code: Select all

D:\>@set "myVar=abc∑∫xyz"

D:\>@cmd /U /S /D /c set myVar|find /V /N ""
[1]m
[2]y
[3]V
[4]a
[5]r
[6]=
[7]a
[8]b
[9]c
[10]◄"+"x
[11]y
[12]z
[13]
[14]
[15]
penpen

sst
Posts: 93
Joined: 12 Apr 2018 23:45

Re: Length of a string - my first contribution

#3 Post by sst » 21 May 2018 21:30

Besides of problems that may arise as a result of assumptions about unicode encoding which has been addressed by penpen, I can think of some other problems which will prevent the usage of this method as a general purpose string length routine.
This method will fail if
1. LineFeed character is present in string.
2. There is another environment variable which the first characters of it's name is the same as the target variable name. ex: myVar and myVar1
Of course It can be resolved using echo in conjunction with delayedExpansion which I think you were trying to avoid by using set in the first place.

Another problem is, one has know the length of the environment variable name in order to able to obtain length of the string it contains. So it can not be used as is for arbitrary variable names, extra steps are required.

But I liked the idea, even if it can not be used for this particular purpose.

About obtaining all command line argument which you said:
Jan Antonin wrote:
20 May 2018 16:10
My biggest (not from the pragmatic point of view of course) contribution which I am really proud of will be an ultimate method of obtainig all command line arguments of a sript inside from the script of course. With no compromise, without any temp file. And storing these arguments inside a normal named variable like %myVar%, not %1.

Is this something different that if offered by %* ? Maybe its too soon and I to wait to see your method.

Anyway welcome to dostips Jan

jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Length of a string - my first contribution

#4 Post by jeb » 22 May 2018 03:16

Hi Jan,

like sst I'm interessted in this part.
Jan Antonin wrote:
20 May 2018 16:10
My biggest (not from the pragmatic point of view of course) contribution which I am really proud of will be an ultimate method of obtainig all command line arguments of a sript inside from the script of course. With no compromise, without any temp file. And storing these arguments inside a normal named variable like %myVar%, not %1.
I thought very often about this and made many tries, but I'm not able to find a solution to avoid the temporary file nor to fetch arguments with embedded line feeds.
And I'm still sure that it's impossible to fetch carriage returns at all, as they are removed directly after the expansion of %1.

I'm waiting impatiently for your solution :D

jeb

Jan Antonin
Posts: 9
Joined: 20 May 2018 13:53

Re: Length of a string - my first contribution

#5 Post by Jan Antonin » 22 May 2018 03:42

yeah, sure, I always used it only for pure ASCII, it means
00100000 to 01111110 (BTW I always wonder what the hell the purpose for 01111111 was ...)

I am definitely not a developer, I am only interested.

about arguments:
Maybe I am totally dumb (it has already happened to me that I couldnt see something evident and spent huge time to make a very complex and totally useless solution, it happens sometimes).

But how can I obtain arguments in case they contain toxic-characters?

I remember I have read somewhere a post from jeb, something like "advanced REM technique".
If I rememer well jeb used a temp file.
I didnt and still dont like using a temp file at all.
Not because of a performance but because of beauty :-)

And I dont like and I dont wanna know "the iron" below.
For example the trick with a definition of LF is something what I can do inside the script but not output nor input.
[ I cannot ... I am not interested :-) ]

Thats why I love so much a thread here named "parsing rules".
It is totally ok to make a theory where objects like <LF> or <CR> exist. But the theory can be tested only by experiments with ASCII input and output.
I wonder in a connection of this attitude how difficult whould be to take CMD.exe, make a reverse ingeneering and extract THE THEORY directly from materia.
Or to investigate who was the author, then go and kidnap him, waterboard him and finally the result post here :-)

I have formulated a theoretical problem for me :
Let's assume that a concatenation of all arguments (= %*) may contain chars of ASCII table in a range 00100000 to 01111110 (I am using this form only to be short, I could simply type a list of all these printable characters).

The thing is some of them play a role of metachars at CMD syntax.

The goal is to define a variable myVar for which outputs to a screen of following two commands are equal:
REM %*
REM %myVar%

without using a temp file.

It is almost done, only some last details remain ... :-)

Post Reply