Page 3 of 3

Re: Closing parenthesis prevents escape of subsequent special character operator

Posted: 13 Sep 2019 18:24
by sst
penpen wrote:
09 Sep 2019 04:10
I mean... what is unclear to me is, that i don't get why MS seems to have built a parser, that behaves differently depending on the command token, when splitting the command and argument string:
If the expected output of "REM One^\r\n Two^ Three^\r\n This is the comment\r\n" is the (command, argument) pair ("REM", " This is the comment\r\n"),
then i would expect any other command to behave the same, but "for One^\r\n Two^ Three^\r\n %%a in (1 2 3) do echo(%%~a" seems to work differently.
The parser only cares about the arguments of IF, FOR and REM. To the parser they are not commands but keywords. Each of them have their own parser: ParseIf, ParseFor and ParseRem. (It only recognizes "REM", "REM/?", "FOR", "FOR/?", "IF", "IF/?" as keywords. So something like "REM/ comment" is an ordinary command).

On the other hand, any other internal or external command will be parsed by a single function: ParseCmd which doesn't care about the contents or format the arguments or the command itself. The arguments will be processed at execution phase by the corresponding internal command's function or by the external process. It doesn't know/care if the command is internal or external, the type of command will be determined at execution phase. (Unlike FOR and IF, REM will be recognized at execution phase as an internal NOP command, so something like "REM/ comment" works, but it is slower than "REM comment").

The ParseFor function checks the first token to see if it contains "/?", if so sets the help flag otherwise checks that same token to see if it is one of the /L, /D, /F or /R switches otherwise it checks if it begins with %, otherwise calls PSError.
penpen wrote:
09 Sep 2019 04:10
Sidenote:
If i had to guess i would have said that they used setjmp and longjmp and lost track of what they are doing exactly..., but
it now seems more they did that the classic way (function calls only, which i would always prefer, except for try/catch) -
with some unexpected decisions (at least from my viewpoint).
Because you know (parts of) the code i would like to know if they used setjmp and longjmp outside any try/catch block?
None of the parser functions which I've studied, uses exception handling, as no SEH prolog/epilog is present in the functions code. As you've guessed they used setjmp/longjmp. They used setjmp once in the parser's root function Parser, and used longjmp in PSError and PError functions but I don't see the relevance.

Re: Closing parenthesis prevents escape of subsequent special character operator

Posted: 15 Sep 2019 07:20
by penpen
sst wrote:
13 Sep 2019 18:24
The parser only cares about the arguments of IF, FOR and REM. To the parser they are not commands but keywords. Each of them have their own parser: ParseIf, ParseFor and ParseRem. (It only recognizes "REM", "REM/?", "FOR", "FOR/?", "IF", "IF/?" as keywords. So something like "REM/ comment" is an ordinary command).
(...)
I'm not sure if you misunderstood what i had written and use a different definition of "different parsers", or if there really are multiple different parsers - both could be what you describe (depending on how you mean it).
From my usage of "different parsers", the interpreter would for example has to build multiple different independent parse-trees (or similiar), probably using different lexers, etc. working one after another.
What i suspect is, that there is only one parser, calling different functions depending on the detected internal command token (which related string should always be a keyword). To me that are just different branching trees of the same parser, because any reaction on a detected keyword when building the parse tree is (per definiton) done by the parser.
Maybe MS didn't encapsulate the top level of the parser in an own function (or it was inlined by an optimizing step, ...), so it might look like multiple different parsers (contrary to using different parsing rules depending on keywords).

sst wrote:
13 Sep 2019 18:24
They used setjmp once in the parser's root function Parser, and used longjmp in PSError and PError functions but I don't see the relevance.
My guess was,that MS uses it as a kind of fallback not only for errors, but for any reason you might use fallback, controlling the execution of the parser; for example code like that:

Code: Select all

union returnValue
{
	std::int32_t raw;
	struct detailedReturnValue { 
		std::int8_t error;
		std::int8_t token;
		std::int16_t state;
	} detailed;
};         


 /* ... */
 {
 	/* ... */
 	returnValue value;
	value.raw = setjmp(buf);
	
	switch (value.detailed.state) {
		case READ_IF: /* ... */
		case READ_NAME: /* ... */
		case 1: /* ... */
		 /* ... */
		default:
			 /* ... */
			break;
	}
	 /* ... */
}
Some parsers i saw are doing such (from my viewpoint) crazy things. They all suffer from a hard to follow program flow (which i had to debug: They implemented unwanted sideeffects, because they lost track of what their code does...).
I suspected MS to have done the same (mostly because of the amount of undocumented features) - but it seems i was wrong on that, else you most probably would have seen that.


penpen

Re: Closing parenthesis prevents escape of subsequent special character operator

Posted: 15 Sep 2019 18:14
by sst
penpen wrote:
15 Sep 2019 07:20
I'm not sure if you misunderstood what i had written and use a different definition of "different parsers", or if there really are multiple different parsers - both could be what you describe (depending on how you mean it).
From my usage of "different parsers", the interpreter would for example has to build multiple different independent parse-trees (or similiar), probably using different lexers, etc. working one after another.
May be I used bad wording. By "their own parser" I didn't mean different parsers with independent parse trees or something. What I meant was, as you've pointed out, different branches of the same parser. It was an attempt to address your question about why the parser behaves differently depending on the command token and why it does not behave the same for e.g. REM and FOR.

You may think of it this way: IF, FOR, REM are different from all other commands. They are part of the Batch/CMD language with their own syntax and parsing rules which will be handled by the parser. All other internal commands are not part the language, they are just utilities or programs which CMD serves internally; The language parser does not care about their command line syntax, as it does not care about the command line syntax of e.g. findstr.exe or other external programs.
So the behavior of the parser is the same for all other internal/external commands.

The internal commands/programs, just like external commands, will eventually parse their own command line string at execution time, but each internal program has its own independent command line parser which is really different from the Batch/CMD language parser.
penpen wrote:
15 Sep 2019 07:20
Some parsers i saw are doing such (from my viewpoint) crazy things. They all suffer from a hard to follow program flow ...
I got your point. Fortunately they used longjmp just for errors, with single point of return. The Lexer has its own setjmp though.

Re: Closing parenthesis prevents escape of subsequent special character operator

Posted: 16 Sep 2019 19:42
by penpen
sst wrote:
15 Sep 2019 18:14
It was an attempt to address your question about why the parser behaves differently depending on the command token and why it does not behave the same for e.g. REM and FOR.
I probably used bad wording myself (maybe because english isn't my native language and i posted pretty late):
Sorry, if i confused you with my post.

I didn't want to know why the parser behaves dfferently depending on the command token.
I wanted to know the idea behind auch a design (they could have used ParseCmd routine for all commands inlcuding IF, REM and FOR).

penpen