Are <LF> and <CR><LF> differences ignored by "FC /W x y"

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
alan_b
Expert
Posts: 357
Joined: 04 Oct 2008 09:49

Are <LF> and <CR><LF> differences ignored by "FC /W x y"

#1 Post by alan_b » 29 Oct 2011 14:51

I have a Unix style of internet garbage in U.txt with <LF> Unix E.O.L. (End Of Line terminators)
and this is manipulated into a similar format into W.txt with <CR><LF> Windows E.O.L.

I find that "FC U.txt W.txt" has a 99.6% chance of ignoring the E.O.L. discrepancy of each line of text.
It is exactly 0% chance of ignoring if the E.O.L. is exactly 255 characters after the previous E.O.L.

"FC /W U.txt W.txt" has a 100 % certainty of ignoring this discrepancy.
I wish to believe this is due to /W not only ignoring Tab/Space discrepancies as Whitespace,
but also viewing <LF> and <CR><LF> as being equivalent White Space.

I am concerned by my recent encounter with Total Chaos when CMD.EXE commands misfire,
due to non-human "text" that included % and | and = characters which have "special significance".

I would appreciate assurance that "FC /W U.txt W.txt" has indeed fixed the problem by disregarding all E.O.L. as White space,
and not simply changed the special error length from 255 to some other number I will encounter next week :shock:
Alternatively I could use MORE U.txt > X.txt followed by "FC X.txt W.txt"
but then I fear MORE could also be subject to the same problems with %,|, and =.

For the context of my application / purpose please see
viewtopic.php?f=3&t=2368

Regards
Alan

aGerman
Expert
Posts: 4705
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Are <LF> and <CR><LF> differences ignored by "FC /W x y"

#2 Post by aGerman » 30 Oct 2011 05:26

Well Alan, because of "non-human text" you should use FC /B instead :P
Seriously, /W is made to compress tabs and spaces for comparisions. If it can solve problems with Lf and CrLf it will be some kind of odd undefined behavior. In my opinion not a good base if you need to make sure ...

Regards
aGerman

alan_b
Expert
Posts: 357
Joined: 04 Oct 2008 09:49

Re: Are <LF> and <CR><LF> differences ignored by "FC /W x y"

#3 Post by alan_b » 30 Oct 2011 11:46

Actually my first approach was to try ALL the options including that one.
For my simple test with 4 lines I had to set the screen buffer to 1000 lines so I could see the binary dump commenced at

Code: Select all

Comparing files 0-moz_down.csv and 0-MOZ_DOWNX.CSV
000000FE: 0A 0D
000000FF: 32 0A
00000100: 35 32
00000101: 2C 35
00000102: 22 2C

I accept your opinion that
If it can solve problems with Lf and CrLf it will be some kind of odd undefined behavior.

and consider this to be a "second strike" against a tool whose behaviour varies with the length of line.

The tool MORE has zero strikes against it so I will export from SqliteManger the file U-moz_down.csv and then run
MORE U-moz_down.csv > 0-moz_down.csv
after which my manipulation algorithm and manipulation validation test will only have Windows <CR><LF> lines to process and compare.

Thanks.
You have helped clarify my thinking on the safest route.
I still have fears that CMD.EXE might inflict strangeness on MORE when it encounter any '%' characters,
but if CMD.EXE should interpret/alter the flow of a text string through MORE then I guess no tool is safe :roll:

Regards

Alan

aGerman
Expert
Posts: 4705
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Are <LF> and <CR><LF> differences ignored by "FC /W x y"

#4 Post by aGerman » 30 Oct 2011 15:07

When I trifled with "/B" I tried to point you in the right direction (even if I knew it's not applicable in your case). Of course there is a difference between Lf and CrLf. Without changing the line break there is only one safe way to compare those files...
Since Lf is no Windows line break it's clearly your fault that you tried to apply text comparision to such files :twisted: (Just kidding :wink: That's the way how the Redmond guys would argue though.) In dead earnest I think FC is right if it distinguishes between a line terminated by Lf or CrLf. But I'm with you that it's absolutely beyond the pale if the behaviour varies with the length of a line.

However, after you "normalized" the line breaks via MORE you should give FC a new chance :wink:

Regards
aGerman

alan_b
Expert
Posts: 357
Joined: 04 Oct 2008 09:49

Re: Are <LF> and <CR><LF> differences ignored by "FC /W x y"

#5 Post by alan_b » 30 Oct 2011 15:40

Yes, I am happy to take FC as the decision maker now that Unix/Windows confusion is removed.

Regards
Alan

aGerman
Expert
Posts: 4705
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Are <LF> and <CR><LF> differences ignored by "FC /W x y"

#6 Post by aGerman » 30 Oct 2011 19:46

I'm not sure whether I should mention that but there is one issue leftover:
Why the heck does MORE change the line breaks and when will M$ change this behaviour?

alan_b
Expert
Posts: 357
Joined: 04 Oct 2008 09:49

Re: Are <LF> and <CR><LF> differences ignored by "FC /W x y"

#7 Post by alan_b » 31 Oct 2011 06:30

I have just tested "SORT Unix.csv > Windows.csv"
and as I suspected the Windows.csv output preceded every <LF> with <CR>

Then I tried "TYPE Unix.csv > Windows.csv"
and found no <CR> in the output.

I think TYPE is the odd one out.
I guess all the other internal commands which affect the input/output text flow will also deliberately damage Unix compatibility
Nice one Microsoft :twisted:

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Are <LF> and <CR><LF> differences ignored by "FC /W x y"

#8 Post by dbenham » 31 Oct 2011 07:53

Well, at least the glass is half full - MORE, SORT, FC etc could have failed entirely to recognize Unix style <LF> newlines :wink: (I suppose one could argue that would be the preferred behavior)

Dave Benham

alan_b
Expert
Posts: 357
Joined: 04 Oct 2008 09:49

Re: Are <LF> and <CR><LF> differences ignored by "FC /W x y"

#9 Post by alan_b » 31 Oct 2011 09:43

True.

My original fear when NotePad first looked at SQLiteManager output to both *.SQL and *CSV was by processing a never ending line of UNIX <LF> text with

Code: Select all

for /f "tokens=1* delims=," %%d in ('type ~%FILE%') do (

the output would result in %%e holding not only the remnant of the first "line" after ',' but also would continue with all subsequent text including <LF>.

I suspect Microsoft know how to deliver text in UNIX <LF> format because they know how to "steal" from such documents,
but prefer to keep any output documents they process for the user should be kept hostage to Windows :roll:

alan_b
Expert
Posts: 357
Joined: 04 Oct 2008 09:49

Re: Are <LF> and <CR><LF> differences ignored by "FC /W x y"

#10 Post by alan_b » 01 Nov 2011 12:07

I am now happy that "FC /W" is working perfectly.
In the 'C' programming language White Space is not restricted to TABS and SPACE, but also specifically includes blank lines.
I think this is probably true for most programming languages.

I am using the Portable Freeware METAPAD which allows File Formats of either DOS or UNIX type.
This composed -moz3.csv and -moz4.csv in UNIX mode.
-moz3.csv has 4 lines of text , each terminated by only <LF>
-moz4.csv has the same 4 lines, but with an extra <LF> immediately before the third line.
I grabbed this from a DOS screen

Code: Select all

E:\Test\BAT\New\New>TYPE -MOZ3.CSV
A123456789
B123456789
C123456789
D123456789

E:\Test\BAT\New\New>TYPE -MOZ4.CSV
A123456789
B123456789

C123456789
D123456789

E:\Test\BAT\New\New>FC -MOZ3.CSV -MOZ4.CSV
Comparing files -moz3.csv and -MOZ4.CSV
***** -moz3.csv
B123456789
C123456789
***** -MOZ4.CSV
B123456789

C123456789
*****

E:\Test\BAT\New\New>FC /W -MOZ3.CSV -MOZ4.CSV
Comparing files -moz3.csv and -MOZ4.CSV
FC: no differences encountered

As can be seen, FC considers the blank line due to a pair of <LF> in succession as White Space,
and correctly ignores white space if /W is applied.

I am now happy to use MORE to convert RAW UNIX <LF> text into RAW WINDOWS <CR><LF> text
and then use FC /W to confirm that the MORE conversion as not made any changes other than blank lines.
After that I can manipulate raw Windows text and there will be no further problems,
because any blank lines will have been done before manipulation

Regards
Alan

Post Reply