JREPL.BAT and UTF auto detect?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Outbreaker
Posts: 10
Joined: 08 Aug 2023 15:16

JREPL.BAT and UTF auto detect?

#1 Post by Outbreaker » 22 Aug 2023 07:49

Hi,
Dose the "JREPL.BAT" tool have an option to auto detect if a text file is encoded in UTF-8 or UTF-16 :?:

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT and UTF auto detect?

#2 Post by dbenham » 27 Aug 2023 05:47

No - there is no auto detection of encoding.

Use the /UTF option if all input and output is UTF-16LE

For more encoding flexibility you must use the ADO charset options available to /F and /O.

Use JREPL /?/F and JREPL /?/O to get help on using ADO character sets

Outbreaker
Posts: 10
Joined: 08 Aug 2023 15:16

Re: JREPL.BAT and UTF auto detect?

#3 Post by Outbreaker » 27 Aug 2023 12:44

The problem I have is that sometimes the text file is encoded in UTF-8 and sometimes in UTF-16LE depending on the Language a user has downloaded. :|

The tool rxrepl.exe has an /auto switch option, but it will corrupt the text files that are not encoded in UTF-8 Unicode (non-Latin) characters. :(

Would it be possible to add such an /auto (UTF-16LE auto detect) function into the JREPL.BAT tool :?:
Last edited by Outbreaker on 16 Sep 2023 03:13, edited 1 time in total.

Outbreaker
Posts: 10
Joined: 08 Aug 2023 15:16

Re: JREPL.BAT and UTF auto detect?

#4 Post by Outbreaker » 14 Sep 2023 03:43

I found a half automated solution for this. By using the FindStr command which can only search in UTF-8 encoding because of it's limitation.
If the FindStr command reads an UTF-16 file, it will see spaces between each character and fails to find the text.
The Batch script works by searching for a text in a file, and if it cannot find the text than it assumes the file is encoded in UTF-16 and it then triggers the /UTF switch.

The best way would be to do it with a Hybrid Batch/jScript by searching for the Hex codes below. But sadly my JScript coding skills suck. :(
UTF-16LE BOM, Header hex code = FF FE
UTF-16LE no-BOM, Windows Break Line hex code = 00 0D 00 0A 00 (no-BOM files don't have any header code)
UTF-16LE no-BOM, Linux Break Line hex code = 00 0A 00 (no-BOM files don't have any header code)

Half Automated Batch Solution ->
Replace /UTF: with /UTF: /UTFA:"" in the JREPL.bat file.
Add this above the :: Validate options in the JREPL.bat file.

Code: Select all

:: UTF-16 detection
If Not Defined /F GoTo SKIP
If Not Defined /UTFA GoTo SKIP
For /f "tokens=1 delims=|" %%i in ("%/F%") do (Set "inFile=%%i")
FindStr /l "%/UTFA%" "%inFile%" >NUL 2>&1 || Set "/UTF=1"
:SKIP

Now you can use the /UTFA command switch:

Code: Select all

CALL ".\JREPL.bat" "\[AddReg\]" "[AddReg]\r\nMy Little Pony." /XSEQ /UTFA "[Version]" /F "text.txt" /O "text1.txt"
If the [Version] text can not be found in the file then the script assumes it's a UTF-16 file and triggers/sets the /UTF switch.

Post Reply