Page 1 of 1

Split text file at start marker and blank line or just start marker into multiple files

Posted: 05 Jan 2019 06:44
by Blazer
I am trying to decide the best way of splitting the following single .rc file into mutiple files.

The start of each split are the lines where the first column is a number and the second column begins with the word DIALOG
The end of each split should be either a blank line or another start line (in case the blank line is missing)
Each filename should be the first column (number) from each start line eg. 1003.rc

This is the code I have so far, but I don't think I need to store the line numbers, also the output files should still contain the original blank lines

Code: Select all

set i=0
for /f "tokens=1,2 delims=: " %%A in ('^(type "Dialog.rc" ^| "%SystemRoot%\System32\findstr.exe" /b /n /r "^[1-9][0-9]* DIALOG"^) 2^>nul') do (
  set /a i+=1
  set array_line[!i!]=%%A
  set array_name[!i!]=%%B
)
The following example should create four files named 1003.rc 1004.rc 1005.rc 1006.rc

Any help would be greatly appreciated :)

Code: Select all

1003 DIALOGEX 0, 0, 227, 93
STYLE DS_SHELLFONT | DS_MODALFRAME | DS_NOIDLEMSG | WS_POPUP | WS_CAPTION | WS_SYSMENU
EXSTYLE WS_EX_APPWINDOW
CAPTION "Run"
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
FONT 9, "Segoe UI"
{
   CONTROL 160, 12297, STATIC, SS_ICON | WS_CHILD | WS_VISIBLE, 7, 11, 21, 20 
   CONTROL "Type the name of a program, folder, document, or Internet resource, and Windows will open it for you.", 12289, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 36, 11, 182, 22 
   CONTROL "&Open:", 12305, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 7, 39, 24, 10 
   CONTROL "", 12298, COMBOBOX, CBS_DROPDOWN | CBS_AUTOHSCROLL | CBS_DISABLENOSCROLL | WS_CHILD | WS_VISIBLE | WS_VSCROLL | WS_TABSTOP, 36, 37, 183, 200 
   CONTROL "Run in separate &memory space", 12306, BUTTON, BS_AUTOCHECKBOX | WS_CHILD | WS_VISIBLE | WS_DISABLED | WS_TABSTOP, 40, 50, 183, 10 
   CONTROL "OK", 1, BUTTON, BS_DEFPUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 62, 70, 50, 14 
   CONTROL "Cancel", 2, BUTTON, BS_PUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 116, 70, 50, 14 
   CONTROL "&Browse...", 12288, BUTTON, BS_PUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 170, 70, 50, 14 
}

1004 DIALOGEX 20, 20, 227, 69
STYLE DS_SHELLFONT | DS_MODALFRAME | DS_NOIDLEMSG | DS_CENTER | WS_POPUP | WS_CAPTION | WS_SYSMENU
CAPTION "Missing Shortcut"
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
FONT 8, "MS Shell Dlg"
{
   CONTROL 134, -1, STATIC, SS_ICON | SS_REALSIZECONTROL | WS_CHILD | WS_VISIBLE, 7, 7, 21, 20 
   CONTROL "Windows is searching for %s. To locate the file yourself, click Browse.", 102, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 35, 7, 187, 30 
   CONTROL "Cancel", 2, BUTTON, BS_DEFPUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 169, 47, 50, 14 
   CONTROL "&Browse...", 12288, BUTTON, BS_PUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 115, 47, 50, 14 
}
1005 DIALOGEX 0, 0, 259, 75
STYLE DS_SHELLFONT | DS_MODALFRAME | DS_NOIDLEMSG | WS_POPUP | WS_CAPTION | WS_SYSMENU
EXSTYLE WS_EX_APPWINDOW
CAPTION "Run"
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
FONT 9, "Segoe UI"
{
   CONTROL 160, 12297, STATIC, SS_ICON | WS_CHILD | WS_VISIBLE, 7, 3, 16, 16 
   CONTROL "Type the name of a program, folder, document, or Internet resource", 12289, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 40, 7, 212, 11 
   CONTROL "&Open:", 12305, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 7, 25, 32, 8 
   CONTROL "", 12298, COMBOBOX, CBS_DROPDOWN | CBS_AUTOHSCROLL | CBS_DISABLENOSCROLL | WS_CHILD | WS_VISIBLE | WS_VSCROLL | WS_TABSTOP, 40, 22, 210, 200 
   CONTROL "Run in separate &memory space", 12306, BUTTON, BS_AUTOCHECKBOX | WS_CHILD | WS_VISIBLE | WS_DISABLED | WS_TABSTOP, 7, 55, 97, 10 
   CONTROL "", 12326, STATIC, SS_ICON | WS_CHILD | WS_VISIBLE, 40, 37, 16, 16 
   CONTROL "This task will be created with administrative privileges.", 12327, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 54, 38, 200, 11 
   CONTROL "OK", 1, BUTTON, BS_DEFPUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 108, 54, 45, 14 
   CONTROL "Cancel", 2, BUTTON, BS_PUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 157, 54, 45, 14 
   CONTROL "&Browse...", 12288, BUTTON, BS_PUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 205, 54, 45, 14 
}

1006 DIALOG 0, 0, 240, 55
FONT 9, "Segoe UI"
{
   CONTROL 160, 12297, STATIC, SS_ICON | WS_CHILD | WS_VISIBLE, 7, 3, 16, 16 
   CONTROL "Type the name of a program, folder, document, or Internet resource", 12289, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 40, 7, 212, 11 
}

Re: Split text file at start marker and blank line or just start marker into multiple files

Posted: 05 Jan 2019 09:21
by aGerman
That might work for you:

Code: Select all

@echo off &setlocal
set "ressource=Dialog.rc"

setlocal EnableDelayedExpansion

for /f %%A in ('type "!ressource!"^|find /c /v ""') do set /a "line_cnt=%%A"

set "i=-1"
for /f "tokens=1,2 delims=: " %%A in ('type "!ressource!"^|findstr /nrbc:"[0-9][0-9]* DIALOG"') do (
  set /a "i+=1"
  set "array_begin[!i!]=%%A"
  set "array_name[!i!]=%%B"
)

for /l %%i in (1 1 %i%) do (
  set /a "idx=%%i-1"
  for /f %%j in ("!idx!") do set /a "array_end[%%j]=!array_begin[%%i]!-1"
)
set /a "array_end[%i%]=line_cnt"


<"!ressource!" (
  for /l %%i in (2 1 %array_begin[0]%) do set /p "="
  for /l %%i in (0 1 %i%) do (
    >"!array_name[%%i]!.rc" (
      for /l %%j in (!array_begin[%%i]! 1 !array_end[%%i]!) do (
        set "line=" &set /p "line="
        echo(!line!
      )
    )
  )
)
Steffen

Re: Split text file at start marker and blank line or just start marker into multiple files

Posted: 05 Jan 2019 12:32
by Aacini
Simpler...

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "last=1"
< Dialog.rc (
   for /F "tokens=1,2 delims=: " %%a in ('(type Dialog.rc ^& echo 0 DIALOG^) ^| findstr /N /R /C:"^[0-9][0-9]* DIALOG"') do (
      set /A "lines=%%a-last, last=%%a"
      if !lines! gtr 0 (
         set /P "line="
         for /L %%i in (2,1,!lines!) do (
            echo(!line!
            set "line=" & set /P "line=" 
         )
         if defined line echo(!line!
      ) > "!file!.rc"
      set "file=%%b"
   )
)
Antonio

Re: Split text file at start marker and blank line or just start marker into multiple files

Posted: 06 Jan 2019 06:47
by Blazer
@aGerman
@Aacini

Thank you, both solutions create the individual files but they contain garbage characters :(

I assume this is because the input file is unicode, Notepad++ reports it has "UCS-2 LE BOM" encoding

I tried changing the following section of code

Code: Select all

< Dialog.rc (
to

Code: Select all

type Dialog.rc ^|(
but that did not work

Re: Split text file at start marker and blank line or just start marker into multiple files

Posted: 06 Jan 2019 09:36
by aGerman
Blazer wrote:
06 Jan 2019 06:47
I assume this is because the input file is unicode, Notepad++ reports it has "UCS-2 LE BOM" encoding
Don't rely on NP++. It can't handle UTF-16 (it doesn't even know of UTF-16). This was reported as bug way more than 10 years ago but the developers ignore it. Thus, they ignore the existence of languages like Japanese, Chinese and Korean on Windows. That's weird and the main reason why I once stopped using NP++. Your file is most likely UTF-16 LE encoded.

I don't think you'll ever have other then ASCII-compliant content. You could use TYPE and redirect the output to a temporary file. Then process the content of the temporary file to generate the new files.

Steffen

Re: Split text file at start marker and blank line or just start marker into multiple files

Posted: 06 Jan 2019 23:37
by dbenham
This is very simple to perform with JREPL.BAT :D

Assuming the input file is indeed UTF-16 LE, and you want your output to be the same, then:

Code: Select all

jrepl "^(\d+) DIALOG" "$txt=$0; openOutput($1+'.rc',false,true);" /jq /utf /f dialog.rc
The /jq option instructs that the 2nd argument is JScript code.

The /utf option treats input as UTF-16 LE.

The /f option followed by the file specifies the input file.

The first argument is the regular expression search that matches an integer at the beginning of a line, followed by a space and DIALOG. The integer is captured in $1.

The 2nd argument is JScript that is executed for each match. First $txt=$0 preserves the matching text without change. Then openOutput opens an output file with the correct name. The false argument means do not append, and the true argument specifies UTF-16 LE output.

If the input file is in some other encoding, then the command can easily be changed to work with most any encoding. Just tell me what encoding you are using.


Dave Benham

Re: Split text file at start marker and blank line or just start marker into multiple files

Posted: 07 Jan 2019 03:20
by Blazer
aGerman wrote:
06 Jan 2019 09:36

Don't rely on NP++. It can't handle UTF-16 (it doesn't even know of UTF-16). This was reported as bug way more than 10 years ago but the developers ignore it. Thus, they ignore the existence of languages like Japanese, Chinese and Korean on Windows. That's weird and the main reason why I once stopped using NP++. Your file is most likely UTF-16 LE encoded.
@aGerman
How can I find the real encoding?
Which editor would you recommend to use?

@dbenham
Thank you, I will try JREPL.BAT today. :)

Re: Split text file at start marker and blank line or just start marker into multiple files

Posted: 07 Jan 2019 11:37
by aGerman
Blazer wrote:
07 Jan 2019 03:20
How can I find the real encoding?
This is difficult. Usually text editors use the Byte Order Mark (BOM) or they read some thousand characters and apply some statistical or heuristically methods to guess the encoding. In your case just open the file in a HEX editor. If the first two bytes are FF FE it will be UTF-16 LE (because that's the BOM for it).
Which editor would you recommend to use?
Any that supports highlighting for the languages you're writing code. I'm used to using PSPad but that's certainly not the only alternative you have.

JREPL is really a powerful tool for that kind of task. The advantage is that you're able to keep UTF-16 which would be a pain using on-board utilities in Batch.

Steffen

Re: Split text file at start marker and blank line or just start marker into multiple files

Posted: 08 Jan 2019 03:07
by Blazer
Thank you to everyone who helped with this solution :D

Re: Split text file at start marker and blank line or just start marker into multiple files

Posted: 15 Jan 2019 03:56
by Blazer
dbenham wrote:
06 Jan 2019 23:37
This is very simple to perform with JREPL.BAT :D

Assuming the input file is indeed UTF-16 LE, and you want your output to be the same, then:

Code: Select all

jrepl "^(\d+) DIALOG" "$txt=$0; openOutput($1+'.rc',false,true);" /jq /utf /f dialog.rc
The /jq option instructs that the 2nd argument is JScript code.

The /utf option treats input as UTF-16 LE.

The /f option followed by the file specifies the input file.

The first argument is the regular expression search that matches an integer at the beginning of a line, followed by a space and DIALOG. The integer is captured in $1.

The 2nd argument is JScript that is executed for each match. First $txt=$0 preserves the matching text without change. Then openOutput opens an output file with the correct name. The false argument means do not append, and the true argument specifies UTF-16 LE output.

If the input file is in some other encoding, then the command can easily be changed to work with most any encoding. Just tell me what encoding you are using.


Dave Benham
After testing I decided the best option was to use JREPL.BAT

Why is JREPL unable to find the input file which is in the current directory?

Code: Select all

jrepl "^(\d+) DIALOG" "$txt=$0; openOutput($1+'.rc',false,true);" /jq /utf /f dialog.rc
JScript runtime error opening input file: File not found
If I specify the full path to the input file then JREPL puts the output files into the parent folder instead of the folder containing the input file

Code: Select all

jrepl "^(\d+) DIALOG" "$txt=$0; openOutput($1+'.rc',false,true);" /jq /utf /f C:\Temp\Work\dialog.rc
dbenham wrote:
06 Jan 2019 23:37
The first argument is the regular expression search that matches an integer at the beginning of a line, followed by a space and DIALOG. The integer is captured in $1.
I want to change the match string to the following regular expression and capture the first none whitespace token in $1 to use as the filename

Code: Select all

^[%space%%tab%]*[^%space%%tab%][^%space%%tab%]*[%space%%tab%][%space%%tab%]*DIALOG
EDIT: the correct command line and regex string is as follows

Code: Select all

jrepl "^[ \t]*^([^ \t][^ \t]*)[ \t][ \t]*DIALOG" "$txt=$0; openOutput('C:\\Temp\\Work\\'+$1+'.rc', false, true);" /jq /utf /f "C:\Temp\Work\dialog.rc"
Thank you :)