Split text file at start marker and blank line or just start marker into multiple files

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Blazer
Posts: 14
Joined: 20 Dec 2018 09:47

Split text file at start marker and blank line or just start marker into multiple files

#1 Post by Blazer » 05 Jan 2019 06:44

I am trying to decide the best way of splitting the following single .rc file into mutiple files.

The start of each split are the lines where the first column is a number and the second column begins with the word DIALOG
The end of each split should be either a blank line or another start line (in case the blank line is missing)
Each filename should be the first column (number) from each start line eg. 1003.rc

This is the code I have so far, but I don't think I need to store the line numbers, also the output files should still contain the original blank lines

Code: Select all

set i=0
for /f "tokens=1,2 delims=: " %%A in ('^(type "Dialog.rc" ^| "%SystemRoot%\System32\findstr.exe" /b /n /r "^[1-9][0-9]* DIALOG"^) 2^>nul') do (
  set /a i+=1
  set array_line[!i!]=%%A
  set array_name[!i!]=%%B
)
The following example should create four files named 1003.rc 1004.rc 1005.rc 1006.rc

Any help would be greatly appreciated :)

Code: Select all

1003 DIALOGEX 0, 0, 227, 93
STYLE DS_SHELLFONT | DS_MODALFRAME | DS_NOIDLEMSG | WS_POPUP | WS_CAPTION | WS_SYSMENU
EXSTYLE WS_EX_APPWINDOW
CAPTION "Run"
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
FONT 9, "Segoe UI"
{
   CONTROL 160, 12297, STATIC, SS_ICON | WS_CHILD | WS_VISIBLE, 7, 11, 21, 20 
   CONTROL "Type the name of a program, folder, document, or Internet resource, and Windows will open it for you.", 12289, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 36, 11, 182, 22 
   CONTROL "&Open:", 12305, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 7, 39, 24, 10 
   CONTROL "", 12298, COMBOBOX, CBS_DROPDOWN | CBS_AUTOHSCROLL | CBS_DISABLENOSCROLL | WS_CHILD | WS_VISIBLE | WS_VSCROLL | WS_TABSTOP, 36, 37, 183, 200 
   CONTROL "Run in separate &memory space", 12306, BUTTON, BS_AUTOCHECKBOX | WS_CHILD | WS_VISIBLE | WS_DISABLED | WS_TABSTOP, 40, 50, 183, 10 
   CONTROL "OK", 1, BUTTON, BS_DEFPUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 62, 70, 50, 14 
   CONTROL "Cancel", 2, BUTTON, BS_PUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 116, 70, 50, 14 
   CONTROL "&Browse...", 12288, BUTTON, BS_PUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 170, 70, 50, 14 
}

1004 DIALOGEX 20, 20, 227, 69
STYLE DS_SHELLFONT | DS_MODALFRAME | DS_NOIDLEMSG | DS_CENTER | WS_POPUP | WS_CAPTION | WS_SYSMENU
CAPTION "Missing Shortcut"
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
FONT 8, "MS Shell Dlg"
{
   CONTROL 134, -1, STATIC, SS_ICON | SS_REALSIZECONTROL | WS_CHILD | WS_VISIBLE, 7, 7, 21, 20 
   CONTROL "Windows is searching for %s. To locate the file yourself, click Browse.", 102, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 35, 7, 187, 30 
   CONTROL "Cancel", 2, BUTTON, BS_DEFPUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 169, 47, 50, 14 
   CONTROL "&Browse...", 12288, BUTTON, BS_PUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 115, 47, 50, 14 
}
1005 DIALOGEX 0, 0, 259, 75
STYLE DS_SHELLFONT | DS_MODALFRAME | DS_NOIDLEMSG | WS_POPUP | WS_CAPTION | WS_SYSMENU
EXSTYLE WS_EX_APPWINDOW
CAPTION "Run"
LANGUAGE LANG_ENGLISH, SUBLANG_ENGLISH_US
FONT 9, "Segoe UI"
{
   CONTROL 160, 12297, STATIC, SS_ICON | WS_CHILD | WS_VISIBLE, 7, 3, 16, 16 
   CONTROL "Type the name of a program, folder, document, or Internet resource", 12289, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 40, 7, 212, 11 
   CONTROL "&Open:", 12305, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 7, 25, 32, 8 
   CONTROL "", 12298, COMBOBOX, CBS_DROPDOWN | CBS_AUTOHSCROLL | CBS_DISABLENOSCROLL | WS_CHILD | WS_VISIBLE | WS_VSCROLL | WS_TABSTOP, 40, 22, 210, 200 
   CONTROL "Run in separate &memory space", 12306, BUTTON, BS_AUTOCHECKBOX | WS_CHILD | WS_VISIBLE | WS_DISABLED | WS_TABSTOP, 7, 55, 97, 10 
   CONTROL "", 12326, STATIC, SS_ICON | WS_CHILD | WS_VISIBLE, 40, 37, 16, 16 
   CONTROL "This task will be created with administrative privileges.", 12327, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 54, 38, 200, 11 
   CONTROL "OK", 1, BUTTON, BS_DEFPUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 108, 54, 45, 14 
   CONTROL "Cancel", 2, BUTTON, BS_PUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 157, 54, 45, 14 
   CONTROL "&Browse...", 12288, BUTTON, BS_PUSHBUTTON | WS_CHILD | WS_VISIBLE | WS_TABSTOP, 205, 54, 45, 14 
}

1006 DIALOG 0, 0, 240, 55
FONT 9, "Segoe UI"
{
   CONTROL 160, 12297, STATIC, SS_ICON | WS_CHILD | WS_VISIBLE, 7, 3, 16, 16 
   CONTROL "Type the name of a program, folder, document, or Internet resource", 12289, STATIC, SS_LEFT | WS_CHILD | WS_VISIBLE | WS_GROUP, 40, 7, 212, 11 
}

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Split text file at start marker and blank line or just start marker into multiple files

#2 Post by aGerman » 05 Jan 2019 09:21

That might work for you:

Code: Select all

@echo off &setlocal
set "ressource=Dialog.rc"

setlocal EnableDelayedExpansion

for /f %%A in ('type "!ressource!"^|find /c /v ""') do set /a "line_cnt=%%A"

set "i=-1"
for /f "tokens=1,2 delims=: " %%A in ('type "!ressource!"^|findstr /nrbc:"[0-9][0-9]* DIALOG"') do (
  set /a "i+=1"
  set "array_begin[!i!]=%%A"
  set "array_name[!i!]=%%B"
)

for /l %%i in (1 1 %i%) do (
  set /a "idx=%%i-1"
  for /f %%j in ("!idx!") do set /a "array_end[%%j]=!array_begin[%%i]!-1"
)
set /a "array_end[%i%]=line_cnt"


<"!ressource!" (
  for /l %%i in (2 1 %array_begin[0]%) do set /p "="
  for /l %%i in (0 1 %i%) do (
    >"!array_name[%%i]!.rc" (
      for /l %%j in (!array_begin[%%i]! 1 !array_end[%%i]!) do (
        set "line=" &set /p "line="
        echo(!line!
      )
    )
  )
)
Steffen

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Split text file at start marker and blank line or just start marker into multiple files

#3 Post by Aacini » 05 Jan 2019 12:32

Simpler...

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "last=1"
< Dialog.rc (
   for /F "tokens=1,2 delims=: " %%a in ('(type Dialog.rc ^& echo 0 DIALOG^) ^| findstr /N /R /C:"^[0-9][0-9]* DIALOG"') do (
      set /A "lines=%%a-last, last=%%a"
      if !lines! gtr 0 (
         set /P "line="
         for /L %%i in (2,1,!lines!) do (
            echo(!line!
            set "line=" & set /P "line=" 
         )
         if defined line echo(!line!
      ) > "!file!.rc"
      set "file=%%b"
   )
)
Antonio

Blazer
Posts: 14
Joined: 20 Dec 2018 09:47

Re: Split text file at start marker and blank line or just start marker into multiple files

#4 Post by Blazer » 06 Jan 2019 06:47

@aGerman
@Aacini

Thank you, both solutions create the individual files but they contain garbage characters :(

I assume this is because the input file is unicode, Notepad++ reports it has "UCS-2 LE BOM" encoding

I tried changing the following section of code

Code: Select all

< Dialog.rc (
to

Code: Select all

type Dialog.rc ^|(
but that did not work

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Split text file at start marker and blank line or just start marker into multiple files

#5 Post by aGerman » 06 Jan 2019 09:36

Blazer wrote:
06 Jan 2019 06:47
I assume this is because the input file is unicode, Notepad++ reports it has "UCS-2 LE BOM" encoding
Don't rely on NP++. It can't handle UTF-16 (it doesn't even know of UTF-16). This was reported as bug way more than 10 years ago but the developers ignore it. Thus, they ignore the existence of languages like Japanese, Chinese and Korean on Windows. That's weird and the main reason why I once stopped using NP++. Your file is most likely UTF-16 LE encoded.

I don't think you'll ever have other then ASCII-compliant content. You could use TYPE and redirect the output to a temporary file. Then process the content of the temporary file to generate the new files.

Steffen

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Split text file at start marker and blank line or just start marker into multiple files

#6 Post by dbenham » 06 Jan 2019 23:37

This is very simple to perform with JREPL.BAT :D

Assuming the input file is indeed UTF-16 LE, and you want your output to be the same, then:

Code: Select all

jrepl "^(\d+) DIALOG" "$txt=$0; openOutput($1+'.rc',false,true);" /jq /utf /f dialog.rc
The /jq option instructs that the 2nd argument is JScript code.

The /utf option treats input as UTF-16 LE.

The /f option followed by the file specifies the input file.

The first argument is the regular expression search that matches an integer at the beginning of a line, followed by a space and DIALOG. The integer is captured in $1.

The 2nd argument is JScript that is executed for each match. First $txt=$0 preserves the matching text without change. Then openOutput opens an output file with the correct name. The false argument means do not append, and the true argument specifies UTF-16 LE output.

If the input file is in some other encoding, then the command can easily be changed to work with most any encoding. Just tell me what encoding you are using.


Dave Benham

Blazer
Posts: 14
Joined: 20 Dec 2018 09:47

Re: Split text file at start marker and blank line or just start marker into multiple files

#7 Post by Blazer » 07 Jan 2019 03:20

aGerman wrote:
06 Jan 2019 09:36

Don't rely on NP++. It can't handle UTF-16 (it doesn't even know of UTF-16). This was reported as bug way more than 10 years ago but the developers ignore it. Thus, they ignore the existence of languages like Japanese, Chinese and Korean on Windows. That's weird and the main reason why I once stopped using NP++. Your file is most likely UTF-16 LE encoded.
@aGerman
How can I find the real encoding?
Which editor would you recommend to use?

@dbenham
Thank you, I will try JREPL.BAT today. :)

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Split text file at start marker and blank line or just start marker into multiple files

#8 Post by aGerman » 07 Jan 2019 11:37

Blazer wrote:
07 Jan 2019 03:20
How can I find the real encoding?
This is difficult. Usually text editors use the Byte Order Mark (BOM) or they read some thousand characters and apply some statistical or heuristically methods to guess the encoding. In your case just open the file in a HEX editor. If the first two bytes are FF FE it will be UTF-16 LE (because that's the BOM for it).
Which editor would you recommend to use?
Any that supports highlighting for the languages you're writing code. I'm used to using PSPad but that's certainly not the only alternative you have.

JREPL is really a powerful tool for that kind of task. The advantage is that you're able to keep UTF-16 which would be a pain using on-board utilities in Batch.

Steffen

Blazer
Posts: 14
Joined: 20 Dec 2018 09:47

Re: Split text file at start marker and blank line or just start marker into multiple files

#9 Post by Blazer » 08 Jan 2019 03:07

Thank you to everyone who helped with this solution :D

Blazer
Posts: 14
Joined: 20 Dec 2018 09:47

Re: Split text file at start marker and blank line or just start marker into multiple files

#10 Post by Blazer » 15 Jan 2019 03:56

dbenham wrote:
06 Jan 2019 23:37
This is very simple to perform with JREPL.BAT :D

Assuming the input file is indeed UTF-16 LE, and you want your output to be the same, then:

Code: Select all

jrepl "^(\d+) DIALOG" "$txt=$0; openOutput($1+'.rc',false,true);" /jq /utf /f dialog.rc
The /jq option instructs that the 2nd argument is JScript code.

The /utf option treats input as UTF-16 LE.

The /f option followed by the file specifies the input file.

The first argument is the regular expression search that matches an integer at the beginning of a line, followed by a space and DIALOG. The integer is captured in $1.

The 2nd argument is JScript that is executed for each match. First $txt=$0 preserves the matching text without change. Then openOutput opens an output file with the correct name. The false argument means do not append, and the true argument specifies UTF-16 LE output.

If the input file is in some other encoding, then the command can easily be changed to work with most any encoding. Just tell me what encoding you are using.


Dave Benham
After testing I decided the best option was to use JREPL.BAT

Why is JREPL unable to find the input file which is in the current directory?

Code: Select all

jrepl "^(\d+) DIALOG" "$txt=$0; openOutput($1+'.rc',false,true);" /jq /utf /f dialog.rc
JScript runtime error opening input file: File not found
If I specify the full path to the input file then JREPL puts the output files into the parent folder instead of the folder containing the input file

Code: Select all

jrepl "^(\d+) DIALOG" "$txt=$0; openOutput($1+'.rc',false,true);" /jq /utf /f C:\Temp\Work\dialog.rc
dbenham wrote:
06 Jan 2019 23:37
The first argument is the regular expression search that matches an integer at the beginning of a line, followed by a space and DIALOG. The integer is captured in $1.
I want to change the match string to the following regular expression and capture the first none whitespace token in $1 to use as the filename

Code: Select all

^[%space%%tab%]*[^%space%%tab%][^%space%%tab%]*[%space%%tab%][%space%%tab%]*DIALOG
EDIT: the correct command line and regex string is as follows

Code: Select all

jrepl "^[ \t]*^([^ \t][^ \t]*)[ \t][ \t]*DIALOG" "$txt=$0; openOutput('C:\\Temp\\Work\\'+$1+'.rc', false, true);" /jq /utf /f "C:\Temp\Work\dialog.rc"
Thank you :)

Post Reply