Page 30 of 37

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

Posted: 23 Jul 2019 12:10
by mwaychoff
Thanks for continued support. How can I handle percent(%) character? following does not seem to work as expected...

::SyntaxDescr MediaWiki pmWiki
::NumberAlpha ## "## %alpha%"
call jrepl_v8_3.bat "##" "## %alpha%" /f C:\Temp\jrepl\output.rtf /o - /l

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

Posted: 25 Jul 2019 15:22
by mwaychoff
I have few thousand files from MediaWiki that are a part of a translation I am working on to pmWiki. Many of these files have underscores in filename...but links in MediaWiki have spaces as follows...

Filenames have underscores:
ACI_318_08.pdf

MediaWiki links have spaces:
[[Media:ACI 318 08.pdf|ACI 318-08 - Imperial]]

Right now this does NOT work in pmWiki:
[[Attach:ACI 318 08.pdf|ACI 318-08 - Imperial]]

Right now this DOES work in pmWiki:
[[Attach:ACI_318_08.pdf|ACI 318-08 - Imperial]]

Please advise how to replace space with underscore between '[[Attach:' and '.pdf|'
I have many thousands of broken links of this nature to attend to.

Much appreciated.

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

Posted: 15 Aug 2019 23:32
by fwmartins
I apologize for my bad english,"google translator"
I want to thank everyone in particular for Dave Benham for this great tool, fantastic. I am still crawling in the JREPL world, I have studied the examples adapting them to my needs. I'm happy with the results.I would like to share

ex.
search the lines of text containing the word joaçaba

jrepl "^(.+?)Joaçaba.*$" "if ($1!=prev) {$1;$0} " /jmatch /jbeg "prev=''" /f input.txt >output.txt

cleaning html

type input.html | jrepl "=?\r?\n" "" /m | jrepl "<tr>(.*?)</tr>" "$1" /jmatch /m >output.html


Maybe someone can improve them .

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

Posted: 12 Sep 2019 21:51
by MarzSyndrome
Hi there! Thought I'd register for the first time a year and a half after discovering JREPL for the first time. :)

It's always come in handy when I need to adjust files on-the-fly within batch scripts, and one of the things I liked was how you could choose a specific line, or range of lines, to remove from a text file altogether. For example, this always worked for me:

Code: Select all

jrepl "^" "" /k 0 /exc 30:31 /u /f main.c /o main2.c
... that is, until v8.0 (and beyond). For some reason when the new file is written, it remains identical to the original, and no lines are removed. It's as if something got changed along the way and this method doesn't work anymore. I did consult the v8.0 changelog to see what had changed and can only conclude that it must be something to do with the /K parameter. But unfortunately I'm not smart enough to determine whether I'm missing another setting now, or if there is in fact a bug in the current JREPL build. Which may be the case, as after further tinkering I discovered that I could remove line 1..... and only line 1. Referencing any line other than the first - whether stand-alone or a range - results in no changes being made.

I sort-of found a way around it by using hex-code references and the /X and /M parameters but this seems very long-winded to remove just two lines. So I thought I'd mention the issue just in case I have indeed discovered a bug in the code.

Feel free to let me know if I'm missing anything important here. Thanks! :)

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

Posted: 13 Sep 2019 05:42
by dbenham
That is most definitely a serious bug :!: :cry:

Not sure how long it will take, but I will definitely fix that.

Thanks for reporting - please don't hesitate to report any suspect behavior in the future.


Dave Benham

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

Posted: 13 Sep 2019 11:20
by dbenham
Well that was an easy fix. I replaced version 8.3 with version 8.4

I also updated the main release to version 8.4 at the original post in this thread.

Thanks again MarzSyndrome - you were a big help.


Dave Benham

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

Posted: 19 Sep 2019 02:36
by tecnictto
Hi, this script is very useful and functional. Congratulations.
I want to understand how to search and replace characters on a file ... for example:
&= &amp;
< = &lt;
> = &gt;
© = &copy;
® = &reg;
´ = &acute;
« = &laquo;
» = &raquo;
¡ = &iexcl;
¿ = &iquest;
À = &Agrave;
à = &agrave;
Á = &Aacute;
á = &aacute;
 = &Acirc;
â = &acirc;
à = &Atilde;
ã = &atilde;
Ä = &Auml;
ä = &auml;
Å = &Aring;
å = &aring;
Æ = &AElig;
æ = &aelig;
Ç = &Ccedil;
ç = &ccedil;
Ð = &ETH;
ð = &eth;
È = &Egrave;
è = &egrave;
É = &Eacute;
é = &eacute;
Ê = &Ecirc;
ê = &ecirc;
Ë = &Euml;
ë = &euml;
Ì = &Igrave;
ì = &igrave;
Í = &Iacute;
í = &iacute;
Î = &Icirc;
î = &icirc;
Ï = &Iuml;
ï = &iuml;
Ñ = &Ntilde;
ñ = &ntilde;
Ò = &Ograve;
ò = &ograve;
Ó = &Oacute;
ó = &oacute;
Ô = &Ocirc;
ô = &ocirc;
Õ = &Otilde;
õ = &otilde;
Ö = &Ouml;
ö = &ouml;
Ø = &Oslash;
ø = &oslash;
Ù = &Ugrave;
ù = &ugrave;
Ú = &Uacute;
ú = &uacute;
Û = &Ucirc;
û = &ucirc;
Ü = &Uuml;
ü = &uuml;
Ý = &Yacute;
ý = &yacute;
ÿ = &yuml;
Þ = &THORN;
þ = &thorn;
ß = &szlig;
§ = &sect;
¶ = &para;
µ = &micro;
¦ = &brvbar;
± = &plusmn;
· = &middot;
¨ = &uml;
¸ = &cedil;
ª = &ordf;
º = &ordm;
¬ = &not;
_ = &shy;
¯ = &macr;
° = &deg;
¹ = &sup1;
² = &sup2;
³ = &sup3;
¼ = &frac14;
½ = &frac12;
¾ = &frac34;
× = &times;
÷ = &divide;
¢ = &cent;
£ = &pound;
¤ = &curren;

I have a txt file that I would like to replace the characters because the xml reader does not recognize the accented characters.
Many thanks in advance

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

Posted: 21 Sep 2019 22:29
by dbenham
That is pretty simple with the /T FILE option, and ADO.

First modify your translations into two txt files representing your find and replace terms, one per line.

find.txt (Must be encoded as Unicode - probably UTF-16LE or UTF-8)

Code: Select all

&
<
>
©
®
´
«
»
¡
¿
À
à
Á
á
Â
â
Ã
ã
Ä
ä
Å
å
Æ
æ
Ç
ç
Ð
ð
È
è
É
é
Ê
ê
Ë
ë
Ì
ì
Í
í
Î
î
Ï
ï
Ñ
ñ
Ò
ò
Ó
ó
Ô
ô
Õ
õ
Ö
ö
Ø
ø
Ù
ù
Ú
ú
Û
û
Ü
ü
Ý
ý
ÿ
Þ
þ
ß
§
¶
µ
¦
±
·
¨
¸
ª
º
¬
_
¯
°
¹
²
³
¼
½
¾
×
÷
¢
£
¤
repl.txt(ASCII encoding is fine )

Code: Select all

&amp;
&lt;
&gt;
&copy;
&reg;
&acute;
&laquo;
&raquo;
&iexcl;
&iquest;
&Agrave;
&agrave;
&Aacute;
&aacute;
&Acirc;
&acirc;
&Atilde;
&atilde;
&Auml;
&auml;
&Aring;
&aring;
&AElig;
&aelig;
&Ccedil;
&ccedil;
&ETH;
&eth;
&Egrave;
&egrave;
&Eacute;
&eacute;
&Ecirc;
&ecirc;
&Euml;
&euml;
&Igrave;
&igrave;
&Iacute;
&iacute;
&Icirc;
&icirc;
&Iuml;
&iuml;
&Ntilde;
&ntilde;
&Ograve;
&ograve;
&Oacute;
&oacute;
&Ocirc;
&ocirc;
&Otilde;
&otilde;
&Ouml;
&ouml;
&Oslash;
&oslash;
&Ugrave;
&ugrave;
&Uacute;
&uacute;
&Ucirc;
&ucirc;
&Uuml;
&uuml;
&Yacute;
&yacute;
&yuml;
&THORN;
&thorn;
&szlig;
&sect;
&para;
&micro;
&brvbar;
&plusmn;
&middot;
&uml;
&cedil;
&ordf;
&ordm;
&not;
&shy;
&macr;
&deg;
&sup1;
&sup2;
&sup3;
&frac14;
&frac12;
&frac34;
&times;
&divide;
&cent;
&pound;
&curren;
Let's assume both your find.txt and your source.txt are encoded as UTF-8. Then the following will convert it into ASCII (assuming all needed translations are accounted for)

Code: Select all

jrepl "find.txt|utf-8" repl.txt /t file /f "source.txt|utf-8" /o output.txt
If the files are UTF-16LE, then

Code: Select all

jrepl "find.txt|unicode" repl.txt /t file /f "source.txt|unicode" /o output.txt
To learn more about using ADO with JREPL, use JREPL /?/I and JREPL /?/O

All the character sets supported by ADO can be listed by using JREPL /?CHARSET/

To learn more about the /T option, use JREPL /??/T


Dave Benham

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

Posted: 22 Sep 2019 08:07
by dbenham
An alternative to using named character references is to use numeric character references in the form of &#nnnn; Then you don't need any special table of translations - you can use very simple JScript logic with the /JQ option.

The following will translate any UTF-8 document into pure ASCII by transforming all unicode character points >= 128, as well as & < > and _

Code: Select all

jrepl "[\x80-\u{FFFFFF}&<>_]" "$txt='&#'+$0.charCodeAt(0)+';'" /xseq /jq /f "source.txt|utf-8" /o "output.txt"

Dave Benham

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

Posted: 28 Oct 2019 06:16
by zimxavier
Hi!
I would like to extract all strings between curly brackets after FUNCTION

input.txt

Code: Select all

				FUNCTION = {}
				FUNCTION = { value1
					value2
					value3
				}
				FUNCTION = { 
				value4
					value5
				}
WRONG1 = {wrong1}

				FUNCTION = { value6 value7 value8}  WRONG2 = {wrong2}

				FUNCTION ={value9
					value10
					value11} 				FUNCTION ={value12
					value13
}
What I need in this case:

Code: Select all

value1
value2
value3
value4
value5
value6
value7
value8
value9
value10
value11
value12
value13
My latest script:
call JREPL "\w+" "$txt=$0" /jmatchq /INC "/\\bFUNCTION\\s*=\\s*\\{/:/\\}/" /f "input.txt" > "output.txt"

output.txt

Code: Select all

FUNCTION
FUNCTION
value1
value2
value3
FUNCTION
value6
value7
value8
WRONG2
wrong2
FUNCTION
value9
value10
value11
FUNCTION
value12
I tried hard to understand how inc parameter works but to no avail. Maybe it shouldn't be used for what I need.

Thanks for any help.

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

Posted: 28 Oct 2019 09:02
by dbenham
The /INC option does not help because it is line based - all text within the included lines is searched. But you want to search only the text on the line(s) that is between the braces.

You need the /P and /PREPL options to specify the regions of the file to search. You need /M because the braced text may span multiple lines.

Note that /XSEQ encodings are implicitly available for /P regexes. You need \c for ^ in case you use CALL because CALL will double all quoted ^

Code: Select all

call jrepl "\w+" "" /match /m /p "FUNCTION\s*=\s*\{([\c}]+)}" /prepl "{$1}" /f "input.txt" /o "output.txt"

Dave Benham

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 29 Feb 2020 21:35
by dbenham
Here is JREPL version 8.5
JREPL8.5.zip
Downloaded 1701 times in 5 months from the main release page while v8.5 was the current release
(29.61 KiB) Downloaded 1127 times

Summary of Changes
JREPL /?HISTORY wrote:
2020-02-29 v8.5: Added /EOL option to set the end of line terminator.
Added the eol global jscript variable.
Doc fix - No EOL if /RTN option specifies a :LineNumber.
. . .
New /EOL option
JREPL /?/EOL wrote:
/EOL EndOfLineString

Write lines using EndOfLineString as the line terminator.
Standard JScript escape sequences may be used.
The default is "\r\n" (CarriageReturn LineFeed).
The value may be set to an empty string to eliminate linefeeds
from the output.

/EOL has no effect if the /M option is used unless /MATCH,
/JMATCH, or /JMATCHQ is also used.

Note that /EOL does not affect input.ReadLine or output.WriteLine
methods in user supplied JScript. ReadLine always accepts both
\r\n and \n as line terminators. And WriteLine always terminates
lines with \r\n.
The /U option remains unchanged:
JREPL /?/U wrote:
/U - Write lines using a Unix line terminator \n instead of Windows
terminator of \r\n. This is the same as using /EOL "\n".
See /EOL help for more info.
New eol JScript variable for user supplied JScript
JREPL /?JSCRIPT wrote: . . .
eol - The line terminator used when writing output lines. This is the
same value set by the /EOL option.
. . .
The most obvious use for this variable is to remove line feeds without using the /M option. For example, the following will remove the newline after any line that ends with a dash:

Code: Select all

jrepl "-$" "$txt=$0;eol=''" /jq /jbegln "eol='\r\n'" /f input.txt /o -
The reason this is important is that it allows manipulating end of line while reading one line at a time, so there are no file size restrictions. The only other way to manipulate line terminators on a line by line basis is through the /M option, but that requires the entire file to fit in memory, which limits the size of the file that can be edited.

Updated /RTN documentation
JREPL /?/RTN wrote:
/RTN ReturnVar[:[-]LineNumber]

Write the result to variable ReturnVar.

If the optional LineNumber is present, then only that specified
line within the result set is returned. A LineNumber of 1 is the
first line. A negative LineNumber is measured from the end of the
result set, so -1 is the last line. /RTN always breaks lines at
\r\n and \n - the /EOL value is ignored.

All byte codes except NULL (0x00) are preserved, regardless
whether delayed expansion is enabled or not. An error is thrown
and no value stored if the result contains NULL.

An error is thrown and no value stored if the value does not fit
within a variable. The maximum returned length varies depending
on the variable name and result content. The longest possible
returned length is 8179 bytes.

The line terminator of the last match is suppressed if /MATCH,
/JMATCH, or /JMATCHQ is used. There is also no line terminator
if LineNumber is specified.

/RTN uses a temporary output file to transfer the result to the
environment variable. By default the temporary file is written
as UTF-8. But the file is written using the CSCRIPT default code
page if the /XFILE option is used - the action may fail if the
result contains a character that cannot be mapped to the CSCRIPT
default code page.

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 11 Mar 2020 16:25
by dlawrence2060
Hi Im using JRepl inside of Webpack with Webpack ShellPlugin. My call works fine when using inside of cmd but when usign the same script inside webpack it gives weird results

My command is

Code: Select all

'call "./framework/config/JREPL.BAT" "(Error)\(([^()]*|\(([^()]*|\([^()]*\))*\))*\)" "Error(\q\q)" /xseq /f ./dist/index.html /o ./dist/indexFinal.html'
when used on this string

Code: Select all

Error( skjdksjdskd() + "" + )
I get the results

Code: Select all

Error()( skjdksjdskd() + "" + )
But when using it in cmd I get

Code: Select all

Error("")
which is the expected result. Any ideas why this behaves differently?

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 13 Mar 2020 08:08
by dbenham
The above post is a follow-up question to my StackOverflow answer to this question.

Based on the OP's last comment on 2020-03-11, the solution is to escape the quotes and backslashes as \" and \\ when using Webpack ShellPlugin:

Code: Select all

call "./framework/config/JREPL.BAT" \"(Error)\\(([\\c()]*|\\(([\\c()]*|\\([\\c()]*\\))*\\))*\\)\" \"Error(\\q\\q)\" /xseq /f ./dist/index.html /o ./dist/indexFinal.html

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 11 Apr 2020 17:48
by ctm
thanks for the excellent tool !

I need to parse an mxf file within a batch; the mxf contains several special characters which pose a problem for batch: "!", " " " (double quote), ":",...
I want to replace them by a harmless characters for subsequent parsing...
as a first step I tried to replace "!" by "_" out of a batch file with this line:

call jrepl.bat "\b^!\b" "_" /f !Lineup! /M /o !dr!___lineup.mx

'Lineup' is the path to the source file; '!dr!' is the path to the target file "___lineup.mx"

wthat I get is this (example line):

original :
<Lineup uid="!MCLineup!DVB-S-DE" primaryProvider="!MCLineup!MainLineup" guid="d78688cca4aa41e788f6092c0beadcb0" />
output :
<Lineup uid="!MCLineup_DVB-S-DE" primaryProvider="!MCLineup_MainLineup" guid="d78688cca4aa41e788f6092c0beadcb0" />

so, the "!" after the "=" is not replaced...
do I need to specify something more in the cmd-line?