JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
mwaychoff
Posts: 4
Joined: 18 Jul 2019 11:06

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

#436 Post by mwaychoff » 23 Jul 2019 12:10

Thanks for continued support. How can I handle percent(%) character? following does not seem to work as expected...

::SyntaxDescr MediaWiki pmWiki
::NumberAlpha ## "## %alpha%"
call jrepl_v8_3.bat "##" "## %alpha%" /f C:\Temp\jrepl\output.rtf /o - /l

mwaychoff
Posts: 4
Joined: 18 Jul 2019 11:06

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

#437 Post by mwaychoff » 25 Jul 2019 15:22

I have few thousand files from MediaWiki that are a part of a translation I am working on to pmWiki. Many of these files have underscores in filename...but links in MediaWiki have spaces as follows...

Filenames have underscores:
ACI_318_08.pdf

MediaWiki links have spaces:
[[Media:ACI 318 08.pdf|ACI 318-08 - Imperial]]

Right now this does NOT work in pmWiki:
[[Attach:ACI 318 08.pdf|ACI 318-08 - Imperial]]

Right now this DOES work in pmWiki:
[[Attach:ACI_318_08.pdf|ACI 318-08 - Imperial]]

Please advise how to replace space with underscore between '[[Attach:' and '.pdf|'
I have many thousands of broken links of this nature to attend to.

Much appreciated.

fwmartins
Posts: 1
Joined: 15 Aug 2019 22:35

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

#438 Post by fwmartins » 15 Aug 2019 23:32

I apologize for my bad english,"google translator"
I want to thank everyone in particular for Dave Benham for this great tool, fantastic. I am still crawling in the JREPL world, I have studied the examples adapting them to my needs. I'm happy with the results.I would like to share

ex.
search the lines of text containing the word joaçaba

jrepl "^(.+?)Joaçaba.*$" "if ($1!=prev) {$1;$0} " /jmatch /jbeg "prev=''" /f input.txt >output.txt

cleaning html

type input.html | jrepl "=?\r?\n" "" /m | jrepl "<tr>(.*?)</tr>" "$1" /jmatch /m >output.html


Maybe someone can improve them .

MarzSyndrome
Posts: 1
Joined: 12 Sep 2019 20:06

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

#439 Post by MarzSyndrome » 12 Sep 2019 21:51

Hi there! Thought I'd register for the first time a year and a half after discovering JREPL for the first time. :)

It's always come in handy when I need to adjust files on-the-fly within batch scripts, and one of the things I liked was how you could choose a specific line, or range of lines, to remove from a text file altogether. For example, this always worked for me:

Code: Select all

jrepl "^" "" /k 0 /exc 30:31 /u /f main.c /o main2.c
... that is, until v8.0 (and beyond). For some reason when the new file is written, it remains identical to the original, and no lines are removed. It's as if something got changed along the way and this method doesn't work anymore. I did consult the v8.0 changelog to see what had changed and can only conclude that it must be something to do with the /K parameter. But unfortunately I'm not smart enough to determine whether I'm missing another setting now, or if there is in fact a bug in the current JREPL build. Which may be the case, as after further tinkering I discovered that I could remove line 1..... and only line 1. Referencing any line other than the first - whether stand-alone or a range - results in no changes being made.

I sort-of found a way around it by using hex-code references and the /X and /M parameters but this seems very long-winded to remove just two lines. So I thought I'd mention the issue just in case I have indeed discovered a bug in the code.

Feel free to let me know if I'm missing anything important here. Thanks! :)

dbenham
Expert
Posts: 2289
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

#440 Post by dbenham » 13 Sep 2019 05:42

That is most definitely a serious bug :!: :cry:

Not sure how long it will take, but I will definitely fix that.

Thanks for reporting - please don't hesitate to report any suspect behavior in the future.


Dave Benham

dbenham
Expert
Posts: 2289
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

#441 Post by dbenham » 13 Sep 2019 11:20

Well that was an easy fix. I replaced version 8.3 with version 8.4

I also updated the main release to version 8.4 at the original post in this thread.

Thanks again MarzSyndrome - you were a big help.


Dave Benham

tecnictto
Posts: 1
Joined: 19 Sep 2019 02:32

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

#442 Post by tecnictto » 19 Sep 2019 02:36

Hi, this script is very useful and functional. Congratulations.
I want to understand how to search and replace characters on a file ... for example:
&= &amp;
< = &lt;
> = &gt;
© = &copy;
® = &reg;
´ = &acute;
« = &laquo;
» = &raquo;
¡ = &iexcl;
¿ = &iquest;
À = &Agrave;
à = &agrave;
Á = &Aacute;
á = &aacute;
 = &Acirc;
â = &acirc;
à = &Atilde;
ã = &atilde;
Ä = &Auml;
ä = &auml;
Å = &Aring;
å = &aring;
Æ = &AElig;
æ = &aelig;
Ç = &Ccedil;
ç = &ccedil;
Ð = &ETH;
ð = &eth;
È = &Egrave;
è = &egrave;
É = &Eacute;
é = &eacute;
Ê = &Ecirc;
ê = &ecirc;
Ë = &Euml;
ë = &euml;
Ì = &Igrave;
ì = &igrave;
Í = &Iacute;
í = &iacute;
Î = &Icirc;
î = &icirc;
Ï = &Iuml;
ï = &iuml;
Ñ = &Ntilde;
ñ = &ntilde;
Ò = &Ograve;
ò = &ograve;
Ó = &Oacute;
ó = &oacute;
Ô = &Ocirc;
ô = &ocirc;
Õ = &Otilde;
õ = &otilde;
Ö = &Ouml;
ö = &ouml;
Ø = &Oslash;
ø = &oslash;
Ù = &Ugrave;
ù = &ugrave;
Ú = &Uacute;
ú = &uacute;
Û = &Ucirc;
û = &ucirc;
Ü = &Uuml;
ü = &uuml;
Ý = &Yacute;
ý = &yacute;
ÿ = &yuml;
Þ = &THORN;
þ = &thorn;
ß = &szlig;
§ = &sect;
¶ = &para;
µ = &micro;
¦ = &brvbar;
± = &plusmn;
· = &middot;
¨ = &uml;
¸ = &cedil;
ª = &ordf;
º = &ordm;
¬ = &not;
_ = &shy;
¯ = &macr;
° = &deg;
¹ = &sup1;
² = &sup2;
³ = &sup3;
¼ = &frac14;
½ = &frac12;
¾ = &frac34;
× = &times;
÷ = &divide;
¢ = &cent;
£ = &pound;
¤ = &curren;

I have a txt file that I would like to replace the characters because the xml reader does not recognize the accented characters.
Many thanks in advance

dbenham
Expert
Posts: 2289
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

#443 Post by dbenham » 21 Sep 2019 22:29

That is pretty simple with the /T FILE option, and ADO.

First modify your translations into two txt files representing your find and replace terms, one per line.

find.txt (Must be encoded as Unicode - probably UTF-16LE or UTF-8)

Code: Select all

&
<
>
©
®
´
«
»
¡
¿
À
à
Á
á
Â
â
Ã
ã
Ä
ä
Å
å
Æ
æ
Ç
ç
Ð
ð
È
è
É
é
Ê
ê
Ë
ë
Ì
ì
Í
í
Î
î
Ï
ï
Ñ
ñ
Ò
ò
Ó
ó
Ô
ô
Õ
õ
Ö
ö
Ø
ø
Ù
ù
Ú
ú
Û
û
Ü
ü
Ý
ý
ÿ
Þ
þ
ß
§
¶
µ
¦
±
·
¨
¸
ª
º
¬
_
¯
°
¹
²
³
¼
½
¾
×
÷
¢
£
¤
repl.txt(ASCII encoding is fine )

Code: Select all

&amp;
&lt;
&gt;
&copy;
&reg;
&acute;
&laquo;
&raquo;
&iexcl;
&iquest;
&Agrave;
&agrave;
&Aacute;
&aacute;
&Acirc;
&acirc;
&Atilde;
&atilde;
&Auml;
&auml;
&Aring;
&aring;
&AElig;
&aelig;
&Ccedil;
&ccedil;
&ETH;
&eth;
&Egrave;
&egrave;
&Eacute;
&eacute;
&Ecirc;
&ecirc;
&Euml;
&euml;
&Igrave;
&igrave;
&Iacute;
&iacute;
&Icirc;
&icirc;
&Iuml;
&iuml;
&Ntilde;
&ntilde;
&Ograve;
&ograve;
&Oacute;
&oacute;
&Ocirc;
&ocirc;
&Otilde;
&otilde;
&Ouml;
&ouml;
&Oslash;
&oslash;
&Ugrave;
&ugrave;
&Uacute;
&uacute;
&Ucirc;
&ucirc;
&Uuml;
&uuml;
&Yacute;
&yacute;
&yuml;
&THORN;
&thorn;
&szlig;
&sect;
&para;
&micro;
&brvbar;
&plusmn;
&middot;
&uml;
&cedil;
&ordf;
&ordm;
&not;
&shy;
&macr;
&deg;
&sup1;
&sup2;
&sup3;
&frac14;
&frac12;
&frac34;
&times;
&divide;
&cent;
&pound;
&curren;
Let's assume both your find.txt and your source.txt are encoded as UTF-8. Then the following will convert it into ASCII (assuming all needed translations are accounted for)

Code: Select all

jrepl "find.txt|utf-8" repl.txt /t file /f "source.txt|utf-8" /o output.txt
If the files are UTF-16LE, then

Code: Select all

jrepl "find.txt|unicode" repl.txt /t file /f "source.txt|unicode" /o output.txt
To learn more about using ADO with JREPL, use JREPL /?/I and JREPL /?/O

All the character sets supported by ADO can be listed by using JREPL /?CHARSET/

To learn more about the /T option, use JREPL /??/T


Dave Benham

dbenham
Expert
Posts: 2289
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

#444 Post by dbenham » 22 Sep 2019 08:07

An alternative to using named character references is to use numeric character references in the form of &#nnnn; Then you don't need any special table of translations - you can use very simple JScript logic with the /JQ option.

The following will translate any UTF-8 document into pure ASCII by transforming all unicode character points >= 128, as well as & < > and _

Code: Select all

jrepl "[\x80-\u{FFFFFF}&<>_]" "$txt='&#'+$0.charCodeAt(0)+';'" /xseq /jq /f "source.txt|utf-8" /o "output.txt"

Dave Benham

zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

#445 Post by zimxavier » 28 Oct 2019 06:16

Hi!
I would like to extract all strings between curly brackets after FUNCTION

input.txt

Code: Select all

				FUNCTION = {}
				FUNCTION = { value1
					value2
					value3
				}
				FUNCTION = { 
				value4
					value5
				}
WRONG1 = {wrong1}

				FUNCTION = { value6 value7 value8}  WRONG2 = {wrong2}

				FUNCTION ={value9
					value10
					value11} 				FUNCTION ={value12
					value13
}
What I need in this case:

Code: Select all

value1
value2
value3
value4
value5
value6
value7
value8
value9
value10
value11
value12
value13
My latest script:
call JREPL "\w+" "$txt=$0" /jmatchq /INC "/\\bFUNCTION\\s*=\\s*\\{/:/\\}/" /f "input.txt" > "output.txt"

output.txt

Code: Select all

FUNCTION
FUNCTION
value1
value2
value3
FUNCTION
value6
value7
value8
WRONG2
wrong2
FUNCTION
value9
value10
value11
FUNCTION
value12
I tried hard to understand how inc parameter works but to no avail. Maybe it shouldn't be used for what I need.

Thanks for any help.

dbenham
Expert
Posts: 2289
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

#446 Post by dbenham » 28 Oct 2019 09:02

The /INC option does not help because it is line based - all text within the included lines is searched. But you want to search only the text on the line(s) that is between the braces.

You need the /P and /PREPL options to specify the regions of the file to search. You need /M because the braced text may span multiple lines.

Note that /XSEQ encodings are implicitly available for /P regexes. You need \c for ^ in case you use CALL because CALL will double all quoted ^

Code: Select all

call jrepl "\w+" "" /match /m /p "FUNCTION\s*=\s*\{([\c}]+)}" /prepl "{$1}" /f "input.txt" /o "output.txt"

Dave Benham

Post Reply