JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
mwaychoff
Posts: 4
Joined: 18 Jul 2019 11:06

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

#436 Post by mwaychoff » 23 Jul 2019 12:10

Thanks for continued support. How can I handle percent(%) character? following does not seem to work as expected...

::SyntaxDescr MediaWiki pmWiki
::NumberAlpha ## "## %alpha%"
call jrepl_v8_3.bat "##" "## %alpha%" /f C:\Temp\jrepl\output.rtf /o - /l

mwaychoff
Posts: 4
Joined: 18 Jul 2019 11:06

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

#437 Post by mwaychoff » 25 Jul 2019 15:22

I have few thousand files from MediaWiki that are a part of a translation I am working on to pmWiki. Many of these files have underscores in filename...but links in MediaWiki have spaces as follows...

Filenames have underscores:
ACI_318_08.pdf

MediaWiki links have spaces:
[[Media:ACI 318 08.pdf|ACI 318-08 - Imperial]]

Right now this does NOT work in pmWiki:
[[Attach:ACI 318 08.pdf|ACI 318-08 - Imperial]]

Right now this DOES work in pmWiki:
[[Attach:ACI_318_08.pdf|ACI 318-08 - Imperial]]

Please advise how to replace space with underscore between '[[Attach:' and '.pdf|'
I have many thousands of broken links of this nature to attend to.

Much appreciated.

fwmartins
Posts: 1
Joined: 15 Aug 2019 22:35

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

#438 Post by fwmartins » 15 Aug 2019 23:32

I apologize for my bad english,"google translator"
I want to thank everyone in particular for Dave Benham for this great tool, fantastic. I am still crawling in the JREPL world, I have studied the examples adapting them to my needs. I'm happy with the results.I would like to share

ex.
search the lines of text containing the word joaçaba

jrepl "^(.+?)Joaçaba.*$" "if ($1!=prev) {$1;$0} " /jmatch /jbeg "prev=''" /f input.txt >output.txt

cleaning html

type input.html | jrepl "=?\r?\n" "" /m | jrepl "<tr>(.*?)</tr>" "$1" /jmatch /m >output.html


Maybe someone can improve them .

MarzSyndrome
Posts: 1
Joined: 12 Sep 2019 20:06

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

#439 Post by MarzSyndrome » 12 Sep 2019 21:51

Hi there! Thought I'd register for the first time a year and a half after discovering JREPL for the first time. :)

It's always come in handy when I need to adjust files on-the-fly within batch scripts, and one of the things I liked was how you could choose a specific line, or range of lines, to remove from a text file altogether. For example, this always worked for me:

Code: Select all

jrepl "^" "" /k 0 /exc 30:31 /u /f main.c /o main2.c
... that is, until v8.0 (and beyond). For some reason when the new file is written, it remains identical to the original, and no lines are removed. It's as if something got changed along the way and this method doesn't work anymore. I did consult the v8.0 changelog to see what had changed and can only conclude that it must be something to do with the /K parameter. But unfortunately I'm not smart enough to determine whether I'm missing another setting now, or if there is in fact a bug in the current JREPL build. Which may be the case, as after further tinkering I discovered that I could remove line 1..... and only line 1. Referencing any line other than the first - whether stand-alone or a range - results in no changes being made.

I sort-of found a way around it by using hex-code references and the /X and /M parameters but this seems very long-winded to remove just two lines. So I thought I'd mention the issue just in case I have indeed discovered a bug in the code.

Feel free to let me know if I'm missing anything important here. Thanks! :)

dbenham
Expert
Posts: 2283
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

#440 Post by dbenham » 13 Sep 2019 05:42

That is most definitely a serious bug :!: :cry:

Not sure how long it will take, but I will definitely fix that.

Thanks for reporting - please don't hesitate to report any suspect behavior in the future.


Dave Benham

dbenham
Expert
Posts: 2283
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

#441 Post by dbenham » 13 Sep 2019 11:20

Well that was an easy fix. I replaced version 8.3 with version 8.4

I also updated the main release to version 8.4 at the original post in this thread.

Thanks again MarzSyndrome - you were a big help.


Dave Benham

tecnictto
Posts: 1
Joined: 19 Sep 2019 02:32

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

#442 Post by tecnictto » 19 Sep 2019 02:36

Hi, this script is very useful and functional. Congratulations.
I want to understand how to search and replace characters on a file ... for example:
&= &amp;
< = &lt;
> = &gt;
© = &copy;
® = &reg;
´ = &acute;
« = &laquo;
» = &raquo;
¡ = &iexcl;
¿ = &iquest;
À = &Agrave;
à = &agrave;
Á = &Aacute;
á = &aacute;
 = &Acirc;
â = &acirc;
à = &Atilde;
ã = &atilde;
Ä = &Auml;
ä = &auml;
Å = &Aring;
å = &aring;
Æ = &AElig;
æ = &aelig;
Ç = &Ccedil;
ç = &ccedil;
Ð = &ETH;
ð = &eth;
È = &Egrave;
è = &egrave;
É = &Eacute;
é = &eacute;
Ê = &Ecirc;
ê = &ecirc;
Ë = &Euml;
ë = &euml;
Ì = &Igrave;
ì = &igrave;
Í = &Iacute;
í = &iacute;
Î = &Icirc;
î = &icirc;
Ï = &Iuml;
ï = &iuml;
Ñ = &Ntilde;
ñ = &ntilde;
Ò = &Ograve;
ò = &ograve;
Ó = &Oacute;
ó = &oacute;
Ô = &Ocirc;
ô = &ocirc;
Õ = &Otilde;
õ = &otilde;
Ö = &Ouml;
ö = &ouml;
Ø = &Oslash;
ø = &oslash;
Ù = &Ugrave;
ù = &ugrave;
Ú = &Uacute;
ú = &uacute;
Û = &Ucirc;
û = &ucirc;
Ü = &Uuml;
ü = &uuml;
Ý = &Yacute;
ý = &yacute;
ÿ = &yuml;
Þ = &THORN;
þ = &thorn;
ß = &szlig;
§ = &sect;
¶ = &para;
µ = &micro;
¦ = &brvbar;
± = &plusmn;
· = &middot;
¨ = &uml;
¸ = &cedil;
ª = &ordf;
º = &ordm;
¬ = &not;
_ = &shy;
¯ = &macr;
° = &deg;
¹ = &sup1;
² = &sup2;
³ = &sup3;
¼ = &frac14;
½ = &frac12;
¾ = &frac34;
× = &times;
÷ = &divide;
¢ = &cent;
£ = &pound;
¤ = &curren;

I have a txt file that I would like to replace the characters because the xml reader does not recognize the accented characters.
Many thanks in advance

dbenham
Expert
Posts: 2283
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

#443 Post by dbenham » 21 Sep 2019 22:29

That is pretty simple with the /T FILE option, and ADO.

First modify your translations into two txt files representing your find and replace terms, one per line.

find.txt (Must be encoded as Unicode - probably UTF-16LE or UTF-8)

Code: Select all

&
<
>
©
®
´
«
»
¡
¿
À
à
Á
á
Â
â
Ã
ã
Ä
ä
Å
å
Æ
æ
Ç
ç
Ð
ð
È
è
É
é
Ê
ê
Ë
ë
Ì
ì
Í
í
Î
î
Ï
ï
Ñ
ñ
Ò
ò
Ó
ó
Ô
ô
Õ
õ
Ö
ö
Ø
ø
Ù
ù
Ú
ú
Û
û
Ü
ü
Ý
ý
ÿ
Þ
þ
ß
§
¶
µ
¦
±
·
¨
¸
ª
º
¬
_
¯
°
¹
²
³
¼
½
¾
×
÷
¢
£
¤
repl.txt(ASCII encoding is fine )

Code: Select all

&amp;
&lt;
&gt;
&copy;
&reg;
&acute;
&laquo;
&raquo;
&iexcl;
&iquest;
&Agrave;
&agrave;
&Aacute;
&aacute;
&Acirc;
&acirc;
&Atilde;
&atilde;
&Auml;
&auml;
&Aring;
&aring;
&AElig;
&aelig;
&Ccedil;
&ccedil;
&ETH;
&eth;
&Egrave;
&egrave;
&Eacute;
&eacute;
&Ecirc;
&ecirc;
&Euml;
&euml;
&Igrave;
&igrave;
&Iacute;
&iacute;
&Icirc;
&icirc;
&Iuml;
&iuml;
&Ntilde;
&ntilde;
&Ograve;
&ograve;
&Oacute;
&oacute;
&Ocirc;
&ocirc;
&Otilde;
&otilde;
&Ouml;
&ouml;
&Oslash;
&oslash;
&Ugrave;
&ugrave;
&Uacute;
&uacute;
&Ucirc;
&ucirc;
&Uuml;
&uuml;
&Yacute;
&yacute;
&yuml;
&THORN;
&thorn;
&szlig;
&sect;
&para;
&micro;
&brvbar;
&plusmn;
&middot;
&uml;
&cedil;
&ordf;
&ordm;
&not;
&shy;
&macr;
&deg;
&sup1;
&sup2;
&sup3;
&frac14;
&frac12;
&frac34;
&times;
&divide;
&cent;
&pound;
&curren;
Let's assume both your find.txt and your source.txt are encoded as UTF-8. Then the following will convert it into ASCII (assuming all needed translations are accounted for)

Code: Select all

jrepl "find.txt|utf-8" repl.txt /t file /f "source.txt|utf-8" /o output.txt
If the files are UTF-16LE, then

Code: Select all

jrepl "find.txt|unicode" repl.txt /t file /f "source.txt|unicode" /o output.txt
To learn more about using ADO with JREPL, use JREPL /?/I and JREPL /?/O

All the character sets supported by ADO can be listed by using JREPL /?CHARSET/

To learn more about the /T option, use JREPL /??/T


Dave Benham

dbenham
Expert
Posts: 2283
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.4 - regex text processor with support for text highlighting and alternate character sets

#444 Post by dbenham » 22 Sep 2019 08:07

An alternative to using named character references is to use numeric character references in the form of &#nnnn; Then you don't need any special table of translations - you can use very simple JScript logic with the /JQ option.

The following will translate any UTF-8 document into pure ASCII by transforming all unicode character points >= 128, as well as & < > and _

Code: Select all

jrepl "[\x80-\u{FFFFFF}&<>_]" "$txt='&#'+$0.charCodeAt(0)+';'" /xseq /jq /f "source.txt|utf-8" /o "output.txt"

Dave Benham

Post Reply