FindRepl.bat:New regex utility to search and replace strings

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
dbenham
Expert
Posts: 2189
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: FindRepl.bat:New regex utility to search and replace str

#76 Post by dbenham » 26 Jan 2015 11:11

Interesting problem.

I imagine FindRepl.bat could handle this well, but I don't really know it well.

JREPL.BAT can definitely handle this.

There are some complicating factors to consider.

1) Apostraphe should not be removed within contractions or possessives: "can't" should not become "can t"

2) Hyphenated words should remain hyphenated: "mother-in-law" should not become "mother in law"

3) Words split across multiple lines via hyphen should be collapsed into a single word. The text may be double (or more) spaced as in your example. "book-\n\nkeeper" should become "bookkeeper".

4) All words should be converted to lower case, unless it is a proper noun. But I don't know how to detect proper nouns.

JREPL can solve the problem in 3 steps:

1) Use JREPL to remove all unwanted punctuation and white space, and put each word on a speparate line.

The /M option is needed because I search across lines. The /I option is used to ignore case. The /X option is used to enable use of \n in replacement expression. I use the /T option to process a list of find/replace pairs. The captured expression numbering is odd because each alternate gets an implicit number.

- collapse a hyphenated word accross multiple lines into a single word on one line:
"([a-z])-(?:\r?\n)+([a-z])" --> "$2$3"

- replace consecutive white space, optionally with punctuation before or after, with a single new line:
"[^a-z0-9]*\s+[^a-z0-9]*" --> "\n"

- remove leading punctuation from the beginning of the first line:
"^[^a-z0-9]+" --> ""

- remove trailing punctuation at the end in case last line is missing \n:
"[^a-z0-9]+$" --> "\n"

2) sort the result with SORT

3) Use JREPL to remove duplicates and convert everything to lower case. The /J option allows use of toLowerCase() method in replacement value.

Code: Select all

jrepl "([a-z])-(?:\r?\n)+([a-z])/[^a-z0-9]*\s+[^a-z0-9]*/^[^a-z0-9]+/[^a-z0-9]+$" ^
      "$2$3/\n//\n" /i /m /x /t "/" /f test.txt | ^
sort | ^
jrepl "(.*\n)\1*" "$1.toLowerCase()" /i /j /m

The above has the following limitations:

1) I cannot detect proper nouns, so they lose their capital letters.

2) I cannot detect when a naturally hyphenated word like "mother-in-law" is split across multiple lines. So "mother-\n\nin-law" incorrectly becomes "motherin-law"

3) I assume only the 26 English letters are used. Non-English letters in the extended ASCII range are treated as punctuation, and will be stripped if they appear before or after white space.

There are probably other issues that I am not aware of - language parsing is complicated.


Here is the test text that I used:

Code: Select all

("First line, with comma character.")

Second line with punctuation mark!

Third line with question mark?

Fourth line with two "double quotes".

This 5th line has an ordinal number.

Hyphenated words are not a high-minded concept.

A word may be split across two sep-

arate lines using a hyphen.

Contractions mustn't be altered.

("Last line without newline at end.")

And here is my result:

Code: Select all

5th
a
across
altered
an
are
at
be
character
comma
concept
contractions
double
end
first
fourth
has
high-minded
hyphen
hyphenated
last
line
lines
mark
may
mustn't
newline
not
number
ordinal
punctuation
question
quotes
second
separate
split
third
this
two
using
with
without
word
words


Dave Benham

Aacini
Expert
Posts: 1578
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: FindRepl.bat:New regex utility to search and replace str

#77 Post by Aacini » 26 Jan 2015 12:26

Hi Bars! The first step is change special characters by spaces:

Code: Select all

C:\> FindRepl "[^a-z\r\n]" " " < test.txt 
first line  with comma character

second line with punctuation mark

third line with question mark

fourth line with two  double quote

fifth line had dot in the end

The next step would be separate words in individual lines, that may be easily achieved in the same execution of FindRepl, and then eliminate duplicate words that can NOT be done in FindRepl, so an auxiliary Batch code is needed. However, in this case it is simpler that the Batch code directly process the same lines above, that is:

Code: Select all

@echo off
setlocal

for /F "delims=" %%a in ('FindRepl "[^a-z\r\n]" " " ^< test.txt') do (
   for %%b in (%%a) do (
      if not defined word[%%b] (
         echo %%b
         set "word[%%b]=1"
      )
   )
)

Output example:

Code: Select all

first
line
with
comma
character
second
punctuation
mark
third
question
fourth
two
double
quote
fifth
had
dot
in
the
end

Antonio

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: FindRepl.bat:New regex utility to search and replace str

#78 Post by bars143 » 26 Jan 2015 20:04

@Dbenham and @ Aacini,

thanks you for replying my problems and i will post my feedback to all your suggestions when an actual subtitle DVD files i saved will be tested soon.

@Dbenham, at first i had a script made before your reply:

Code: Select all

for /f "delims=" %%A in (' type "input.txt" ^| jrepl "(\w+)" "$1" /jmatch ') do (
         ECHO %%A
        ) >>output.txt


and regarding the word likes you mentions:

mother-in-law
can't
i'm

really a problem to me in converting to my native dialect language.

---------------------------------------

@Aacini,

do you have Findrepl scripts that an output.txt can be translated to my native dialect language?

for example,

output.txt content:

Code: Select all

my
username
is
Bars



english-cebuano.txt content:

Code: Select all

my , ang akong
username , username
is , mao ang
Bars , Bars


if the every word in the content of output.txt matches the first every first word in the content of english-cebuano.txt, then , it will automatically replaced "my" in the output.txt with "ang akong" in a english-cebuano.txt (with a "comma" symbol as delimeter).

so the translated output.txt content will be (or create a new file named translated.txt) :

Code: Select all

ang akong
username
mao ang
Bars

Aacini
Expert
Posts: 1578
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: FindRepl.bat:New regex utility to search and replace str

#79 Post by Aacini » 26 Jan 2015 21:01

You may directly use the replacement file in the /G: switch, as described at FindRepl documentation:

FindRepl documentation wrote:6. Loading alternations from a file (/G switch)

The new /G switch allows to place the search/replace alternations in a file. The use of this feature is simple because all previously described facilities works in the same way; the only difference is that the search/replacement alternations are read from a file instead of placed in parameters.

In the description of || feature at previous point, the /J switch allows to use regexps/JScript expressions instead of literal strings, and /A switch is used to set a find-replace operation instead of a find-only one. The same 4 combinations of /A and /J switches are used in the same way when /G switch is added, as shown below:

Code: Select all

Command:                                File lines have:       Operation performed:

FindRepl /G:filename.txt                literal                Find literals
FindRepl /G:filename.txt  /A            literalS|literalR      Find literalS, replace by literalR
FindRepl /G:filename.txt      /J        regexp                 Find regexps
FindRepl /G:filename.txt  /A  /J        regexp||JScript        Find regexp, replace by JScript expression

To define search and replace alternations, include the /A switch and separate each search-replace literals pair by a single pipe | character. For example, this search-replace alternation:

Code: Select all

FindRepl "Sun|Mon|Tue|Wed|Thu|Fri|Sat" /A "Dom|Lun|Mar|Mié|Jue|Vie|Sab"

... may be placed in SearchRepl.txt file this way:

Code: Select all

Sun|Dom
Mon|Lun
Tue|Mar
Wed|Mié
Thu|Jue
Fri|Vie
Sat|Sab

... and execute FindRepl this way:

Code: Select all

FindRepl /G:SearchRepl.txt /A



In your case you just need to change the separator of each pair of strings in the english-cebuano.txt file, from " , " to "|", and use it in /G: option. For example:

Code: Select all

C:\> type output.txt
my
username
is
Bars

C:\> type english-cebuano.txt
my|ang akong
username|username
is|mao ang
Bars|Bars

C:\> FindRepl /G:english-cebuano.txt /A < output.txt
ang akong
username
mao ang
Bars


Antonio

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: FindRepl.bat:New regex utility to search and replace str

#80 Post by bars143 » 27 Jan 2015 01:49

Aacini wrote:You may directly use the replacement file in the /G: switch, as described at FindRepl documentation:

FindRepl documentation wrote:6. Loading alternations from a file (/G switch)

The new /G switch allows to place the search/replace alternations in a file. The use of this feature is simple because all previously described facilities works in the same way; the only difference is that the search/replacement alternations are read from a file instead of placed in parameters.

In the description of || feature at previous point, the /J switch allows to use regexps/JScript expressions instead of literal strings, and /A switch is used to set a find-replace operation instead of a find-only one. The same 4 combinations of /A and /J switches are used in the same way when /G switch is added, as shown below:

Code: Select all

Command:                                File lines have:       Operation performed:

FindRepl /G:filename.txt                literal                Find literals
FindRepl /G:filename.txt  /A            literalS|literalR      Find literalS, replace by literalR
FindRepl /G:filename.txt      /J        regexp                 Find regexps
FindRepl /G:filename.txt  /A  /J        regexp||JScript        Find regexp, replace by JScript expression

To define search and replace alternations, include the /A switch and separate each search-replace literals pair by a single pipe | character. For example, this search-replace alternation:

Code: Select all

FindRepl "Sun|Mon|Tue|Wed|Thu|Fri|Sat" /A "Dom|Lun|Mar|Mié|Jue|Vie|Sab"

... may be placed in SearchRepl.txt file this way:

Code: Select all

Sun|Dom
Mon|Lun
Tue|Mar
Wed|Mié
Thu|Jue
Fri|Vie
Sat|Sab

... and execute FindRepl this way:

Code: Select all

FindRepl /G:SearchRepl.txt /A



In your case you just need to change the separator of each pair of strings in the english-cebuano.txt file, from " , " to "|", and use it in /G: option. For example:

Code: Select all

C:\> type output.txt
my
username
is
Bars

C:\> type english-cebuano.txt
my|ang akong
username|username
is|mao ang
Bars|Bars

C:\> FindRepl /G:english-cebuano.txt /A < output.txt
ang akong
username
mao ang
Bars


Antonio


@Aacini,

thanks for a quick reply. it seems script is very simple to do with a suggested "|" as a delimeter (as a recommended for use in english-cebuano.txt).

can you give us another big help below

or i will try to use a sentence as my home assignment:

from :

Code: Select all

my username is Bars


to:

Code: Select all

ang akong username mao ang Bars



if working, then, next is multi-lines:

from :

Code: Select all

my username is Bars

another username is Aacini



to:

Code: Select all

ang akong username mao ang Bars

lain nga username mao ang Aacini



above output have two empty line as required.

thanks again.

Bars

Aacini
Expert
Posts: 1578
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: FindRepl.bat:New regex utility to search and replace str

#81 Post by Aacini » 27 Jan 2015 09:48

Code: Select all

C:\> type output.txt
my username is Bars

another username is Aacini


C:\> type english-cebuano.txt
my|ang akong
is|mao ang
another|lain nga

C:\> FindRepl /G:english-cebuano.txt /A < output.txt
ang akong username mao ang Bars

lain nga username mao ang Aacini


C:\>

Antonio

dbenham
Expert
Posts: 2189
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: FindRepl.bat:New regex utility to search and replace str

#82 Post by dbenham » 27 Jan 2015 09:59

I suspect that code is dependent on the order of the translation table. I think it would require that larger words be listed before smaller words that may be contained within a larger word.

I think this would cause a problem:

Code: Select all

to|xxxx
toward|yyyy

but this would be OK

Code: Select all

toward|yyyy
to|xxxx


Dave Benham
Last edited by dbenham on 27 Jan 2015 15:24, edited 1 time in total.

Aacini
Expert
Posts: 1578
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: FindRepl.bat:New regex utility to search and replace str

#83 Post by Aacini » 27 Jan 2015 13:41

You are right, Dave. This point is explained in the documentation:

FindRepl documentation wrote:The alternatives are evaluated in left-to-right order and stops at the first one that matchs. This means that if narrow regexps are placed first and wider ones after, the last regexps will get just those data that was not "captured" by anyone of previous regexps. For example:

Code: Select all

FindRepl "\d||\w||." /A "'Digits'||'Just letters \(not digits\)'||'The rest of characters \(not digits nor letters\)'" /J

I think I should elaborate on such information with something like this:

Conversely if the alternatives are literal strings, they must be placed in longer-to-smaller order to avoid that a short word may appear inside a larger one and be wrongly replaced.

... and add a complete description of this point in the first part of the documentation. Thanks for note this!

Antonio

dbenham
Expert
Posts: 2189
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: FindRepl.bat:New regex utility to search and replace str

#84 Post by dbenham » 27 Jan 2015 15:23

Have a look at how I solved the problem using JREPL. I build a translation dictionary using environment variables, and then have one regular expression that generically finds words in the source, and the replacement attempts to look up the value using JScript. This avoids the entire ordering issue. I also think it may be more efficient. I suspect you could do something similar.

JScript has access to the Dictionary Object in case you don't want to use environment variables.


Dave Benham

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: FindRepl.bat:New regex utility to search and replace str

#85 Post by bars143 » 27 Jan 2015 23:12

@Aacini , a big thanks for your another good scripts -- it works on multi-lines too with one empty line per sentence including .srt format that required timestamp before a sentence like this example:

1
00:00:24,827 --> 00:00:29,827
My user-name can't be Bars!

2
00:00:59,587 --> 00:01:04,587
My other user-name is Bars143!



here are my files to works on my assignment:

translation source file --> english-cebuano2.txt

Code: Select all

my|ang akong
can't|'ili
be|mao ang
other|uban
is|ay


input subtitle file --> english2.srt

Code: Select all

1
00:00:24,827 --> 00:00:29,827
My user-name can't be Bars!

2
00:00:59,587 --> 00:01:04,587
My other user-name is Bars143!


output subtitle file -->cebuano2.srt

Code: Select all

1
00:00:24,827 --> 00:00:29,827
My user-name 'ili mao ang Bars!

2
00:00:59,587 --> 00:01:04,587
My uban user-name ay Bars143!




the problem above is "My" which is has a capital "M" but my english-cebuano content is "my|ang akong" lowercase "m" which result in no match.

i decided to changed "my|ang akong" to "My|ang akong" as shown below:

Code: Select all

My|ang akong
can't|'ili
be|mao ang
other|uban
is|ay


which is a match as the output show after revising lowercase to uppercase,

Code: Select all

1
00:00:24,827 --> 00:00:29,827
ang akong user-name 'ili mao ang Bars!

2
00:00:59,587 --> 00:01:04,587
ang akong uban user-name ay Bars143!



@Aacini, do you have script for insesitive case to match "my" with "My" ?

here is my edited code based on your scripts added output file at the end of script:

Code: Select all

FindRepl /G:english-cebuano2.txt /A < english2.srt >cebuano2.srt


@Aacini, your scripts seems smaller but need some additional scripts for insesitive case if neccessary?

thanks again,

Bars

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: FindRepl.bat:New regex utility to search and replace str

#86 Post by bars143 » 02 Feb 2015 00:03

HI, Antonio

can you give me an alternative faster script than this i made for two days now:

Code: Select all

@echo off

setlocal enabledelayedexpansion

set "count=1
set /a "count+=1"

echo."https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=">list_of_url.txt

for /f "delims=" %%C in (wordlist.txt) do (
   if !count! lss 30 echo.%%C |FindRepl " \r\n" "+"
   if !count!==30 echo.%%C |FindRepl " \r\n" "+" &&echo. &&echo."https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q="
      if !count! geq 31 if !count! lss 60 echo.%%C |FindRepl " \r\n" "+"
      if !count!==60 echo.%%C |FindRepl " \r\n" "+" &&echo. &&echo."https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q="&& set "count=0"
      set /a "count+=1"
   ) >>list_of_url.txt
   
    < list_of_url.txt FindRepl "\x22" "" > list_of_url2.txt && < list_of_url2.txt FindRepl  "m&q=\x0D\x0A" "m&q=" > list_of_url3.txt && < list_of_url3.txt FindRepl  "\+\ " "" /v > list_of_url4.txt

echo ended....
pause


content of wordlist.txt (cut-off to a 181 lines from original total 58111 lines):

Code: Select all

aardvark
aardwolf
aaron
aback
abacus
abaft
abalone
abandon
abandoned
abandonment
abandons
abase
abased
abasement
abash
abashed
abate
abated
abatement
abates
abattoir
abattoirs
abbe
abbess
abbey
abbeys
abbot
abbots
abbreviate
abbreviated
abbreviates
abbreviating
abbreviation
abbreviations
abdicate
abdicated
abdicates
abdicating
abdication
abdomen
abdomens
abdominal
abduct
abducted
abducting
abduction
abductions
abductor
abductors
abducts
abe
abeam
abel
abele
aberdeen
aberrant
aberration
aberrations
abet
abets
abetted
abetting
abeyance
abhor
abhorred
abhorrence
abhorrent
abhors
abide
abided
abides
abiding
abidjan
abies
abilities
ability
abject
abjectly
abjure
abjured
ablate
ablates
ablating
ablation
ablative
ablaze
able
ablebodied
abler
ablest
abloom
ablution
ablutions
ably
abnegation
abnormal
abnormalities
abnormality
abnormally
aboard
abode
abodes
abolish
abolished
abolishes
abolishing
abolition
abolitionist
abolitionists
abomb
abominable
abominably
abominate
abominated
abomination
abominations
aboriginal
aborigines
abort
aborted
aborting
abortion
abortionist
abortionists
abortions
abortive
aborts
abound
abounded
abounding
abounds
about
above
abraded
abraham
abrasion
abrasions
abrasive
abrasively
abrasiveness
abrasives
abreast
abridge
abridged
abridgement
abridging
abroad
abrogate
abrogated
abrogating
abrogation
abrogations
abrupt
abruptly
abruptness
abscess
abscesses
abscissa
abscissae
abscissas
abscond
absconded
absconder
absconding
absconds
abseil
abseiled
abseiler
abseiling
abseils
absence
absences
absent
absented
absentee
absenteeism
absentees
absenting
absently
absentminded



output result of list_of_url.txt:

Code: Select all

"https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q="
aardvark+aardwolf+aaron+aback+abacus+abaft+abalone+abandon+abandoned+abandonment+abandons+abase+abased+abasement+abash+abashed+abate+abated+abatement+abates+abattoir+abattoirs+abbe+abbess+abbey+abbeys+abbot+abbots+abbreviate+
"https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q="
abbreviated+abbreviates+abbreviating+abbreviation+abbreviations+abdicate+abdicated+abdicates+abdicating+abdication+abdomen+abdomens+abdominal+abduct+abducted+abducting+abduction+abductions+abductor+abductors+abducts+abe+abeam+abel+abele+aberdeen+aberrant+aberration+aberrations+abet+
"https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q="
abets+abetted+abetting+abeyance+abhor+abhorred+abhorrence+abhorrent+abhors+abide+abided+abides+abiding+abidjan+abies+abilities+ability+abject+abjectly+abjure+abjured+ablate+ablates+ablating+ablation+ablative+ablaze+able+ablebodied+abler+
"https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q="
ablest+abloom+ablution+ablutions+ably+abnegation+abnormal+abnormalities+abnormality+abnormally+aboard+abode+abodes+abolish+abolished+abolishes+abolishing+abolition+abolitionist+abolitionists+abomb+abominable+abominably+abominate+abominated+abomination+abominations+aboriginal+aborigines+abort+
"https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q="
aborted+aborting+abortion+abortionist+abortionists+abortions+abortive+aborts+abound+abounded+abounding+abounds+about+above+abraded+abraham+abrasion+abrasions+abrasive+abrasively+abrasiveness+abrasives+abreast+abridge+abridged+abridgement+abridging+abroad+abrogate+abrogated+
"https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q="
abrogating+abrogation+abrogations+abrupt+abruptly+abruptness+abscess+abscesses+abscissa+abscissae+abscissas+abscond+absconded+absconder+absconding+absconds+abseil+abseiled+abseiler+abseiling+abseils+absence+absences+absent+absented+absentee+absenteeism+absentees+absenting+absently+
"https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q="
absentminded+


output result of list_of_url2.txt:

Code: Select all

https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=
aardvark+aardwolf+aaron+aback+abacus+abaft+abalone+abandon+abandoned+abandonment+abandons+abase+abased+abasement+abash+abashed+abate+abated+abatement+abates+abattoir+abattoirs+abbe+abbess+abbey+abbeys+abbot+abbots+abbreviate+
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=
abbreviated+abbreviates+abbreviating+abbreviation+abbreviations+abdicate+abdicated+abdicates+abdicating+abdication+abdomen+abdomens+abdominal+abduct+abducted+abducting+abduction+abductions+abductor+abductors+abducts+abe+abeam+abel+abele+aberdeen+aberrant+aberration+aberrations+abet+
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=
abets+abetted+abetting+abeyance+abhor+abhorred+abhorrence+abhorrent+abhors+abide+abided+abides+abiding+abidjan+abies+abilities+ability+abject+abjectly+abjure+abjured+ablate+ablates+ablating+ablation+ablative+ablaze+able+ablebodied+abler+
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=
ablest+abloom+ablution+ablutions+ably+abnegation+abnormal+abnormalities+abnormality+abnormally+aboard+abode+abodes+abolish+abolished+abolishes+abolishing+abolition+abolitionist+abolitionists+abomb+abominable+abominably+abominate+abominated+abomination+abominations+aboriginal+aborigines+abort+
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=
aborted+aborting+abortion+abortionist+abortionists+abortions+abortive+aborts+abound+abounded+abounding+abounds+about+above+abraded+abraham+abrasion+abrasions+abrasive+abrasively+abrasiveness+abrasives+abreast+abridge+abridged+abridgement+abridging+abroad+abrogate+abrogated+
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=
abrogating+abrogation+abrogations+abrupt+abruptly+abruptness+abscess+abscesses+abscissa+abscissae+abscissas+abscond+absconded+absconder+absconding+absconds+abseil+abseiled+abseiler+abseiling+abseils+absence+absences+absent+absented+absentee+absenteeism+absentees+absenting+absently+
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=
absentminded+



output result of list_of_url3.txt:

Code: Select all

https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=aardvark+aardwolf+aaron+aback+abacus+abaft+abalone+abandon+abandoned+abandonment+abandons+abase+abased+abasement+abash+abashed+abate+abated+abatement+abates+abattoir+abattoirs+abbe+abbess+abbey+abbeys+abbot+abbots+abbreviate+ 
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=abbreviated+abbreviates+abbreviating+abbreviation+abbreviations+abdicate+abdicated+abdicates+abdicating+abdication+abdomen+abdomens+abdominal+abduct+abducted+abducting+abduction+abductions+abductor+abductors+abducts+abe+abeam+abel+abele+aberdeen+aberrant+aberration+aberrations+abet+
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=abets+abetted+abetting+abeyance+abhor+abhorred+abhorrence+abhorrent+abhors+abide+abided+abides+abiding+abidjan+abies+abilities+ability+abject+abjectly+abjure+abjured+ablate+ablates+ablating+ablation+ablative+ablaze+able+ablebodied+abler+
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=ablest+abloom+ablution+ablutions+ably+abnegation+abnormal+abnormalities+abnormality+abnormally+aboard+abode+abodes+abolish+abolished+abolishes+abolishing+abolition+abolitionist+abolitionists+abomb+abominable+abominably+abominate+abominated+abomination+abominations+aboriginal+aborigines+abort+
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=aborted+aborting+abortion+abortionist+abortionists+abortions+abortive+aborts+abound+abounded+abounding+abounds+about+above+abraded+abraham+abrasion+abrasions+abrasive+abrasively+abrasiveness+abrasives+abreast+abridge+abridged+abridgement+abridging+abroad+abrogate+abrogated+
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=abrogating+abrogation+abrogations+abrupt+abruptly+abruptness+abscess+abscesses+abscissa+abscissae+abscissas+abscond+absconded+absconder+absconding+absconds+abseil+abseiled+abseiler+abseiling+abseils+absence+absences+absent+absented+absentee+absenteeism+absentees+absenting+absently+
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=absentminded+



output result of list_of_url4.txt:

Code: Select all

https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=aardvark+aardwolf+aaron+aback+abacus+abaft+abalone+abandon+abandoned+abandonment+abandons+abase+abased+abasement+abash+abashed+abate+abated+abatement+abates+abattoir+abattoirs+abbe+abbess+abbey+abbeys+abbot+abbots+abbreviate
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=abbreviated+abbreviates+abbreviating+abbreviation+abbreviations+abdicate+abdicated+abdicates+abdicating+abdication+abdomen+abdomens+abdominal+abduct+abducted+abducting+abduction+abductions+abductor+abductors+abducts+abe+abeam+abel+abele+aberdeen+aberrant+aberration+aberrations+abet
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=abets+abetted+abetting+abeyance+abhor+abhorred+abhorrence+abhorrent+abhors+abide+abided+abides+abiding+abidjan+abies+abilities+ability+abject+abjectly+abjure+abjured+ablate+ablates+ablating+ablation+ablative+ablaze+able+ablebodied+abler
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=ablest+abloom+ablution+ablutions+ably+abnegation+abnormal+abnormalities+abnormality+abnormally+aboard+abode+abodes+abolish+abolished+abolishes+abolishing+abolition+abolitionist+abolitionists+abomb+abominable+abominably+abominate+abominated+abomination+abominations+aboriginal+aborigines+abort
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=aborted+aborting+abortion+abortionist+abortionists+abortions+abortive+aborts+abound+abounded+abounding+abounds+about+above+abraded+abraham+abrasion+abrasions+abrasive+abrasively+abrasiveness+abrasives+abreast+abridge+abridged+abridgement+abridging+abroad+abrogate+abrogated
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=abrogating+abrogation+abrogations+abrupt+abruptly+abruptness+abscess+abscesses+abscissa+abscissae+abscissas+abscond+absconded+absconder+absconding+absconds+abseil+abseiled+abseiler+abseiling+abseils+absence+absences+absent+absented+absentee+absenteeism+absentees+absenting+absently
https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=absentminded+



as you read the final output file named list_of_url4.txt, the part of the google translate URL for english to cebuano is:



and the the total 30 words limit allowed by google to translates with "+" between words:

aardvark+aardwolf+aaron+aback+abacus+abaft+abalone+abandon+abandoned+abandonment+abandons+abase+abased+abasement+abash+abashed+abate+abated+abatement+abates+abattoir+abattoirs+abbe+abbess+abbey+abbeys+abbot+abbots+abbreviate


complete URL to be used for batch download of webpage is in this format:



but using my own script is slow and time consuming to produces a list 1938 URLs ( 58111 words divided by 30-words per URL = 1938 URL links).

so i needs experts to convert my slow script to faster scripts.

a Jreply script i made below equivalent to FindRepl is same slow too :

Code: Select all

@echo off

setlocal enabledelayedexpansion

md dbenhamTEMP

set "count=1"

echo."https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=" >list_of_url.txt

for /f "delims=" %%C in (wordlist.txt) do (
   set /a "count+=1"
   if !count! lss 30 echo.%%C |jrepl " \r\n" "+" /m
   if !count!==30 echo.&& echo."https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=" && set "count=1"
   ) >>list_of_url.txt
   
   jrepl "\q" "" /x /m /f list_of_url.txt /o dbenhamTEMP\list_of_url2.txt && jrepl " \r\n" "" /x /m /f dbenhamTEMP\list_of_url2.txt /o dbenhamTEMP\list_of_url3.txt && jrepl "\+\r\n" "\r\n" /x /m /f dbenhamTEMP\list_of_url3.txt /o dbenhamTEMP\list_of_url4.txt

echo ended....
pause


a big help from experts are always fruitful.

maybe your new splitfile.bat be applicable to split a file with 58111 lines into a file 1938 lines of 30=words per line?
i thinks you can do that too.

thanks for your last helpful responses.

Bars

Aacini
Expert
Posts: 1578
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: FindRepl.bat:New regex utility to search and replace str

#87 Post by Aacini » 02 Feb 2015 12:21

Excuse me. You will get a better help if you describe your problem in a concise text description. Don't use code segments as base examples. Suppose we know nothing about your problem. Use small segments of data files (4 or 5 lines) as examples.

EDIT: Code added

Check if this is what you want:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set wordsPerUrl=30
set urlsPerFile=7

del list_of_url*.txt 2> NUL
set /A wordCount=0, urlCount=0, fileCount=1001
set "url="
echo File: 001
for /F %%a in (wordlist.txt) do (
   set "url=!url!+%%a"
   set /A wordCount+=1
   if !wordCount! equ %wordsPerUrl% (
      set wordCount=0
      echo "https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=" !url:~1! >> list_of_url!fileCount:~1!.txt
      set "url="
      set /A urlCount+=1
      if !urlCount! equ %urlsPerFile% set /A urlCount=0, fileCount+=1 & echo File: !fileCount:~1!
   )
)
if defined url echo "https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=" !url:~1! >> list_of_url!fileCount:~1!.txt


Antonio

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: FindRepl.bat:New regex utility to search and replace str

#88 Post by bars143 » 03 Feb 2015 06:04

Aacini wrote:Excuse me. You will get a better help if you describe your problem in a concise text description. Don't use code segments as base examples. Suppose we know nothing about your problem. Use small segments of data files (4 or 5 lines) as examples.

EDIT: Code added

Check if this is what you want:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set wordsPerUrl=30
set urlsPerFile=7

del list_of_url*.txt 2> NUL
set /A wordCount=0, urlCount=0, fileCount=1001
set "url="
echo File: 001
for /F %%a in (wordlist.txt) do (
   set "url=!url!+%%a"
   set /A wordCount+=1
   if !wordCount! equ %wordsPerUrl% (
      set wordCount=0
      echo "https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=" !url:~1! >> list_of_url!fileCount:~1!.txt
      set "url="
      set /A urlCount+=1
      if !urlCount! equ %urlsPerFile% set /A urlCount=0, fileCount+=1 & echo File: !fileCount:~1!
   )
)
if defined url echo "https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q=" !url:~1! >> list_of_url!fileCount:~1!.txt


Antonio


Thanks, Antonio

your script really does not need Findrepl yet it finished my 58111 lines code in just 30 seconds better than 15 minutes of my own combo of splitfile.bat, Findrepl.bat, and merge.bat

but i adjust your batch script to suit my needs to remove surrounding double-quote symbols in URL and output all in just one file:

Code: Select all

    @echo off
    setlocal EnableDelayedExpansion
   md "%~dp0Aacini"
   set "xx=%~dp0Aacini"

    set wordsPerUrl=30

    del list_of_url*.txt 2> NUL
    set /A wordCount=0, urlCount=0, fileCount=1001
    set "url="
    echo File: 001
    for /F %%a in (wordlist.txt) do (
      set "ww=https://translate.google.com/m?hl=en&sl=auto&tl=ceb&ie=UTF-8&prev=_m&q="
       set "url=!url!+%%a"
       set /A wordCount+=1
       if !wordCount! equ %wordsPerUrl% (
          set wordCount=0
          echo.!ww!!url:~1! >> !xx!\list_of_url!fileCount:~1!.txt
          set "url="
          set /A urlCount+=1
         echo File: !fileCount:~1!
       )
    )
    if defined url echo.!ww!!url:~1! >> !xx!\list_of_url!fileCount:~1!.txt



its great to have an expert like you Antonio -- to solved my slow script.
next time i will accept your suggestion for posting example up to 5 lines as needed. :D

bars,

foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: FindRepl.bat:New regex utility to search and replace str

#89 Post by foxidrive » 01 Apr 2015 17:14

Antonio, when using the /help switch, the top of the help options scrolls off the screen.
I only see from option 3 and I have a larger than normal console window.

Does it do that on your screen too?

Aacini
Expert
Posts: 1578
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: FindRepl.bat:New regex utility to search and replace str

#90 Post by Aacini » 02 Apr 2015 13:12

foxidrive wrote:Antonio, when using the /help switch, the top of the help options scrolls off the screen.
I only see from option 3 and I have a larger than normal console window.

Does it do that on your screen too?


I have two machines, a very old Windows XP computer and a recent one with Windows 8.1. In both of them the cmd.exe text window is 43 lines height (that is a standard height since the VGA video card days, besides 25 and 50 lines), so I lost the header line of the /help screen only. If the two blank lines at bottom where replaced by just one, the /help option would fit the screen exactly. Anyway, you may review the top part of the /help screen with the side scroll bar...

Antonio

Post Reply