Page 1 of 1
Script to Remove Unknown Number Character from Text String in a file.
Posted: 14 Jun 2016 10:46
by Phaedrus T. Wolfe
I have a DOS batch file that needs to remove certain text from an HTML file when it is run. The offensive text in the HTML file is:
The problem is that the number character, in this case shown the number 9, can be any number from 0 to 100 at random. I never know what the number will be.
I tried the following code in my DOS batch file with no results:
Code: Select all
powershell -Command "(gc *.htm) -replace 'id="Picture ?"', '' | Out-File *.htm"
where '"id="Picture "', '' ends with two single quote marks in order to remove all text.
My faulty logic was that the question mark would act as a wild card for that character position.
Can someone please help me on this issue?
Thank You
Phaedrus T. Wolfe
Re: Script to Remove Unknown Number Character from Text String in a file.
Posted: 14 Jun 2016 16:15
by sambul35
Can you post an accurate example of your html file?
What error or result did you get when running the PS command?
Re: Script to Remove Unknown Number Character from Text String in a file.
Posted: 14 Jun 2016 17:01
by Phaedrus T. Wolfe
sambul35 wrote:Can you post an accurate example of your html file?
Sure, thank you. Here is a relevant fragment of the html file. It's extracted from an ebook file in HTML format.
Code: Select all
<p class=MsoNormal>The thud of the door shutting made the last glimpse of
Britney repeat in his mind. </p>
<p class=End><img width=77 height=24 [b]id="Picture 9"[/b]
src="Henry%20Davis%20%20-%20Dark%20Secrets%2002%20-%20Beautifully%20Done_files/image003.jpg"></p>
<b><span style='font-size:18.0pt;font-family:"Verdana","sans-serif";color:#0D0D0D'><br
style='page-break-before:always'>
</span></b>
<h2><a name=Chap02></a><a href="#AaContents">Chapter 2</a></h2>
Thank you for your help
sambul35 wrote:What error did you get when running the PS command?
No actual warning error from the system occurs. The error is that what I want does not happen. Notice how the
id="Picture 9" changed to
id="9" What I want is for that entire ID section to be deleted.
Code: Select all
<p class=End><img width=77 height=24 id="9"
src="Henry%20Davis%20%20-%20Dark%20Secrets%2002%20-%20Beautifully%20Done_files/image003.jpg"></p>
This would be the ideal code I want to achieve. Notice the complete ID section is removed:
Code: Select all
<p class=End><img width=77 height=24
src="Henry%20Davis%20%20-%20Dark%20Secrets%2002%20-%20Beautifully%20Done_files/image003.jpg"></p>
Re: Script to Remove Unknown Number Character from Text String in a file.
Posted: 14 Jun 2016 22:36
by foxidrive
Test this:
Code: Select all
call jrepl.bat "id=\qPicture [0-9]*\q" "" /x /f "file.html" /o -
This uses a native Windows batch script called Jrepl.bat written by Dave Benham
Put it in the same folder, or in a folder that is on the system path.
viewtopic.php?f=3&t=6044or download it from Dropbox (unblock it after downloading):
https://www.dropbox.com/s/4otci4d4s8x5ni4/Jrepl.bat
Re: Script to Remove Unknown Number Character from Text String in a file.
Posted: 15 Jun 2016 06:53
by sambul35
Phaedrus T. Wolfe wrote:the id="Picture 9" changed to id="9"
Define initial and resulting strings as variables, try '*' instead of '?', and use
-like operator with wild cards jointly with
-replace in your PowerShell code block. See usage
examples on the web. Let us know what you come up with.