Script to Remove Unknown Number Character from Text String in a file.

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Phaedrus T. Wolfe
Posts: 2
Joined: 14 Jun 2016 10:26

Script to Remove Unknown Number Character from Text String in a file.

#1 Post by Phaedrus T. Wolfe » 14 Jun 2016 10:46

I have a DOS batch file that needs to remove certain text from an HTML file when it is run. The offensive text in the HTML file is:

Code: Select all

id="Picture 9"

The problem is that the number character, in this case shown the number 9, can be any number from 0 to 100 at random. I never know what the number will be.

I tried the following code in my DOS batch file with no results:

Code: Select all

powershell -Command "(gc *.htm) -replace 'id="Picture ?"', '' | Out-File *.htm"

where '"id="Picture "', '' ends with two single quote marks in order to remove all text.

My faulty logic was that the question mark would act as a wild card for that character position.

Can someone please help me on this issue?
Thank You
Phaedrus T. Wolfe

sambul35
Posts: 192
Joined: 18 Jan 2012 10:13

Re: Script to Remove Unknown Number Character from Text String in a file.

#2 Post by sambul35 » 14 Jun 2016 16:15

Can you post an accurate example of your html file?

What error or result did you get when running the PS command?

Phaedrus T. Wolfe
Posts: 2
Joined: 14 Jun 2016 10:26

Re: Script to Remove Unknown Number Character from Text String in a file.

#3 Post by Phaedrus T. Wolfe » 14 Jun 2016 17:01

sambul35 wrote:Can you post an accurate example of your html file?


Sure, thank you. Here is a relevant fragment of the html file. It's extracted from an ebook file in HTML format.

Code: Select all

<p class=MsoNormal>The thud of the door shutting made the last glimpse of
Britney repeat in his mind. </p>

<p class=End><img width=77 height=24 [b]id="Picture 9"[/b]
src="Henry%20Davis%20%20-%20Dark%20Secrets%2002%20-%20Beautifully%20Done_files/image003.jpg"></p>

<b><span style='font-size:18.0pt;font-family:"Verdana","sans-serif";color:#0D0D0D'><br
style='page-break-before:always'>
</span></b>

<h2><a name=Chap02></a><a href="#AaContents">Chapter 2</a></h2>


Thank you for your help


sambul35 wrote:What error did you get when running the PS command?


No actual warning error from the system occurs. The error is that what I want does not happen. Notice how the id="Picture 9" changed to id="9" What I want is for that entire ID section to be deleted.

Code: Select all

<p class=End><img width=77 height=24 id="9"
src="Henry%20Davis%20%20-%20Dark%20Secrets%2002%20-%20Beautifully%20Done_files/image003.jpg"></p>


This would be the ideal code I want to achieve. Notice the complete ID section is removed:

Code: Select all

<p class=End><img width=77 height=24
src="Henry%20Davis%20%20-%20Dark%20Secrets%2002%20-%20Beautifully%20Done_files/image003.jpg"></p>

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Script to Remove Unknown Number Character from Text String in a file.

#4 Post by foxidrive » 14 Jun 2016 22:36

Test this:

Code: Select all

call jrepl.bat "id=\qPicture [0-9]*\q"  "" /x /f "file.html" /o -



This uses a native Windows batch script called Jrepl.bat written by Dave Benham
Put it in the same folder, or in a folder that is on the system path.

viewtopic.php?f=3&t=6044
or download it from Dropbox (unblock it after downloading): https://www.dropbox.com/s/4otci4d4s8x5ni4/Jrepl.bat

sambul35
Posts: 192
Joined: 18 Jan 2012 10:13

Re: Script to Remove Unknown Number Character from Text String in a file.

#5 Post by sambul35 » 15 Jun 2016 06:53

Phaedrus T. Wolfe wrote:the id="Picture 9" changed to id="9"

Define initial and resulting strings as variables, try '*' instead of '?', and use -like operator with wild cards jointly with -replace in your PowerShell code block. See usage examples on the web. Let us know what you come up with.

Post Reply