delete greater/lesser-than symbols including strings inside

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
bars143
Posts: 87
Joined: 01 Sep 2013 20:47

delete greater/lesser-than symbols including strings inside

#1 Post by bars143 » 17 Oct 2013 04:46

below are portion of sources code of html:

Code: Select all

<html>
<p>hello world</>
<a href=.......>website</a>


and of above i only want to remove the following:

[code]
<html>
<p>
</>
<a href=.......>
</a>
[code]

leaving others intact.
but findrepl.bat is difficult for me to learn without the help of an expert.

answer is highly appreciated.
waiting for a reply to those who give script that remove string inside greater/lesser-than symbols and also remove greater/lesser-than symbols itself.

thanks,
bars

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: delete greater/lesser-than symbols including strings ins

#2 Post by foxidrive » 17 Oct 2013 08:38

Can you save the page as plain text in your web browser?

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: delete greater/lesser-than symbols including strings ins

#3 Post by bars143 » 17 Oct 2013 16:37

foxidrive wrote:Can you save the page as plain text in your web browser?


sorry for late reply.

and yes ,i can save a page as plain text in a web browser and i can open html by using notepad,notepad2 and notepad++.

my purpose of deleting "<--any string-->" is to convert saved html(source code) into a readable text file without the use of web browser in offline mode.

in a notepad i dont know how to remove "<>" with inside string of varying string length and i only though of this code like " <*> " with " * " as any characters length but does not work in notepad. and i only know is match string or same string length replacement but does not know code that match any varying string length.

anyway thanks for a reply.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: delete greater/lesser-than symbols including strings ins

#4 Post by foxidrive » 17 Oct 2013 17:29

Using repl.bat you can easily remove specific HMTL tags from a file, but there are command line tools that will do the whole job for you too.

HTMSTRIP.EXE is an MSDOS tool - no long filename support but it works well.

HTMLAsText - Web Site: http://www.nirsoft.net


There are probably many more tools that you can google for too.

The helper batch file called `repl.bat` is here - viewtopic.php?f=3&t=3855

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: delete greater/lesser-than symbols including strings ins

#5 Post by bars143 » 17 Oct 2013 17:39

thanks foxi for links

and i will try and test later.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: delete greater/lesser-than symbols including strings ins

#6 Post by dbenham » 17 Oct 2013 21:37

Using REPL.BAT:

Code: Select all

type file.txt|repl "<.*?>" "" >file.txt.new

If you want to replace the original, follow with:

Code: Select all

move /y file.txt.new file.txt >nul


Dave Benham

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: delete greater/lesser-than symbols including strings ins

#7 Post by bars143 » 18 Oct 2013 00:13

@foxi, it is ,HTMLAsText , does work.

@dbenham, i got your repl.bat but there is one that are not deleted, a tag comment, as shown below-(its 11-lines total) :

Code: Select all

<!--
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-11051788-9']);
  _gaq.push(['_setDomainName', 'pinoyden.com.ph']);
  _gaq.push(['_trackPageview']);
  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })(); 
    //-->


and i used "^<.*?>$" but does not remove all-lines starting first-line "<!--" and up to last-line "//-->"

its difficult for me to code without your example.
thanks for a reply and it help me learn a batch code aside from HTMLAsText.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: delete greater/lesser-than symbols including strings ins

#8 Post by dbenham » 18 Oct 2013 04:52

Comment tags are easily solved with an extra REPL. Comments can be multi-line, so the REPL needs the M option. There may be < and > within the comment, so the comments should be removed first.

Code: Select all

type file.txt|repl "<!--[\w\W]*?-->" "" m|repl "<.*?>" "" >file.txt.new


Dave Benham

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: delete greater/lesser-than symbols including strings ins

#9 Post by bars143 » 19 Oct 2013 21:30

dbenham wrote:Comment tags are easily solved with an extra REPL. Comments can be multi-line, so the REPL needs the M option. There may be < and > within the comment, so the comments should be removed first.

Code: Select all

type file.txt|repl "<!--[\w\W]*?-->" "" m|repl "<.*?>" "" >file.txt.new


Dave Benham


big thanks @dbenham, for the one-liner working code above
and later i will study syntax as per your link given in your repl thread. :D

cheers,
bars

Post Reply