String extract question

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
renzlo
Posts: 116
Joined: 03 May 2011 19:06

String extract question

#1 Post by renzlo » 21 May 2011 22:49

Experts,

How do you write this in batch?

Source File: source.txt

Source.txt

Code: Select all

C:\randomdir\randomdir\randomdir\randomdir:note.txt  thisfile_C1.jpg 10458791 with handwrittenC:\randomdir\randomdir\randomdir\randomdir\randomdir:note  thisfile_C1.jpg 10458791 with handwritten
C:\randomdir\randomdir\andomdir\randomdir:notes.txt  thisfile_C2.jpg 50459491 with handwritten
C:\randomdir\randomdir:note.txt  thisfile_C2.jpg 78458791 with handwritten
C:\randomdir\randomdir\randomdir\randomdir\randomdir\randomdir\randomdir:note.txt  thisfile_C1.jpg 90458791 with handwritten


the "thisfile_c1.jpg 50459491 with handwritten" will be extracted and written in newfile.txt, the result should be like this:
Output of newfile.txt
Img-------------Serial-------Remarks

thisfile_C1.jpg 10458791 with handwritten
thisfile_C1.jpg 10458791 with handwritten
thisfile_C2.jpg 50459491 with handwritten
thisfile_C2.jpg 78458791 with handwritten
thisfile_C1.jpg 90458791 with handwritten


Thanks in advance.

renzlo

!k
Expert
Posts: 378
Joined: 17 Oct 2009 08:30
Location: Russia

Re: String extract question

#2 Post by !k » 22 May 2011 03:31

in command line:

Code: Select all

type source.txt |sed -n -r -e s/(\swith\shandwritten)/\1\n/p |sed -n -r -e s/.*\s(thisfile_C[0-9].jpg\s[0-9]*\swith\shandwritten)/\1/p > newfile.txt
Last edited by !k on 22 May 2011 03:35, edited 1 time in total.

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: String extract question

#3 Post by renzlo » 22 May 2011 03:34

thanks for the reply..

i got this:

Code: Select all

'sed' is not recognized as an internal or external command,
operable program or batch file.


renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: String extract question

#5 Post by renzlo » 22 May 2011 03:45

thanks !k

by the way, is this possible with pure dos batch? not using any other program like sed?

!k
Expert
Posts: 378
Joined: 17 Oct 2009 08:30
Location: Russia

Re: String extract question

#6 Post by !k » 22 May 2011 04:19

Code: Select all

@echo off
for /f "delims=" %%a in (source.txt) do call :p "%%a"
goto :eof

:p
set "str="
for %%b in (%~1) do (
echo %%b |findstr /r /c:"thisfile_C[0-9]\.jpg" >nul &&set "str=%%b" ||if defined str call set "str=%%str%% %%b"
)
echo,%str%>>newfile.txt
goto :eof

very dirty & fail with "... handwrittenC:\..."
Last edited by !k on 22 May 2011 05:20, edited 1 time in total.

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: String extract question

#7 Post by renzlo » 22 May 2011 04:53

thanks for the script, this should work if "thisfile" is changing? the real filename of jpg is "8character"_c[0-9].jpg.

thanks for your time.

!k
Expert
Posts: 378
Joined: 17 Oct 2009 08:30
Location: Russia

Re: String extract question

#8 Post by !k » 22 May 2011 05:06

findstr/? wrote:Regular expression quick reference:
. Wildcard: any character
* Repeat: zero or more occurances of previous character or class
^ Line position: beginning of line
$ Line position: end of line
[class] Character class: any one character in set
[^class] Inverse class: any one character not in set
[x-y] Range: any characters within the specified range
\x Escape: literal use of metacharacter x
\<xyz Word position: beginning of word
xyz\> Word position: end of word


so findstr /r /c:"\<........_C[0-9]\.jpg"

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: String extract question

#9 Post by renzlo » 22 May 2011 05:15

thanks !k. it is working great. the only problem is that my source is dirty. And i can't do about it. thanks again.

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: String extract question

#10 Post by aGerman » 22 May 2011 05:28

Well, FINDSTR will find the line that contains the pattern you're looking for, but it's always a problem to extract it.
In this case (if you don't want to install a 3rd party app) I suggest to use e.g. VBScript.

*.vbs

Code: Select all

strSourceFile = "source.txt"
strDestFile = "newfile.txt"
strPattern = "thisfile_C[0-9][0-9]*\.jpg\s[0-9][0-9]*\swith\shandwritten"

Set objRegEx = New RegExp
With objRegEx
  .Global = True
  .IgnoreCase = True
  .Pattern = strPattern
End With

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objSourceFile = objFSO.OpenTextFile(strSourceFile, 1)
Set objDestFile = objFSO.OpenTextFile(strDestFile, 2, True)

While Not objSourceFile.AtEndOfStream
  Set colMatches = objRegEx.Execute(objSourceFile.ReadLine)
  For Each objMatch In colMatches
    objDestFile.WriteLine objMatch.Value
  Next
Wend

objDestFile.Close
objSourceFile.Close


Regards
aGerman

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: String extract question

#11 Post by renzlo » 22 May 2011 05:50

thanks aGerman. it is working great but what if there's a line in the source text that looks like this:

Code: Select all

C:\randomdir\randomdir:note.txt      thisfile_C2.jpg       78458791  with  handwritten
C:\randomdir\randomdir:note.txtthisfile_C2.jpg<tab>78458791<tab>with  handwritten


and can i use also a wilcard in "thisfile"?

thanks.

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: String extract question

#12 Post by aGerman » 22 May 2011 06:10

Test it with the following line:

Code: Select all

strPattern = "\w*_C[0-9][0-9]*\.jpg\t[0-9][0-9]*\twith\shandwritten"


Here you can find a reference.

Regards
aGerman

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: String extract question

#13 Post by renzlo » 22 May 2011 06:29

thanks aGerman. It seems that i need to play with regular expression. currently only those who have many white space is not extracted.

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: String extract question

#14 Post by aGerman » 22 May 2011 07:23

Haha, one more try:

Code: Select all

strPattern = "\w*_C[0-9][0-9]*\.jpg\s\s*[0-9][0-9]*\s\s*with\shandwritten"


Regards
aGerman

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: String extract question

#15 Post by renzlo » 22 May 2011 08:38

got it with this:

Code: Select all

strPattern = "\w*_C[0-9][0-9]*\.jpg[\s\t]*[0-9][0-9]*[\s\t]*with[\s\t]*handwritten"


thanks aGerman.

Post Reply