DosTips.com

A Forum all about DOS Batch
It is currently 13 Feb 2016 21:48

All times are UTC-06:00




Post new topic  Reply to topic  [ 19 posts ]  Go to page 1 2 Next
Author Message
 Post subject: String extract question
PostPosted: 21 May 2011 22:49 
Offline

Joined: 03 May 2011 19:06
Posts: 116
Experts,

How do you write this in batch?

Source File: source.txt

Source.txt
Code:
C:\randomdir\randomdir\randomdir\randomdir:note.txt  thisfile_C1.jpg 10458791 with handwrittenC:\randomdir\randomdir\randomdir\randomdir\randomdir:note  thisfile_C1.jpg 10458791 with handwritten
C:\randomdir\randomdir\andomdir\randomdir:notes.txt  thisfile_C2.jpg 50459491 with handwritten
C:\randomdir\randomdir:note.txt  thisfile_C2.jpg 78458791 with handwritten
C:\randomdir\randomdir\randomdir\randomdir\randomdir\randomdir\randomdir:note.txt  thisfile_C1.jpg 90458791 with handwritten


the "thisfile_c1.jpg 50459491 with handwritten" will be extracted and written in newfile.txt, the result should be like this:
Output of newfile.txt
Quote:
Img-------------Serial-------Remarks

thisfile_C1.jpg 10458791 with handwritten
thisfile_C1.jpg 10458791 with handwritten
thisfile_C2.jpg 50459491 with handwritten
thisfile_C2.jpg 78458791 with handwritten
thisfile_C1.jpg 90458791 with handwritten


Thanks in advance.

renzlo


Top
   
PostPosted: 22 May 2011 03:31 
Offline
Expert

Joined: 17 Oct 2009 08:30
Posts: 378
Location: Russia
in command line:
Code:
type source.txt |sed -n -r -e s/(\swith\shandwritten)/\1\n/p |sed -n -r -e s/.*\s(thisfile_C[0-9].jpg\s[0-9]*\swith\shandwritten)/\1/p > newfile.txt


Last edited by !k on 22 May 2011 03:35, edited 1 time in total.

Top
   
PostPosted: 22 May 2011 03:34 
Offline

Joined: 03 May 2011 19:06
Posts: 116
thanks for the reply..

i got this:

Code:
'sed' is not recognized as an internal or external command,
operable program or batch file.


Top
   
PostPosted: 22 May 2011 03:42 
Offline
Expert

Joined: 17 Oct 2009 08:30
Posts: 378
Location: Russia
http://sourceforge.net/projects/gnuwin32/files/sed/4.2.1/sed-4.2.1-bin.zip/download (317.9 KB)
homepage http://sourceforge.net/projects/gnuwin32/files/sed/


Top
   
PostPosted: 22 May 2011 03:45 
Offline

Joined: 03 May 2011 19:06
Posts: 116
thanks !k

by the way, is this possible with pure dos batch? not using any other program like sed?


Top
   
PostPosted: 22 May 2011 04:19 
Offline
Expert

Joined: 17 Oct 2009 08:30
Posts: 378
Location: Russia
Code:
@echo off
for /f "delims=" %%a in (source.txt) do call :p "%%a"
goto :eof

:p
set "str="
for %%b in (%~1) do (
echo %%b |findstr /r /c:"thisfile_C[0-9]\.jpg" >nul &&set "str=%%b" ||if defined str call set "str=%%str%% %%b"
)
echo,%str%>>newfile.txt
goto :eof

very dirty & fail with "... handwrittenC:\..."


Last edited by !k on 22 May 2011 05:20, edited 1 time in total.

Top
   
PostPosted: 22 May 2011 04:53 
Offline

Joined: 03 May 2011 19:06
Posts: 116
thanks for the script, this should work if "thisfile" is changing? the real filename of jpg is "8character"_c[0-9].jpg.

thanks for your time.


Top
   
PostPosted: 22 May 2011 05:06 
Offline
Expert

Joined: 17 Oct 2009 08:30
Posts: 378
Location: Russia
findstr/? wrote:
Regular expression quick reference:
. Wildcard: any character
* Repeat: zero or more occurances of previous character or class
^ Line position: beginning of line
$ Line position: end of line
[class] Character class: any one character in set
[^class] Inverse class: any one character not in set
[x-y] Range: any characters within the specified range
\x Escape: literal use of metacharacter x
\<xyz Word position: beginning of word
xyz\> Word position: end of word


so findstr /r /c:"\<........_C[0-9]\.jpg"


Top
   
PostPosted: 22 May 2011 05:15 
Offline

Joined: 03 May 2011 19:06
Posts: 116
thanks !k. it is working great. the only problem is that my source is dirty. And i can't do about it. thanks again.


Top
   
PostPosted: 22 May 2011 05:28 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 1934
Location: Germany
Well, FINDSTR will find the line that contains the pattern you're looking for, but it's always a problem to extract it.
In this case (if you don't want to install a 3rd party app) I suggest to use e.g. VBScript.

*.vbs
Code:
strSourceFile = "source.txt"
strDestFile = "newfile.txt"
strPattern = "thisfile_C[0-9][0-9]*\.jpg\s[0-9][0-9]*\swith\shandwritten"

Set objRegEx = New RegExp
With objRegEx
  .Global = True
  .IgnoreCase = True
  .Pattern = strPattern
End With

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objSourceFile = objFSO.OpenTextFile(strSourceFile, 1)
Set objDestFile = objFSO.OpenTextFile(strDestFile, 2, True)

While Not objSourceFile.AtEndOfStream
  Set colMatches = objRegEx.Execute(objSourceFile.ReadLine)
  For Each objMatch In colMatches
    objDestFile.WriteLine objMatch.Value
  Next
Wend

objDestFile.Close
objSourceFile.Close


Regards
aGerman


Top
   
PostPosted: 22 May 2011 05:50 
Offline

Joined: 03 May 2011 19:06
Posts: 116
thanks aGerman. it is working great but what if there's a line in the source text that looks like this:

Code:
C:\randomdir\randomdir:note.txt      thisfile_C2.jpg       78458791  with  handwritten
C:\randomdir\randomdir:note.txtthisfile_C2.jpg<tab>78458791<tab>with  handwritten


and can i use also a wilcard in "thisfile"?

thanks.


Top
   
PostPosted: 22 May 2011 06:10 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 1934
Location: Germany
Test it with the following line:
Code:
strPattern = "\w*_C[0-9][0-9]*\.jpg\t[0-9][0-9]*\twith\shandwritten"


Here you can find a reference.

Regards
aGerman


Top
   
PostPosted: 22 May 2011 06:29 
Offline

Joined: 03 May 2011 19:06
Posts: 116
thanks aGerman. It seems that i need to play with regular expression. currently only those who have many white space is not extracted.


Top
   
PostPosted: 22 May 2011 07:23 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 1934
Location: Germany
Haha, one more try:
Code:
strPattern = "\w*_C[0-9][0-9]*\.jpg\s\s*[0-9][0-9]*\s\s*with\shandwritten"


Regards
aGerman


Top
   
PostPosted: 22 May 2011 08:38 
Offline

Joined: 03 May 2011 19:06
Posts: 116
got it with this:

Code:
strPattern = "\w*_C[0-9][0-9]*\.jpg[\s\t]*[0-9][0-9]*[\s\t]*with[\s\t]*handwritten"


thanks aGerman.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 19 posts ]  Go to page 1 2 Next

All times are UTC-06:00


Who is online

Users browsing this forum: Bing [Bot], Yahoo [Bot] and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Limited