Clean text

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
darioit
Posts: 230
Joined: 02 Aug 2010 05:25

Clean text

#1 Post by darioit » 16 Mar 2018 02:13

Hello,
which is the best way to clean this text from:

2016-04-09 13:16:45 ....A 105907 aa_bb_cc_dd.pdf
2016-04-09 13:16:45 ....A 105907 directory\ee_ff_gg_h.pdf

to:
aa_bb_cc_dd.pdf
ee_ff_gg_h.pdf

Thank you in advance

darioit
Posts: 230
Joined: 02 Aug 2010 05:25

Re: Clean text

#2 Post by darioit » 16 Mar 2018 04:00

maybe this could works, it's just a regular expression!

Code: Select all

(?:[^ ]+ ){4}(?:.*\\)?(.*)

ShadowThief
Expert
Posts: 1159
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Clean text

#3 Post by ShadowThief » 16 Mar 2018 08:47

darioit wrote:
16 Mar 2018 04:00
maybe this could works, it's just a regular expression!

Code: Select all

(?:[^ ]+ ){4}(?:.*\\)?(.*)
Batch can handle only the most rudimentary of regular expressions. What you've posted will not work, since it has no idea what {4} means.

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Clean text

#4 Post by aGerman » 16 Mar 2018 12:45

What is "....A" for? Does it contain any spaces? And if so, is it always the same number of spaces? Because in that case you just could use a FOR /F loop. Your two example lines work with that code:

Code: Select all

@echo off &setlocal
for /f "tokens=4*" %%i in ('findstr /rc:"20[0-9][0-9]-[01][0-9]-[0-3][0-9] [0-2][0-9]:[0-5][0-9]:[0-5][0-9]" "test.txt"') do echo %%~nxj
pause
It filters all lines that begin with date and time. The 4 in "tokens=4*" is for the number of substrings separated by spaces or tabs found in front of the file name or path.

Steffen

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Clean text

#5 Post by Samir » 16 Mar 2018 23:22

This looks like you're just trying to get the filenames from a directory listing. And judging by the \ character, it's a dos/windows system.

So instead of using just dir, you can use dir /b.

For finding all the filenames in not only the current directory but also subdirectories, the for command is a bit more useful:

Code: Select all

for /r %f in (*) do echo %~nxf
There may be much better ways to do this as I'm no expert. 8)

darioit
Posts: 230
Joined: 02 Aug 2010 05:25

Re: Clean text

#6 Post by darioit » 17 Mar 2018 00:22

sorry it's not a dir command, but "7za l file.zip"

I'll check your solution thank you for help!

darioit
Posts: 230
Joined: 02 Aug 2010 05:25

Re: Clean text

#7 Post by darioit » 17 Mar 2018 01:44

I try solution from aGerman and works fine except first raw that always is:

2017-01-17 10:50:41 ..... 160045 1485635 aa_bb_cc_dd.pdf
where 1485635 show size compressed archive

we could parse name starting from "pdf" word, that's a constant, so we get name backwards when find a space or backslashes

Thank you

darioit
Posts: 230
Joined: 02 Aug 2010 05:25

Re: Clean text

#8 Post by darioit » 17 Mar 2018 03:47

another hint, filter is enough on dots ....
thank you

Compo
Posts: 599
Joined: 21 Mar 2014 08:50

Re: Clean text

#9 Post by Compo » 17 Mar 2018 06:18

It would have helped had you provided a more accurate example of the output data from the outset.

Code: Select all

7-Zip (a) 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28

Scanning the drive for archives:
1 file, iiiiiii bytes (iiii KiB)

Listing archive: file.zip

--
Path = file.zip
Type = zip
Physical Size = iiiiiii

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2017-01-17 10:50:41 ....A       160045      1485635  aa_bb_cc_dd.pdf
iiii-ii-ii ii:ii:ii ....A       iiiiii      iiiiiii  directory\ee_ff_gg_h.pdf
------------------- ----- ------------ ------------  ------------------------
iiii-ii-ii ii:ii:ii           iiiiiiii      iiiiiii  i files, i folders
Here's one way of doing it, with the command you've now provided and a relevant data example:

Code: Select all

@Echo Off
SetLocal EnableDelayedExpansion
Set "_=0"
(For /F "Tokens=5*" %%A In ('7za l "file.zip"') Do (Set "T5=%%A"
	If "!T5:~,3!"=="---" Set /A _+=1
	If !_! Equ 1 If Not "%%B"=="" Echo %%B))>"output.txt"
The output file, output.txt, should contain the 'clean' data you require.

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Clean text

#10 Post by Samir » 17 Mar 2018 15:55

darioit wrote:
17 Mar 2018 00:22
sorry it's not a dir command, but "7za l file.zip"

I'll check your solution thank you for help!
Ahh makes sense. And I just checked the options for 7za and it doestn' seem to have any options for the display.

I have some ideas on how to do this since the column width seems always the same, but I'll wait to see if someone already solves it for you. 8)

Compo
Posts: 599
Joined: 21 Mar 2014 08:50

Re: Clean text

#11 Post by Compo » 17 Mar 2018 17:22

Samir, look at the post right above yours :!:

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Clean text

#12 Post by Samir » 17 Mar 2018 18:40

Compo wrote:
17 Mar 2018 17:22
Samir, look at the post right above yours :!:
Yeah, I'm sure that will solve the OP's issue--I just couldn't wrap my head around exactly what was going on there. :oops:

Post Reply