Page 1 of 2

search for previous item

Posted: 29 Jan 2022 02:20
by darioit
Hello I got this problem:
a very huge txt list of content of zip file created using winrar

file: filearchive.txt contain:
#Archivio E:\pippo.zip
2021-01-01 01:01 12345 doc1.pdf
2021-01-02 01:02 23451 doc2.pdf
#Archivio E:\pluto.zip
2021-01-03 01:01 34556 doc3.pdf
2021-01-04 01:02 78123 doc4.pdf
etc.

Using "find" or "findstr" is easy search for document "doc2.pdf" for example, but how can I get a name of zip "E:\pippo.zip" at row #Archivio?

thank you in advance

Re: search for previous item

Posted: 29 Jan 2022 05:52
by aGerman
You have to compare line numbers (notice option /n).

Code: Select all

@echo off &setlocal
set "file=filearchive.txt"
set "search=doc2.pdf"

:: find the line number of the line that ends with <space>%search%
set "row="
for /f "delims=:" %%i in ('findstr /nec:" %search%" "%file%"') do set "row=%%i"
if not defined row exit /b

:: find the last line that begins with # and has a line number less than the previously found line number
for /f "tokens=1,2* delims=: " %%i in ('findstr /nbc:"#" "%file%"') do if %%i lss %row% (set "archive=%%k") else goto escape
:escape

echo "%search%" found in "%archive%"
pause
Steffen

Re: search for previous item

Posted: 29 Jan 2022 11:47
by Squashman
Another option. Probably not as fast as aGerman's solution. Don't have a big file to test with us.

Code: Select all

echo off &setlocal
set "file=filearchive.txt"
set "search=doc3.pdf"

for /f "usebackq tokens=1,* delims= " %%G in ("%file%") do (
	if /I "%%G"=="#Archivio" (
		set "archive=%%H"
	) ELSE (
		FOR /F "tokens=1,2,* delims= " %%I in ("%%H") DO (
			IF /I "%%K"=="%search%" GOTO FOR_DONE
		)
	)
)
:FOR_DONE
Echo Archive is: %archive%

Re: search for previous item

Posted: 29 Jan 2022 11:59
by darioit
Many thanks Steffen, I modify first search "findstr /nec:" in "findstr /nc:" to search also a part of document
Just a little problem, if document is present in more zip file works only for last record find

@Squashman slowly but can't find a zip match, sorry

Regards

Re: search for previous item

Posted: 29 Jan 2022 12:05
by Squashman
darioit wrote:
29 Jan 2022 11:59
@Squashman slowly but can't find a zip match, sorry
I will disagree. It finds the matches based on your provided data. If it did not find a match then your provided data is not what you gave as an example.

Re: search for previous item

Posted: 29 Jan 2022 12:23
by darioit
sorry you right works, in my example works fine, but I try in real word and it doesn't work

Re: search for previous item

Posted: 29 Jan 2022 12:26
by darioit
this is exactly real word

2015-01-03 22:31 4490416 4490416 d:\zipfile1.7z
# Archivio d:\zipfile1.7z
2015-01-01 22:51 56091 56369 file1.pdf
2015-01-01 22:51 14975 14479 file2.pdf
# Fine dell'archivio
2015-01-03 22:31 4490416 4490416 d:\zipfile2.7z
# Archivio d:\zipfile2.7z
2015-01-02 23:21 14576321 14773827 file3.pdf
2015-01-02 23:21 41092 40119 file4.pdf
etc....

Re: search for previous item

Posted: 29 Jan 2022 13:09
by Squashman
darioit wrote:
29 Jan 2022 12:26
this is exactly real word

2015-01-03 22:31 4490416 4490416 d:\zipfile1.7z
# Archivio d:\zipfile1.7z
2015-01-01 22:51 56091 56369 file1.pdf
2015-01-01 22:51 14975 14479 file2.pdf
# Fine dell'archivio
2015-01-03 22:31 4490416 4490416 d:\zipfile2.7z
# Archivio d:\zipfile2.7z
2015-01-02 23:21 14576321 14773827 file3.pdf
2015-01-02 23:21 41092 40119 file4.pdf
etc....
And you should see exactly why my code does not work with that. You have been around long enough to know how the FOR /F command works with TOKENS.
:x :x

Re: search for previous item

Posted: 29 Jan 2022 13:30
by Squashman
darioit wrote:
29 Jan 2022 12:26
this is exactly real word

2015-01-03 22:31 4490416 4490416 d:\zipfile1.7z
# Archivio d:\zipfile1.7z
2015-01-01 22:51 56091 56369 file1.pdf
2015-01-01 22:51 14975 14479 file2.pdf
# Fine dell'archivio
2015-01-03 22:31 4490416 4490416 d:\zipfile2.7z
# Archivio d:\zipfile2.7z
2015-01-02 23:21 14576321 14773827 file3.pdf
2015-01-02 23:21 41092 40119 file4.pdf
etc....
aGerman's code doesn't technically provide the intended output either because again the real world example does not reflect the correct amount of TOKENS in the line because your actual example has a space after the HASH symbol!

So if I search for file2.pdf with aGerman's code it will output Archivio d:\zipfile1.7z. You clearly said you just wanted the file path only.

Re: search for previous item

Posted: 29 Jan 2022 14:06
by Squashman
New code.

Code: Select all

@echo off &setlocal
set "file=filearchive2.txt"
set "search=doc190049.pdf"

for /f "usebackq tokens=1,2,* delims= " %%G in (`findstr /IC:"#" /IC:"%search%" "%file%"`) do (
	if /I "%%G"=="#" (
		set "archive=%%I"
	) ELSE (
		FOR /F "tokens=1,2,* delims= " %%J in ("%%I") DO (
			IF /I "%%L"=="%search%" GOTO FOR_DONE
		)
	)
)
:FOR_DONE
Echo Archive is: %archive%
I created a file with 90,000 archives and 10 files in each archive. So the file has 990,000 rows.
I searched for a file that was on row 99,045

Here are the timings using my code and aGerman's code.

Code: Select all

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Squashman.bat
Archive is: E:\19004.zip

TimeThis :  Command Line :  Squashman.bat
TimeThis :    Start Time :  Sat Jan 29 13:57:03 2022
TimeThis :      End Time :  Sat Jan 29 13:57:31 2022
TimeThis :  Elapsed Time :  00:00:28.397

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE aGerman.bat
Archive is: Archivio E:\19004.zip

TimeThis :  Command Line :  aGerman.bat
TimeThis :    Start Time :  Sat Jan 29 13:57:54 2022
TimeThis :      End Time :  Sat Jan 29 13:58:30 2022
TimeThis :  Elapsed Time :  00:00:36.192
Here is the batch file I used to create the test file.

Code: Select all

@echo off
(FOR /L %%G IN (10000,1,99999) DO (
	ECHO # Archivio E:\%%G.zip
	FOR /L %%H IN (0,1,9) DO (
		ECHO 2015-01-01 22:51 56091 56369 doc%%G%%H.pdf
	)
)
)>FileArchive2.txt

Re: search for previous item

Posted: 29 Jan 2022 14:34
by darioit
good job, but if I need to search only a parts such "19004"?

thank you in advance

Re: search for previous item

Posted: 29 Jan 2022 20:46
by Aacini
May I enter the timing test contest? 8)

Code: Select all

@echo off
setlocal

set "file=test.txt"
set "search=file4.pdf"

for /F "tokens=1,2*" %%a in ('findstr "^# %search%" "%file%"') do (
   if "%%a" equ "#" (
      set "archive=%%c"
   ) else (
      goto break
   )
)
:break
echo %archive%
darioit wrote:
29 Jan 2022 14:34
good job, but if I need to search only a parts such "19004"?

thank you in advance
You may search for any part you want as long as the part uniquely identify the file you want. If the part you specify may match two or more files, then just the first one is returned...

Antonio

Re: search for previous item

Posted: 30 Jan 2022 12:41
by Squashman
Aacini wrote:
29 Jan 2022 20:46
May I enter the timing test contest? 8)
find the file at row 99045

Code: Select all

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Antonio.bat
E:\19004.zip

TimeThis :  Command Line :  Antonio.bat
TimeThis :    Start Time :  Sun Jan 30 12:39:45 2022
TimeThis :      End Time :  Sun Jan 30 12:40:14 2022
TimeThis :  Elapsed Time :  00:00:28.519
And now I realize that I over programmed my code!

Re: search for previous item

Posted: 30 Jan 2022 16:46
by Aacini
Squashman wrote:
29 Jan 2022 14:06

Here are the timings using my code and aGerman's code.

Code: Select all

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Squashman.bat
Archive is: E:\19004.zip

TimeThis :  Command Line :  Squashman.bat
TimeThis :    Start Time :  Sat Jan 29 13:57:03 2022
TimeThis :      End Time :  Sat Jan 29 13:57:31 2022
TimeThis :  Elapsed Time :  00:00:28.397

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE aGerman.bat
Archive is: Archivio E:\19004.zip

TimeThis :  Command Line :  aGerman.bat
TimeThis :    Start Time :  Sat Jan 29 13:57:54 2022
TimeThis :      End Time :  Sat Jan 29 13:58:30 2022
TimeThis :  Elapsed Time :  00:00:36.192
Squashman wrote:
30 Jan 2022 12:41
Aacini wrote:
29 Jan 2022 20:46
May I enter the timing test contest? 8)
find the file at row 99045

Code: Select all

C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Antonio.bat
E:\19004.zip

TimeThis :  Command Line :  Antonio.bat
TimeThis :    Start Time :  Sun Jan 30 12:39:45 2022
TimeThis :      End Time :  Sun Jan 30 12:40:14 2022
TimeThis :  Elapsed Time :  00:00:28.519
And now I realize that I over programmed my code!
I was expecting a bigger decrease in execution time... :(





I was looking for a different method that allows a faster execution. I used the "Searching across line breaks" (undocumented) feature of findstr command. Here it is:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "file=test.txt"
set "search=file4.pdf"

for /F %%a in ('copy /Z "%~F0" NUL') do set ^"CRLF=%%a^
%empty line%
^"

set "searchCRLF=#.*%search%"

:nextSearch
for %%a in ("!CRLF!") do set "searchCRLF=!searchCRLF:#=#.*%%~a!"
findstr "!searchCRLF!" "%file%" > result.tmp
if errorlevel 1 goto nextSearch
for /F "tokens=3" %%a in (result.tmp) do echo %%a
I am pretty sure that this method will be faster (perhaps much faster) than the previous one when the search file is the first one in the archive. However, the method will be every time slower as the search file be placed in posterior lines in the archive...

I wonder after how many lines the timing of this method will be similar than the previous one.

Antonio

Re: search for previous item

Posted: 30 Jan 2022 19:07
by Eureka!
Probably not the fastest one (didn't check), but for a different perspective.
It *does* return all archives that contain the %search% pdf.

Code: Select all

@echo off

set "file=filearchive2.txt"
set "search=doc190049.pdf"


	findstr.exe /n /I /C:"#" /I /C:"%search%"  "%file%" | sort.exe /r /o temp1.txt

	for /f "usebackq tokens=1 delims=:" %%I in (`findstr.exe /n /i /c:"%search%" temp1.txt`) do call :GetArchive %%I

	del temp1.txt
goto :EOF


:GetArchive
	for /f "usebackq  skip=%1  tokens=2 delims=#" %%X in (temp1.txt) do (
		echo %%X
		goto :EOF
	)	

Should be possible to skip creating a temp-file, but I expected to run into memory constraints