search for previous item
Moderator: DosItHelp
search for previous item
Hello I got this problem:
a very huge txt list of content of zip file created using winrar
file: filearchive.txt contain:
#Archivio E:\pippo.zip
2021-01-01 01:01 12345 doc1.pdf
2021-01-02 01:02 23451 doc2.pdf
#Archivio E:\pluto.zip
2021-01-03 01:01 34556 doc3.pdf
2021-01-04 01:02 78123 doc4.pdf
etc.
Using "find" or "findstr" is easy search for document "doc2.pdf" for example, but how can I get a name of zip "E:\pippo.zip" at row #Archivio?
thank you in advance
a very huge txt list of content of zip file created using winrar
file: filearchive.txt contain:
#Archivio E:\pippo.zip
2021-01-01 01:01 12345 doc1.pdf
2021-01-02 01:02 23451 doc2.pdf
#Archivio E:\pluto.zip
2021-01-03 01:01 34556 doc3.pdf
2021-01-04 01:02 78123 doc4.pdf
etc.
Using "find" or "findstr" is easy search for document "doc2.pdf" for example, but how can I get a name of zip "E:\pippo.zip" at row #Archivio?
thank you in advance
Re: search for previous item
You have to compare line numbers (notice option /n).
Steffen
Code: Select all
@echo off &setlocal
set "file=filearchive.txt"
set "search=doc2.pdf"
:: find the line number of the line that ends with <space>%search%
set "row="
for /f "delims=:" %%i in ('findstr /nec:" %search%" "%file%"') do set "row=%%i"
if not defined row exit /b
:: find the last line that begins with # and has a line number less than the previously found line number
for /f "tokens=1,2* delims=: " %%i in ('findstr /nbc:"#" "%file%"') do if %%i lss %row% (set "archive=%%k") else goto escape
:escape
echo "%search%" found in "%archive%"
pause
Re: search for previous item
Another option. Probably not as fast as aGerman's solution. Don't have a big file to test with us.
Code: Select all
echo off &setlocal
set "file=filearchive.txt"
set "search=doc3.pdf"
for /f "usebackq tokens=1,* delims= " %%G in ("%file%") do (
if /I "%%G"=="#Archivio" (
set "archive=%%H"
) ELSE (
FOR /F "tokens=1,2,* delims= " %%I in ("%%H") DO (
IF /I "%%K"=="%search%" GOTO FOR_DONE
)
)
)
:FOR_DONE
Echo Archive is: %archive%
Re: search for previous item
Many thanks Steffen, I modify first search "findstr /nec:" in "findstr /nc:" to search also a part of document
Just a little problem, if document is present in more zip file works only for last record find
@Squashman slowly but can't find a zip match, sorry
Regards
Just a little problem, if document is present in more zip file works only for last record find
@Squashman slowly but can't find a zip match, sorry
Regards
Re: search for previous item
sorry you right works, in my example works fine, but I try in real word and it doesn't work
Last edited by darioit on 29 Jan 2022 12:30, edited 1 time in total.
Re: search for previous item
this is exactly real word
2015-01-03 22:31 4490416 4490416 d:\zipfile1.7z
# Archivio d:\zipfile1.7z
2015-01-01 22:51 56091 56369 file1.pdf
2015-01-01 22:51 14975 14479 file2.pdf
# Fine dell'archivio
2015-01-03 22:31 4490416 4490416 d:\zipfile2.7z
# Archivio d:\zipfile2.7z
2015-01-02 23:21 14576321 14773827 file3.pdf
2015-01-02 23:21 41092 40119 file4.pdf
etc....
2015-01-03 22:31 4490416 4490416 d:\zipfile1.7z
# Archivio d:\zipfile1.7z
2015-01-01 22:51 56091 56369 file1.pdf
2015-01-01 22:51 14975 14479 file2.pdf
# Fine dell'archivio
2015-01-03 22:31 4490416 4490416 d:\zipfile2.7z
# Archivio d:\zipfile2.7z
2015-01-02 23:21 14576321 14773827 file3.pdf
2015-01-02 23:21 41092 40119 file4.pdf
etc....
Re: search for previous item
And you should see exactly why my code does not work with that. You have been around long enough to know how the FOR /F command works with TOKENS.darioit wrote: ↑29 Jan 2022 12:26this is exactly real word
2015-01-03 22:31 4490416 4490416 d:\zipfile1.7z
# Archivio d:\zipfile1.7z
2015-01-01 22:51 56091 56369 file1.pdf
2015-01-01 22:51 14975 14479 file2.pdf
# Fine dell'archivio
2015-01-03 22:31 4490416 4490416 d:\zipfile2.7z
# Archivio d:\zipfile2.7z
2015-01-02 23:21 14576321 14773827 file3.pdf
2015-01-02 23:21 41092 40119 file4.pdf
etc....
Re: search for previous item
aGerman's code doesn't technically provide the intended output either because again the real world example does not reflect the correct amount of TOKENS in the line because your actual example has a space after the HASH symbol!darioit wrote: ↑29 Jan 2022 12:26this is exactly real word
2015-01-03 22:31 4490416 4490416 d:\zipfile1.7z
# Archivio d:\zipfile1.7z
2015-01-01 22:51 56091 56369 file1.pdf
2015-01-01 22:51 14975 14479 file2.pdf
# Fine dell'archivio
2015-01-03 22:31 4490416 4490416 d:\zipfile2.7z
# Archivio d:\zipfile2.7z
2015-01-02 23:21 14576321 14773827 file3.pdf
2015-01-02 23:21 41092 40119 file4.pdf
etc....
So if I search for file2.pdf with aGerman's code it will output Archivio d:\zipfile1.7z. You clearly said you just wanted the file path only.
Re: search for previous item
New code.
I created a file with 90,000 archives and 10 files in each archive. So the file has 990,000 rows.
I searched for a file that was on row 99,045
Here are the timings using my code and aGerman's code.
Here is the batch file I used to create the test file.
Code: Select all
@echo off &setlocal
set "file=filearchive2.txt"
set "search=doc190049.pdf"
for /f "usebackq tokens=1,2,* delims= " %%G in (`findstr /IC:"#" /IC:"%search%" "%file%"`) do (
if /I "%%G"=="#" (
set "archive=%%I"
) ELSE (
FOR /F "tokens=1,2,* delims= " %%J in ("%%I") DO (
IF /I "%%L"=="%search%" GOTO FOR_DONE
)
)
)
:FOR_DONE
Echo Archive is: %archive%
I searched for a file that was on row 99,045
Here are the timings using my code and aGerman's code.
Code: Select all
C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Squashman.bat
Archive is: E:\19004.zip
TimeThis : Command Line : Squashman.bat
TimeThis : Start Time : Sat Jan 29 13:57:03 2022
TimeThis : End Time : Sat Jan 29 13:57:31 2022
TimeThis : Elapsed Time : 00:00:28.397
C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE aGerman.bat
Archive is: Archivio E:\19004.zip
TimeThis : Command Line : aGerman.bat
TimeThis : Start Time : Sat Jan 29 13:57:54 2022
TimeThis : End Time : Sat Jan 29 13:58:30 2022
TimeThis : Elapsed Time : 00:00:36.192
Code: Select all
@echo off
(FOR /L %%G IN (10000,1,99999) DO (
ECHO # Archivio E:\%%G.zip
FOR /L %%H IN (0,1,9) DO (
ECHO 2015-01-01 22:51 56091 56369 doc%%G%%H.pdf
)
)
)>FileArchive2.txt
Re: search for previous item
good job, but if I need to search only a parts such "19004"?
thank you in advance
thank you in advance
Re: search for previous item
May I enter the timing test contest?
Antonio
Code: Select all
@echo off
setlocal
set "file=test.txt"
set "search=file4.pdf"
for /F "tokens=1,2*" %%a in ('findstr "^# %search%" "%file%"') do (
if "%%a" equ "#" (
set "archive=%%c"
) else (
goto break
)
)
:break
echo %archive%
You may search for any part you want as long as the part uniquely identify the file you want. If the part you specify may match two or more files, then just the first one is returned...
Antonio
Re: search for previous item
find the file at row 99045
Code: Select all
C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Antonio.bat
E:\19004.zip
TimeThis : Command Line : Antonio.bat
TimeThis : Start Time : Sun Jan 30 12:39:45 2022
TimeThis : End Time : Sun Jan 30 12:40:14 2022
TimeThis : Elapsed Time : 00:00:28.519
Re: search for previous item
Squashman wrote: ↑29 Jan 2022 14:06
Here are the timings using my code and aGerman's code.Code: Select all
C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Squashman.bat Archive is: E:\19004.zip TimeThis : Command Line : Squashman.bat TimeThis : Start Time : Sat Jan 29 13:57:03 2022 TimeThis : End Time : Sat Jan 29 13:57:31 2022 TimeThis : Elapsed Time : 00:00:28.397 C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE aGerman.bat Archive is: Archivio E:\19004.zip TimeThis : Command Line : aGerman.bat TimeThis : Start Time : Sat Jan 29 13:57:54 2022 TimeThis : End Time : Sat Jan 29 13:58:30 2022 TimeThis : Elapsed Time : 00:00:36.192
I was expecting a bigger decrease in execution time...Squashman wrote: ↑30 Jan 2022 12:41find the file at row 99045
And now I realize that I over programmed my code!Code: Select all
C:\BatchFiles\DOSTIPS\10362>TIMETHIS.EXE Antonio.bat E:\19004.zip TimeThis : Command Line : Antonio.bat TimeThis : Start Time : Sun Jan 30 12:39:45 2022 TimeThis : End Time : Sun Jan 30 12:40:14 2022 TimeThis : Elapsed Time : 00:00:28.519
I was looking for a different method that allows a faster execution. I used the "Searching across line breaks" (undocumented) feature of findstr command. Here it is:
Code: Select all
@echo off
setlocal EnableDelayedExpansion
set "file=test.txt"
set "search=file4.pdf"
for /F %%a in ('copy /Z "%~F0" NUL') do set ^"CRLF=%%a^
%empty line%
^"
set "searchCRLF=#.*%search%"
:nextSearch
for %%a in ("!CRLF!") do set "searchCRLF=!searchCRLF:#=#.*%%~a!"
findstr "!searchCRLF!" "%file%" > result.tmp
if errorlevel 1 goto nextSearch
for /F "tokens=3" %%a in (result.tmp) do echo %%a
I wonder after how many lines the timing of this method will be similar than the previous one.
Antonio
Re: search for previous item
Probably not the fastest one (didn't check), but for a different perspective.
It *does* return all archives that contain the %search% pdf.
Should be possible to skip creating a temp-file, but I expected to run into memory constraints
It *does* return all archives that contain the %search% pdf.
Code: Select all
@echo off
set "file=filearchive2.txt"
set "search=doc190049.pdf"
findstr.exe /n /I /C:"#" /I /C:"%search%" "%file%" | sort.exe /r /o temp1.txt
for /f "usebackq tokens=1 delims=:" %%I in (`findstr.exe /n /i /c:"%search%" temp1.txt`) do call :GetArchive %%I
del temp1.txt
goto :EOF
:GetArchive
for /f "usebackq skip=%1 tokens=2 delims=#" %%X in (temp1.txt) do (
echo %%X
goto :EOF
)