JREPL - Combine Data From Two Files By Matching Strings

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
houstontech
Posts: 3
Joined: 25 Jun 2020 10:38

JREPL - Combine Data From Two Files By Matching Strings

#1 Post by houstontech » 25 Jun 2020 10:48

Hi,

I've used JREPL for a few different formatting tasks but haven't been able to figure out how to do the following:

File 1:
3498=ABC Company

File 2:
3498-0112=General Expenses

FinalOutput:
3498-0112=General Expenses=ABC Company

Can JREPL be used to check data from both files or do I need to get them in the same file first and then try pattern matching?

Thanks!

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: JREPL - Combine Data From Two Files By Matching Strings

#2 Post by Aacini » 26 Jun 2020 14:01

Your question is pretty incomplete; there are a lot of details that are not well defined. However, you don't need JREPL to perform this simple sustitution:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem Load first-token values from File 1:
for /F "tokens=1* delims==" %%a in (File1.txt) do set "company[%%a]=%%b"

rem Show File 2 with value replacements
for /F "tokens=1* delims=-" %%a in (File2.txt) do echo %%a-%%b=!company[%%a]!
Antonio

PS - As far as I know, JREPL can not be used to solve this problem...

houstontech
Posts: 3
Joined: 25 Jun 2020 10:38

Re: JREPL - Combine Data From Two Files By Matching Strings

#3 Post by houstontech » 26 Jun 2020 15:32

Thanks, Antonio. I'll provide more information. The files will actually contain thousands of lines each:
File 1:
3498=ABC Company
73=First Hospital
78=Best Organization

File 2:
3498-0112=General Expenses
73-0001=Bills
73-0292=Documents
78-0003=Human Resources

FinalOutput:
3498-0112=General Expenses=ABC Company
73-0001=Bills=First Hospital
73-0292=Documents=First Hospital
78-0003=Human Resources=Best Organization

I've come up with the following script:

Code: Select all

@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
( 

  for /f "tokens=1-2 delims==" %%A in (File2.txt) do (
	set clientraw=%%~A
	set client=!clientraw:~0,-5!
	for /f "tokens=1-2 delims==" %%i in (File1.txt) do (
       	      findstr ^!client!  File1.txt >nul 2>&1 && (
           if !client!==%%i (Echo:%%~A=%%~B=%%~j)  

          )
	 ) 
  
  )

) >FinalOutput.txt
The code above works but takes FOREVER to run. Any help would be appreciated.

Thanks,

James

houstontech
Posts: 3
Joined: 25 Jun 2020 10:38

Re: JREPL - Combine Data From Two Files By Matching Strings

#4 Post by houstontech » 26 Jun 2020 15:49

Another update. I did just speed it up by removing the findstr line as this was serving no real purpose ;P:

Code: Select all

@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
(
  for /f "tokens=1-2 delims==" %%A in (File2.txt) do (
	set clientraw=%%~A
	set client=!clientraw:~0,-5!
	for /f "tokens=1-2 delims==" %%i in (File1.txt) do (
       	       if !client!==%%i (Echo:%%~A=%%~B=%%~j)  
          
	 ) 
  
  )

) >FinalOutput.txt
Do you see any other areas where I can improve the speed of execution?

Thanks,

James

siberia-man
Posts: 208
Joined: 26 Dec 2013 09:28
Contact:

Re: JREPL - Combine Data From Two Files By Matching Strings

#5 Post by siberia-man » 27 Jun 2020 08:30

Let's consider

FILE1 is as follows:

Code: Select all

3498=ABC Company
73=First Hospital
78=Best Organization
and FILE2 is:

Code: Select all

3498-0112=General Expenses
73-0001=Bills
73-0292=Documents
78-0003=Human Resources
and the js-script combining both files in the mode you want is called FILECOMBINER.js:

Code: Select all

var STDIN = WScript.StdIn;
var STDOUT = WScript.StdOut;

var line;
var m;
var match_n;
var match_s;

while ( ! STDIN.AtEndOfStream ) {
	line = STDIN.ReadLine();

	m = line.match(/^(\d+)=(.*)/);
	if ( m ) {
		match_n = m[1];
		match_s = m[2];
		continue;
	}

	m = line.match(/^(\d+)-\d+=.*/);
	if ( m && m[1] == match_n ) {
		STDOUT.WriteLine(line + '=' + match_s);
	}
}
Now we can produce the expected result as follows:

Code: Select all

( type FILE1 & type FILE2 ) | sort /r | cscript //nologo FILECOMBINER.js
No performance testing on my side. I leave it for the topic starter :)

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: JREPL - Combine Data From Two Files By Matching Strings

#6 Post by Aacini » 27 Jun 2020 08:46

houstontech wrote:
26 Jun 2020 15:32
Thanks, Antonio. I'll provide more information. The files will actually contain thousands of lines each:
File 1:
3498=ABC Company
73=First Hospital
78=Best Organization

File 2:
3498-0112=General Expenses
73-0001=Bills
73-0292=Documents
78-0003=Human Resources

FinalOutput:
3498-0112=General Expenses=ABC Company
73-0001=Bills=First Hospital
73-0292=Documents=First Hospital
78-0003=Human Resources=Best Organization

I've come up with the following script:

Code: Select all

@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
( 

  for /f "tokens=1-2 delims==" %%A in (File2.txt) do (
	set clientraw=%%~A
	set client=!clientraw:~0,-5!
	for /f "tokens=1-2 delims==" %%i in (File1.txt) do (
       	      findstr ^!client!  File1.txt >nul 2>&1 && (
           if !client!==%%i (Echo:%%~A=%%~B=%%~j)  

          )
	 ) 
  
  )

) >FinalOutput.txt
The code above works but takes FOREVER to run. Any help would be appreciated.

Thanks,

James
How many lines are "thousands of lines"? What is the size of File 1 in bytes?

Did you tested my code? I am pretty sure that it will run much faster than yours!!! (Unless File 1 is several megabytes size). Tip to speed up my code: change "company[%%a]" by "c%%a"


Can both files be sorted? If so, the file merge solution below could be faster, unless that the time involved in the sort process be very large:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem Sort both files
sort File1.txt File1Sorted.txt
sort File2.txt File2Sorted.txt

set "companyKey="

rem File 1 will be read from redirected input
< File1Sorted.txt (

   rem File 2 will be read via FOR /F
   for /F "tokens=1* delims=-" %%a in (File2Sorted.txt) do (
      if "%%a" neq "!companyKey!" call :seekCompany "%%a"
      echo %%a-%%b=!companyName!
   )

) > output.txt
goto :EOF


:seekCompany key

rem Read next line from File 1
set /P "line1="
for /F "tokens=1* delims==" %%a in ("%line1%") do (
   if "%%a" neq "%~1" goto seekCompany
   set "companyKey=%%a"
   set "companyName=%%b"
)
exit /B
This method will fail if there is not a matching company in File1 for a record in File2, but you may insert additional code for this case. Of course, if the files are sorted already (you did NOT specified this point), just remove the two sort commands.

Please, report the timing of these two methods.

Antonio

Post Reply