Dynamically merging 2 files on header values

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
kushagra
Posts: 2
Joined: 27 Jun 2013 02:02

Dynamically merging 2 files on header values

#1 Post by kushagra » 27 Jun 2013 02:05

Hi All,

I have 2 files which i need to merge together based on the column names provided in the file. The first line in both files are header records.
The first file has fixed columns but second file can have subset of the columns from file 1

File 1:

Column1,column2,column3,column4,column5,column6
Abc,123,zyz,456,asd,098

File 2:

column2,column3,column6
656,wzw,898

Desired output:

Column1,column2,column3,column4,column5,column6
Abc,123,zyz,456,asd,098
,656,wzw,,,898


Please help on how to do this.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Dynamically merging 2 files on header values

#2 Post by dbenham » 27 Jun 2013 05:58

This is a nasty problem for batch, but it can be done fairly efficiently with some limitations.

- Line lengths must be <= 1021 (or there abouts, I'm going from memory)
- Line terminators must use Windows style (<CR><LF>), not Unix style (<LF>)
- Trailing control characters will be stripped from each line

If the above limitations do not meet your needs, then I seriously suggest using something other than batch, perhaps VBScript, JScript, or PowerShell. It is probably a good idea to use something else regardless :wink:

The code below makes the following assumptions:

1 - Every line in file1 has all values (or at least comma place holders). Missing commas could be added with a little extra work.
2 - File1 and File2 both exist, and both have an appropriate header record. Some error checking could be added if needed.
3 - No values contain a comma. CSV supports commas in quoted values, but this code does not support it. Support could be added, but it would slow the script significantly.
4 - The number of lines in file2 is <= the number in file1. Extra lines in file2 are ignored. Extra lines could easily be processed if assumption 1 is solved and a bit more code added

Code: Select all

@echo off
setlocal enableDelayedExpansion

:: Define file names
set "file1=file1.txt"
set "file2=file2.txt"
set "file3=file3.txt"

:: Define LF to contain a line feed character
set LF=^


:: Above two empty lines are critical - DO NOT REMOVE

:: Determine number of lines in file 1
for /f %%N in ('^<"%file1%" find /c /v ""') do set /a lines=%%N

:: Cleare existing $ variables
for /f "delims==" %%V in ('set $ 2^>nul') do set "%%V="

:: Define a line feed FOR variable and set up input and output streams
for %%L in ("!LF!") do 3<"%file2%" 0<"%file1%" 1>"%file3%" (

  %= Determine file 1 column names and position =%
  set /p "ln="
  set /a colCnt=0
  for /f "eol=, delims=" %%A in ("!ln:,=%%~L!") do (
    set /a colCnt+=1
    set "$f1Header_%%~A=!colCnt!"
  )

  %= Determine matching file 2 column names =%
  <&3 set /p "ln="
  set /a cnt=0
  for /f "eol=, delims=" %%A in ("!ln:,=%%~L!") do (
    set /a cnt+=1
    if defined $f1Header_%%~A set "$f2Header_!cnt!=!$f1Header_%%~A!"
  )

  %= process remaining lines from file1, and same number from file2 =%
  for /l %%N in (2 1 %lines%) do (

    %= Read and write next file1 line =%
    set "ln="
    set /p "ln="
    (echo(!ln!)

    %= Read next file2 line =%
    set "ln="
    <&3 set /p "ln="

    %= Clear column values =%
    for /l %%N in (1 1 !colCnt!) do set "$col%%N="

    %= Load column values =%
    set /a cnt=0
    if defined ln for /f "eol=, delims=" %%A in ("!ln:,=%%~L!") do (
      set /a cnt+=1
      for %%N in (!cnt!) do if defined $f2Header_%%N set "$col!$f2Header_%%N!=%%A"
    )

    %= Build and write file2 output line =%
    set "ln="
    for /l %%N in (1 1 !colCnt!) do set "ln=!ln!,!$col%%N!"
    (echo(!ln:~1!)
  )
)

type "%file3%"


Dave Benham

kushagra
Posts: 2
Joined: 27 Jun 2013 02:02

Re: Dynamically merging 2 files on header values

#3 Post by kushagra » 27 Jun 2013 23:02

dbenham wrote:This is a nasty problem for batch, but it can be done fairly efficiently with some limitations.

- Line lengths must be <= 1021 (or there abouts, I'm going from memory)
- Line terminators must use Windows style (<CR><LF>), not Unix style (<LF>)
- Trailing control characters will be stripped from each line

If the above limitations do not meet your needs, then I seriously suggest using something other than batch, perhaps VBScript, JScript, or PowerShell. It is probably a good idea to use something else regardless :wink:

The code below makes the following assumptions:

1 - Every line in file1 has all values (or at least comma place holders). Missing commas could be added with a little extra work.
2 - File1 and File2 both exist, and both have an appropriate header record. Some error checking could be added if needed.
3 - No values contain a comma. CSV supports commas in quoted values, but this code does not support it. Support could be added, but it would slow the script significantly.
4 - The number of lines in file2 is <= the number in file1. Extra lines in file2 are ignored. Extra lines could easily be processed if assumption 1 is solved and a bit more code added

Code: Select all

@echo off
setlocal enableDelayedExpansion

:: Define file names
set "file1=file1.txt"
set "file2=file2.txt"
set "file3=file3.txt"

:: Define LF to contain a line feed character
set LF=^


:: Above two empty lines are critical - DO NOT REMOVE

:: Determine number of lines in file 1
for /f %%N in ('^<"%file1%" find /c /v ""') do set /a lines=%%N

:: Cleare existing $ variables
for /f "delims==" %%V in ('set $ 2^>nul') do set "%%V="

:: Define a line feed FOR variable and set up input and output streams
for %%L in ("!LF!") do 3<"%file2%" 0<"%file1%" 1>"%file3%" (

  %= Determine file 1 column names and position =%
  set /p "ln="
  set /a colCnt=0
  for /f "eol=, delims=" %%A in ("!ln:,=%%~L!") do (
    set /a colCnt+=1
    set "$f1Header_%%~A=!colCnt!"
  )

  %= Determine matching file 2 column names =%
  <&3 set /p "ln="
  set /a cnt=0
  for /f "eol=, delims=" %%A in ("!ln:,=%%~L!") do (
    set /a cnt+=1
    if defined $f1Header_%%~A set "$f2Header_!cnt!=!$f1Header_%%~A!"
  )

  %= process remaining lines from file1, and same number from file2 =%
  for /l %%N in (2 1 %lines%) do (

    %= Read and write next file1 line =%
    set "ln="
    set /p "ln="
    (echo(!ln!)

    %= Read next file2 line =%
    set "ln="
    <&3 set /p "ln="

    %= Clear column values =%
    for /l %%N in (1 1 !colCnt!) do set "$col%%N="

    %= Load column values =%
    set /a cnt=0
    if defined ln for /f "eol=, delims=" %%A in ("!ln:,=%%~L!") do (
      set /a cnt+=1
      for %%N in (!cnt!) do if defined $f2Header_%%N set "$col!$f2Header_%%N!=%%A"
    )

    %= Build and write file2 output line =%
    set "ln="
    for /l %%N in (1 1 !colCnt!) do set "ln=!ln!,!$col%%N!"
    (echo(!ln:~1!)
  )
)

type "%file3%"


Dave Benham



Thanks Dave for your help. This code solves a lot of problem.
Regarding the assumptions.
1 - Every line in file1 has all values (or at least comma place holders). Missing commas could be added with a little extra work.
2 - File1 and File2 both exist, and both have an appropriate header record. Some error checking could be added if needed.
3 - No values contain a comma. CSV supports commas in quoted values, but this code does not support it. Support could be added, but it would slow the script significantly.
4 - The number of lines in file2 is <= the number in file1. Extra lines in file2 are ignored. Extra lines could easily be processed if assumption 1 is solved and a bit more code added


The points 1 and 2 hold true.
For point 3. there may be comma in the source value for description but that case will have the value in the Quote.
For point 4, the data in file2 > file1. File one will have just the header record whereas actual data will be in file2.

The scenario is that we have a fixed table structure in target in DB2. we use csv file as source that is generated by client. Requirement is that they can send us any subset of columns that are defined in target as source file.

To solve that we want to implement the above logic, where we have the main file with the target structure and compare that with source file and construct another file with all the missing columns.

Post Reply