Page 1 of 1
How to determine the number of difference
Posted: 03 Oct 2011 03:22
by linconnue55
Hey
I need a batch script which compare two text files and determine the number of different ligne.
For exemple: file1.txt contains
aaaaaa
bbbbbb
cccccc
And file2.txt contains
aaaaaa
eeeeee
bbbbbb
cccccc
the result of the script must display the number of differences ::-> 1
Thanks for advance.
Re: How to determine the number of difference
Posted: 03 Oct 2011 05:36
by Ed Dyreen
'
It's easy, I use 'fc' command, there is also a 'comp' command although I don't really understand the difference:
Re: How to determine the number of difference
Posted: 03 Oct 2011 07:20
by linconnue55
i know that is the purpose of comp or fc but i need the number of ligne
like the exemple that i mention previously

Re: How to determine the number of difference
Posted: 03 Oct 2011 08:02
by dbenham
That simple question is devilishly difficult to answer.
How many lines are different in the below example?
A
B
C
---------
B
C
A
Obviously the lines are all identical, but simply in a different order. Depending on your requirements, the answer could be 0, 1, 2 or 3.
I don't think you will find a ready built solution. If you really want to do this you will have to precisely define your rules and build it yourself. I doubt you want to attempt this with batch.
Dave Benham
Re: How to determine the number of difference
Posted: 03 Oct 2011 08:26
by linconnue55
Thanks ,
the idea is to add each line from file1.txt to a variable then browse to find this variable in file2.txt
How done
Re: How to determine the number of difference
Posted: 03 Oct 2011 09:10
by dbenham
So what would you expect the result to be in my A|B|C example?
Re: How to determine the number of difference
Posted: 03 Oct 2011 13:56
by linconnue55
I came up with a solution:
here is the script:
Code: Select all
for /F "tokens=*" %%a in ('type file1.txt') do ( find /c "%%a" file2.txt )
Its role is to verify the existence of each line of a first file in a second. it shows the occurrence of each line of file1 into file2.
I still just enter this code:
its role is to increment "compt" every time it does not find a line from file1 in file2
But I can not introduce it into DO
I need an idea please
Re: How to determine the number of difference
Posted: 03 Oct 2011 16:14
by linconnue55
Finally I have the solution :
Code: Select all
@echo off
for /F "tokens=*" %%a in ('type file1.txt') do (
find /c "%%a" file2.txt
if errorlevel 1 set /a Compt+=1
)
for /F "tokens=*" %%b in ('type file2.txt') do (
find /c "%%b" file1.txt
if errorlevel 1 set /a Compt+=1
)
echo le nombre de différence est %Compt%
PAUSE
Enjoy IT !!!

Re: How to determine the number of difference
Posted: 03 Oct 2011 16:48
by dbenham
Good - you figured out how to incorporate the failure test. There is a simpler way to do the same thing using the
cmd && (success commands go here) || (failure commands go here) construct. You only need the failure portion, plus you don't care about the output so you can redirect it to nul, plus you don't need the /C option:
Code: Select all
for /F "tokens=*" %%a in ('type file1.txt') do (
find "%%a" file2.txt >nul || set /a Compt+=1
)
You still have problems.
1) Your code searches to see if a file1 line is anywhere within a line in file2. This means file1 line could be a substring of file2 line. But you want an exact match. This can easily be solved by switching to FINDSTR with /B /E /C:"%%a" options.
2) Your FOR loop currently skips blank lines as well as lines that begin with ; (implicit FOR "EOL=;" option). There are ways to get around this but its not worth it until you solve more critical problems.
3) What if identical lines appear multiple times within either or both files. The number of appearances should be the same for both, yes?
4) Does line order really not matter? These two files seem very different to me, yet your algorithm will treat them as having no differences:
File 1 Name=George
Age=32
Name=Fred
Age=21
File 2Name=George
Age=21
Name=Fred
Age=32
5) Even if you solve the above issues - you still have a nasty judgement call to make:
Example 1File 1Name=George
Age=32
File 2Name=George
Age=15
Most people would say there is one difference between the files.
Example 2File 1Name=George
Age=32
File 2Name=George
Hobby=soccer
I think most people would expect the difference count to be two in this case. (file 1 is missing hobby, file 2 is missing age)
What algorithm will give the "correct" answer to both Example 1 and Example 2? This kind of problem is difficult enough, but to try to tackle it using batch programming seems like a bad idea.
Dave Benham
Re: How to determine the number of difference
Posted: 03 Oct 2011 17:35
by aGerman
Well Dave your first example shows 4 different lines because that can't be the same George in both files
I guess the problem can be solved using batch as well as using each other language but the rules have to be clearly defined. Without these rules we're groping in the dark.
Regards
aGerman