File concatenation having encoding issues
Posted: 26 Oct 2011 09:32
we are growing our development staff and decided to automate a few processes that were manual and the quick win, at least what I thought would be a quick win, was to automate the concatenation of all out .sql files to have 1 runnable file for every build. The old/current process is manually opening newly checked in stored proc's in notepad and copy/paste the file into the larger script for the builds, but that doesn't help when there are new database rebuilds or multiple branches of code.
Well, as simple as this sounds we are having issues getting the files to concatenate properly because of the extra characters put in on different types of encoding the original files were created with, spesifically the UTF-8 files.
For example we are seeing ∩╗┐sometimes ÿþ and sometimes ■ depenting on how we are trying to concatenate these files. Also we already went into our editor and changed the encoding of these utf-8 files to be Windows 1252, but I guess the characters at the top are still there because it made no difference.
The things we have tried on concatenating these files together are simple scripts like:
for %%f in (*.sql) do type %%f>>ALL_SCRIPTS.sql
or
copy *.sql all.sql
or
chcp 1252
CMD /A /c TYPE *.sql >> all.sql
I have even tried copying and typing them unicode to ascii and binary and back to see if it removed the characters and it doesn't, I even tried the BatchSubstitute I got from this site to replace the characters and that didn't work either. it almost seems like although the chareaters are visible in the new file when I try and run anything against them they are unseen.
Now I could go back and remove all the UTF-8 stored proc's then recreate them with windows 1252 encoding, but that won't stop a new developer or someone in the future to make sure the encoding is correct before checking it in.
Well, as simple as this sounds we are having issues getting the files to concatenate properly because of the extra characters put in on different types of encoding the original files were created with, spesifically the UTF-8 files.
For example we are seeing ∩╗┐sometimes ÿþ and sometimes ■ depenting on how we are trying to concatenate these files. Also we already went into our editor and changed the encoding of these utf-8 files to be Windows 1252, but I guess the characters at the top are still there because it made no difference.
The things we have tried on concatenating these files together are simple scripts like:
for %%f in (*.sql) do type %%f>>ALL_SCRIPTS.sql
or
copy *.sql all.sql
or
chcp 1252
CMD /A /c TYPE *.sql >> all.sql
I have even tried copying and typing them unicode to ascii and binary and back to see if it removed the characters and it doesn't, I even tried the BatchSubstitute I got from this site to replace the characters and that didn't work either. it almost seems like although the chareaters are visible in the new file when I try and run anything against them they are unseen.
Now I could go back and remove all the UTF-8 stored proc's then recreate them with windows 1252 encoding, but that won't stop a new developer or someone in the future to make sure the encoding is correct before checking it in.