File concatenation having encoding issues

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Johnpsz
Posts: 1
Joined: 25 Oct 2011 17:19

File concatenation having encoding issues

#1 Post by Johnpsz » 26 Oct 2011 09:32

we are growing our development staff and decided to automate a few processes that were manual and the quick win, at least what I thought would be a quick win, was to automate the concatenation of all out .sql files to have 1 runnable file for every build. The old/current process is manually opening newly checked in stored proc's in notepad and copy/paste the file into the larger script for the builds, but that doesn't help when there are new database rebuilds or multiple branches of code.

Well, as simple as this sounds we are having issues getting the files to concatenate properly because of the extra characters put in on different types of encoding the original files were created with, spesifically the UTF-8 files.

For example we are seeing ∩╗┐sometimes ÿþ and sometimes  ■ depenting on how we are trying to concatenate these files. Also we already went into our editor and changed the encoding of these utf-8 files to be Windows 1252, but I guess the characters at the top are still there because it made no difference.

The things we have tried on concatenating these files together are simple scripts like:

for %%f in (*.sql) do type %%f>>ALL_SCRIPTS.sql

or

copy *.sql all.sql

or

chcp 1252
CMD /A /c TYPE *.sql >> all.sql

I have even tried copying and typing them unicode to ascii and binary and back to see if it removed the characters and it doesn't, I even tried the BatchSubstitute I got from this site to replace the characters and that didn't work either. it almost seems like although the chareaters are visible in the new file when I try and run anything against them they are unseen.


Now I could go back and remove all the UTF-8 stored proc's then recreate them with windows 1252 encoding, but that won't stop a new developer or someone in the future to make sure the encoding is correct before checking it in.

alan_b
Expert
Posts: 357
Joined: 04 Oct 2008 09:49

Re: File concatenation having encoding issues

#2 Post by alan_b » 26 Oct 2011 15:45

Look at the very first reply to my topic at
viewtopic.php?f=3&t=2368

Your SQL needs may differ from mine
but the solution I received works perfectly on CSV files,
and should also be valid for SQL, excepting the text parsing has to split a line into 3 parts instead of two parts,
because the SQL index is preceded by
INSERT INTO "moz_downloads" VALUES(


I posted my final reply to my post before I saw your query.
You have answered one question, apparently SQL does deliver UTF-8.
Does it also deliver Little Endian or Big Endian Unicode ?

Refgards
Alan

Post Reply