DosTips.com

A Forum all about DOS Batch
It is currently 29 May 2017 09:03

All times are UTC-06:00




Post new topic  Reply to topic  [ 28 posts ]  Go to page Previous 1 2
Author Message
PostPosted: 09 Dec 2016 04:50 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
Please have a look at the list of code pages:
https://msdn.microsoft.com/en-us/library/dd317756.aspx
There are already code pages like 037, 500, 1026, 1047, 1140-1149. If you still have some EBCDIC data you may do some tests.

Steffen


Top
   
PostPosted: 09 Dec 2016 17:44 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
Rather by a fluke I found a serious bug that could have happened while reading UTF-8. Fixed with v1.3.2.

Steffen


Top
   
PostPosted: 28 Dec 2016 09:04 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
UTF-16 big endian is supported with version 1.4.0 (something that batch can't handle natively). Use code page ID 1201.
Also you can specify the source and destination files directly using options /i and /o. Of course redirections do still work.

Steffen


Top
   
PostPosted: 23 Jan 2017 09:05 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
I did a little code profiling on the weekend. Outcome is that threading of the conversion isn't as important as I expected. It makes more sense to separate reading and writing on the file system because these are slow processes. I changed the behavior in a way that writing is done in a parallel thread while the next chunk of data can be read. Surprisingly I got the best performance results if both converting and writing run together in one thread.
To cut it short: The performance increasement is insignificant but existing. Thus, I'd like to share it by version 1.4.1.

Steffen


Top
   
PostPosted: 26 Jan 2017 11:33 
Offline

Joined: 28 Jun 2010 03:46
Posts: 272
Thanks for the update.

Saso


Top
   
PostPosted: 29 Jan 2017 02:37 
Offline

Joined: 20 Aug 2010 13:57
Posts: 430
Location: Chile
Great tool. I reduced the executable size to 8Kb. Pm sent.


Top
   
PostPosted: 29 Jan 2017 03:30 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
Thank you Carlos!

I will definitely try some of the compiler options in order to reduce the size of the tool. Unfortunately the tool you sent me was immediately removed by Avira (free antivirus) :( There are some good reasons why my tool has a few extra KBs. I'll explain it via PM.

Steffen


Top
   
PostPosted: 29 Jan 2017 08:26 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
I managed to add carlos' size improvements. See comments of the DECREASE_SIZE_GCC macro in the source code. That way the size of the utility was reduced by half (without noticeable performance increasement though).
In order to preserve cross-compiler support I added a few pre-processor directives for retrieving arguments UTF-16-encoded.

Since I don't have any experiences with this kind of size optimizations yet I would like you to report if the new version causes false positives of your antivirus software.

Steffen


Top
   
PostPosted: 30 Jan 2017 11:52 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
After testing at virustotal the executables uploaded with version 1.4.2. do not cause any findings. At least I hope this can be proved in real world, too.

Steffen

https://www.virustotal.com/en/file/a8d6 ... 485797283/
https://www.virustotal.com/en/file/7562 ... 485797365/


Top
   
PostPosted: 02 Feb 2017 12:21 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
With version 1.4.3. comes the feature to add to an existing file using option /a. See the initial post.
Again I checked the executables on virustotal. No false positives detected.
https://www.virustotal.com/en/file/53c0 ... 486055380/
https://www.virustotal.com/en/file/f552 ... 486055433/

As always - the updated file can be found in the initial post of this thread.

Now I'm out of ideas (and am tired reading the source code repeatedly). I'll archive it and leave it alone :wink:

Steffen


Top
   
PostPosted: 23 Feb 2017 11:58 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
Quoted from there:
viewtopic.php?f=3&t=7703&p=51312#p51310
penpen wrote:
I have tested your CONVERTCP utility, and read the source code:
I saw no error, but i noticed that your tool does more, than just converting between codepages - it also approximates characters that are not within the target codepage (which is not that bad, because cmd.exe is doing the same, but i would mention it somewhere).
For example i created a file "string.txt" with this content (i hope it is not corrupted) encoded using UTF-8:
Code: Select all
ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩ

If you convert it to codepage 850 you get:
Code: Select all
AaAaAaCcCcCcCcDdDdEeEeEeEeEeGgGgGgGgHhHhIi

The recommended behaviour for such cases i know were to use the REPLACEMENT CHARACTER, a question mark, a square, or a question mark in a square for such cases.

This is by design and actually wanted behavior.

1) https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130(v=vs.85).aspx
Quote:
Code: Select all
int WideCharToMultiByte(
  _In_      UINT    CodePage,
  _In_      DWORD   dwFlags,
  _In_      LPCWSTR lpWideCharStr,
  _In_      int     cchWideChar,
  _Out_opt_ LPSTR   lpMultiByteStr,
  _In_      int     cbMultiByte,
  _In_opt_  LPCSTR  lpDefaultChar,
  _Out_opt_ LPBOOL  lpUsedDefaultChar
);

...
lpDefaultChar [in, optional]
...
For the CP_UTF7 and CP_UTF8 settings for CodePage, this parameter must be set to NULL. Otherwise, the function fails with ERROR_INVALID_PARAMETER.

lpUsedDefaultChar [out, optional]
...
For the CP_UTF7 and CP_UTF8 settings for CodePage, this parameter must be set to NULL. Otherwise, the function fails with ERROR_INVALID_PARAMETER.
...

That means at least for UTF-7 and UTF-8 I'm not even able to define a default character.
I noted this behavior in my first reply to Dave:
viewtopic.php?f=3&t=7570#p50285

2) The reason why I don't even want to work around it is that the utility was requested by miskox. He told me via email
Quote:
I 'patched' original .exe to make another .exe version with NOCSZ (that is NOČŠŽ) which replaces ČŠŽĐĆ characters with ordinary CZSDC - depending on the input code page.

That's why I called it "wanted behavior".

Steffen


Top
   
PostPosted: 18 Mar 2017 08:15 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
I was asked to add another option in order to automatically replace the original file content with the converted content. I won't do so.

The utility was designed to convert big files. That means it doesn't read the whole content into memory before it begins with the conversion in order to avoid running out of RAM space and to be able to read and convert data in parallel threads. Concurrent access to the same file could cause data losses, especially if the converted data is bigger than the data read.
Of course I could let the tool automatically write to a temporary file and replace the original file after the conversion was finished. But as soon as the temporary file and the original file are saved on different volumes this would cause a physical copying of data which wastes time and resources.

Thus, I would rather keep it in your hands. Moving a file to another file at the same logical drive will only lead to changing the file addressing. Example:
Code: Select all
convertcp 1 65001 /b /i "test.txt" /o "test.txt.temp~"
if not errorlevel 1 move /y "test.txt.temp~" "test.txt"


Steffen


Top
   
PostPosted: 27 May 2017 09:44 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
I didn't like to have only a link to the list of Code Page Identifiers in the help message. That's why I decided to add /l to the supported options that displays a list of installed code pages on your computer together with the information of how they can be used as input code page (see section "additional information" of the initial post), and their description.

Virustotal didn't find any false positives for version 1.4.4.
x86: https://www.virustotal.com/en/file/33108943bf6f8575a49873c44d0eef7ce30ffdd4af7f8564f6c2f8339171581c/analysis/
x64: https://www.virustotal.com/en/file/961bf49a7e624709742cde83ae5739f8e1f949a6e08e0e1a9f29e1f075afa9a4/analysis/

Steffen


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 28 posts ]  Go to page Previous 1 2

All times are UTC-06:00


Who is online

Users browsing this forum: Bing [Bot], Compo, Yahoo [Bot], zimxavier and 17 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Limited