truncate files - a (pure?) batch solution

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Sponge Belly
Posts: 200
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

truncate files - a (pure?) batch solution

#1 Post by Sponge Belly » 25 May 2013 11:50

Hello All!

Sometimes I’d like to snip off the newline from the end of a file. But there’s no efficient way to do it using pure Batch commands. Recently, I unearthed a Super User question on how to truncate a set amount of bytes from a file. The second answer gave a PowerShell solution. I know nothing of PowerShell (except that it’s even uglier and more obtuse than VBScript :twisted:), but I managed to glue the following snippet together:

Code: Select all

@echo off & setlocal enableextensions

powershell -noprofile -command ^"^& {$file='%~dpf1'; $BYTES_TO_TRIM=%2; ^
$byteEncodedContent = [System.IO.File]::ReadAllBytes($file); ^
$truncatedByteEncodedContent = $byteEncodedContent[0..($byteEncodedContent.Length - ($BYTES_TO_TRIM + 1))]; ^
Set-Content -value $truncatedByteEncodedContent -encoding byte -path "$($file)"}"

endlocal & exit /b 0


Awful, isn’t it? But it has one redeeming quality… it works! The snippet accepts two parameters: a filename, and the number of bytes to be snipped off the end of the file. And the best bit is that the file is edited in place. Sweet! :-) The last modified time is updated and creation time is preserved.

How does it work? Well, I think it slurps the whole file into an array, subtracts the number of bytes to be truncated from the array length, and writes the modified array back into the original file. Works fine with smallish files. I tried it on a 6Mb file and there was a delay of a few seconds.

Of course, you could just use the freeware and tiny (~6k) trunc command line utility. But many people find themselves in environments where they’re not allowed to install third party software. Plus, the snippet above is more flexible. It could easily be modified to truncate from the beginning of a file, or even the middle. With judicious use of findstr /o, a wily Batcher might even be able to delete a range of lines from a file without having to go through the rigmarole of looping through the file line by line, storing the head in a temporary file, deleting the range of lines, and gluing the head and tail back together. Tedious! :cry:

Anyways, please feel free to improve the functionality and efficiency of the code above. It would be interesting to see how far you gurus could run with this particular ball.

- SB
Last edited by Sponge Belly on 20 Nov 2013 11:09, edited 1 time in total.

Squashman
Expert
Posts: 4198
Joined: 23 Dec 2011 13:59

Re: truncate files

#2 Post by Squashman » 25 May 2013 15:31

So you could potentially split in the middle of a line?
Are you including the CRLF in your byte count?

Well I suppose if you wanted to do it with native batch you could read the file line by line and get the length of each line and keep adding the length of each line to a total bytes counted variable. Once you are over your total you could substring the last line with the number of bytes you needed to pull from that line.
In theory you should be able to tell it if you want the first 1000 bytes from the file truncated or the last 1000 bytes truncated because getting the file size would be easy enough and you could add or subtract from there.

foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: truncate files

#3 Post by foxidrive » 25 May 2013 23:01

Sponge Belly wrote:Awful, isn’t it? But it has one redeeming quality… it works!


Here's a batch solution for 32 bit machines. Probably only works up to 64 KB.

Code: Select all

:: based upon a batch by Tom Lavedas
@echo off
if "%1"=="" echo Syntax: %0 Sourcefile
if "%1"=="" echo Purpose:   Removes trailing two bytes (CR/LF usually)
if "%1"=="" goto end
type nul>temp.txt
for %%v in (E100''83''E9''02 L103 P N%1 W103 Q) do echo %%v>>temp.txt
if exist %1 debug %1 <temp.txt >nul
del temp.txt
:end



The last modified time is updated and creation time is preserved.


I am not sure that happened here. It seems to change the timestamp in Win8.

Sponge Belly
Posts: 200
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: truncate files - a (pure?) batch solution

#4 Post by Sponge Belly » 20 Nov 2013 10:24

Dear DosTippers, :-)

While participating in a long, meandering thread about something else entirely, I prepended a BOM to a plain ASCII file and redirected it to a new file as below:

Code: Select all

cmd /d /u /c type bomfile.txt > newfile.txt


To my surprise, the BOM was missing from the new file and the new file was 1 byte shorter than the original. Penpen explained what was going on:

A UTF-16 character consists of 2, or 4 bytes. If an unfinished UTF16 character at the end is read, the InputStream is handled as broken and the last 1 byte (or 3 bytes) is dropped.


This curious behaviour gave me the idea to write a program that would chomp the newline from the end of a plain ASCII text file. It ain’t pretty, but it works! ;-)

Code: Select all

@echo off & setlocal enableextensions & (call;)
if exist "%~1\" (goto die) else if not exist "%~1" (goto die
) else echo("%~1" | findstr "* ?" >nul && goto die
if "%~z1"=="0" goto die
if "%~2"=="" (set chomp=2) else set "chomp=%~2"
if %chomp% lss 1 (goto die) else if %chomp% gtr 2 goto die
if %chomp% geq %~z1 goto die

set "bom=%tmp%\bom.tmp"
if not exist "%bom%" (>"%bom%" echo(ff fe
certutil -f -decodehex "%bom%" "%bom%" >nul)
set "bs=%tmp%\bs.tmp"
if not exist "%bs%" for /f %%b in ('
"prompt $h & for %%a in (1) do rem"
') do >"%bs%" echo(%%b
set "ff=%tmp%\ff.tmp"
if not exist "%ff%" for /f %%f in ('cls') do >"%ff%" echo(%%f
set "sub=%tmp%\sub.tmp"
if not exist "%sub%" (copy /a nul "%sub%" >nul
for /f usebackq %%z in ("%sub%") do >"%sub%" echo(%%z)
set "esc=%tmp%\esc.tmp"
if not exist "%esc%" for /f %%e in ('
"prompt $e & for %%a in (1) do rem"
') do >"%esc%" echo(%%e
for %%f in ("%bs%" "%ff%" "%sub%" "%esc%") ^
do findstr /lmxg:"%%~f" "%~1" >nul || (
set "pre=%%~f" & goto break)
:break
if not defined pre goto die

pushd "%~dp1"
set "orig=%~nx1"
set "dup=%tmp%\%~nx1.tmp"
set "dup2=%tmp%\%~nx1-2.tmp"
set /a "parity=%~z1 & 1"
if %parity% equ 0 (copy /b "%pre%" + "%orig%" "%dup%" >nul
) else copy /b "%orig%" "%dup%" >nul
for /l %%i in (1 1 %chomp%) do (
copy /b /y "%bom%" + "%dup%" "%dup%.bom" >nul
cmd /d /u /c type "%dup%.bom" > "%dup2%"
copy /b /y "%pre%" + "%dup2%" "%dup%" >nul)
findstr /lvxg:"%pre%" "%dup%" >"%orig%"
del "%dup%" "%dup2%" "%dup%.bom"
popd
goto end

:die
>&2 echo(program "%~nx0" ended unexpectedly& (call)
:end
endlocal & goto :eof


The program accepts two arguments: a filename; and the number of bytes to be chomped from the end of file (2 by default). The source file is searched to make sure it doesn’t contain the unique string to be prepended to the temporary copy. Four different strings are tried. Next, the source file is tested for parity (thanks to Aacini), and the unique string is prepended to the temporary copy if it’s even—this dreadful kludge won’t work unless the file being chomped has an odd number of bytes. Then we go into the for /l loop either 1 or 2 times. Each time, the BOM is prepended to the temporary copy which is then redirected into a second temporary file using the type command explained earlier. And then the unique string is prepended to the second temporary file and copied to the first temporary copy. Finally, findstr removes all instances of the unique string from the temporary copy and redirects its output to the original file. The original file should be exactly as before except without a terminating newline and an updated last modified time (creation time is preserved).

Successfully tested on a 5.84Mb file and a file containing a line 10,001 characters long. Works only with 8-bit characters. Use entirely at your own risk! :!:

Any suggestions for improvement gratefully appreciated.

- SB

PS: Forgot to say that since the program above can chomp the newline from EOF, it’s possible to remove the last line from a chomped file—even if it’s extremely long using something like: findstr "!lf!" longfile.txt > longfile-1.txt or the last line could be isolated by using: findstr /v "!lf!" longfile.txt > lastline.txt

Post Reply