Pulling out sets of fixed multiple lines onto single set

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
plasma33
Posts: 22
Joined: 26 Jul 2017 21:18

Pulling out sets of fixed multiple lines onto single set

#1 Post by plasma33 » 29 Jul 2017 21:38

Hello everyone,

I would like pull out sets of fixed multiple lines from around 500,000 plus lines onto a single set. A simple demonstration of what I would like to achieve is represented below:
Input:

Code: Select all

RGRGRKRGRHRGRGRGIGRMKHIGRMRRIGKMKMIGRHRLIGRIRNIGRL
|||||||||||||||||||||||||.|:||:||.|||.:.|||:::|||.
RGRGRKRGRHRGRGRGIGRMKHIGRGRKIGRMKHIGRLKHIGRMKHIGRH

RGIGRKRGIGRGRGIGRQRHIGKLKHIGRGRIIGRGRGIGRGRGIGRGRG
:.||:.:.|||.|.|||.|.||:.:.|||.|.|||||||||.:.|||.:.
KLIGKMKMIGRHRLIGRGRGIGRQRGIGRKRNIGRGRGIGRMKHIGRNKM

IGRRRRIGKKKKGDGARGRGRKRGRHRGRHRGIGRMKHIGRGRGIGKMKM
|||.:.||::::|||||||||||||||||||||||||||||.:.||||||
IGRMKHIGRRRQGDGARGRGRKRGRHRGRHRGIGRMKHIGRRKMIGKMKM

IGRHRLIGRIRMIGRLRGIGRKRGIGRGRGIGRGRRIGKMKLIGRGRRIG
|||||||||.|.|||.|||||||.|||||||||.:.||:.:.|||.:.||
IGRHRLIGRGRKIGRQRGIGRKRNIGRGRGIGRMKHIGRHRRIGRMKHIG

KKKLIGRGRRIGKMRHIGRMRQIGRNRNGDGARGRGRKRGRHRGRIRGIG
:.|.|||.:.||:.:.||:|:.|||:|.||||||||||||||||||||||
RIKHIGRMKHIGRRKMIGKMKMIGRHRLGDGARGRGRKRGRHRGRIRGIG


Output:

Code: Select all

RGRGRKRGRHRGRGRGIGRMKHIGRMRRIGKMKMIGRHRLIGRIRNIGRLRGIGRKRGIGRGRGIGRQRHIGKLKHIGRGRIIGRGRGIGRGRGIGRGRGIGRRRRIGKKKKGDGARGRGRKRGRHRGRHRGIGRMKHIGRGRGIGKMKMIGRHRLIGRIRMIGRLRGIGRKRGIGRGRGIGRGRRIGKMKLIGRGRRIGKKKLIGRGRRIGKMRHIGRMRQIGRNRNGDGARGRGRKRGRHRGRIRGIG
|||||||||||||||||||||||||.|:||:||.|||.:.|||:::|||.:.||:.:.|||.|.|||.|.||:.:.|||.|.|||||||||.:.|||.:.|||.:.||::::|||||||||||||||||||||||||||||.:.|||||||||||||||.|.|||.|||||||.|||||||||.:.||:.:.|||.:.||:.|.|||.:.||:.:.||:|:.|||:|.||||||||||||||||||||||
RGRGRKRGRHRGRGRGIGRMKHIGRGRKIGRMKHIGRLKHIGRMKHIGRHKLIGKMKMIGRHRLIGRGRGIGRQRGIGRKRNIGRGRGIGRMKHIGRNKMIGRMKHIGRRRQGDGARGRGRKRGRHRGRHRGIGRMKHIGRRKMIGKMKMIGRHRLIGRGRKIGRQRGIGRKRNIGRGRGIGRMKHIGRHRRIGRMKHIGRIKHIGRMKHIGRRKMIGKMKMIGRHRLGDGARGRGRKRGRHRGRIRGIG


Thanks.

Plasma33

ShadowThief
Expert
Posts: 921
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Pulling out sets of fixed multiple lines onto single set

#2 Post by ShadowThief » 29 Jul 2017 22:45

This is the third question (including the one StackOverflow question I saw) I've seen from you about data in this format. I'm really curious about what it could possibly be used for.

plasma33
Posts: 22
Joined: 26 Jul 2017 21:18

Re: Pulling out sets of fixed multiple lines onto single set

#3 Post by plasma33 » 29 Jul 2017 23:27

ShadowThief wrote:This is the third question (including the one StackOverflow question I saw) I've seen from you about data in this format. I'm really curious about what it could possibly be used for.


Hi there,

These are biological sequences and I am trying to extract the conserved regions (common substrings) from the aligned sequences for research purposes.

Thanks.

Plasma33

Aacini
Expert
Posts: 1623
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: Pulling out sets of fixed multiple lines onto single set

#4 Post by Aacini » 30 Jul 2017 11:52

Code: Select all

@echo off
setlocal EnableDelayedExpansion

echo Processing file, please wait...
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
for /L %%i in (1,1,3) do del output%%i.txt 2> nul

set /A "out=0, lineNum=0"
< nul (for /F "delims=" %%a in (input.txt) do (
   set "line=%%a"
   set /A "out=out%%3+1, lineNum+=1"
   set /P "=!line!" >> output!out!.txt
   set /P "=Line: !lineNum!!CR!"
))
(for /L %%i in (1,1,3) do type output%%i.txt & del output%%i.txt & echo/) > output.txt

plasma33
Posts: 22
Joined: 26 Jul 2017 21:18

Re: Pulling out sets of fixed multiple lines onto single set

#5 Post by plasma33 » 30 Jul 2017 20:16

@Aacini, thanks for your code. It works like I wanted. I love how it shows the number of lines that it has processed. Also, I love how your code divides each line into a separate text file. And on top of it, your code does the processing much faster than my one. It did the processing in under 5mins for a 17mb file. Hats off and thanks again.

Plasma33

Aacini
Expert
Posts: 1623
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: Pulling out sets of fixed multiple lines onto single set

#6 Post by Aacini » 30 Jul 2017 20:56

Ops! I just realized that the program should run slightly faster modified in this way:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

echo Processing file, please wait...
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
for /L %%i in (1,1,3) do del output%%i.txt 2> nul

set /A "out=0, lineNum=0"
< nul (for /F "delims=" %%a in (input.txt) do (
   set /A "out=out%%3+1, lineNum+=1"
   set /P "=%%a" >> output!out!.txt
   set /P "=Line: !lineNum!!CR!"
))
(for /L %%i in (1,1,3) do type output%%i.txt & del output%%i.txt & echo/) > output.txt

Antonio

plasma33
Posts: 22
Joined: 26 Jul 2017 21:18

Re: Pulling out sets of fixed multiple lines onto single set

#7 Post by plasma33 » 01 Aug 2017 20:46

Hello Aacini,

Yes, it does. Thanks for the modified code. You are a life saver!!

Plasma33

Post Reply