Quick way to extract data from multiples lines and many files using jrepl?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Quick way to extract data from multiples lines and many files using jrepl?

#1 Post by zimxavier » 07 Aug 2017 03:11

I would like to extract all values of which from hundred files but only when it is a parameter of set_variable or change_variable. These functions can be on one line or several. Commented functions are excluded.
Example:

Code: Select all

ROOT = {
    change_variable = {
        which = current_potion_quality
        value = 1
    }
}
change_variable = { which = "laws" value = 1 }

change_variable = { value = -1 which = debate_score }

bad_function = { which = not_a_value }
#change_variable = { value = -1 which = not_a_value }


Correct values:
current_potion_quality
laws
debate_score



My script works fine, but it is excruciatingly long (At least 10 minutes):

Code: Select all

for %%F in ("C:\game\decisions\*.txt") do (
call BATCH_JREPL "(^[^#]*?)(\b(set_variable|change_variable)\s*=)([\s\S]*?)(which\s*=\s*\q?)([A-Za-z0-9_]+)" "$txt=$6" /jmatchq /x /m /f "%%F" >> "TEMP\ztemp0_all_variables.txt"
for %%F in ("C:\game\events\*.txt") do (
call BATCH_JREPL "(^[^#]*?)(\b(set_variable|change_variable)\s*=)([\s\S]*?)(which\s*=\s*\q?)([A-Za-z0-9_]+)" "$txt=$6" /jmatchq /x /m /f "%%F" >> "TEMP\ztemp0_all_variables.txt"
)


/m (multilines parameter) seems to be the cause, the rest of the script being fast.
Any ideas ?
Thanks.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Quick way to extract data from multiples lines and many files using jrepl?

#2 Post by dbenham » 07 Aug 2017 06:41

zimxavier wrote:My script works fine...
I don't think so. Your code will mistakenly extract "current_potion_quality" from the following:

Code: Select all

#ROOT = {
    change_variable = {
        which = current_potion_quality
        value = 1
    }
}

Also, you are capturing 6 groups, when you only need 1.

Here is a fixed JREPL call with only one capturing group:

Code: Select all

call jrepl ^
  "^(?![\s])[^#]*?(?:\bset_variable|\bchange_variable)\s*=[\s\S]*?which\s*=\s*\q?([A-Za-z0-9_]+)"^
  "$txt=$1" /jmatchq /x /m /f "%%F" >>"TEMP\ztemp0_all_variables.txt"
But I don't think this will speed anything up. In fact, it may slow things down because the bug fix can lead to additional regex backtracking.

Each call to JREPL requires significant start up time. The simplest way I can think to speed things up is to minimize the number of times JREPL is called. All your results go to a single output file, so it should be possible to get by with only one JREPL call (provided the sum of all source text files is less than ~1 GB)

If you can guarantee that the last character of every source file is a newline character, then you can use the following:

Code: Select all

type "C:\game\decisions\*.txt" "C:\game\events\*.txt" 2>nul |^
jrepl "^(?![\s])[^#]*?(?:\bset_variable|\bchange_variable)\s*=[\s\S]*?which\s*=\s*\q?([A-Za-z0-9_]+)"^
      "$txt=$1" /jmatchq /x /m /o "TEMP\ztemp0_all_variables.txt"

But if some files are missing the final newline, then the first line of a file may be appended to the last line of the prior file.

A FOR loop can solve the problem:

Code: Select all

@echo off
(
  for %%F in (
    "C:\game\decisions\*.txt"
    "C:\game\events\*.txt"
  ) do (
    type "%%F"
    echo(
  )
)|jrepl "^(?![\s])[^#]*?(?:\bset_variable|\bchange_variable)\s*=[\s\S]*?which\s*=\s*\q?([A-Za-z0-9_]+)"^
        "$txt=$1" /jmatchq /x /m /o "TEMP\ztemp0_all_variables.txt"


Dave Benham

zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Re: Quick way to extract data from multiples lines and many files using jrepl?

#3 Post by zimxavier » 07 Aug 2017 08:50

Thank you Dave :)

Code: Select all

#ROOT = {
    change_variable = {
        which = current_potion_quality
        value = 1
    }
}


Actually, in that case I want to extract current_potion_quality, because it is read by the game (not in root scope, but the syntax is correct). Only the extra final curly bracket is always wrong and breaks the syntax highlighting of the rest of the file. # must be somewhere before and in the same line than change_variable\set_variable or which (Never see that latest case though). Sorry it wasn't clear.

3 correct syntaxes:

current_potion_quality is not valid:

Code: Select all

#ROOT = {
#    change_variable = {
#        which = current_potion_quality
#        value = 1
#    }
#}


current_potion_quality is valid:

Code: Select all

#ROOT = {
    change_variable = {
        which = current_potion_quality
        value = 1
    }
#}


current_potion_quality is not valid and a_value is valid:

Code: Select all

#ROOT = {
    change_variable = {
        #which = current_potion_quality
        which = a_value
        value = 1
    }
#}


Anyways, I tested your latest script. It took... 2 seconds. It doesn't work as expected though. It found 94 occurrences instead of 508.

I replaced ^(?![\s])[^#] with ^[^#]*? (like in my initial script). This code works but takes 2'24'' (!):

Code: Select all

@echo off
(
  for %%F in (
    "C:\game\decisions\*.txt"
    "C:\game\events\*.txt"
  ) do (
    type "%%F"
    echo(
  )
)|BATCH_JREPL "^[^#]*?(?:\bset_variable|\bchange_variable)\s*=[\s\S]*?which\s*=\s*\q?([A-Za-z0-9_]+)"^
        "$txt=$1" /jmatchq /x /m /o "TEMP\ztemp0_all_variables.txt"


I don't understand why the result is not the same.

zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Re: Quick way to extract data from multiples lines and many files using jrepl?

#4 Post by zimxavier » 08 Aug 2017 01:00

After a good night's sleep, I believe I found it:

Code: Select all

@echo off
(
  for %%F in (
    "C:\game\decisions\*.txt"
    "C:\game\events\*.txt"
    "C:\game\common\scripted_effects\*.txt"
  ) do (
    type "%%F"
    echo(
  )
)|BATCH_JREPL "^[^#\n]*?(?:\bset_variable|\bchange_variable)\s*=[\s\S]*?which\s*=\s*\q?([A-Za-z0-9_]+)"^
        "$txt=$1" /jmatchq /x /m /o "TEMP\ztemp0_all_variables.txt"


I added \n as exclusion. This script takes 1 or 2 seconds. Amazing! :mrgreen: My third example is not supported, but not a big deal.
Thank you again Dave. Your changes are very promising. All durations are drastically reduced.
Last edited by zimxavier on 13 Aug 2017 03:15, edited 1 time in total.

Post Reply