Example 5 – Nested GROUP
Nested GROUP commands become very handy once you understand how to use GROUP IF. By nesting GROUP IF commands, you can control the flow of the script’s executing logic more efficiently and transparently than would otherwise be possible. I use GROUP IF exclusively as logic controllers and treat them as IF blocks in my code (a feature I wish very much was available outside of GROUP).
To better understand my previous statement, let’s look at the script from example 3:
This script is functional – we know this because we walked through it in detail – but there is something about this script that bothers me at a very deep level, and that is code duplication. In this GROUP block we typed out the exact same EXTRACT command and v_Cat assignment twice. Now, if we need to make updates (say, add another field to EXTRACT), we have to do it in 2 different places. If we forget to update one of the commands, the output table will be garbled and useless. Any time we duplicate code we are creating potential maintenance nightmares in the future. Not to mention that it requires mindless effort. I hate repetitive tasks – if I didn’t, I would not spend so much time learning how to automate my job into oblivion so I can focus on new and exciting things.
So, there has to be a better way, and that way is to nest GROUP. By nesting GROUP, we can separately identify when the category changes and reset our running total in its own little block of code. Take a look at the following script:
Notice that we have removed the IF statement from the initial GROUP command – that command will now run in its entirety on each record. We have moved the IF command to a nested GROUP block, which will be the first thing to execute on each record. Now we can get rid of the duplicated EXTRACT command and v_Cat assignment. Let’s see how this command executes. We start with the primed variables.
Now the main GROUP block begins executing.
The first command is the nested GROUP IF command, which determines if v_Cat is a different value from the current record’s Category field. In this first run it is, so IF evaluates to True and executes the code in the nested GROUP block.
The nested GROUP does only 1 thing – it resets the running total to 0.
The nested GROUP then exits and the remainder of the main GROUP executes by adding the current record’s value to the running total, extracting that value, and resetting the v_Cat variable.
We have reached the end of the main GROUP block and there are more records to process, so GROUP begins again by immediately evaluating the nested GROUP IF command.
GROUP IF evaluates False, so its code is skipped and the remainder of the main GROUP executes.
The next record is iterated and GROUP IF is evaluated…
…which returns False. The remainder of the main GROUP block is executed.
ACL iterates to the next record. On this iteration we have come to a new category, so GROUP IF evaluates True and executes its code…
…which resets the running total to 0.
The remainder of the main GROUP command finishes executing.
The remaining records execute as we have already seen. When these nested GROUP commands have finished executing, we will be left with a table that is identical to the end result in example 3.
Example 6 – LOOP
Now we have come to the final example. LOOP is a command that can only be used inside GROUP and which allows us to execute the same set of commands multiple times for each record. Some of you may have familiarity with other scripting or programming languages which have looping commands. The idea is the same, but the usage of ACL’s LOOP command is restricted to a very specific scope – it can only be used in the context of a table that is being processed by a GROUP command. You cannot use this command to create looping code elsewhere in your scripts. To do that, you need to use DO…WHILE. Do not spend your time, like I did when first learning the command, trying to use LOOP to replicate the looping commands found in other languages.
In this example we are going to extract each individual tag contained in our sample transaction table into its own record. The script which will accomplish this, and our sample data, is below.
The GROUP command starts by calculating how many tags are in the current record’s tag field. In our sample table, the tag field can contain multiple tags which are separated by commas. The OCCURS function will detect how many commas are in the field. We add 1 to this result because there will always be one more tag than there are commas.
Next, we initialize a counter which we will use to loop through the number of tags.
Now LOOP begins, and it begins by determining whether our counter is at a valid tag. We have 1 tag, and our counter is set to tag 1, so LOOP can execute.
LOOP starts by calculating the value of the tag in the first position using the SPLIT function.
The calculated tag is then extracted along with other relevant data from the record.
The final command increments the counter by a value of 1. This is a very crucial line of code – without it LOOP becomes an infinite loop. If you are like me, you will spend so much time focusing on the rest of the GROUP/LOOP logic that you will occasionally forget to add this to the end of the block. And when you finally get around to cancelling the script because it is taking too long, you will have a table with hundreds of thousands (or millions, if you got up to get a cup of coffee) of records all containing the same data. If you leave the script unattended, it will create records until your hard drive is full (assuming you have disabled ACL’s loop limit protections, which I do). It is probably a good idea to stay at your computer when testing LOOP.
The end of the LOOP is reached, which causes LOOP to start again by evaluating its WHILE condition.
Now, the current tag as represented by the counter is invalid – according to v_Tags there is only 1 tag, not 2. The WHILE condition returns False and LOOP stops executing. This takes us to the end of the GROUP command, which iterates the next record.
Side note: Unlike other languages which can allow the loop condition to be evaluated either before or after the loop code, the LOOP command always evaluates the WHILE condition before the LOOP code executes.
Now that we are back at the beginning of GROUP, the number of tags is recalculated and the counter is re-initialized to 1.
LOOP executes again. This record only has 1 tag, so its execution is identical to the first record.
We are finished processing this record, so it is time to move to the next record.
This record is also processed in a similar manner because it only has one tag. We will skip the details and move along.
Now we have iterated to a record with multiple tags.
This record begins, as with all of the prior records, by calculating the number of tags and re-initializing the counter to 1.
The LOOP block executes by calculating the first tag, which is ‘A’, extracting this tag and other record information to the result table, and incrementing the counter.
LOOP begins again by evaluating the WHILE condition, which is still true because our counter has not exceeded the number of tags in the record.
The code in the LOOP block now executes by calculating the second tag (‘B’), extracting it to the result table, and incrementing the counter.
LOOP begins again by evaluating the WHILE condition, which is still true.
The LOOP code now executes for the third tag (‘E’).
LOOP is finished for this record – the counter has finally exceeded the number of tags. GROUP begins again on the next record.
The combination of GROUP and LOOP continue executing in this manner for the remaining 2 records. By the time they have finished processing this table the result looks like this:
LOOP can be a powerful tool in your analytical toolbox to perform tasks which are otherwise difficult to perform. Some examples of how I have used LOOP include:
- Extracting regular expression pattern matches out of fields of free-form text.
- Extracting comments from a table imported from ACL GRC.
- Associating formal names and nicknames to each other from a file containing delimited name associations (if you were at ACL Connections 2015 and attended the session on regular expressions, you may remember me talking about this briefly).
- I have a fairly complex normalization script that, among other things, will optionally sort the individual characters in a field by alphabetical order. I do this using nested LOOP statements to iterate through each character of the source string and plop it into the proper position of the output string. The result is pretty cool, and after innumerable tests and failures, I was quite excited when I finally got it to work.
This brings our discussion of GROUP to a close. It is my sincere hope that it has provided clarity about the command and gives you the knowledge and confidence to use it in your own scripts. As a parting gift, you can download a non-Unicode project with the sample data and example scripts here.