In the last post we discussed two basic examples of GROUP to give you an idea of how the command works. In this post we are going to look at two examples that utilize GROUP IF. We will be using the same example table that you have already seen, but extending our usage of the GROUP command to perform more sophisticated analytical tasks.
Example 3 – Running Total
In this example we are going to create a running total of each category in our sample table. Below is the script that will accomplish this and the sample data that we will use. Note that while the sample data is the same as in examples 1 and 2, in this example it has been sorted on Category and EnteredOn. This has to happen before GROUP is executed; data cannot be presorted in the GROUP command.
Notice that we have a few commands to execute before GROUP. These commands prepare the variables that we will use while we process the table. While GROUP is executing, the type and length of any variables used cannot be changed. Because of this behavior, it is especially important to ‘prime’ our variables with data from the table we are working with. For example, if we create a Character variable with value of ‘A’ (1 character long), then try to assign it a value of ‘XYZ’ (3 characters long) during the execution of GROUP, the data will be truncated to ‘X’. ACL does not warn you when this happens; the only indicator you will receive is when your output data looks weird.
Because of the importance of priming variables, I consider this step to be intricately tied to the GROUP command. That is why we are including these steps in this example.
The script starts its execution by calculating the length of the category field.
We then use this length to prime a variable that will hold the value of the prior record’s category. Unfortunately, the BLANKS() function cannot contain the LEN() calculation directly – ACL throws a fit if you try to use anything other than a variable or hard-coded value in BLANKS(). The reason we are initializing v_Cat with blanks is because we want the IF condition in the GROUP command to fail on the first record. You will understand why soon.
The next line simply primes a variable that will store our running total.
Now that our variables are primed, we arrive at the GROUP IF command.
We want the GROUP command to aggregate a running total in the v_RunTotal variable while we are processing records of the same category. When we reach a new category, we want to reset the running total. The IF statement in the group command is designed to identify when we are in the same category. When we reach a new category, the IF statement returns False, which causes the code in the ELSE block to execute. The ELSE block is what resets our running total.
We want the first record to return False and thereby reset the running total. If it returned True, v_RunTotal would double the first record’s value because we primed v_RunTotal with that value. We could have primed v_RunTotal with a value of 0, but then we would have to perform a complicated procedure to calculate the number of decimal places in our field or else risk truncating our numbers. I find it much easier to prime a Character variable with blanks than to prime a numeric variable with 0.
Moving forward with this methodology, the IF statement returns false. This causes the code in the ELSE block to start executing, starting with the resetting of v_RunTotal.
Now that our variable is reset, we extract the data with our current running total value.
The next step is to set the value of v_Cat to the category field. We do this at the very end because we want v_Cat to hold the value of the prior record. When the GROUP command iterates to the next record and calculates the IF statement, v_Cat will have the prior value and the IF statement will correctly compare it to the new value.
We have reached the end of the block, but there are more records to process, so GROUP begins again. The IF statement compares v_Cat, which has a value of ‘A01’, to the Category field, which has a value of ‘A01’, and returns True.
The Amount field is added to the running total.
The current record and running total are then extracted to the Result table.
v_Cat is set to the current record’s value…
…and GROUP iterates on the next record, where the IF statement again returns a value of True.
The next record processes…
…and GROUP iterates to the next record. The IF statement returns False because v_Cat has a value of ‘A01’ and the current record has a value of ‘Z90’.
The running total now resets for the new category.
The first record of the new category with its reset running total is extracted to the Result table…
…v_Cat is now set to ‘Z90’…
…which causes the IF statement for the next record to return True.
From here on out the GROUP command executes as we have already seen.
And that is it. GROUP has finished executing and we have a new table with a running total of our transactions, by category.
Example 4 – Min and Max Dates
A fairly common analytical task is to identify the min and max values, by category, in a table. In SQL this is easy to do using the min() and max() functions in a SELECT…GROUP BY statement. ACL can do this for numeric fields using the SUMMARIZE command, but unfortunately, there is currently no way to easily determine the min and max values of fields containing dates or text. The traditional (non-GROUP) approach is to use a series of SORTs, SUMMARIZEs, and JOINs to calculate and merge the min and max values by category. The process is a bit clunky and can create lots of unnecessary tables, especially if you are trying to calculate min and max over multiple columns. With GROUP, it can all be done in 1 pass of the table and will create, at most, 1 temporary table.
To start, the data you are working with needs to be sorted on the key field(s) (this is the temporary table I mentioned). It does NOT need to be sorted on any of the fields we are min’ing or max’ing. Here is the basic script we will use and our sorted sample table.
The script starts, as in the prior example, by priming some variables we will be using.
The GROUP block starts with an IF statement checking if v_Cat is equal to the current record’s value of Category. We want this to be true to prevent the EXTRACT command in the ELSE block from running, so we primed v_Cat with the value of the Category field rather than blanks.
This next block of code is where the magic happens, but because our date variables were primed with the value of the current record, this is not a good place to explain how it works. We will walk through this logic in detail during the next iteration.
We have reached the end of the block and there are more records to process, so GROUP begins again by evaluation v_Cat against the next record’s Category field.
It evaluates true, so now we can talk about the magic of the first block of code. ACL has a MIN() and MAX() function, which is nice. But it only works with numbers, not dates or characters, which is not nice. Fortunately, it is easy to replicate these functions using a simple IF statement. The first line calculates the minimum value. We start by assuming that v_MinDate IS the minimum value. If the current record’s EnteredOn date is less than the variable value, we know this assumption is false and we re-assign v_MinDate to the current EnteredOn value. In this example, the EnteredOn value of March 2, 2015 is less than v_MinDate’s value of November 1, 2015, so we assign v_MinDate a value of March 2, 2015.
The next line calculates the max value, which is the same logic as the min value but with ‘greater than’ equality rather than ‘less than’ equality.
And now we reset the v_Cat variable to the current record’s Category value.
We have reached the end of the magic block, so GROUP reiterates to the next record.
We are still working with A01 category transactions, so the magic block recalculates the min and max values again.
Now we arrive at the Z90 category of transactions, which causes GROUP to execute the ELSE block.
Up to this point we have not actually extracted any data; we have simply been calculating and recalculating the min and max values. The first step in the ELSE block is to extract these values for the category we have been working with (in this case, A01). Notice that the EXTRACT command references v_Cat rather than Category. We cannot use Category in the extract because we have iterated to a Z90 record. Extracting the Category field directly would associate A01’s min and max values with Z90.
The next block of code resets the min and max variables and updates the category variable.
We have again come to the end of the GROUP command, so we move to the next record and begin again.
The min and max values are recalculated as before…
…and we again move to the next record.
The final recalculation of min and max is performed…
…and we are done with the GROUP command.
But wait, we can’t be done! We have not extracted the Z90 record – our Result table only contains a record for A01. How do we make this GROUP command give us the last record?
The answer lies in the EXTRACT command. You may have already noticed it, but look at the ‘EOF’ parameter at the end of EXTRACT. This parameter tells ACL to run EXTRACT one last time after the final record is processed. There are 3 commands that take the EOF parameter – EXTRACT, LIST, and REPORT – and all of them will run one last time after GROUP is finished if the parameter is specified.
Now GROUP is completely finished and the remainder of the script executes.
Notice that EOF does not cause the entire block of code to re-run. v_MinDate, v_MaxDate, and v_Cat were not recalculated – only the EXTRACT command ran. This means if there are commands that run before EXTRACT, which EXTRACT depends on, they will not re-calculate on the final execution and the last row of data that is extracted could be erroneous.
That wraps up our review of GROUP IF. You will find it is a powerful way to control the execution of the GROUP command and ensure appropriate commands are executed on relevant records. The next, and final, post on GROUP will review nested GROUP commands and introduce you to the basics of LOOP.