GROUP Part 2 – GROUP IF

In the last post we discussed two basic examples of GROUP to give you an idea of how the command works. In this post we are going to look at two examples that utilize GROUP IF. We will be using the same example table that you have already seen, but extending our usage of the GROUP command to perform more sophisticated analytical tasks.

Example 3 – Running Total

In this example we are going to create a running total of each category in our sample table. Below is the script that will accomplish this and the sample data that we will use. Note that while the sample data is the same as in examples 1 and 2, in this example it has been sorted on Category and EnteredOn. This has to happen before GROUP is executed; data cannot be presorted in the GROUP command.

004 ex 3_01

Notice that we have a few commands to execute before GROUP. These commands prepare the variables that we will use while we process the table. While GROUP is executing, the type and length of any variables used cannot be changed. Because of this behavior, it is especially important to ‘prime’ our variables with data from the table we are working with. For example, if we create a Character variable with value of ‘A’ (1 character long), then try to assign it a value of ‘XYZ’ (3 characters long) during the execution of GROUP, the data will be truncated to ‘X’. ACL does not warn you when this happens; the only indicator you will receive is when your output data looks weird.

Because of the importance of priming variables, I consider this step to be intricately tied to the GROUP command. That is why we are including these steps in this example.

The script starts its execution by calculating the length of the category field.

004 ex 3_02.png

We then use this length to prime a variable that will hold the value of the prior record’s category. Unfortunately, the BLANKS() function cannot contain the LEN() calculation directly – ACL throws a fit if you try to use anything other than a variable or hard-coded value in BLANKS(). The reason we are initializing v_Cat with blanks is because we want the IF condition in the GROUP command to fail on the first record. You will understand why soon.

004 ex 3_03.png

The next line simply primes a variable that will store our running total.

004 ex 3_04.png

Now that our variables are primed, we arrive at the GROUP IF command.

004 ex 3_05.png

We want the GROUP command to aggregate a running total in the v_RunTotal variable while we are processing records of the same category. When we reach a new category, we want to reset the running total. The IF statement in the group command is designed to identify when we are in the same category. When we reach a new category, the IF statement returns False, which causes the code in the ELSE block to execute. The ELSE block is what resets our running total.

We want the first record to return False and thereby reset the running total. If it returned True, v_RunTotal would double the first record’s value because we primed v_RunTotal with that value. We could have primed v_RunTotal with a value of 0, but then we would have to perform a complicated procedure to calculate the number of decimal places in our field or else risk truncating our numbers. I find it much easier to prime a Character variable with blanks than to prime a numeric variable with 0.

Moving forward with this methodology, the IF statement returns false. This causes the code in the ELSE block to start executing, starting with the resetting of v_RunTotal.

004 ex 3_06

Now that our variable is reset, we extract the data with our current running total value.

004 ex 3_07

The next step is to set the value of v_Cat to the category field. We do this at the very end because we want v_Cat to hold the value of the prior record. When the GROUP command iterates to the next record and calculates the IF statement, v_Cat will have the prior value and the IF statement will correctly compare it to the new value.

004 ex 3_08.png

We have reached the end of the block, but there are more records to process, so GROUP begins again. The IF statement compares v_Cat, which has a value of ‘A01’, to the Category field, which has a value of ‘A01’, and returns True.

004 ex 3_09

The Amount field is added to the running total.

004 ex 3_10.png

The current record and running total are then extracted to the Result table.

004 ex 3_11.png

v_Cat is set to the current record’s value…

004 ex 3_12.png

…and GROUP iterates on the next record, where the IF statement again returns a value of True.

004 ex 3_13.png

The next record processes…

004 ex 3_14.png

…and GROUP iterates to the next record. The IF statement returns False because v_Cat has a value of ‘A01’ and the current record has a value of ‘Z90’.

004 ex 3_15

The running total now resets for the new category.

004 ex 3_16

The first record of the new category with its reset running total is extracted to the Result table…

004 ex 3_17

…v_Cat is now set to ‘Z90’…

004 ex 3_18.png

…which causes the IF statement for the next record to return True.

004 ex 3_19.png

From here on out the GROUP command executes as we have already seen.

004 ex 3_20

004 ex 3_21.png

004 ex 3_22.png

004 ex 3_23

And that is it. GROUP has finished executing and we have a new table with a running total of our transactions, by category.

Example 4 – Min and Max Dates

A fairly common analytical task is to identify the min and max values, by category, in a table. In SQL this is easy to do using the min() and max() functions in a SELECT…GROUP BY statement. ACL can do this for numeric fields using the SUMMARIZE command, but unfortunately, there is currently no way to easily determine the min and max values of fields containing dates or text. The traditional (non-GROUP) approach is to use a series of SORTs, SUMMARIZEs, and JOINs to calculate and merge the min and max values by category. The process is a bit clunky and can create lots of unnecessary tables, especially if you are trying to calculate min and max over multiple columns. With GROUP, it can all be done in 1 pass of the table and will create, at most, 1 temporary table.

To start, the data you are working with needs to be sorted on the key field(s) (this is the temporary table I mentioned). It does NOT need to be sorted on any of the fields we are min’ing or max’ing. Here is the basic script we will use and our sorted sample table.

004 ex 4_01.png

The script starts, as in the prior example, by priming some variables we will be using.

004 ex 4_02.png

The GROUP block starts with an IF statement checking if v_Cat is equal to the current record’s value of Category. We want this to be true to prevent the EXTRACT command in the ELSE block from running, so we primed v_Cat with the value of the Category field rather than blanks.

004 ex 4_03.png

This next block of code is where the magic happens, but because our date variables were primed with the value of the current record, this is not a good place to explain how it works. We will walk through this logic in detail during the next iteration.

004 ex 4_04

We have reached the end of the block and there are more records to process, so GROUP begins again by evaluation v_Cat against the next record’s Category field.

004 ex 4_05.png

It evaluates true, so now we can talk about the magic of the first block of code. ACL has a MIN() and MAX() function, which is nice. But it only works with numbers, not dates or characters, which is not nice. Fortunately, it is easy to replicate these functions using a simple IF statement. The first line calculates the minimum value. We start by assuming that v_MinDate IS the minimum value. If the current record’s EnteredOn date is less than the variable value, we know this assumption is false and we re-assign v_MinDate to the current EnteredOn value. In this example, the EnteredOn value of March 2, 2015 is less than v_MinDate’s value of November 1, 2015, so we assign v_MinDate a value of March 2, 2015.

004 ex 4_06.png

The next line calculates the max value, which is the same logic as the min value but with ‘greater than’ equality rather than ‘less than’ equality.

004 ex 4_07.png

And now we reset the v_Cat variable to the current record’s Category value.

004 ex 4_08.png

We have reached the end of the magic block, so GROUP reiterates to the next record.

004 ex 4_09.png

We are still working with A01 category transactions, so the magic block recalculates the min and max values again.

004 ex 4_10

Now we arrive at the Z90 category of transactions, which causes GROUP to execute the ELSE block.

004 ex 4_11.png

Up to this point we have not actually extracted any data; we have simply been calculating and recalculating the min and max values. The first step in the ELSE block is to extract these values for the category we have been working with (in this case, A01). Notice that the EXTRACT command references v_Cat rather than Category. We cannot use Category in the extract because we have iterated to a Z90 record. Extracting the Category field directly would associate A01’s min and max values with Z90.

004 ex 4_12.png

The next block of code resets the min and max variables and updates the category variable.

004 ex 4_13.png

We have again come to the end of the GROUP command, so we move to the next record and begin again.

004 ex 4_14.png

The min and max values are recalculated as before…

004 ex 4_15.png

…and we again move to the next record.

004 ex 4_16.png

The final recalculation of min and max is performed…

004 ex 4_17

…and we are done with the GROUP command.

004 ex 4_18

But wait, we can’t be done! We have not extracted the Z90 record – our Result table only contains a record for A01. How do we make this GROUP command give us the last record?

The answer lies in the EXTRACT command. You may have already noticed it, but look at the ‘EOF’ parameter at the end of EXTRACT. This parameter tells ACL to run EXTRACT one last time after the final record is processed. There are 3 commands that take the EOF parameter – EXTRACT, LIST, and REPORT – and all of them will run one last time after GROUP is finished if the parameter is specified.

004 ex 4_19

Now GROUP is completely finished and the remainder of the script executes.

004 ex 4_20

Notice that EOF does not cause the entire block of code to re-run. v_MinDate, v_MaxDate, and v_Cat were not recalculated – only the EXTRACT command ran. This means if there are commands that run before EXTRACT, which EXTRACT depends on, they will not re-calculate on the final execution and the last row of data that is extracted could be erroneous.

That wraps up our review of GROUP IF.  You will find it is a powerful way to control the execution of the GROUP command and ensure appropriate commands are executed on relevant records.  The next, and final, post on GROUP will review nested GROUP commands and introduce you to the basics of LOOP.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s