Modularization, Part 1

ACLScript is a great domain-specific language for analyzing large sets of data. It is a very comfortable language – like SQL it is fairly verbose and descriptive but even easier to read and manipulate. It fits very nicely with traditional data analysis and I have not yet found a scripting or programming language that is as enjoyable to use while being as powerful as ACLScript. My favorite days at the office are those where I am sitting at my desk with a hot cup of coffee; wearing a warm, comfortable sweater; and writing ACLScript code for one of my long list of projects. Add a fireplace, a window, and some light snow falling gently outside and you have the makings of a warm and fuzzy holiday greeting card.

But this does not mean there is no room for improvement in the language.  One shortcoming of ACLScript is the behavior of variables. Every variable in ACL is a global variable; they are accessible from any script and can be manipulated by any script. Admittedly, global variables are easier to use by those who are new to ACLScript or scripting in general. But to intermediate and advanced ACL scripters, or those with prior programming experience, it becomes a hindrance to the development of good code.

Why talk about this in an article titled ‘Modularization’? Because the lack of variable scope is the biggest roadblock to creating properly-modular scripts in ACL. It is the global-ness of variables which allow, indeed encourage, us to create awful spaghetti code that weaves in and out, changes state at unexpected times, and generally makes maintenance and enhancements a nightmare. I would know – I have written a lot of spaghetti code in my time both in .NET languages and ACLScript.  If we artificially force variable scoping we can go a long way to creating modular scripts, free ourselves from the tangles of bad code, and ultimately make our code base more stable and easier to maintain.

What is modular ACL scripting?

Modular ACL scripting is, in its most basic form, creating scripts that do exactly one thing. Spaghetti code begins to rear its ugly head when we start to add functionality and new tasks to existing scripts rather than create a new script. Some examples I have found (and created) include: making analytical scripts create reports, making data import scripts do analytics, making a single script that imports data, analyzes data, creates report files, sends notifications, etc. Even making an analytic script do an unrelated, or somewhat related, analysis can be considered spaghettification. If you want to modularize your scripts, you need to force yourself to write scripts that do exactly one thing.

I have found it helps to think about scripts as layers of an analytic object or application. The diagram below is my current high-level approach to layering scripts. Some scripters may like additional levels and others may prefer fewer levels; the actual layers you define are not overly important, but having an architecture in mind and forcing scripts to be single-purpose within that architecture is.

006 Architecture

So, what does it mean to separate scripts into these layers? Let’s take an example of an analytic that identifies duplicate invoices. We will have separate scripts to import vendor master data and invoice transactions. If we want to perform currency conversions to a single reporting currency, we may also have a third script to import exchange rates. Notice that I do not embed currency conversions into the other data imports – the invoice script does not touch exchange rates. I consider currency conversion to be an analytic function, not a data import function. By keeping the exchange rate script separate, I can call it exactly when I need to and re-use the data for other scripts.

The analytic script uses the data from the import scripts to identify duplicate invoices, but it does not directly call the imports.  During the course of the analysis, it might perform some data normalization on invoice references or vendor names. This might be done in the script itself, or it might call utility scripts to perform the normalization. The utility scripts are designed to require certain inputs, such as field and table names, so that they can be easily re-used in other scripts. Any actions performed by utility or analytic scripts, such as normalization, are not performed on the source tables; rather, temporary tables would be created from the sources and the normalization actions performed on them. I pay very close attention to ensuring that scripts do not change input data in any way. If input data is manipulated by any script that is not its originating script, any other analytics that use the source data cannot have confidence in the integrity of the data. If you want scripts to be plug and play, they cannot change their inputs.

The analytic scripts only go so far as to identify the exception or item you are trying to identify. No output files are generated, unless you save the results of your analytics to a database. In this case, I have found it best for the analytic script to handle the transfer. Separate scripts should generate reports for distribution. De-coupling the analytic and reporting scripts gives you great flexibility on how to present the data to your customers. When creating the analytic you can think about ‘exceptions’ and when designing the report you can think about ‘presentation’. Your presentation no longer has to influence what your analytic results look like. It also allows you to easily generate multiple reports for the same analytic, if you are unlucky enough to be in that situation.

A real-life example of the benefits of de-coupling analysis and reporting is an analytic I run to identify vendor maintenance and invoice processing SOD violations. Each exception generated by the analytic is a record containing details about vendor master data changes that were followed by invoices created by the same user. The report I generate manipulates the data to present the exceptions in a transaction history format where each action (changing vendor data, creating an invoice) is on a separate line. See below:

006 ReportExample

As you can see, in this case the actual exception and the report view are not very compatible. Storing the results in a transaction history format muddles the issue and makes it unclear where the exception lies if you want to perform historical trend analysis or implement an exception feedback loop. But the presentation of a transaction history is more clear to the business users. De-coupling the analytic and the report allows me to meet everyone’s needs.

Once you have completely split the data imports, analytics, reports, and utilities, the question may arise, ‘how do I run an analytic application?’. The answer is a category of script I like to call ‘controllers’. Controllers take our desire to run an analysis and call the necessary scripts to make it happen – they are essentially the layer between the UI and our working code. For AX, the controllers have analytic headers and are initiated through the AX client. For AN, the controllers may contain dialog boxes asking for inputs.  Once the input has been obtained and validated, the controllers call the necessary imports, analytics, and reports. No matter their location, they are the glue that ties everything into an analytic application.

If you store your scripts as a centralized script library in a shared folder using .aclscript files, controllers become more intriguing. They can be created and destroyed at whim without ever touching or affecting the underlying code base. I can add analytics to and remove them from AX, the applications toolbar, or Analysis Apps without ever changing the working code. When auditors run data or analytic scripts, they can be sure they are using the same code that supports Continuous Controls Monitoring (CCM). If I need to update logic or fix bugs, I do it to a single code repository and by default am guaranteed that everyone has the correct, up-to-date version. If I develop a really cool analytic for CCM that I want to share with the audit team, it is a simple matter of creating a controller in the Applications Toolbar that calls the necessary scripts. I never have to touch the analytic logic, so I can rest assured that I will not inadvertently mess up my production CCM code. The portability and maintainability of the system becomes a thing of beauty.

Modularizing scripts typically means you will have a larger number of script files in your script library because it forces you to break up your analytics into small, self-contained pieces.  If we return to the duplicate invoice analytic, the entire analytic might look something like this:

006 ExampleScripts

Every script can be unplugged and replaced as needed.  None of the scripts below the controller require each other.  The analytic script in particular does not depend directly on the import scripts; it only depends on the data generated by those scripts.  This means we can make changes to or outright replace these scripts, as long as the output data is compatible with the analytic’s requirements.  Want to run your duplicate invoice analytic across multiple systems spanning your company’s business units or geographical locations?  You just have to spend your energy on designing the data import scripts.  The analytic itself can be run on as many, or as few, source data files as is needed to cover your systems.

This first post has been a theoretical discussion on the benefits of modularizing ACL scripts.  While helpful, it has not provided the ‘how’.  Now that you have an idea of the vision of modular scripts, we will discuss how to actually code modular scripts in part 2.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s