Where in the world?

Like most of you, in my career up to this point I have focused on analytics and analytic techniques common to the audit world. Analysis of financial and operational controls is the mainstay of our work and where we spend most of our time. We look at data elements such as journal entries, purchase orders, sales orders, pricing data, T&E entries, etc, pulling data from our source systems, munging it together, and identifying potential control weaknesses or operational inefficiencies. It is a good place to be with lots of opportunities. But recently I was involved in some analysis outside of our traditional scope which was a lot of fun and which I believe can furnish our users with useful information. It also highlights the flexibility of ACL Analytics and the wisdom of ACL’s decision to integrate Analytics with other platforms and tools in the data analytics sphere.

The analysis in question involved the supply chain function; in particular, I wanted to get a feel for the supply chain network and make some first steps towards identifying potential inefficiencies. At the very least, I wanted to obtain a better understanding of a complex area of the business which we, as a department, do not always understand. If I can also help our audit teams use the analysis to identify areas of improvement not visible to the business, then that would be a cherry on top of the pie. One element of the analysis, then, was to look at the source and destination of shipments. Not only did I want to understand what the current state was, but I also wanted to make an attempt to identify when product was shipped from one facility when perhaps a different facility was a better option. To do this, I needed to geocode our data. And that is how I found myself taking a first step into the world of geospatial intelligence.

For those of you who do not know, geocoding is the process of taking address data (such as ‘1 Main Street’) and converting it to latitude and longitude. Geocoding is what Google Maps/Bing/OpenStreetMap does every time you search for an address and it returns a map with that address plotted. In essence, with geocoding we are able to add another aspect to our data: the where. In addition to being able to plot geocoded data on a map, we can also calculate distances between our data elements and determine relationships such as ‘to the north of’ or ‘in the vicinity of’.

The biggest difficulty I found in getting started with geospatial analysis is the act of geocoding address data. If you are lucky enough to have systems that geocode their data and store it for you, then most of the hard work is already done. For example, SAP has geocoding capabilities, but it is not always turned on, which is the situation I found myself in. So it was up to me to find a way to geocode the data. Here is what I found out:

  • Solutions like Google Maps and Bing typically have licensing restrictions and do not want automated scripts connecting to their services. You would have to use their web interface and manually look up each address. Not ideal if you are dealing with tens of thousands of addresses on a monthly basis.
  • OpenStreetMap is very….open….but again, they do not like automated scripts connecting to their services and will block connections that take up a lot of resources.
  • For the prototype analysis, I ended up using a file of global post codes from Geonames. For my purposes, post codes were accurate enough to get a view of the situation. It’s not perfect, but it was a good start for our analysis, which was focused on North America and for which post code data is reliable.

I am working with an internal group to devise a more permanent and elegant solution. My end goal is a database of geocoded address identifiers (SAP address numbers, specifically) that is updated daily when new addresses are entered into SAP. Depending on the conclusion of the project, I may share more in a future post.

If my partnership with the internal group does not yield results, I have a plan B. OpenStreetMap data is open source and available for download. They also have a search engine, Nominatim, that is open source and runs on Linux. I virtualized an instance of Ubuntu on my personal laptop and tried installing Nominatim and the North America data set. After seven, long, days of watching my computer install the data, it was successful. I actually found the process quite easy. Ubuntu comes with a Postgres database and Apache web server, so setting up a behind-the-firewall host of your own copy of OpenStreetMap is actually quite easy, much easier than I was expecting. My plan B is to have IT set up a Linux server and help me install OpenStreetMap. I could then perform geocoding searches on this internal server to my heart’s content using the web API provided by Nominatim.

No matter your solution, once the data is geocoded the rest is pretty easy. Add the coordinates to your data sets using joins, relations, etc. If you have R installed (and really, why wouldn’t you?) you can install the geosphere package and use it to calculate distances between points. This function in ACL will do that for you, where StartLongitude, StartLatitude, EndLongitude, and EndLatitude are field names containing the longitude and latitude of your points:

RNUMERIC("geosphere::distm(c(value1,value2),c(value3,value4))" 1 StartLongitude StartLatitude EndLongitude EndLatitude)/1000

The function returns distance in meters, so I divide by 1000 to convert to kilometers.

Alternatively, you can simply store the coordinates in your data tables. With AN12, other analytical tools can connect to your ACL projects and retrieve the data stored therein. My visualization tool of choice is currently PowerBI, and geospatial analysis is one area where visualizations can really shine.

I would like to take a quick trip down memory lane. Ever since I was young one of the first things I would do after I sat down in an airplane was to reach for the airline’s magazine and turn to the back. In the very back would be route maps showing all of the destinations and routes the airline and its partners flew, like a spider web of adventure and endless possibilities. I would spend lots of time poring over those maps and wondering about those places, imagining myself going there someday. I loved those maps, and still do. So naturally, I wanted to visualize my analysis results with those kinds of maps.

Unfortunately, while Power BI has native capabilities to plot single points on a map, it does not currently have native capability to plot lines on a map. Fortunately, Power BI also talks to R, which gives us capabilities limited only by our imaginations and competency with R. Here is a sample map of my analysis of non-optimal shipping points, focusing on 1 location. The red lines indicate actual shipments and the green lines indicate an alternative, more optimized shipment.

019-01

The great thing about that visualization is how easy it is to understand the extent of the issue, which is something not immediately apparent in a table of raw data. Many of the non-optimal shipments aren’t even close, such as the shipments to Florida.

Here are 4 more sample maps that might show total shipments to end customers from 3 different distribution centers. The first map is the entire landscape and the following maps are filtered to each DC:

019-02
019-03
019-04
019-05

Even these maps, drawn using raw shipment data without any analysis, beg 2 important sets of questions:

  • All 3 distribution centers are shipping nationally. Why don’t they focus on shipping only to their region? What is the purpose of having regional distribution centers if they all ship nationally? Is this explained because certain product lines are only carried by a single DC? If so, why? Is volume of that product line sufficient to stock it at other DCs? Has that cost-benefit analysis been performed by the business?
  • Why is there no DC on the west coast? Has that option been explored by the business? Has a cost-benefit analysis been performed to justify the decision?

The maps make plain the current situation and leads the audit team to make inquiries about patterns in the data. This allows us to start assessing the business’s operational efficiency, rather than focusing on the same old tired issues about aged POs, late deliveries, and approval forms. Now we can begin to understand the big picture and narrow down to issues which truly impact the business. Most importantly, these types of visualizations can begin to guide the conversation to understand ‘why’, which allows us to arrive at solutions that fix the issue rather than the symptom.

Creating those charts in Power BI requires a little bit of familiarity with R. This is the base code I used to draw those maps. Disclaimer: It is probably not optimized and probably not good R code, but it works and is quick enough on the re-draw to handle tens of thousands of data points within a few seconds.

library(maps)
library(geosphere)
map("usa")
map("state", boundary = FALSE, col="gray", add = TRUE)

#Sort by plant and distance (descending) so the arrows are always drawn on top
dataset <- dataset[with(dataset, order(plant, -DistanceKm)),]

#Draw the arrows and segments
apply(dataset, 1, function(x)
{
  arrows(as.numeric(x["shipfromlon"]), as.numeric(x["shipfromlat"]), as.numeric(x["shiptolon"]), as.numeric(x["shiptolat"]), angle=45, col="blue", length=.15)
  segments(as.numeric(x["shipfromlon"]), as.numeric(x["shipfromlat"]), as.numeric(x["shiptolon"]), as.numeric(x["shiptolat"]), col="pink")
})

#Draw the points
points(dataset$shiptolon, dataset$shiptolat, col="red")
points(dataset$shipfromlon, dataset$shipfromlat, col="yellow")

Very simply, the code loads a few R packages for drawing maps. It then sorts the data by plant and distance, so the shipping points furthest away are drawn first. This minimizes the arrows from being ‘erased’ when lines draw over them. The apply function draws the actual lines and arrows connecting two points. The final operation is to draw the points themselves. Again, these are left for last to avoid ‘erasing’ them.

And with that, we have a simple network map we can use to visualize our data. These maps are not interactive, but they do a great job of displaying the data in an easily-digestible format. They are also affected by slicers and highlights. If you select an item in a slicer or click on a value in a bar chart, these maps will re-draw with only the filtered dataset. This provides a powerful, yet relatively simple, platform from which to explore your data geographically. For the price of free, you get a lot of capability to add to the end of your ACL analysis process.

Advertisements

One thought on “Where in the world?

  1. Another great post Thomas. Thanks for sharing your experiences. We definitely have an interest in geocoding for a few different projects, but haven’t been able to get to it yet. Good to know that R does a nice job with the coordinates. I’m hoping our visualization tool (Spotfire) can compete with PowerBi for the actual mapping.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s