Tuesday, July 3, 2012

Difference Cube for Data Validation

On occasion throughout my career with Essbase, I have heard of difference cubes.  I understood the concept behind them, but until recently didn't see how they could be useful.  I don't know that I am using a difference cube in the sense that others have spoken of them, but I did find a great way to test large amounts of data using Essbase and the concept is still largely the same.


Recently we decided to upgrade one of our cubes by adding some new dimensions.  To do this we needed to update our history to include these new fields via a set of SQL update queries to our source table.  We had the the correct data in a separate set of PeopleSoft source tables.  We needed to ensure that our table updates populated the correct data.  We are talking more than 5 million rows of data so testing in Excel is not exactly a viable solution.  So what I did instead was load my data from my PeopleSoft tables into a copy of my cube into a scenario called "Actual."  Next I took my SQL updated table and and loaded it into the cube into a scenario called "Test."  I then had a dynamic scenario called "Difference" which did a simple calculation of "Test" - "Actual."  Once each of the data sets was loaded, I was able to set up a high level retrieve sheet for each of the drilled out new dimensions for the "Difference" scenario.  Anywhere, where I saw something other than zero was something that needed to be researched, as it was a variance between my two data sets.  I was also able to further drill down on my other dimensions to determine exactly where the variance was occurring.  I was able to completely validate my data in hours rather than the days or weeks it may have taken me if I had been forced to use a set of queries and Excel.


This was just the immediate use I found for an Essbase difference cube.  Taking just a few minutes, I can think of various other situations where this type of cube might come in handy.  Just a couple of examples would be that it might also be useful in:

  • Verifying a new calculation for a cube
  • Comparing a back up of an initial cube load to a new cube load
  • Validating a new cube to an existing similar cube
A difference cube isn't always going to be something that makes sense, but it is definitely a tool that should be included in your tool box that can be drawn upon should an ideal scenario arise.  I don't see a difference cube as a daily use, especially for a functional user.  But for someone who develops cubes on a regular basis or works with large amounts of source data, there is definitely a functionality of this type of cube that is appealing.

As a note, if it were so necessary, you can also keep a copy of your difference cube as documentation of your validation for audit or SOX requests.