Data In a Split Second: 2019

Wednesday 4 September 2019

Using Level of Detail (LOD) to Determine Completeness

Recently a colleague and myself found ourselves in a situation where we needed to produce a report on whether something was 'complete' or not. But oh wait, there's a bit more to that than a simple TRUE/FALSE flag.

Take phone specs for example, I found a data source that lists approximately 30 different phone specification categories. Imagine if this information needs to be 'complete before we can show it to people, we'll need to know where the gap is in order to know where to put our efforts.In plain language: 'what's complete and what still missing'?

Ok, so let's say I go ask my leader, out of the 30 various specs that each phone has, which ones must be complete? The "mandatory fields" so to speak. And this is his/her requirements:

Year must not be null/blank
Out of the 2G, 3G, and 4G fields, we need at least 1 of them to be filled in
Weight must not be null/blank
OS must not be null/blank
RAM must not be null/blank
Primary camera must not be null/blank

Also, in addition to the specific phones that are incomplete, they also wants to know the % of "brands" and "models" that's complete. A sort of status report so to speak. In essence, they wants a report that gives the following information:

Which brands/models are complete vs incomplete? (if one phone within a brand is incomplete, then mark the entire brand incomplete.)
Click on brand or model to see the phone within each grouping
Which phones are complete vs incomplete? (if one field within a phone is incomplete, then mark the entire phone incomplete.)
Click on phone to see which mandatory fields are complete vs incomplete

First, for all the fields that cannot be null/blank, I quickly made calculated fields for each of them:

IF ISNULL([Field])

THEN "Incomplete"

ELSE "Complete"

END

But for the bandwidth, only 1 out of 3 fields need to be filled in. So the calculated field is slightly different:

IF ISNULL([2G bands])

THEN

(IF ISNULL([3G bands])

THEN

(IF ISNULL([4G bands])

THEN "Incomplete"

ELSE "Complete"

END)

ELSE "Complete"

END)

ELSE "Complete"

END

Now I need to report on this completeness based on the following 3 dimensions:

Per phone
Per brand
Per model

For phone, that's easy enough, all I have to do is create a calculated field based on each row:

IF [C. Bandwidth] = "Complete"

THEN

(IF [C. OS] = "Complete"

THEN

(IF [C. PrimeCam] = "Complete"

THEN

(IF [C. RAM] = "Complete"

THEN

(IF [C. Weight] = "Complete"

THEN

(IF [C. Year] = "Complete"

THEN "Complete"

ELSE "Incomplete"

END)

ELSE "Incomplete"

END)

ELSE "Incomplete"

END)

ELSE "Incomplete"

END)

ELSE "Incomplete"

END)

ELSE "Incomplete"

END

This essentially looks at all the columns and if one of the column is 'incomplete' then mark the entire record as incomplete. To me, on a visual level, it looks like this:

I'm using the information between Category 1 - 4 to determine what the overall flag should be for each row/phone.

But when it comes to Model or Brand, we have to look at a number of rows, and if out of that grouping of rows, one of them is 'incomplete', then mark that entire grouping of rows incomplete. Basically like this:

In this instance, I'm looking at each of the categories as well as the brand. as long as one of the overall flag for a brand is 'incomplete', flag all the rows in that particular brand 'incomplete'.

To do that, you could use the LOD function in Tableau. However, that only works on measures. How we did it, was to flag each row's complete/incomplete indication using numbers, then convert those numbers back into a string.

First, create a calculated field that turns the result per phone into a measurable number:

IF [C. Bandwidth] = "Complete"

THEN

(IF [C. OS] = "Complete"

THEN

(IF [C. PrimeCam] = "Complete"

THEN

(IF [C. RAM] = "Complete"

THEN

(IF [C. Weight] = "Complete"

THEN

(IF [C. Year] = "Complete"

THEN 0

ELSE 1

END)

ELSE 1

END)

ELSE 1

END)

ELSE 1

END)

ELSE 1

END)

ELSE 1

END

Now we can add them up. If all the records in the given brand are complete, the sum of all of them should still be 0. But if any of the rows are incomplete, then the sum would be greater than 0.

Create an LOD calculated field based on that measure:

IF
{FIXED [Brand]:SUM([C. Phone (measure)])} = 0
THEN "Complete"
ELSE "Incomplete"
END

Apply this to Model, and now we can now look at this set of data in the way that was required (dashboard embedded). Hopefully this was helpful. Let me know what other ways have you used LODs before in the comments below.

Monday 17 June 2019

Alphabet Browsing (Filter) Hack

Recently I've been requested to created a dimension-heavy dashboard and provide various ways for users to search for information. Other than the obvious wildcard search boxes and single/multi select filters, I created an alphabet that allowed users to click on and quickly navigate to a specific alphabet letter.

Using the WWF endangered species data as an example, at first, I simply used the LEFT function, created a calculated field with the first letter of each 'common name' and set them as the filter, but what happens is that the letters that does not correspond with a 'common name' (in this case: K, Q, U, X, and Z) simply didn't show up.

There are a few downsides to this, one, it is simply a filter and not an alphabet browser, two, if more data gets added in the future, the width of the alphabets will increase, hence making it hard for me to determine the final width I should aim for. In my real life scenario, I didn't have enough space to put all the alphabets in a single row, and had to find a way to cut them into two 13 letter rows. The goal was to create a filter with a full set of alphabet letters, and color the letters that has data a certain color, and the ones that don't another color.

After some thought, the solution came to me. What I ended up doing was creating a separate data source (using Excel) that contains all the letters in the alphabet, then blended the data source together to determine which letters has a 'common name' (teal) and which doesn't (red).

Details:

Drag the Alphabets pill from the List of Alphabets data source to Columns and Text
Drag the Alphabets pill from the Main data source to Color
Color null red, color all other letters teal

If you are working with your final data source, this is pretty much all you have to do. However, in my case, I had to develop the dashboard using test data from a test server, then another team actually moves my dashboard from the test server to UAT, then finally connecting to production. The problem with this, is that when a new letter appears (from null), Tableau will automatically pick a colour from the chosen colour palette, and 9 out of 10 times it won't be teal.

Not knowing which letters will be Null and which letter will be populated, this is what I did:

In test server, using existing data, colour null values one colour (red) and non-null values another colour (teal)
Use the user interface and manually created dummy data for each of the letters that were null. In this data set, it would be K, Q, U, X and Z.
Go back to tableau. You now see no nulls as every letter has a value. You should see that K, Q, U, X, and Z are all various colours. Colour them all teal.
Connect to final data source, and the colours updated accordingly. Since we coloured all the letters teal, whenever a letter has a value, it automatically colours it teal. And since we coloured null values red, whenever there isn't a null value, it colours it red!

The last step is to set the proper dashboard action to enable the alphabets to filter the results, and you're done!

Hopefully this helps you. It's just a cute little hack that make dashboard that much more user friendly.