Road Rules for Tableau and Redshift

Welcome! Did you just get your shiny new Redshift cluster up and running and now you want to know what to do before you point Tableau at it?

…Or maybe you’re the guy/gal who has been trying to make the combination of Redshift and Tableau sing, and things just aren’t quite clicking…

You’re in the right place. Over the next few posts I’m going to attempt to impart as much practical wisdom around “Tableau and Redshift” as possible.

Some of what you’ll read here will fall into the “No duh” category, and that’s OK. I’m going to cover this stuff anyway, because a fair number of people don’t know (and/or choose to ignore!) basic rules of engagement that can cause lots of pain later.

In this (pretty brief) lead-off post I’ll mainly do level-setting. We’ll get into the weeds later. Even though this post is high-level it’s arguably the most important one of the bunch.

Let’s start at the beginning, shall we?

Redshift is not magic.

The Tableau Data Engine, HyPer and AWS’s SPICE are magic. They can take a stinky, worn out data model and make it pretty damn fast. They are often able to overcome a multitude of dashboard design sins as a result. A complete novice can use Tableau and the TDE to create a wicked-fast dashboard, even if the dashboard design and original data model are poor.

Redshift will not. It is not magic in the same sense. To make Tableau and Redshift “team well” at scale, you will generally need to do things right on both sides of the line.

The Tableau Side

Viz and dashboard design are important. I can’t stress this enough. If you don’t get this part right, there is very little you can do on the Redshift side to make up for it. While I tend towards the dramatic (ex-theater kid turned wanna-be-technologist), I’m not being histrionic here. Simply adding more nodes to Redshift and attempting to brute-force your way through a bad dashboard design will probably fail.

Unless you have low volumes of data in Redshift, you must follow best practices around dashboard design. Period.

So, where do you find these best practices? Here.

This is Alan Eldridge‘s opus on performance, Designing Efficient Workbooks. Is it a light read? (I hear Alan laughing in my head right now…)

Nope!

However, if you aspire to be really good with Tableau, his words of wisdom need to become second nature. You’ll want to ask yourself “What would Alan do?” each time you’re tempted to add that Relative Value Quick Filter or the 7th and 8th chart into your dashboard.

To be clear, the whitepaper is not Redshift-centric, but nearly everything it says to do will make Redshift happier.

If I had to summarize Alan’s work, here’s what I’d start with:

Keep it simple
Less is more
Pre-filter your vizzes: Don’t show “all” data just because you can
Being too clever is often counter-productive

The Redshift Side

Concurrency.

Redshift is not a high-concurrency database. Therefore, a successful Tableau-Redshift implementation should be designed to keep the number of concurrent queries executing on Redshift to a reasonably low level. It should also provide high throughput on Redshift to get in and out fast.

Let’s rephrase:

Don’t execute too many queries on Redshift at the same time: Tableau’s influence on this part of the equation is paramount
Make sure the queries that do run on Redshift complete quickly

How do you get throughout? It’s not magic. You need to do things like add sort keys, distribution keys, and leverage compression. You don’t want to take that super-complex schema you ran in SQL Server or Oracle and deploy it in Redshift. You should attempt to simplify it, removing the need for lots of joins. Some of the work and good behaviors that may be considered “optional” while using the TDE are mandatory with Redshift.

Like I said, “no Duh!”

Just planting the flag. In future posts, we’ll dive into exactly what you can do on either side of the Tableau – Redshift relationship to keep this beautiful marriage humming along.