I’ve downloaded, massaged, and loaded the full FAA On-Time dataset from http://www.transtats.bts.gov/ to a test Redshift cluster. It’s about 150M rows and covers 1989 through July of 2016.

I’m currently running a 5-node cluster and will add / remove nodes at random — sometimes running as few as 1-2, sometimes running as many as 32. The cluster is located in Singapore, so there may be a tiny bit of lag between you and me.

Here is a sample v10 workbook which includes a live data source pointing to the cluster in question. The username for this cluster is iloveredshift and the password is abcD1234. Do as you wish.

Why in the world am I doing this? I need (you?) to generate “real world” queries against a cluster that I can monitor with a little toy I’m working on. I figure you heathens can be much more creative than I 🙂

FYI, the database itself isn’t very optimized at this point, so don’t take it as an indication of performance you can expect with your well-designed database on Redshift. The main dashboard will take at least 30 seconds to render when the cluster is running 1-2 nodes. We’ll see how fast it runs when I add more juice. I also am bumping up the concurrency on this puppy to 15 concurrent queries, so if 2-3 of you happen to start banging on it at the same time, you’ll be able to tell.

I’ll leave this sucker up and running for a week or two. If you create something cool or discover something awesome in this dataset, let me know, or post it on Tableau public!

BTW – be aware that this data source contains a customization that causes no cursors to be used on Redshift. What this means to you is that you don’t want to create an extract using the “embedded” data source (unless you have a HECK of a lot of memory on your machine). If you want an use this cluster to try and grab an extract, that’s cool, but I’d advise you to create a NEW data source without a customization first. Don’t know what the hell I’m talking about? Here you go!

I’d also recommend you do SOME aggregation in the extract, or you could inadvertently blow out the “temp space” that this cluster allows for cursors. If you do that, your extract will fail.

BTW #2: unless you have 35+ minutes and about 15 GB of RAM (for Tableau) to spare, don’t run the worksheet that says “Don’t run me”. For real. It takes every airplane in the US on each day it flies, groups it by the airline if flies for, and then clusters the result based on arrival and departure delay. I just wanted to see what happens..and I don’t know what happens yet. I’m pretty excited. [Edit: It worked. 17M+ marks]

Russell is CURRENTLY running this many nodes:

Five

Two

Six

One

SEVEN

Three

TEN

Can you make these numbers go higher?

2 Comments

Robert Rouse 10 years ago / Reply

You must understand that your “Don’t run me” sheet is like telling someone “Whatever you do, DO NOT press the red button.” Now that’s the only button I want to press. It calls to me.
1. Russell Christopher 10 years ago / Reply
  
  Well, yes.

Submit a comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Want to point Tableau at pre-loaded FAA data on Redshift? Cost = 0$

Russell is CURRENTLY running this many nodes:

Five

Two

Six

One

SEVEN

Three

TEN

Related

2 Comments

Submit a comment Cancel reply

Russell is CURRENTLY running this many nodes:

Five

Two

Six

One

SEVEN

Three

TEN

Share this:

Related

Related Posts

2 Comments

Submit a comment Cancel reply