Once again, file this one under “Probably won’t matter as much after we get into 9.x features”. In the meantime, this bit of trickiness can improve performance when using extracts.
While I couldn’t attend Pawel Terlecki’s presentation “Speedier Queries with the Tableau Query Cache” last week, one of the ideas really stuck with me.
By adding all columns (measures and dimensions) in your extract to the Level of Detail of a blank sheet and then filtering the viz to return a single row, you can force all columns of the extract to be loaded into RAM (even if you don’t need them).
Note that this isn’t Tableau caching warming per-se, it’s simply getting data on your disk into RAM early so that it can be accessed by Tableau more quickly when the time comes.
Tableau normally doesn’t want to do this, by the way – our goals is to let customers analyze as much data as they want regardless of how much RAM is on their box. We purposefully don’t load data until it is needed.
However, there are times when forcing this behavior could be a good thing, assuming you have enough RAM.
Here’s a simple example:
- Using a 50M row Super Store extract (~ 1.2 GB, 25+ columns), I created a simple viz
- Said viz used 1 measure and 4 dimensions out of the 25+ columns in the extract
- I also created a “Load All Columns” (LOC) worksheet where ALL columns in the extract were added to Level of Detail and filtered by Order ID so that a single row was returned
- I then ran several tests using Performance Recorder to see how quickly my viz loaded based on whether or not I first touched the LOC sheet before running the viz
- Between each test I bounced my machine to clear my RAM out entirely
I saw a 25%+ increase in performance by running the LOC sheet before hitting the real viz. The same viz ran about 1 second faster if the extract was already pre-loaded in RAM:
You should be able to take this behavior and apply it to Server, which is where the fun really starts.
You’d run an “LOC” viz on the server now and then to make sure the extract is loaded into RAM. Then, any other report which utilizes the same data source would benefit from having it already resident in RAM when the time came to do work.
I suspect you really wouldn’t have to load ALL your columns in the LOC viz, either – only the columns which “cover” the fields used by the viz that you want to accelerate.
Someone try this and report back what you find? I bet you’ll see more of a difference when your extract is larger and/or the viz you execute uses more fields than the 5 in my little test.