Earlier last week, a colleague asked why we generally recommend that you use “Multiple Tables” in a Tableau Data Source instead of custom SQL. The short answer is that your custom SQL statement becomes a sub-query inside the SQL that Tableau generates, and that can be a game-changer.
While your super-duper-well-tuned custom SQL may run really well in a vacuum, what happens when we start using it with additional GROUP BY, ORDER BY and WHERE clauses that Tableau needs? What if we use it to feed CASE statements? What if we don’t need all the tables in the custom SQL statement to answer a question being asked in Tableau?
I thought it would be fun to actually work up a case study on this, and I’m including my 10,000 foot level findings here. They may be enough for you. In future posts we’ll dig deeper if you want to get into the nitty-gritty.
Data
The data environment I’m using is SQL Server 2012 with the Contoso retail sample database. You can find it here if you want to sing along:
http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=18279
I like ContosoDW because the schema is fairly real-world: It is a snowflake schema with multiple fact tables leveraging the same dimensions. I chose to use a subset of the fact tables: Sales and Sales Quota – about 12M rows. I used an additional 9 dimension tables, for a grand total of 11. Here’s my schema, in case you’re interested:
Approach:
Using the model above, I created a Multiple Table driven Data Source in Tableau. I represented all of the relationships described above in the data source, and I used INNER JOINs exclusively.
I added a basic hierarchy for “Product Category / Sub Category / Product”, and did some simple grouping. I added a few calculations, too – nothing fancy.
I created 5 quick-and-dirty visualizations and dropped them in a dashboard. They’re ugly things and if Stephen Few sees them, he’ll cry. Out loud.
I included some basic quick filters and purposefully overused “Only Relevant Values” on the Product Category / Product Subcategory / Product Name quick filters – this is something I see customers do fairly regularly, so I did, too.
After clearing SQL Server’s cache buffer with a DBCC FreeProcCache command, I ran the dashboard.
Next, I made a single, simple change – I edited my connection and changed Multiple Tables to Custom SQL. Want to see the resulting SQL? Fine:
SELECT [FactOnlineSales].[OnlineSalesKey] AS [OnlineSalesKey],
[FactOnlineSales].[DateKey] AS [DateKey],
[FactOnlineSales].[StoreKey] AS [StoreKey],
[FactOnlineSales].[ProductKey] AS [ProductKey],
[FactOnlineSales].[PromotionKey] AS [PromotionKey],
[FactOnlineSales].[CurrencyKey] AS [CurrencyKey],
[FactOnlineSales].[CustomerKey] AS [CustomerKey],
[FactOnlineSales].[SalesOrderNumber] AS [SalesOrderNumber],
[FactOnlineSales].[SalesOrderLineNumber] AS [SalesOrderLineNumber],
[FactOnlineSales].[SalesQuantity] AS [SalesQuantity],
[FactOnlineSales].[SalesAmount] AS [SalesAmount],
[FactOnlineSales].[ReturnQuantity] AS [ReturnQuantity],
[FactOnlineSales].[ReturnAmount] AS [ReturnAmount],
[FactOnlineSales].[DiscountQuantity] AS [DiscountQuantity],
[FactOnlineSales].[DiscountAmount] AS [DiscountAmount],
[FactOnlineSales].[TotalCost] AS [TotalCost],
[FactOnlineSales].[UnitCost] AS [UnitCost],
[FactOnlineSales].[UnitPrice] AS [UnitPrice],
[FactOnlineSales].[ETLLoadID] AS [ETLLoadID],
[FactOnlineSales].[LoadDate] AS [LoadDate],
[FactOnlineSales].[UpdateDate] AS [UpdateDate],
[DimCurrency].[CurrencyKey] AS [DimCurrency_CurrencyKey],
[DimCurrency].[CurrencyLabel] AS [CurrencyLabel],
[DimCurrency].[CurrencyName] AS [CurrencyName],
[DimCurrency].[CurrencyDescription] AS [CurrencyDescription],
[DimCurrency].[ETLLoadID] AS [DimCurrency_ETLLoadID],
[DimCurrency].[LoadDate] AS [DimCurrency_LoadDate],
[DimCurrency].[UpdateDate] AS [DimCurrency_UpdateDate],
[DimCustomer].[CustomerKey] AS [DimCustomer_CustomerKey],
[DimCustomer].[GeographyKey] AS [GeographyKey],
[DimCustomer].[CustomerLabel] AS [CustomerLabel],
[DimCustomer].[Title] AS [Title],
[DimCustomer].[FirstName] AS [FirstName],
[DimCustomer].[MiddleName] AS [MiddleName],
[DimCustomer].[LastName] AS [LastName],
[DimCustomer].[NameStyle] AS [NameStyle],
[DimCustomer].[BirthDate] AS [BirthDate],
[DimCustomer].[MaritalStatus] AS [MaritalStatus],
[DimCustomer].[Suffix] AS [Suffix],
[DimCustomer].[Gender] AS [Gender],
[DimCustomer].[EmailAddress] AS [EmailAddress],
[DimCustomer].[YearlyIncome] AS [YearlyIncome],
[DimCustomer].[TotalChildren] AS [TotalChildren],
[DimCustomer].[NumberChildrenAtHome] AS [NumberChildrenAtHome],
[DimCustomer].[Education] AS [Education],
[DimCustomer].[Occupation] AS [Occupation],
[DimCustomer].[HouseOwnerFlag] AS [HouseOwnerFlag],
[DimCustomer].[NumberCarsOwned] AS [NumberCarsOwned],
[DimCustomer].[AddressLine1] AS [AddressLine1],
[DimCustomer].[AddressLine2] AS [AddressLine2],
[DimCustomer].[Phone] AS [Phone],
[DimCustomer].[DateFirstPurchase] AS [DateFirstPurchase],
[DimCustomer].[CustomerType] AS [CustomerType],
[DimCustomer].[CompanyName] AS [CompanyName],
[DimCustomer].[ETLLoadID] AS [DimCustomer_ETLLoadID],
[DimCustomer].[LoadDate] AS [DimCustomer_LoadDate],
[DimCustomer].[UpdateDate] AS [DimCustomer_UpdateDate],
[DimGeography].[GeographyKey] AS [DimGeography_GeographyKey],
[DimGeography].[GeographyType] AS [GeographyType],
[DimGeography].[ContinentName] AS [ContinentName],
[DimGeography].[CityName] AS [CityName],
[DimGeography].[StateProvinceName] AS [StateProvinceName],
[DimGeography].[RegionCountryName] AS [RegionCountryName],
[DimGeography].[Geometry] AS [Geometry],
[DimGeography].[ETLLoadID] AS [DimGeography_ETLLoadID],
[DimGeography].[LoadDate] AS [DimGeography_LoadDate],
[DimGeography].[UpdateDate] AS [DimGeography_UpdateDate],
[DimPromotion].[PromotionKey] AS [DimPromotion_PromotionKey],
[DimPromotion].[PromotionLabel] AS [PromotionLabel],
[DimPromotion].[PromotionName] AS [PromotionName],
[DimPromotion].[PromotionDescription] AS [PromotionDescription],
[DimPromotion].[DiscountPercent] AS [DiscountPercent],
[DimPromotion].[PromotionType] AS [PromotionType],
[DimPromotion].[PromotionCategory] AS [PromotionCategory],
[DimPromotion].[StartDate] AS [StartDate],
[DimPromotion].[EndDate] AS [EndDate],
[DimPromotion].[MinQuantity] AS [MinQuantity],
[DimPromotion].[MaxQuantity] AS [MaxQuantity],
[DimPromotion].[ETLLoadID] AS [DimPromotion_ETLLoadID],
[DimPromotion].[LoadDate] AS [DimPromotion_LoadDate],
[DimPromotion].[UpdateDate] AS [DimPromotion_UpdateDate],
[DimStore].[StoreKey] AS [DimStore_StoreKey],
[DimStore].[GeographyKey] AS [DimStore_GeographyKey],
[DimStore].[StoreManager] AS [StoreManager],
[DimStore].[StoreType] AS [StoreType],
[DimStore].[StoreName] AS [StoreName],
[DimStore].[StoreDescription] AS [StoreDescription],
[DimStore].[Status] AS [Status],
[DimStore].[OpenDate] AS [OpenDate],
[DimStore].[CloseDate] AS [CloseDate],
[DimStore].[EntityKey] AS [EntityKey],
[DimStore].[ZipCode] AS [ZipCode],
[DimStore].[ZipCodeExtension] AS [ZipCodeExtension],
[DimStore].[StorePhone] AS [StorePhone],
[DimStore].[StoreFax] AS [StoreFax],
[DimStore].[AddressLine1] AS [DimStore_AddressLine1],
[DimStore].[AddressLine2] AS [DimStore_AddressLine2],
[DimStore].[CloseReason] AS [CloseReason],
[DimStore].[EmployeeCount] AS [EmployeeCount],
[DimStore].[SellingAreaSize] AS [SellingAreaSize],
[DimStore].[LastRemodelDate] AS [LastRemodelDate],
[DimStore].[GeoLocation] AS [GeoLocation],
[DimStore].[Geometry] AS [DimStore_Geometry],
[DimStore].[ETLLoadID] AS [DimStore_ETLLoadID],
[DimStore].[LoadDate] AS [DimStore_LoadDate],
[DimStore].[UpdateDate] AS [DimStore_UpdateDate],
[DimDate].[Datekey] AS [Dateke1],
[DimDate].[FullDateLabel] AS [FullDateLabel],
[DimDate].[DateDescription] AS [DateDescription],
[DimDate].[CalendarYear] AS [CalendarYear],
[DimDate].[CalendarYearLabel] AS [CalendarYearLabel],
[DimDate].[CalendarHalfYear] AS [CalendarHalfYear],
[DimDate].[CalendarHalfYearLabel] AS [CalendarHalfYearLabel],
[DimDate].[CalendarQuarter] AS [CalendarQuarter],
[DimDate].[CalendarQuarterLabel] AS [CalendarQuarterLabel],
[DimDate].[CalendarMonth] AS [CalendarMonth],
[DimDate].[CalendarMonthLabel] AS [CalendarMonthLabel],
[DimDate].[CalendarWeek] AS [CalendarWeek],
[DimDate].[CalendarWeekLabel] AS [CalendarWeekLabel],
[DimDate].[CalendarDayOfWeek] AS [CalendarDayOfWeek],
[DimDate].[CalendarDayOfWeekLabel] AS [CalendarDayOfWeekLabel],
[DimDate].[FiscalYear] AS [FiscalYear],
[DimDate].[FiscalYearLabel] AS [FiscalYearLabel],
[DimDate].[FiscalHalfYear] AS [FiscalHalfYear],
[DimDate].[FiscalHalfYearLabel] AS [FiscalHalfYearLabel],
[DimDate].[FiscalQuarter] AS [FiscalQuarter],
[DimDate].[FiscalQuarterLabel] AS [FiscalQuarterLabel],
[DimDate].[FiscalMonth] AS [FiscalMonth],
[DimDate].[FiscalMonthLabel] AS [FiscalMonthLabel],
[DimDate].[IsWorkDay] AS [IsWorkDay],
[DimDate].[IsHoliday] AS [IsHoliday],
[DimDate].[HolidayName] AS [HolidayName],
[DimDate].[EuropeSeason] AS [EuropeSeason],
[DimDate].[NorthAmericaSeason] AS [NorthAmericaSeason],
[DimDate].[AsiaSeason] AS [AsiaSeason],
[DimProduct].[ProductKey] AS [DimProduct_ProductKey],
[DimProduct].[ProductLabel] AS [ProductLabel],
[DimProduct].[ProductName] AS [ProductName],
[DimProduct].[ProductDescription] AS [ProductDescription],
[DimProduct].[ProductSubcategoryKey] AS [ProductSubcategoryKey],
[DimProduct].[Manufacturer] AS [Manufacturer],
[DimProduct].[BrandName] AS [BrandName],
[DimProduct].[ClassID] AS [ClassID],
[DimProduct].[ClassName] AS [ClassName],
[DimProduct].[StyleID] AS [StyleID],
[DimProduct].[StyleName] AS [StyleName],
[DimProduct].[ColorID] AS [ColorID],
[DimProduct].[ColorName] AS [ColorName],
[DimProduct].[Size] AS [Size],
[DimProduct].[SizeRange] AS [SizeRange],
[DimProduct].[SizeUnitMeasureID] AS [SizeUnitMeasureID],
[DimProduct].[Weight] AS [Weight],
[DimProduct].[WeightUnitMeasureID] AS [WeightUnitMeasureID],
[DimProduct].[UnitOfMeasureID] AS [UnitOfMeasureID],
[DimProduct].[UnitOfMeasureName] AS [UnitOfMeasureName],
[DimProduct].[StockTypeID] AS [StockTypeID],
[DimProduct].[StockTypeName] AS [StockTypeName],
[DimProduct].[UnitCost] AS [DimProduct_UnitCost],
[DimProduct].[UnitPrice] AS [DimProduct_UnitPrice],
[DimProduct].[AvailableForSaleDate] AS [AvailableForSaleDate],
[DimProduct].[StopSaleDate] AS [StopSaleDate],
[DimProduct].[Status] AS [DimProduct_Status],
[DimProduct].[ImageURL] AS [ImageURL],
[DimProduct].[ProductURL] AS [ProductURL],
[DimProduct].[ETLLoadID] AS [DimProduct_ETLLoadID],
[DimProduct].[LoadDate] AS [DimProduct_LoadDate],
[DimProduct].[UpdateDate] AS [DimProduct_UpdateDate],
[DimProductSubcategory].[ProductSubcategoryKey] AS [DimProductSubcategory_ProductSubcategoryKey],
[DimProductSubcategory].[ProductSubcategoryLabel] AS [ProductSubcategoryLabel],
[DimProductSubcategory].[ProductSubcategoryName] AS [ProductSubcategoryName],
[DimProductSubcategory].[ProductSubcategoryDescription] AS [ProductSubcategoryDescription],
[DimProductSubcategory].[ProductCategoryKey] AS [ProductCategoryKey],
[DimProductSubcategory].[ETLLoadID] AS [DimProductSubcategory_ETLLoadID],
[DimProductSubcategory].[LoadDate] AS [DimProductSubcategory_LoadDate],
[DimProductSubcategory].[UpdateDate] AS [DimProductSubcategory_UpdateDate],
[DimProductCategory].[ProductCategoryKey] AS [DimProductCategory_ProductCategoryKey],
[DimProductCategory].[ProductCategoryLabel] AS [ProductCategoryLabel],
[DimProductCategory].[ProductCategoryName] AS [ProductCategoryName],
[DimProductCategory].[ProductCategoryDescription] AS [ProductCategoryDescription],
[DimProductCategory].[ETLLoadID] AS [DimProductCategory_ETLLoadID],
[DimProductCategory].[LoadDate] AS [DimProductCategory_LoadDate],
[DimProductCategory].[UpdateDate] AS [DimProductCategory_UpdateDate],
[FactSalesQuota].[SalesQuotaKey] AS [SalesQuotaKey],
[FactSalesQuota].[ChannelKey] AS [ChannelKey],
[FactSalesQuota].[StoreKey] AS [FactSalesQuota_StoreKey],
[FactSalesQuota].[ProductKey] AS [FactSalesQuota_ProductKey],
[FactSalesQuota].[DateKey] AS [FactSalesQuota_DateKey],
[FactSalesQuota].[CurrencyKey] AS [FactSalesQuota_CurrencyKey],
[FactSalesQuota].[ScenarioKey] AS [ScenarioKey],
[FactSalesQuota].[SalesQuantityQuota] AS [SalesQuantityQuota],
[FactSalesQuota].[SalesAmountQuota] AS [SalesAmountQuota],
[FactSalesQuota].[GrossMarginQuota] AS [GrossMarginQuota],
[FactSalesQuota].[ETLLoadID] AS [FactSalesQuota_ETLLoadID],
[FactSalesQuota].[LoadDate] AS [FactSalesQuota_LoadDate],
[FactSalesQuota].[UpdateDate] AS [FactSalesQuota_UpdateDate]
FROM [dbo].[FactOnlineSales] [FactOnlineSales]
INNER JOIN [dbo].[DimCurrency] [DimCurrency] ON ([FactOnlineSales].[CurrencyKey] = [DimCurrency].[CurrencyKey])
INNER JOIN [dbo].[DimCustomer] [DimCustomer] ON ([FactOnlineSales].[CustomerKey] = [DimCustomer].[CustomerKey])
INNER JOIN [dbo].[DimGeography] [DimGeography] ON ([DimCustomer].[GeographyKey] = [DimGeography].[GeographyKey])
INNER JOIN [dbo].[DimPromotion] [DimPromotion] ON ([FactOnlineSales].[PromotionKey] = [DimPromotion].[PromotionKey])
INNER JOIN [dbo].[DimStore] [DimStore] ON (([DimGeography].[GeographyKey] = [DimStore].[GeographyKey]) AND ([FactOnlineSales].[StoreKey] = [DimStore].[StoreKey]))
INNER JOIN [dbo].[DimDate] [DimDate] ON ([FactOnlineSales].[DateKey] = [DimDate].[Datekey])
INNER JOIN [dbo].[DimProduct] [DimProduct] ON ([FactOnlineSales].[ProductKey] = [DimProduct].[ProductKey])
INNER JOIN [dbo].[DimProductSubcategory] [DimProductSubcategory] ON ([DimProduct].[ProductSubcategoryKey] = [DimProductSubcategory].[ProductSubcategoryKey])
INNER JOIN [dbo].[DimProductCategory] [DimProductCategory] ON ([DimProductSubcategory].[ProductCategoryKey] = [DimProductCategory].[ProductCategoryKey])
INNER JOIN [dbo].[FactSalesQuota] [FactSalesQuota] ON (([DimCurrency].[CurrencyKey] = [FactSalesQuota].[CurrencyKey]) AND ([DimDate].[Datekey] = [FactSalesQuota].[DateKey]) AND ([DimProduct].[ProductKey] = [FactSalesQuota].[ProductKey]) AND ([DimStore].[StoreKey] = [FactSalesQuota].[StoreKey]))
Got that?
I cleared SQL Server’s buffer cache again, then ran the same exact dashboard. All of the particulars of both dashboard runs were stored in my log, just waiting to be read….
I leaned heavily on Interworks’s Tableau Performance Analyzer to do my log parsing. It’s a great tool, and if you don’t have it, you should add it to your toolbox immediately. Best of all, it’s free.
http://www.interworks.com/services/business-intelligence/tableau-performance-analyzer
Final results
I managed to increase the run time of my dashboard over 30% by using custom SQL instead of multiple tables. Zoinks!
Yes, You’re reading the results correctly, folks. Because of all the joins, my SQL box took a while to return results. By comparison using a Tableau Data Extract against the same data source allowed the dashboard to render in ~4 seconds – a 97% difference.
So, the question remains, why did custom SQL slow the dashboard rendering time by over 30%? Why? WHY?!
We’ll get into that in the next blog post.
EDIT: As you can see, I never did another post on this…but I’m going to decoreate this guy with interesting threads on the same subject culled from our community forums.
A reason not to use Custom SQL : http://community.tableausoftware.com/message/231860?et=watches.email.thread#231860