I’m a die-hard Tableau guy, but after ~15 years at
Microsoft, I will always have a place in my heart for that bucket-of-happiness
that is SQL Server.
I was therefore really excited to see that SQL Server 2016
will offer built-in row level security (RLS). The current solution (SQL Server Label
Security Toolkit) has always felt like a kludge to me.
This morning I stood up SQL Server 2016 CTP2 on an 8 GB / 4 vCPU VM and gave
RLS a spin. My goal was to see if I could simplify implementing RLS inside
Tableau.
(Tl; dr: SQL Server RLS did simplify things substantially, and it’s faster.)
I
already had a ready-made database in SQL Server which implements RLS via a “security
junction table” as described in option 2 of this KB article from Tableau.
I used the MSDN help topics to create a TVF which returns true/false based on whether the person executing
it is the same as the person in my “security junction table” username column:
CREATE FUNCTION Security.fn_securitypredicate(@user AS sysname)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN SELECT 1 AS fn_securitypredicate_result
WHERE @user = USER_NAME();
…and then I added a security policy which bound the function
to my security junction table:
CREATE SECURITY POLICY SalesFilter
ADD FILTER PREDICATE Security.fn_securitypredicate(username)
ON dbo.[Security Junction Table]
WITH (STATE = ON);
The result of doing so causes the security junction
table to returns rows only for the logged in user. Before, it would have returned ALL rows by default.
For example, here I am logged in as russch and I have access to a bunch
of “Clients” whom I have permission to see:
…and here I am as a different user who only has access to
two clients:
Take
a look at my data source in Tableau:
As you can see, my demo workbook already joins the security
junction table against my fact table. In the past, I filtered the security
junction table with a combination of an inline
data source filter and a user filter.
By filtering the security junction table, I got fewer rows, and therefore
less rows were returned from the fact table. Simple enough.
Since SQL Server is now doing the heavy lifting for me in
terms of limiting data, I removed the data source filter and all expressions
which leveraged username() from my workbook.
I executed a dashboard using my original “Tableau Only” row
level security mechanism. The dashboard
returned in about 25 seconds, and as you can see, most of that time is spent in
SQL Server:
The longest query took about 20 seconds to return from the
database.
I then cleared SQL Server’s cache with DBCC DROPCLEANBUFFERS
and FREEPROCCACHE and tried again with my modified approach. Big difference:
This time, my dashboard rendered in about 15 seconds – a 40%
decrease in execution time The same 20 second query executed in about 10 seconds. Good stuff!
Now, before you get all excited, let’s be clear – I purposefully
did everything I could to prevent Tableau from using any sort of caching (on
the Tableau OR SQL Server side) to create the “maximum delta” possible. Your
mileage will definitely vary and could be much, much lower – especially if you
have well-turned indexes in the database, which I don’t.
Looking at the logs for both executions, it’s
pretty clear where the difference is. Here’s the long running query (one of four which were fired by Tableau):
Before:
query-compiled: SELECT [DimClient].[AgencyTypes] AS [AgencyTypes],
[DimClient].[Client_Network_Name] AS [Client_Network_Name],
[DimClient].[Client_Region] AS [Client_Region],
[DimAd].[DFA_Ad_Name] AS [DFA_Ad_Name],
SUM([FactDailyResponse].[Indirect Sales]) AS [TEMP(Calculation_4630621065048410)(2339326404)(0)],
SUM([FactDailyResponse].[Calculated Cost]) AS [TEMP(Calculation_4630621065048410)(3060909973)(0)],
SUM([FactDailyResponse].[Direct Sales]) AS [TEMP(Calculation_4630621065048410)(3355975298)(0)],
SUM(CAST(([FactDailyResponse].[Clicks]) as BIGINT)) AS [TEMP(Calculation_7370621065401553)(370505876)(0)],
SUM(CAST(([FactDailyResponse].[Impressions]) as BIGINT)) AS [TEMP(Calculation_7370621065401553)(908667754)(0)]
FROM [dbo].[FactDailyResponse] [FactDailyResponse]
INNER JOIN [dbo].[DimClient] [DimClient] ON ([FactDailyResponse].[Client_Key] = [DimClient].[Client_Key])
INNER JOIN [dbo].[DimAd] [DimAd] ON ([FactDailyResponse].[Ad_Key] = [DimAd].[Ad_Key])
INNER JOIN [dbo].[Security Junction Table] [Security Junction Table] ON ([FactDailyResponse].[Client_Key] = [Security Junction Table].[ClientID])
WHERE ((CASE WHEN (‘russch’ = [Security Junction Table].[UserName]) THEN 1 WHEN NOT (‘russch’ = [Security Junction Table].[UserName]) THEN 0 ELSE NULL END) <> 0)
GROUP BY [DimClient].[AgencyTypes],
[DimClient].[Client_Network_Name],
[DimClient].[Client_Region],
[DimAd].[DFA_Ad_Name]
After:
query-compiled: SELECT [DimClient].[AgencyTypes] AS [AgencyTypes],
[DimClient].[Client_Network_Name] AS [Client_Network_Name],
[DimClient].[Client_Region] AS [Client_Region],
[DimAd].[DFA_Ad_Name] AS [DFA_Ad_Name],
SUM([FactDailyResponse].[Indirect Sales]) AS [TEMP(Calculation_4630621065048410)(2339326404)(0)],
SUM([FactDailyResponse].[Calculated Cost]) AS [TEMP(Calculation_4630621065048410)(3060909973)(0)],
SUM([FactDailyResponse].[Direct Sales]) AS [TEMP(Calculation_4630621065048410)(3355975298)(0)],
SUM(CAST(([FactDailyResponse].[Clicks]) as BIGINT)) AS [TEMP(Calculation_7370621065401553)(370505876)(0)],
SUM(CAST(([FactDailyResponse].[Impressions]) as BIGINT)) AS [TEMP(Calculation_7370621065401553)(908667754)(0)]
FROM [dbo].[FactDailyResponse] [FactDailyResponse]
INNER JOIN [dbo].[DimClient] [DimClient] ON ([FactDailyResponse].[Client_Key] = [DimClient].[Client_Key])
INNER JOIN [dbo].[DimAd] [DimAd] ON ([FactDailyResponse].[Ad_Key] = [DimAd].[Ad_Key])
INNER JOIN [dbo].[Security Junction Table] [Security Junction Table] ON ([FactDailyResponse].[Client_Key] = [Security Junction Table].[ClientID])
GROUP BY [DimClient].[AgencyTypes],
[DimClient].[Client_Network_Name],
[DimClient].[Client_Region],
[DimAd].[DFA_Ad_Name]
That WHERE clause seems to be the culprit. Since I no longer
used username() in a data source filter for my “modified” example, no WHERE
clause. I could have taken a look at the query plan in SQL Server to see why that made a difference, but I’m lazy, so I didn’t.
Summary:
If you run SQL Server 2016, you may
want to seriously consider implementing your RLS in the data tier. Its a best
practice to begin with, and it’ll make your life easier (and faster) in
Tableau.