TabMon is like a Fitbit for your Tableau Server

(…and without the annoying rashes) TabMon is a fantastic tool for monitoring your Tableau Server. It was released about two months ago and has become the go-to mechanism for taking care of important but sometimes time-consuming and thankless activities:

Establishing baseline performance information for future Tableau troubleshooting efforts
Creating baseline hardware performance profiles and monitoring deltas
Recording data for monthly “Is it time to upgrade?” reviews

Let’s assume you’ve installed & configured TabMon and that is has been running for a little while on your server. It’s been collecting data points, just like it’s supposed to.

The documentation on the counters TabMon records is a little basic, so I’ve put together a table which sheds a bit more light on the information available to you “out of the box”. Most of this information is observation-based since there aren’t any official definitions. I more or less played with this stuff over the weekend and am recording what I saw for my friends on the internets. FYI, I didn’t bother filling in information for every single cell in the table – just the highlights.

Use the handy little search thinger below and to the right if you’re interested in a particular counter or class:

Process	Category	Category Description	Counter	Counter Description
Vizportal	Get View	Vizportal gets information about a view from the repository. For example, hovering over a view to see a tooltip with Name, project, owner information will execute this. This has nothing to do with rendering a viz.	AverageRequestLatency	Length in ms to return metadata
			RequestsFailed	Number of attempts to retrieve metadata which failed
			RequestsProcessed	Total number of requests processed
	Guest Login	I can't get this to increment. Regardless of whether I execute a viz as guest directly in the browser or do so with TabJolt, this remains 0.	AverageRequestLatency	-
			RequestsFailed	-
			RequestsProcessed	-
	Publish Workbook	Records Desktop, TabCmd (and I assume REST) publish actions. Does not capture Web Authoring Save & Save As events.	AverageRequestLatency	Includes Desktop-to-Server upload time - so big files/extracts will push this number up
			RequestsFailed
			RequestsProcessed
	Run Extract Tasks	Records an individual "Run Now" action for selected Extract Refresh Tasks in the Tasks list.	AverageBatchFailure	Number of tasks in a larger batch which could not be started. This counter does not reflect whether individual refresh tasks succeeded or not.
			AverageBatchSize	Indicates the number of Extract Refresh tasks that were selected / checked in the Extract List when "Run Now" was clicked.
			AverageRequestLatency	How quickly was the task started and/or queued. Does not indicate the length of time the task in question took to execute.
			RequestsFailed	This counter does not indicate the RESULT of refreshing an extract - only that the extract refresh task itself was launched successfully or not.
			RequestsProcessed	Number of times "Run Extract Tasks" was executed. Each execution could launch multiple tasks.
	Run Extract Refreshes on Workbooks	Indicates an admin selected one or more workbooks from the Workbooks List and then used the "Refresh Extracts" action on the Actions menu. Does NOT reflect workbooks which were automatically refreshed via a schedule they are associated with.	AverageBatchFailure	Average number of workbook-related extracts refresh tasks which could not be interactively executed.
			AverageBatchSize	Number of workbooks which were chosen at the same time in the Workbooks list and then executed.
			AverageRequestLatency	Amount of time it took to begin executing and or queue the workbook-related extract refresh trasks
			RequestsFailed
			RequestsProcessed
	Run Extract Refreshes on Data Sources	Indicates an admin selected one or more data sources from the Data Sources list and then used the "Refresh Extracts" action on the Actions menu. Does NOT reflect data sources which were automatically refreshed via a schedule they are associated with.	AverageBatchFailure	Number of selected data source refresh tasks which could not be executed.
			AverageBatchSize	Number of data sources which were chosen at the same time in the Data Sources list and then executed.
			AverageRequestLatency
			RequestsFailed
			RequestsProcessed
	Run Schedules	Represents the number of schedules that were interactively executed from the Schedules Page. Does not reflect the number of schedules that executed automatically based on their scheduled execution time. Running a schedule from the "Schedules" page will not cause "Run Extract Refreshes on Data Sources / Workbooks" counters to increment EVEN IF the schedule in question executes those tasks. This is because they were not directly executed in an interactive fashion.	AverageBatchFailure	Average number of schedules which could not be interactively executed.
			AverageBatchSize	The average number of schedules in the interactively selected batch which could not be started.
			AverageRequestLatency
			RequestsFailed
			RequestsProcessed
	Search Metrics	Not exactly sure what this does - I've noted that that RequestProcessed increments when I jump to a different site in VizPortal using the "Sites" drop-down, or when I login and choose which site I want to login to, but that's it.	AverageQueryLatency
			RequestsProcessed
	Sync AD Group	Doesn't appear to function. These counters didn't change when I manually initiated an AD Sync or used the new-ish 9.1 mechanism to schedule a sync	AverageRequestLatency	-
			RequestsFailed	-
			RequestsProcessed	-
	User Login	Records users logging in. As expected, does not record trusted-ticket based login and I assume the same thing holds true for REST-related activity.	AverageRequestLatency
			RequestsFailed	Number of failed logins due to bad password or username
			RequestsProcessed
Vizql	External Query Cache	Cache which is shared across all processes (vizqls, backgrounders, etc.)	ExternalAbstractQueryCacheHits	Hits against a "generalized" cache of data which has been retrieved/stored without a specific relationship to database-engine specific query which was fired previously
			ExternalAbstractQueryCacheMisses	Failure to find cached data in the abstract query cache (likely that the question hasn't been asked before)
			ExternalQueryCacheHits	Unknown. This counter doesn't exist when calling getPerforamcneMetrics(). Perhaps an older counter name from < 9.1 which was never removed?
			ExternalQueryCacheMisses	Unknown. This counter doesn't exist when calling getPerforamcneMetrics(). Perhaps an older counter name from < 9.1 which was never removed?
	InProcess Query Cache	A per-process cache (each vizqlserver has a distinct, unsynchronized copy) that lives "inside" each vizqlserver process. It is checked before the External Query Cache.	InProcessAbstractQueryCacheHits	Same as ExternalAbstractQueryCacheHits, but in process.
			InProcessAbstractQueryCacheMisses	Same as before
			InProcessQueryCacheHits
			InProcessQueryCacheMisses
	Visual Model Cache	A model represents a "ready to render" plan to execute a viz. It represents data, color, size, position and shape of marks to be drawn. It is the output of the interpreter pipeline	VisualModelCacheHits	Number of hits
			VisualModelCacheMisses	Number of misses
			VisualModelCachePartialHits	A model exists for what we want to render, but it was created for a different screen size. The cached model can be copied and then "resized" which will still save some time vs. just starting from scratch
	Image Cache	Image tiles can/are generated from the Visual Model. These tiles are saved and can be re-served to save time.	ImageCacheHits	Hits. Yay!
			ImageCacheMisses	Misses. Boooo!
	Sessions	Information about VizQLServer sessions	ActiveSessions	The number of open sessions under the vizqlserver process in question. These sessions are active (alive) but not necessarily in flight (doing work)
DataServer	Overall Metrics	Information pertaining to the Data Server component of Tableau Server	ActiveSessions	Open session count against the data server component. These sessions may not all be active, however.
			RequestsCount	Number of requests handled by Data Server. A single action in Tableau Desktop or the Web Browser can result in many requests against Data Server.

All memorized? Good.

For the uninitiated, here are a few things that may seem counter-intuitive to you. I want to call them out specifically so that you don’t get confused.

Request Latency

Very few the of Request Latency counters measure something you’ll be especially interested in. With the exception of maybe Publish Workbook and User Login, this values tend to focus on how long it took to start a process (extract refresh, etc) versus how long the process took to complete. Watching these counters could be useful to note that on the whole one is taking longer than it used to to initiate certain processes…but then again I suspect that when and if these numbers go up, you’ll have larger problems to worry about anyway.

VizPortal Counters are about what happens in the Application Server Portal.

You won’t find anything here about backgrounder activity. Once you’ve kicked off a schedule or executed a task in the portal, there is no counter which will tell you how long it took that schedule / task to complete. For the moment, you’ll still need to rely on system tables or the built-in reports that come with Tableau Server.

Wow, lots of information about caching. I guess I should worry about this, huh?

No, not really. Knowing that you’re getting cache hits is interesting, but there isn’t a “right number” that you should strive for. There’s also not much you can do to drive up your cache hit ration without specifically warming the cache. I’ll get into this in a follow-up blog post that examines the sample workbook which comes w/ TabMon.

Active Sessions: That must mean active users doing work?

No. I wish we’d call this “Open Sessions”. Even though we may have hundreds of sessions open (because users don’t generally logoff when they’re done), only a few may be truly active. To me, “active” should mean actually doing something which drives additional resource usage. The majority of active sessions will just be sitting there until we time them out, kill them, and and clean up.

What’s next?

In the next entry, we’ll explore the sample workbook that comes with TabMon. There are a couple things you should know about it.

After that, we’ll jump into some beautiful, exceptional things that you can do with TabMon if you go “off road” a bit. This is the good stuff. Don’t miss it.