In the previous two posts you learned how to install and configure TabJolt to drive load against a Tableau Server.
Today, we’ll delve into how to configure TabJolt to capture JMX and Windows Performance Monitor data. This data can be very interesting in terms of identifying and understanding which component of Tableau Server or the machine itself is “falling down” when you reach your saturation point.
It’s pretty rare that you’ll get this stuff working perfectly right from the start. Expect some error messages and multiple rounds of tinkering.
In general, there are two buckets of “problems” you’ll run into:
- You didn’t modify one of the config files correctly
- You’re running into network, firewall, or security issues
The Windows Firewall
The Windows Firewall is typically responsible for most of the agony experienced by those in the second group. It tends to get in the way quite regularly. To collect all the JMX/Windows performance data you want, you can count putting in some quality firewall time.
Live fast, Die Young, and just turn it off?
Seriously. If you can, just turn the damn thing off while you test and you’ll eliminate a whole category of problems. You’ll also save yourself a ton of time. If you go this route, turn off the Domain Profile. Not allowed to turn this sucker off? I feel for you, and we’ll touch on typical problems you should be prepared for later.
Capturing JMX Data
By default, TabJolt does not collect information from the vizqlserver, dataserver, and vizportal processes of your Tableau Server.
Per the TabJolt documentation, remove the comment which disables several applicableCounterGroups XML elements towards the bottom of /config/datatretriever.config when you’re ready to capture data.
When you do uncomment these puppies, make sure you change the default host (localhost) to the name of your Tableau Server. If you forget, you’ll be trying to capture information from the TabJolt box vs. the Tableau Server. I’ve done this several times and marveled at my own stupidity afterwards.
Before:
<hosts> <host name="localhost"> <applicableCounterGroups> <applicableCounterGroup>machineStatus</applicableCounterGroup> <applicableCounterGroup>tableauProcess</applicableCounterGroup> <!--enable the following section only after you jmx counter for tableau--> <!-- <applicableCounterGroup>vizqlserver</applicableCounterGroup> <applicableCounterGroup>dataserver</applicableCounterGroup> <applicableCounterGroup>vizportal</applicableCounterGroup> --> </applicableCounterGroups> </host> </hosts>
After:
<hosts> <host name="tableau"> <applicableCounterGroups> <applicableCounterGroup>machineStatus</applicableCounterGroup> <applicableCounterGroup>tableauProcess</applicableCounterGroup> <!--enable the following section only after you jmx counter for tableau--> <applicableCounterGroup>vizqlserver</applicableCounterGroup> <applicableCounterGroup>dataserver</applicableCounterGroup> <applicableCounterGroup>vizportal</applicableCounterGroup> </applicableCounterGroups> </host> </hosts>
Data Retriever errors, part 1
Let’s say you execute TabJolt and see this:
Failed to connect to the JMX connector due to the following error: ConnectException: Connection refused: connect
Failed to get the object from the pool due to the following error: ConnectException: Connection refused: connect
The Health Service failed to open JMX connection with component: vizqlserver, service URL: /jndi/rmi://localhost:9400/jmxrmi
Failed to connect to the JMX connector due to the following error: ConnectException: Connection refused: connect
Failed to get the object from the pool due to the following error: ConnectException: Connection refused: connect
The Health Service failed to open JMX connection with component: vizqlserver, service URL: /jndi/rmi://localhost:9400/jmxrmi
Failed to connect to the JMX connector due to the following error: ConnectException: Connection refused: connect
Failed to get the object from the pool due to the following error: ConnectException: Connection refused: connect
The Health Service failed to open JMX connection with component: vizqlserver, service URL: /jndi/rmi://localhost:9400/jmxrmi
Failed to connect to the JMX connector due to the following error: ConnectException: Connection refused: connect
Failed to get the object from the pool due to the following error: ConnectException: Connection refused: connect
The Health Service failed to open JMX connection with component: vizqlserver#1, service URL: /jndi/rmi://localhost:9401/jmxrmi
Failed to connect to the JMX connector due to the following error: ConnectException: Connection refused: connect
Failed to get the object from the pool due to the following error: ConnectException: Connection refused: connect
The Health Service failed to open JMX connection with component: vizqlserver#1, service URL: /jndi/rmi://localhost:9401/jmxrmi
Failed to connect to the JMX connector due to the following error: ConnectException: Connection refused: connect
Failed to get the object from the pool due to the following error: ConnectException: Connection refused: connect
The Health Service failed to open JMX connection with component: vizqlserver#1, service URL: /jndi/rmi://localhost:9401/jmxrmi
Failed to connect to the JMX connector due to the following error: ConnectException: Connection refused: connect
Failed to get the object from the pool due to the following error: ConnectException: Connection refused: connect
The Health Service failed to open JMX connection with component: vizqlserver#1, service URL: /jndi/rmi://localhost:9401/jmxrmi
Failed to connect to the JMX connector due to the following error: ConnectException: Connection refused: connect
Failed to get the object from the pool due to the following error: ConnectException: Connection refused: connect
The Health Service failed to open JMX connection with component: dataserver, service URL: /jndi/rmi://localhost:10000/jmxrmi
Failed to connect to the JMX connector due to the following error: ConnectException: Connection refused: connect
Failed to get the object from the pool due to the following error: ConnectException: Connection refused: connect
The Health Service failed to open JMX connection with component: dataserver, service URL: /jndi/rmi://localhost:10000/jmxrmi
Failed to connect to the JMX connector due to the following error: ConnectException: Connection refused: connect
Failed to get the object from the pool due to the following error: ConnectException: Connection refused: connect
The Health Service failed to open JMX connection with component: dataserver, service URL: /jndi/rmi://localhost:10000/jmxrmi
Whatever did you do wrong? You didn’t change the host name like I told you, silly! Note how the JMX connection messages reference localhost. Fix it, bub.
You’ll also get the same errors (except referring to the remote Tableau Server instead of localhost) if you forgot to enable JMX on your Tableau Server. Run tabadmin set server.jmx_enable true, then configure, and start your server.
Data Retriever errors, part 2
How about this lovely problem?:
Unhandled Exception: System.InvalidOperationException: The Counter layout for the Category specified is invalid, a counter of the type: AverageCount64, AverageTimer32, CounterMultiTimer, CounterMultiTimerInverse, CounterMultiTimer100Ns, CounterMultiTimer100NsInverse, RawFraction, or SampleFraction has to be immediately followed by any of the base counter types: AverageBase, CounterMultiBase, RawBase or SampleBase.
at System.Diagnostics.CategorySample.GetCounterDefinitionSample(String counter)
at System.Diagnostics.PerformanceCounter.get_CounterType()
at Com.Tableausoftware.DataRetriever.Core.PerformanceCounterHelper.IsNotCounterBase(PerformanceCounter p)
at Com.Tableausoftware.DataRetriever.Core.PerformanceCounterHelper.GetCounterNames(String category, String instance, String host, String regexPattern)
at Com.Tableausoftware.DataRetriever.Configurations.GetInstanceParallel.ThreadPoolCallback(Object threadContext)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
TabJolt can’t parse your dataretriever.config file.
As you modify dataretreiver, you need to make sure each line in the file ends with a linefeed. If you somehow manage to edit one out of existence, TabJolt will throw the exception above. I like to use Notepad++’s “View | Show Symbol | Show All Characters” functionality to double-check myself. Here’s a properly “formatted” file in terms of linefeeds:
When in doubt, read the exception messages carefully – they’ll often point you to the line in the file where your problem lies. For example, I removed a closing quote around “5” on line 27, and here’s what TabJolt told me:
Unhandled Exception: System.InvalidOperationException: There is an error in XML document (28, 13). ---> System.Xml.XmlException: '<', hexadecimal value 0x3C, is an invalid attribute character. Line 28, position 13.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseAttributeValueSlow(Int32 curPos, Char quoteChar, NodeData attr)
at System.Xml.XmlTextReaderImpl.ParseAttributes()
at System.Xml.XmlTextReaderImpl.ParseElement()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Skip()
at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderDataRetrieverConfig.Read8_DataRetrieverConfig(Boolean isNullable, Boolean checkType)
at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderDataRetrieverConfig.Read9_dataRetrieverConfig()
--- End of inner exception stack trace ---
at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
at Com.Tableausoftware.DataRetriever.Configurations.DataRetrieverConfig.LoadConfig(String configPath)
at Com.Tableausoftware.DataRetriever.Console.Program.Main(String[] args)
Unable to initialize configuration object. Exiting. Exception: javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 28; columnNumber: 14; The value of attribute "threadCount" associated with an element type "scheduler" must not contain the '<' character.]
Data Retriever errors, the final cut.
Assuming your config file is in order, your <host> element is pointing at your Tableau Server, and JMX is enabled on your Tableau Server, there are still a couple things you might see:
For example, references to connection errors against <either vizqlserver or dataserver> #1.
You might see either / both of the following:
The Health Service failed to open JMX connection with component: vizqlserver#1, service URL: /jndi/rmi://localhost:9401/jmxrmi
The Health Service failed to open JMX connection with component: dataserver#1, service URL: /jndi/rmi://localhost:10001/jmxrmi
Note that no errors are returned about “vizqlserver” or “dataserver” here – only “vizqlserver#1” or “dataserver#1 “. Are you actually RUNNING two instances of these services? Maybe not and you just forgot. The default dataretriever.config file assumes you are running two, however. You should comment out appropriate <components> that you’re not running.
<components> <component name="vizqlserver" serviceURL="service:jmx:rmi:///jndi/rmi://%s:9400/jmxrmi" /> <component name="vizqlserver#1" serviceURL="service:jmx:rmi:///jndi/rmi://%s:9401/jmxrmi" /> // Comment me out! <component name="dataserver" serviceURL="service:jmx:rmi:///jndi/rmi://%s:10000/jmxrmi" /> <component name="dataserver#1" serviceURL="service:jmx:rmi:///jndi/rmi://%s:10001/jmxrmi" /> // Or maybe me! <component name="wgserver" serviceURL="service:jmx:rmi:///jndi/rmi://%s:8300/jmxrmi" /> <component name="searchservice" serviceURL="service:jmx:rmi:///jndi/rmi://%s:9004/jmxrmi" /> <component name="vizportal" serviceURL="service:jmx:rmi:///jndi/rmi://%s:8900/jmxrmi" /> </components>
Windows Firewall Fun
It’s not uncommon to see some data collection components working, and others throwing errors every 15 seconds or so. For example, you might see these three lines shortly after you start a test, and again every 15 seconds…
Failed to connect to the JMX connector due to the following error: ConnectException: Connection timed out: connect
Failed to get the object from the pool due to the following error: ConnectException: Connection timed out: connect
The Health Service failed to open JMX connection with component: vizqlserver, service URL: /jndi/rmi://tableau:9400/jmxrmi
It’s the return of the firewall! Take a look at the list of <components> we were just talking about – it’s pretty clear that we’re dealing with vizqlserver. However, you may have already opened up port 9400 – what gives? Well, my friends, some of our components use more than one JMX port…and the TabJolt dataretriever.config file doesn’t tell you this. You’ll need to do some sleuthing. Specifically, look in tasks.yml which you’ll find in the /data/tabsvc/config folder of Tableau Server. Standard disclaimers apply – don’t edit this sucker by hand – if you make an error, you’re in for a world of hurt.
Anyway, search for Dcom.sun.management.jmxremote.rmi.port inside tasks.yml You’ll land on the option which is fed to the Java JVM to set the port the process in question will listen to. For example, if I want to see how the first vizqlserver (vizqlserver0) on my server is launched, I take gander at lines 79-84
- name: Tableau Server Vizqlserver 0
directory: D:/Program Files/Tableau/Tableau Server/9.2/repository/jre/bin
command: "\"D:/Program Files/Tableau/Tableau Server/9.2/bin/vizqlserver.exe\" -c tabsvc -XX:+UseConcMarkSweepGC -Xmx512m -XX:NewRatio=2 -XX:SurvivorRatio=6 -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=15 -Dcom.sun.management.jmxremote.port=9400 -Dcom.sun.management.jmxremote.rmi.port=9402 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -XX:ErrorFile=\"D:/Program Files/Tableau/Tableau Server/data/tabsvc/logs/vizqlserver/hs_err-0_pid%%p.log\" -Dlicensing.logFileName=vizqlserver -Djava.util.logging.config.file=\"D:/Program Files/Tableau/Tableau Server/data/tabsvc/vizqlserver/0/conf/logging.properties\" -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.class.path=\"D:/Program Files/Tableau/Tableau Server/9.2/tomcat/bin/bootstrap.jar;D:/Program Files/Tableau/Tableau Server/9.2/tomcat/bin/tomcat-juli.jar\" -Dcatalina.base=\"D:/Program Files/Tableau/Tableau Server/data/tabsvc/vizqlserver/0\" -Dcatalina.home=\"D:/Program Files/Tableau/Tableau Server/9.2/tomcat\" -Djava.io.tmpdir=\"D:/Program Files/Tableau/Tableau Server/data/tabsvc/temp\" -Dconfig.properties=\"file:D:/Program Files/Tableau/Tableau Server/data/tabsvc/config/vizql.properties\" -Dconnections.properties=\"file:D:/Program Files/Tableau/Tableau Server/data/tabsvc/config/connections.properties\" -Dlog4j.configuration=\"file:D:/Program Files/Tableau/Tableau Server/data/tabsvc/vizqlserver/0/conf/log4j.xml\" -Duser.timezone=UTC -Dprocid=0 org.apache.catalina.startup.Bootstrap start"
shutdown.mode: ctrl-c
spawn.log.dir: D:/Program Files/Tableau/Tableau Server/data/tabsvc/logs/vizqlserver
reconfigure: restart
See what’s going on? There are TWO ports being used for JMX for vizqlserver0. Both of these must be open. You may very well need to repeat this process of looking up ports for each process which complains, opening them up in the firewall, and then trying again. Here’s what I ended up with on my server to allow all the JMX traffic through without turning off the firewall itself:
- 8900-8901 for my single App Server (VizPortal)
- 9400-9403 for two VizQLS
- 10000-10001 for my single Data Server
Interestingly enough, I didn’t have to do this for wgserver (API Server) and the search service. Maybe we don’t actively monitor those even though they are listed in <components>? I dunno.
Now admit it – wouldn’t it have just been easier to turn off the blasted firewall? And we’re not even done yet 🙂
Monitoring multiple machines with TabJolt
Want to monitor several machines at the same time, you say? No sweat. In dataretriever.config, add a distinct <host> element for each machine you want to monitor. Here I am monitoring 3 of my machines at the same time:
CAPTURING PerfMON Data
Now that you can grab lots of JMX goodness, let’s not forget about OS-level performance information. There’s a bunch of good stuff there to be had…but also some potential errors to overcome.
The first thing to keep in mind is documented at the top of dataretriever.config:
Impersonation is only needs when you need to collect windows perf counters from other domain which your current runas user doesn't have permission to access
<!--impersonation is only needs when you need to collect windows perf counters from other domain which your current runas user doesn't have permission to access--> <!-- <settings> <impersonation userName="user" password="password" domain="domainName"/> </settings>
I’ve never had to use this, but if your Tableau Server RunAs user doesn’t have permissons to grab perfmon data from the machine, choose one who can and enter it here.
The main problems that I’ve had collecting PerfMon data have nothing to do with TabJolt, actually. TabJolt is simply a victim of the Windows Firewall (imagine that) or services which must be started on Windows Server in order to collect counters remotely. Typically, I’ll see errors like this:
Error Unable to get instances for category Processor of host tableau. Skip collecting perf counters for the category. Exception Message: The network path was not found
Error Unable to get instances for category Process of host tableau. Skip collecting perf counters for the category. Exception Message: The network path was not found
Error Unable to get instances for category Network Interface of host tableau. Skip collecting perf counters for the category. Exception Message: The network path was not found
Error Unable to get instances for category Memory of host tableau. Skip collecting perf counters for the category. Exception Message: The network path was not found
Error Unable to get instances for category LogicalDisk of host tableau. Skip collecting perf counters for the category. Exception Message: The network path was not found
How do deal with these?
Forget TabJolt for the moment – Google for articles which will help you setup remote performance monitoring correctly. Here are my faves, along with a summary of what I found useful from each:
Making sure the correct perfmon-relates Windows services are running on Tableau Server
In your Services snapin, make sure the following services are enabled and have at least the specific startup type mentioned below
Service | Startup Type |
---|---|
Remote Procedure Call (RPC) | Automatic |
Remote Registry | Automatic |
WMI Performance Adapter | Manual |
Performance Counter DLL Host | Manual |
Performance Logs and Alerts | Manual |
Remote Procedure Call (RPC) Locator | Manual |
Dealing with the ever-loving firewall
- Lists several Inbound rules to play with, I found that File and Printer Sharing (SMB-In) rule MUST be on in order to avoid the errors above
- …which tells you that Windows Server must be configured to allow some form of File sharing to begin with. I noted that the SMB 1.0/CIFS File Sharing Support feature is installed on my Windows Server
Monitoring additional PerfMON counters
Keep in mind that you can also add additional counters to what TabJolt monitors automatically. For example, I think watching read/write latency on the disk is a better way to IO impact on Tableau Server, so I add Avg. Disk sec/Transfer as a <counter> inside the <counterGroup> element towards the top of the dataretriever file:
That’s it! You managed to read the whole never-ending blog post. Go take a break.
Next up: Reporting!
Hi,
How are you? Have you experience this issue as well? I did not see it in your post. Searching now for other people who may have experienced it. Thanks.
Windows Firewall is off. Correct Tableau Server has been defined (had to go to a series of trial and error.) JMX has been enabled on the TS. So now, the issue is:
Failed to connect to the JMX connector due to the following error: NoSuchObjectE
xception: no such object in table
Not sure what is this? I’m running Tabjolt for 9.2 as it says in can run to any TS 9.0 and higher?
Thanks,
David