How to Monitor Your Tableau Server – Part 2 – Tableau Server Application Monitoring


Hello there,

Following on from Part 1 of this series. Here’s part 2, how to monitor your Tableau Server application itself.

Now I don’t know server in as much detail as some of the Jedi-level experts out there so I’m totally open to different ways of doing things. My recommendations here are based as much on general IT service monitoring best practice as they are on Tableau specifics. If I’ve missed something then do point it out – hoping the community can help me expand this article. 

On that subject – I’m delighted to have been able to collaborate with Craig Bloodworth (@craigbloodworth), Mark Jackson (@ugamarkj) & Chris Schultz (@nalyticsatwork) on this. Thanks for your invaluable contributions guys.

Are we ok? That’s the ubiquitous question on an IT service manager’s mind. And it can be a real worry. But the fact is that there are a lot of tools and methods you can employ to cut down that worry and stress or even eliminate it.

 

Service Availability

Simply put, is your Tableau Server up or down? Tableau offer a “Server Status” view, but in my opinion that’s pretty useless as you’re never going to be staring at it for the whole day. I’m also not sure how quickly it updates or responds to the system activity. It never seems to change when I’m looking at it.

status

Tableau’s Default Server ‘Monitor’

So it’s clear you’ll need something else to give you that early warning of any issues.

xml

Tableau Server Monitor in xml

Btw you can also get this in xml output. Could be handy.

 

Process Monitoring (Enterprise Process)

procs

Main Tableau Server Processes (click to enlarge)

These are the key processes (running programs) that are required for Tableau Server to function. If one of these has crashed they you’ll likely have a problem.

So referring back to Part 1, I talked about enterprise monitoring tools used to monitor your Tableau infrastructure. Well you should be able to use these tools to set up application monitoring. That’s monitoring of your own application, that you define (and ideally configure) that produces alerts that come to you or your own support team (via the enterprise process).

You should set up monitoring rules to alert on zero instances of each of these processes. The alerts need to be classed as a “Critical” severity so that they hit the alert list of the Level 1 team (non-critical alerts may not be visible). Make sure the monitoring rules apply 24 x 7.

Important – Make sure that the Level 1 & 2 teams that will get these alerts know exactly what to do with them. These teams will probably have a document or Runbook that you’ll need to fill out which will give them instructions as to what the alert means and who they should call. This needs to be crystal clear as they’ll usually follow it to the letter.

Process Monitoring (Paranoid Android Process)

marvin_660

“I knew that alert would get lost. Don’t say I didn’t warn you..”

So even if you set up the above monitoring using the Enterprise Process, then you may have issues. That process can break, meaning that your alert may take up to 30 mins to get to you (or a lot longer!).

Therefore I always encourage being as paranoid as possible when it comes to monitoring.

Luckily there are a number of things you can do to add an extra level to your monitoring.

 

Use a Simple Script

miker

Monitor Tableau Server without the GUI

Mike Roberts of Interworks has written a simple guide to scripting up a basic process check based on the default Tableau Server monitor xml output mentioned above. You can run that script using Windows Task Scheduler and get an email if any of the processes are detected to be down.

 

 

I don’t use that one, but I do have a very basic Powershell script that I run using Task Scheduler every 5 mins. Does the same thing. It’s based on the following code.

powershell.exe -command "& {if (! (get-process -name postgres -erroraction SilentlyContinue)){Send-MailMessage -SmtpServer '' -from  -To  -Subject 'postgres.exe not running on PROD '}}"

All that does is execute in the background and if the process name (in this example postgres) is not detected by the get-process command then it sends an email to my team. Not foolproof but when combined with the enterprise process then it gives me a better level of protection.

Query the processes via URL

craig

Querying processes via URL

This is a new one on me. Apparently it is possible to query each process by http and get a message back to indicate if the process is ok. Opens up a lot of options for more scripting of remote checks or monitoring of the URLs via third party applications. All adds to the arsenal of monitoring available to the service owner. Many thanks to Craig Bloodworth (@craigbloodworth) of The Information Lab for this tip. You can find more details in this blog post.

The Windows Event Log

By default Windows will log any messages or errors to the Windows Event Log. This can include system and application alerts and is a great source of data regarding system health.

tableau_event_restartingdeadcomponent

Windows Event Log (click to enlarge)

Fire up the Event Viewer (somewhere in administrative tools menu usually) . You should see a number of categories of event on the left, from system stuff to specific application messages. Some will be informational, others downright confusing, but there will be some gold dust in there that you need to be mindful of.

For example – the image (right) shows that Tableau server has been restarting the backgrounder process due to a crash. That’s not critical to know about immediately but I’d sure be interested to understand if it is happening regularly.

There are ways you can export this data automatically and then create a Tableau datasource – we haven’t done that yet but are planning to.

Windows Performance Monitor

perfmon

Windows performance monitor data collector

You can also make use of the inbuilt Windows performance monitor to collect and export data regarding the performance of the Tableau processes on your server. We set up a collector and constructed a basic Server Health Dashboard.

 

 

 

server health

Server Health Dashboard based on Windows perf mon stats

It’s a good idea to subscribe to these dashboards to get them dropped in your inbox at the start and end of your production day.

The details for setting this up are on this Tableau KB article.

 

 

Tableau Log File Monitoring

To me the Tableau logs seem like a real mystery. There’s clearly a ton of information in there, but even the Tableau support folk don’t seem to know what’s important and what’s junk. There are also a lot of messages that seem like red-herrings and some that are just plain confusing.

It’s a shame that there isn’t more clarity on which strings and messages we should pay attention to, at the moment I’m just guessing.

In terms of alerting, the enterprise monitoring tool you use will have an equivalent log scraping functionality, just as it does for process monitoring. This will involve you telling the tool which text to alert on. Fairly simple. You can also write your own script in much the same way as the powershell process monitoring script mentioned earlier in this post.

I get really annoyed with the state of the Tableau Server logs. They’re a total mess. There are multiple locations, and there’s little consistency. I’ve not had time to analyse them properly but it seems like some entries contain either DEBUG / INFO / ERROR or FATAL which would give an indication of whether you should trigger an alert based on the occurrence. It doesn’t seem consistent though.

Ideally I’d like every log entry from every component to start with a timestamp, then either of these severity indicators. Would make it so easy.

 

Log analysis using Splunk

splunkIf you’ve not seen Splunk then you should take a look. It’s a great tool for aggregating and analysing masses of log file data and is in widespread use at many large enterprises. I don’t use it yet but it’s in the pipeline.

Another bit of collaboration – Chris Schultz has written a guide to using Splunk to analyse Tableau Server logs. It’s on his new blog here.

 

Monitoring Tableau Server Activity

Monthly Server Stats

A wealth of info is available from the Postgres DB

So you’ll probably know that Tableau has an internal Postgres database. You may not know that you can interrogate this database easily and pull out pure gold! It’s an absolute treasure trove of information about your server performance, usage and pretty much anything else.

I’m not going to elaborate on it here as my good friend Mark Jackson (@ugamarkj) has written a comprehensive guide on it here.

This is critical ammo to the Tableau Service manager and making these dashboards available to your user community will get you some serious brownie points, especially with senior management. Most applications don’t have the ability to provide this level of detail, Tableau does, and it’s a great feature.

Other Resources

As mentioned there are a ton of ways to do this and there are many more guides out there. Take a look at some of these links.

http://www.alansmitheepresents.org/2014/02/tableau-server-performance-monitoring.html
http://kb.tableausoftware.com/articles/knowledgebase/automation-checking-server-status
 

OK that’s it for this part. Hopefully that’s given you an idea of what is possible in terms of monitoring the Tableau Server application. Got any ideas or methods of your own, then do share!

Cheers, Paul

Advertisements
This entry was posted in Tableau as an IT service and tagged , , , , , , , , , , , , , , , , , . Bookmark the permalink.

2 Responses to How to Monitor Your Tableau Server – Part 2 – Tableau Server Application Monitoring

  1. Pingback: Monitoring your Tableau Server using Pulseway - The Information Lab

  2. stanleysrb says:

    Great work man! Very helpful staff

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s