How to manage Tableau upgrades in an Enterprise environment

Hi,

Been a while since my last post. Been exceptionally busy at work due to widespread adoption of Tableau at my organisation. Usage has doubled in the last 3 months and we now have thousands of users to keep happy. That takes some doing, hence the blog hiatus.

Anyway, time to continue the series on Tableau as an IT Service, with a subject that I’m asked about a lot – just how do you manage your Tableau upgrades in an Enterprise environment?

This is actually a pretty big subject, especially in an Enterprise setup. There’s no perfect way to do it but hopefully some of these tips will be useful.

 

Section 1 – Pre-Upgrade Considerations

To upgrade or not? That is the question

Obviously you’ll need to make some sort of decision as to whether or not you actually need to upgrade. Each time you upgrade a production system you risk impacting stability, introducing bugs or human error. It also needs testing, planning and eats resources and time. So if there’s no good reason to upgrade, then don’t.

Tableau release new versions at a pretty impressive cadence, generally once a month for ‘maintenance’ releases. So for each newly advertised release, take note of the following in order to make your decision.

  • Compelling functionality –  New features are entering the product all the time. Determine what may be useful to your user base.
  • Key bug fixes – Each new version will squash a few bugs. If there’s one that is affecting your users then it may be prudent to upgrade and quieten the noise. Remember that your upgrade may introduce new issues. Take note of the known issues section in the release notes.

Both of these are fully documented in the Release Notes. Review them carefully each time Tableau announce a new version. There are occasions where bug fixes are not announced in the release notes but your account manager will make you aware of those.

Also be aware that new versions might also introduce new bugs / issues. We have had situations where we have been stable, decided to upgrade and then spent the next few months battling newly introduced issues to the point that we probably should have stayed on the older version.

Be aware of compatibility issues

I hear a lot of complaints about compatibility issues between Server and Desktop. So it’s important to be aware of the behaviour between versions. Get this wrong and you may be in a position where users have overwritten content originally created in an older version and don’t have a back up to roll back to. If you are crossing major versions (8 -> 9) for example, then you’ll certainly need to upgrade the Server and Desktops at the same time.

Top tip – it is possible to hack the xml of a workbook to change the version and rescue the situation, provided no edits have been made to the content. The ever-so-talented Jen Vaughan (@butterflystoryexplains all here.

For more details see this article.

Don’t upgrade to version zero

Most risk averse IT managers (like me) will resist the temptation to jump right into that new shiny Vx.0 release the day it comes out. Version zero releases of any software are notorious for bugs and issues, that’s just the nature of software development. So at my org we always let at least one maintenance release slide by and instead go to the Vx.0.1 or Vx.0.2 release.

It can be hard to resist temptation, especially when your users are clamouring for that shiny new version but if there are major bugs in that zero release, then best let someone else find out about them rather than you.

Caveat: This doesn’t always work of course. We waited for 9.0.2 and that ended up being one of the buggier releases. Oh the irony.

 

Understand the new version resource demands

This is important. You may be rocking away on your existing version, confident that your hardware can satisfy the software. But then you upgrade, and all of a sudden that new version eats up double the RAM or batters your CPU. You didn’t see that coming.

For version 9 Tableau released an updated scalability document. Annoyingly it was released quite a while after V9 went live. I was expecting a comfortable read but noticed phrases like the following, which were pretty alarming.

Screen Shot 2015-09-04 at 23.03.31

Screen Shot 2015-09-04 at 23.05.20

Whaaaat?! That led to some discussions with Tableau tech folks (thanks Meredith!) and some fevered testing and all turned out to be ok. But those figures took us by surprise for sure. Don’t be in a position where you upgrade and then suddenly hit a capacity issue that your older version didn’t have.

 

Test it!

Let’s say you upgrade. And it goes wrong. One of the first questions you’ll get asked is “Did you test this?”. You really don’t want to be answering “no” to that question.

You should have at least one non-production environment that you can run tests on. Due to the complex nature of Tableau it is impossible to test every aspect of functionality but you should at least be able to cover a good number of scenarios. You may also not be able to simulate your production load on your test environment but it will be better than nothing. You may find the load testing utility tabjolt handy here. Check Russell Christopher’s guide to tabjolt.

I have the following environments to play with. This gives me a lot of options.

  • Production – Main user facing environment
  • BCP – Disaster recovery environment in case production fails
  • UAT – A mirror of production. Same spec
  • Engineering – Lower spec, used to test the latest version available from the vendor
  • Beta – Even lower spec, used for testing beta versions
  • Alpha – for testing the alpha versions

Ensure that the tests you conduct are consistent, repeatable and that the outcomes are recorded. We use a tool called Quality Centre and the tests are performed by my level 2 support folks. This gives consistency and frees time for my main analysts.

Verify your licence details

Double check your licence maintenance end date for both Server and Desktop. If you’re out of maintenance then you won’t be able to use the application after you upgrade. I’ve seen licence issues way too many times with many applications after an upgrade. You won’t want to be trying to contact your account manager on a weekend to sort out a licence issue.

Opinion – IMHO I would much prefer it if applications didn’t crap out due to licence expiration. In 99% of cases there’s some paperwork misunderstanding that is easily sorted by your account manager. By all means let the application hit you with some warnings and also alert the vendor, but it shouldn’t mean a loss of service.

 

Your upgrade process & strategy

I’m not going to go into it here as it’s a full on subject in itself but make sure you follow your organisation’s Change Management procedure to the letter. Failure to follow change processes is generally a dismissible offence in most Enterprises.

Make sure you advertise your strategy for upgrades and maintenance to your users. You’ll get asked, so ensure this is specified in your Service Document. You may even have standard maintenance windows (e.g. on a weekend) where you can reserve the right to take down the system. Again, make sure that is documented and your users are aware.

 

Section 2 – The Upgrade

Create a package

Most enterprises will use some form of package deployment tool / team to perform the actual deployment of new software. That’s pretty handy. I have over 500 installations of desktop to support and we need to ensure they all get upgraded at the same time. So I can create a software bundle, send to the packaging team and they will then schedule and deploy.

This gives you the chance to include those little extras in your package to give your users the best experience. Here’s what’s in our Tableau Desktop package.

  • The installer exe file
  • A sample “Getting Started” workbook with tips, best practices & help.
  • Preferences.tps file containing customised colour pallettes
  • Most used drivers (Oracle, MySQL etc)

I would love to be able to customise the “Discover” pane to point to some of my internal resources but it doesn’t seem to be possible. Boo.

We also supply custom instructions for the packaging team, such as running the installer with a flag / registry update to disable the auto-update feature that has been implemented in the upcoming 9.1 release. A really baaaaad idea for Enterprise deployments.

One thing to be aware of with packaging is that it can take a long time. From request to deployment, the typical time at my org is an insane 7 weeks! By which time another version is out. We are hoping to speed that up a bit obviously.

Communicate to the max

You can’t over communicate any potential disruption to your service. Make sure a broadcast message goes to your users via whatever system your firm uses. And it doesn’t hurt to follow-up with your power users / senior stakeholders with a personal reminder that work is planned.

Take a backup

Whadda ya mean you didn't take a backup?

Whadda ya mean you didn’t take a backup?

Tableau is one of the easiest applications to upgrade that I’ve worked with. A simple uninstall / reinstall does the trick. But don’t take that for granted – make sure you take a manual backup prior to your upgrade. If you’re not doing this as a matter of principle then you’ll be getting a visit from the boys. And they won’t have had their dinner.

Handily, the uninstall process takes a backup anyway but don’t rely on that, take your own.

You should also back up all your .yml configuration files and ensure you know what each setting is. Tableau should preserve these settings during the upgrade but just in case it doesn’t then it’s handy to have a copy to refer back to.

Server specific considerations

When you uninstall Tableau Server it backs up the content and the settings in the main yml configuration files. That’s cool, but do remember that if you’ve changed any of the other config files then they will be overwritten and you’ll have to make the changes again. For example we change the webserver timeout settings in the file “\Tableau\Tableau Server\data\tabsvc\config\httpd\httpd.conf.templ” – that gets blown away by an uninstall.

There may also be other settings in the Postgres db that you may have modified using tabadmin. Not all are retained from what I can see. Note I’m still researching this so not 100% sure of the behaviour.

Finally make sure you understand any changes to the Tableau Postgres DB schema in the new versions. It has generally remained pretty consistent but if any tables or fields are renamed then that may well break your Custom Admin Views.

Section 3 – Post upgrade

Test it! Again!

Not all issues come to light immediately. Perform testing, keep vigilant and liaise with your users closely in the next few days to understand if the application is behaving as it should be.

Ask for help!

Tableau Upgrade Assistance

Tableau Upgrade Assistance

If all this sounds a bit daunting and you’d like to get assistance then Tableau offer an “upgrade assistance” programme which might be worth looking at. Talk to your account manager for more information.

There are also other guides around. Have a look at this one from our good friends at Interworks.

That’s it for this post. As I said it’s a big subject so do post comments if you feel I’ve missed anything.

Happy upgrading! Cheers, Paul

Advertisements

How to Monitor Your Tableau Server – Part 2 – Tableau Server Application Monitoring

Hello there,

Following on from Part 1 of this series. Here’s part 2, how to monitor your Tableau Server application itself.

Now I don’t know server in as much detail as some of the Jedi-level experts out there so I’m totally open to different ways of doing things. My recommendations here are based as much on general IT service monitoring best practice as they are on Tableau specifics. If I’ve missed something then do point it out – hoping the community can help me expand this article. 

On that subject – I’m delighted to have been able to collaborate with Craig Bloodworth (@craigbloodworth), Mark Jackson (@ugamarkj) & Chris Schultz (@nalyticsatwork) on this. Thanks for your invaluable contributions guys.

Are we ok? That’s the ubiquitous question on an IT service manager’s mind. And it can be a real worry. But the fact is that there are a lot of tools and methods you can employ to cut down that worry and stress or even eliminate it.

 

Service Availability

Simply put, is your Tableau Server up or down? Tableau offer a “Server Status” view, but in my opinion that’s pretty useless as you’re never going to be staring at it for the whole day. I’m also not sure how quickly it updates or responds to the system activity. It never seems to change when I’m looking at it.

status

Tableau’s Default Server ‘Monitor’

So it’s clear you’ll need something else to give you that early warning of any issues.

xml

Tableau Server Monitor in xml

Btw you can also get this in xml output. Could be handy.

 

Process Monitoring (Enterprise Process)

procs

Main Tableau Server Processes (click to enlarge)

These are the key processes (running programs) that are required for Tableau Server to function. If one of these has crashed they you’ll likely have a problem.

So referring back to Part 1, I talked about enterprise monitoring tools used to monitor your Tableau infrastructure. Well you should be able to use these tools to set up application monitoring. That’s monitoring of your own application, that you define (and ideally configure) that produces alerts that come to you or your own support team (via the enterprise process).

You should set up monitoring rules to alert on zero instances of each of these processes. The alerts need to be classed as a “Critical” severity so that they hit the alert list of the Level 1 team (non-critical alerts may not be visible). Make sure the monitoring rules apply 24 x 7.

Important – Make sure that the Level 1 & 2 teams that will get these alerts know exactly what to do with them. These teams will probably have a document or Runbook that you’ll need to fill out which will give them instructions as to what the alert means and who they should call. This needs to be crystal clear as they’ll usually follow it to the letter.

Process Monitoring (Paranoid Android Process)

marvin_660

“I knew that alert would get lost. Don’t say I didn’t warn you..”

So even if you set up the above monitoring using the Enterprise Process, then you may have issues. That process can break, meaning that your alert may take up to 30 mins to get to you (or a lot longer!).

Therefore I always encourage being as paranoid as possible when it comes to monitoring.

Luckily there are a number of things you can do to add an extra level to your monitoring.

 

Use a Simple Script

miker

Monitor Tableau Server without the GUI

Mike Roberts of Interworks has written a simple guide to scripting up a basic process check based on the default Tableau Server monitor xml output mentioned above. You can run that script using Windows Task Scheduler and get an email if any of the processes are detected to be down.

 

 

I don’t use that one, but I do have a very basic Powershell script that I run using Task Scheduler every 5 mins. Does the same thing. It’s based on the following code.

powershell.exe -command "& {if (! (get-process -name postgres -erroraction SilentlyContinue)){Send-MailMessage -SmtpServer '' -from  -To  -Subject 'postgres.exe not running on PROD '}}"

All that does is execute in the background and if the process name (in this example postgres) is not detected by the get-process command then it sends an email to my team. Not foolproof but when combined with the enterprise process then it gives me a better level of protection.

Query the processes via URL

craig

Querying processes via URL

This is a new one on me. Apparently it is possible to query each process by http and get a message back to indicate if the process is ok. Opens up a lot of options for more scripting of remote checks or monitoring of the URLs via third party applications. All adds to the arsenal of monitoring available to the service owner. Many thanks to Craig Bloodworth (@craigbloodworth) of The Information Lab for this tip. You can find more details in this blog post.

The Windows Event Log

By default Windows will log any messages or errors to the Windows Event Log. This can include system and application alerts and is a great source of data regarding system health.

tableau_event_restartingdeadcomponent

Windows Event Log (click to enlarge)

Fire up the Event Viewer (somewhere in administrative tools menu usually) . You should see a number of categories of event on the left, from system stuff to specific application messages. Some will be informational, others downright confusing, but there will be some gold dust in there that you need to be mindful of.

For example – the image (right) shows that Tableau server has been restarting the backgrounder process due to a crash. That’s not critical to know about immediately but I’d sure be interested to understand if it is happening regularly.

There are ways you can export this data automatically and then create a Tableau datasource – we haven’t done that yet but are planning to.

Windows Performance Monitor

perfmon

Windows performance monitor data collector

You can also make use of the inbuilt Windows performance monitor to collect and export data regarding the performance of the Tableau processes on your server. We set up a collector and constructed a basic Server Health Dashboard.

 

 

 

server health

Server Health Dashboard based on Windows perf mon stats

It’s a good idea to subscribe to these dashboards to get them dropped in your inbox at the start and end of your production day.

The details for setting this up are on this Tableau KB article.

 

 

Tableau Log File Monitoring

To me the Tableau logs seem like a real mystery. There’s clearly a ton of information in there, but even the Tableau support folk don’t seem to know what’s important and what’s junk. There are also a lot of messages that seem like red-herrings and some that are just plain confusing.

It’s a shame that there isn’t more clarity on which strings and messages we should pay attention to, at the moment I’m just guessing.

In terms of alerting, the enterprise monitoring tool you use will have an equivalent log scraping functionality, just as it does for process monitoring. This will involve you telling the tool which text to alert on. Fairly simple. You can also write your own script in much the same way as the powershell process monitoring script mentioned earlier in this post.

I get really annoyed with the state of the Tableau Server logs. They’re a total mess. There are multiple locations, and there’s little consistency. I’ve not had time to analyse them properly but it seems like some entries contain either DEBUG / INFO / ERROR or FATAL which would give an indication of whether you should trigger an alert based on the occurrence. It doesn’t seem consistent though.

Ideally I’d like every log entry from every component to start with a timestamp, then either of these severity indicators. Would make it so easy.

 

Log analysis using Splunk

splunkIf you’ve not seen Splunk then you should take a look. It’s a great tool for aggregating and analysing masses of log file data and is in widespread use at many large enterprises. I don’t use it yet but it’s in the pipeline.

Another bit of collaboration – Chris Schultz has written a guide to using Splunk to analyse Tableau Server logs. It’s on his new blog here.

 

Monitoring Tableau Server Activity

Monthly Server Stats

A wealth of info is available from the Postgres DB

So you’ll probably know that Tableau has an internal Postgres database. You may not know that you can interrogate this database easily and pull out pure gold! It’s an absolute treasure trove of information about your server performance, usage and pretty much anything else.

I’m not going to elaborate on it here as my good friend Mark Jackson (@ugamarkj) has written a comprehensive guide on it here.

This is critical ammo to the Tableau Service manager and making these dashboards available to your user community will get you some serious brownie points, especially with senior management. Most applications don’t have the ability to provide this level of detail, Tableau does, and it’s a great feature.

Other Resources

As mentioned there are a ton of ways to do this and there are many more guides out there. Take a look at some of these links.

http://www.alansmitheepresents.org/2014/02/tableau-server-performance-monitoring.html
http://kb.tableausoftware.com/articles/knowledgebase/automation-checking-server-status
 

OK that’s it for this part. Hopefully that’s given you an idea of what is possible in terms of monitoring the Tableau Server application. Got any ideas or methods of your own, then do share!

Cheers, Paul

How to Monitor Your Tableau Server – Part 1 – Infrastructure Monitoring

Hello there,

I hope you are all well and recovered from #data14. What a great event that was.

I’m gonna get a bit serious on yo now. It’s time to talk monitoring.

For a Tableau service manager (or any IT service for that matter), the worst situation that can possibly occur is getting a phone call from your users to tell you that your service is down. At best you’ll look stupid, at worst it will cost you credibility and is a sure-fire way to destroy user confidence in your service.

So how do you avoid this? You could not have any outages – well you can forget that, it aint gonna happen. You’ll get issues so get ready for them. What you can do is monitor your service big time. That way you’ll get the heads up and you can answer that phone call with a “yep we know, we’ve just raised an incident ticket and we are on it” – or better still, get to the incident and fix it before users even notice! Remember that effective incident management can actually gain you plus points from your user base, and senior management.

The problem with monitoring is that it’s BORING. I should know I did it for 12 years! But it’s also essential! Get it right and you’ll be making your life a lot easier. It also traditionally doesn’t get a whole lot of investment thrown its way as there’s no immediate tangible business benefit.

Monitoring falls into these categories. This is likely to take me more than one post to explain and it’s a big subject so I’ll doubtless miss some bits out. As always, I’m happy to connect offline and explain.

  • Infrastructure monitoring
  • Application monitoring
  • Performance monitoring
  • Capacity monitoring
  • User experience monitoring

Infrastructure Monitoring

As the name suggests this is all about monitoring of your infrastructure. That’s your hardware and network, peripherals and components of the platform your Tableau Server application is running on.

Chances are the infrastructure will be owned by an IT team. You’ll need a great relationship with these folks so if you haven’t then start buying them some doughnuts now. From what I can see Tableau is often brought into organisations by business users and that then antagonizes IT, meaning this relationship isn’t always the best. That’s a separate conversation however.

 

How does infrastructure monitoring work?

Chances are your monitoring team will have decided on an enterprise monitoring tool for the whole organisation. It will probably take the form of a central server, receiving alerts from an agent that is deployed as standard on each server in the estate.

NagiosSome examples of commonly used monitoring tools include the following. I’ve got a fondness for ITRS Geneos myself but am not going to go into the relative merits of each tool. You won’t have a choice what tool is used in your org anyway.

So what happens? Well the agent will have a set of monitoring “rules” that it adheres to. These will take the form of something like “check the disk space on partition X, every Y minutes and trigger an alert if greater than Z percentage full”. That’s all the agent does. Polls the server for process availability, disk space, memory usage etc on a scheduled frequency and triggers an alert to the central server if the condition is breached. Those parameters should be fully configurable.

consoleThe central server will then display the alert on an event console such as this one (pictured). Alerts will be given a criticality such as minor, major or critical. The alert console will be viewed by a support team, usually an offshore Level 1 team that provides an initial triage of the alert. They may then pass it onto a Level 2 team for potential remediation, or they may also pass it on to Level 3 – the main support team. That’s the usual process in a big organisation.

So what’s the issue with that? Well there’s the time factor for one. It can sometimes take 20 – 30 mins for an alert to get to the person that matters. That’s obviously not great. Also there’s the sheer volume of alerts, a big organisation can be dealing with tens of thousands of active alerts a week, many of them junk. That increases the risk of your alert being missed. There are also a lot of break points in the process, and sometimes alerts just go missing due to lost packets, network issues etc. It happens. On the whole the process works though.

 

Who’s responsible and what for?

Your infra teams are 100% responsible for the monitoring of these components. This encompasses

  • Server availability (ICMP ping)
  • CPU usage
  • Memory usage
  • Disk space (operating system partitions only)
  • Network throughput / availability

trustnooneThey’ll tell you not to worry about this. They’ll tell you that any alerts will go to their support teams and they’ll be on it should they detect an issue. My advice – don’t trust anyone. There have been many times where I’ve had an issue and lo and behold the monitoring hasn’t been configured properly, or hasn’t even been set up at all. Or there’s been a bad break in the process somewhere. That aint cool.

 

So what should I do?

Take these steps to keep your infra teams on their toes. They’re providing you a platform, you are entitled to ask. They might not like it, but stick to your guns – you’ll be the one who gets it in the neck if your Tableau Server goes down.

  • Ask for a breakdown of the infra monitoring thresholds – What’s the polling cycle for alerts? What thresholds are being monitored? Who decided them and why?
  • Ask for a process flow – What happens when an alert is generated? Where does it go? How long does it take for someone to get on it? How is root cause followed up?
  • Ask to have visibility of the infra changes – If there are changes going on to the environment that might affect your server, make sure you get notified. Make sure you attend the appropriate change management meetings so you know what’s going on.
  • Ask for a regular report on server performance – There will probably be a tool on the server that logs time series data on server performance. That should be accessible to you as well as them. Chuck the data into Tableau and make it available to your users.
  • Understand the infra team SLA – It’s important to realise that you are a customer of the infra teams. Ask them for a Service Catalogue document for the service that they are providing. Understand the SLA that they’re operating to. Don’t be out-of-order, but if you find they’re not giving you good service then don’t be scared to wave the SLA.
  • Ask for a report of successful backups – Just as important as monitoring
  • Ask for the ICMP ping stats – How many packets get lost in communications with your Tableau server? How many times does it drop off the network?
  • Be nice – The infra teams in big orgs have a tough job. They’ll have no money and little resource. Cut them some slack and don’t be a prat if they let you down occasionally. It happens.

Start with that lot. Your users will also love it if you can make this information available to them. Again, it inspires confidence that you know what you’re doing.

OK that’s it for infrastructure monitoring. Next up I’ll dive into how you monitor your Tableau Server application.

Cheers, Paul