How to PROPERLY Back Up Your Tableau Server

Hello there,

Whadda ya mean you didn't take a backup?

Whadda ya mean you didn’t take a backup?

Time for another post about Tableau Server and how to get the best out of it in a large-scale, enterprise deployment situation.

Today we are focusing on how to PROPERLY back up your Tableau Server installation.

Like many aspects of enterprise services, this is a simple concept, but one that if you get wrong, can spell disaster. It always amazes me how many people / organisations don’t do this properly or even at all.

You know how annoyed you get when your mum tells you she isn’t backing up all her family photos – well that’s what I get like when I see IT systems neglecting backups.

Note this post refers to a standalone Tableau installation with a manual failover to DR. We don’t yet have a clustered environment. I’ll update the post with considerations when we implement that.

 

What’s a backup?

Seems a simple question, and there are a number of different types of backups that you can take, each useful in different situations. Here’s what I’ve got in place:

 

 

Full System Backup

This is a complete dump of the server filesystems to disk (or tape – there’s still plenty of tape backup infra out there). Most likely it will be one of the big vendor products that look like the mothership from Close Encounters of the Third Kind.

Your full system backup should be set up by your server team when you get your machine. However, the principle of “trust no-one” applies here as always and it’s up to you to check the following:

  • Have the backups been set up at all?
  • Are they backing up all the filesystems? – Many times I’ve seen that only operating system partition backups have been set up, and I’ve had to request the application partitions be included.
  • Have the backups been succeeding? – Get your backup team to send you a report every month of backup completion. They don’t always succeed and you probably won’t be told that there has been a failure.
  • If you need to perform a restore, do you know the process and how long does it take?

If you get the okay on that then you’re good. But only as an insurance policy. Full system backups can take a long time to restore, and may only be weekly so you could end up losing data even if these are in place. It’s up to you to ensure you’re covered rather than rely on other teams doing things correctly.

 

 

Nightly Tableau Backup

There’s no excuse for not having this in place. It’s easy to set up and it is a case of when rather than if it saves your ass.

The tabadmin backup command gets Tableau Server to dump all content & configuration to a single .tsbak file. You don’t have to stop the server to do this and it doesn’t seem to impact performance too much while it is running so this should be the first backup you configure.

A simple script like this will do the job.

@echo OFF
set Binpath="D:\Program Files\Tableau\Tableau Server\9.0\bin"
set Backuppath="D:\Program Files\Tableau\Backups\nightly"
echo %date% %time%: *** Housekeeping started ***

tabadmin backup %Backuppath%\ts_backup_ -d
timeout 5

tabadmin cleanup

move "D:\Program Files\Tableau\Backups\nightly\*" \\\tableau_shr\backups\nightly\
echo %date% %time%: *** Housekeeping completed ***

The tabadmin backup command does the actual work here, dumping everything to a file. Always a good idea to run tabadmin cleanup afterwards to remove logs etc.

We run this script at a quiet time for the server (not that there is one in my global environment). We use the Windows Scheduler on the server but I’d recommend using a decent scheduler like Autosys or whatever your enterprise standard is as WTS is pretty poor.

IMPORTANT: You may have noticed the move command at the end there. That takes our newly created backup file and moves it OFF THE SERVER to a share drive accessible by my backup server. Why? Well what happens if you lose the server and your backup file is on it? You may as well have no backup. So move it somewhere else.

Update – this tip actually saved my ass this week when we lost our entire VM cluster (er.. hardware team – *cough* – what’s going on??) . We were able to failover to the backup server successfully. Going forward we will be soon implementing Tableau’s High Availability capability.

Do make sure you rotate your backup files with a script that deletes the old files or your share drive will fill up. I keep 4 days worth, just in case the current file is somehow corrupted – rare but can happen.

 

 

Weekly Restart

You may know I’m not a fan of running enterprise apps on Windows. I prefer Linux for a number of reasons that I’m not going to go into here. I know many users want Tableau Server on Linux, and the amazing Tamas Foldi has only gone and written it himself – so one day we may see it.

Anyway, with Windows apps I always build in a weekly application restart. In our case every Saturday morning. That involves a server reboot (to clean out any OS related temp stuff), application restart and a tabadmin cleanup. The tabadmin cleanup with the server stopped has the added bonus of clearing out the temp files (doesn’t happen when the server is running). These files can get pretty big so worth clearing out.

 

 

Virtual Machine Snapshots

If you’re running on a VM then you may be able to utilise the VM snapshot facility. Contact your VM admins for details. I’ve not needed to implement this but I know some that do. VM snapshots are super handy.

Do be aware that Tableau don’t seem to support this though..

Screen Shot 2015-11-11 at 19.11.48

 

 

Config File Backup

Sometimes it’s handy to just back up your Tableau Server config. I’ve got a script that grabs all the .yml and template files in my Server directory, zips them up and moves them off the server. Pretty useful to refer back to old config settings if you need to. Make sure you include workgroup.yml.

If you’re being really good then you’ll be checking your config files into a revision control repo like SVN.

 

Site Specific Backups

Tableau Server allows you to backup per site. This doesn’t give me much extra but I know in orgs that have lots of sites, or a site per team / business unit it can be very handy.

One thing that isn’t great about exporting a site is that the site is locked and inaccessible as the export is taking place. See Toby Erkson’s blog for more info on exporting a site.

 

 

Backup File Size & Duration

As your environment grows you’ll need to be mindful of the size of your backup file. Mine is around 16GB and takes well over an hour to write. Takes about 25 mins to restore. You’ll need to understand those numbers as your system matures.

unnamed

Backup files can get pretty big

Another variable that can affect backup time is the specification of your primary server. If your primary is low spec then you’re gonna get a longer time to write a backup. I don’t have any stats on that but I know it is true. Contact Jeff Mills of Tableau if you want more info on Weak Primaries & backup times.

 

 

Backup Your Logs

Less important this, but handy to do on a weekly basis is to zip up your logs. We have a much better solution for logfile management using Splunk – you’ll see a blog about that in the future.

 

 

The Most Important Bit – TEST YOUR BACKUPS

OK so you’re backing up like a man / woman possessed? Fine. You’re only as good as your last restore. So TEST your backups periodically. Files get corrupted and you don’t want to be discovering that your only backup is broken when you need it.

OK that’s it. Backups can save your life – don’t ignore them. Paranoia is king in IT!

Cheers, Paul

How to Monitor Your Tableau Server – Part 1 – Infrastructure Monitoring

Hello there,

I hope you are all well and recovered from #data14. What a great event that was.

I’m gonna get a bit serious on yo now. It’s time to talk monitoring.

For a Tableau service manager (or any IT service for that matter), the worst situation that can possibly occur is getting a phone call from your users to tell you that your service is down. At best you’ll look stupid, at worst it will cost you credibility and is a sure-fire way to destroy user confidence in your service.

So how do you avoid this? You could not have any outages – well you can forget that, it aint gonna happen. You’ll get issues so get ready for them. What you can do is monitor your service big time. That way you’ll get the heads up and you can answer that phone call with a “yep we know, we’ve just raised an incident ticket and we are on it” – or better still, get to the incident and fix it before users even notice! Remember that effective incident management can actually gain you plus points from your user base, and senior management.

The problem with monitoring is that it’s BORING. I should know I did it for 12 years! But it’s also essential! Get it right and you’ll be making your life a lot easier. It also traditionally doesn’t get a whole lot of investment thrown its way as there’s no immediate tangible business benefit.

Monitoring falls into these categories. This is likely to take me more than one post to explain and it’s a big subject so I’ll doubtless miss some bits out. As always, I’m happy to connect offline and explain.

  • Infrastructure monitoring
  • Application monitoring
  • Performance monitoring
  • Capacity monitoring
  • User experience monitoring

Infrastructure Monitoring

As the name suggests this is all about monitoring of your infrastructure. That’s your hardware and network, peripherals and components of the platform your Tableau Server application is running on.

Chances are the infrastructure will be owned by an IT team. You’ll need a great relationship with these folks so if you haven’t then start buying them some doughnuts now. From what I can see Tableau is often brought into organisations by business users and that then antagonizes IT, meaning this relationship isn’t always the best. That’s a separate conversation however.

 

How does infrastructure monitoring work?

Chances are your monitoring team will have decided on an enterprise monitoring tool for the whole organisation. It will probably take the form of a central server, receiving alerts from an agent that is deployed as standard on each server in the estate.

NagiosSome examples of commonly used monitoring tools include the following. I’ve got a fondness for ITRS Geneos myself but am not going to go into the relative merits of each tool. You won’t have a choice what tool is used in your org anyway.

So what happens? Well the agent will have a set of monitoring “rules” that it adheres to. These will take the form of something like “check the disk space on partition X, every Y minutes and trigger an alert if greater than Z percentage full”. That’s all the agent does. Polls the server for process availability, disk space, memory usage etc on a scheduled frequency and triggers an alert to the central server if the condition is breached. Those parameters should be fully configurable.

consoleThe central server will then display the alert on an event console such as this one (pictured). Alerts will be given a criticality such as minor, major or critical. The alert console will be viewed by a support team, usually an offshore Level 1 team that provides an initial triage of the alert. They may then pass it onto a Level 2 team for potential remediation, or they may also pass it on to Level 3 – the main support team. That’s the usual process in a big organisation.

So what’s the issue with that? Well there’s the time factor for one. It can sometimes take 20 – 30 mins for an alert to get to the person that matters. That’s obviously not great. Also there’s the sheer volume of alerts, a big organisation can be dealing with tens of thousands of active alerts a week, many of them junk. That increases the risk of your alert being missed. There are also a lot of break points in the process, and sometimes alerts just go missing due to lost packets, network issues etc. It happens. On the whole the process works though.

 

Who’s responsible and what for?

Your infra teams are 100% responsible for the monitoring of these components. This encompasses

  • Server availability (ICMP ping)
  • CPU usage
  • Memory usage
  • Disk space (operating system partitions only)
  • Network throughput / availability

trustnooneThey’ll tell you not to worry about this. They’ll tell you that any alerts will go to their support teams and they’ll be on it should they detect an issue. My advice – don’t trust anyone. There have been many times where I’ve had an issue and lo and behold the monitoring hasn’t been configured properly, or hasn’t even been set up at all. Or there’s been a bad break in the process somewhere. That aint cool.

 

So what should I do?

Take these steps to keep your infra teams on their toes. They’re providing you a platform, you are entitled to ask. They might not like it, but stick to your guns – you’ll be the one who gets it in the neck if your Tableau Server goes down.

  • Ask for a breakdown of the infra monitoring thresholds – What’s the polling cycle for alerts? What thresholds are being monitored? Who decided them and why?
  • Ask for a process flow – What happens when an alert is generated? Where does it go? How long does it take for someone to get on it? How is root cause followed up?
  • Ask to have visibility of the infra changes – If there are changes going on to the environment that might affect your server, make sure you get notified. Make sure you attend the appropriate change management meetings so you know what’s going on.
  • Ask for a regular report on server performance – There will probably be a tool on the server that logs time series data on server performance. That should be accessible to you as well as them. Chuck the data into Tableau and make it available to your users.
  • Understand the infra team SLA – It’s important to realise that you are a customer of the infra teams. Ask them for a Service Catalogue document for the service that they are providing. Understand the SLA that they’re operating to. Don’t be out-of-order, but if you find they’re not giving you good service then don’t be scared to wave the SLA.
  • Ask for a report of successful backups – Just as important as monitoring
  • Ask for the ICMP ping stats – How many packets get lost in communications with your Tableau server? How many times does it drop off the network?
  • Be nice – The infra teams in big orgs have a tough job. They’ll have no money and little resource. Cut them some slack and don’t be a prat if they let you down occasionally. It happens.

Start with that lot. Your users will also love it if you can make this information available to them. Again, it inspires confidence that you know what you’re doing.

OK that’s it for infrastructure monitoring. Next up I’ll dive into how you monitor your Tableau Server application.

Cheers, Paul

 

Building a Tableau Centre of Excellence – Additional Resources

Hi

If you’re reading this then the chances are you attended my talk at Tableau Conference 2014. I hope you enjoyed what I had to say. I certainly enjoyed delivering it. As mentioned in the presentation, this blog post lists all the resources referred to in the talk.

Link to Presentation – on Prezi

Screen Shot 2014-09-13 at 01.08.07

 

In order of reference

Grab me anytime @paulbanoub if you’d like a chat about anything.

Cheers, Paul

5 Ways to Create a BI Centre of Excellence in the Enterprise

Hello all – I hope I find you well,

dlI’m delighted to have been invited to speak at the Information Age Data Leadership 2014 event in October. In my session I’ll be sharing tips for building a BI Centre of Excellence (COE) in an enterprise environment, based upon my experience of constructing and managing IT services at big enterprises for the last 12 years.

I’m currently in the process of helping to construct a data visualisation COE based on Tableau Software at a Tier 1 Investment Bank in London.

To give a flavour of what I’ll be speaking about, I’ve identified five areas fundamental to creating a successful Centre of Excellence. But if you have any questions ahead of then ping me on Twitter @paulbanoub.

Choose the right tools

There are a ton of tools out there. And a lot of them aren’t that great. Enterprise users are time-poor, under constant pressure to deliver and generally impatient. For the long term success of any COE it is fundamental your applications are easy to use, agile, and feature rich.

I’ve been trying to achieve the holy grail of a great BI stack for years now, and finally it seems like tools are emerging that allow this vision to be a reality.

When evaluating applications, always look for agility and ease of use. Most of your users won’t have much time to learn the tool; they probably won’t read much of the documentation and also probably won’t have time to attend any training courses. They’ll want to fire-up the application and dive right in. As a result that experience needs to be great from the off. Then once they’re running with it, can they generate their content or achieve their desired results quickly? They’ll generally be happy to trade off some of the more advanced functionality for a tool that gives rapid results.

Choose the right partners

It’s not just about the application. Is the vendor able to support your vision? Ensure your tool choice is backed up by a company that is dynamic, proactive and truly values its user community.

How does the company conduct itself? Do you as a subject matter expert feel that your opinions matter? If you’ve got an issue can you get it to the people that matter quickly? And will they take notice of you? With truly great companies you’ll find yourself getting to know the top brass and support teams. You’ll be participating in industry events and being asked to share knowledge with other customers, you might even get an award or two from them.

With the best organisations, you’ll see enhancement requests from user forums making it into new releases regularly. You’ll see offers from them to come to your organisation and help with training, demos and Q&A sessions, and they’ll be constantly interested in how you’re using the tool and the value you’re getting. Bad companies will just sell you it and then go quiet.

Build your service for ease

Your service must deliver on two key fronts. Firstly, it must allow users to express themselves, without smothering them in red tape. Secondly, it must be as easy as possible to support. Making both central to your service construction will give the best possible chance for success.

Big enterprises generally feature a lot of bureaucracy.  Users will already be dealing with enough of that on a daily basis and won’t want your service adding to it. It’s critical to be able to deliver a service that gets users onboarded quickly and with little fuss. Then once they’re onboard it’s vital that your service allows them to use the functionality of the tool quickly, easily and with as much flexibility as possible. There’s no point implementing a cool, agile BI tool and then miring your users in process.

That service also needs to be supportable. Chances are your support team will be light on bodies and pretty much flat out the whole time. To be a true Centre of Excellence you’ll want your team to be focusing on the good stuff, helping users get the best out of the tool, training people in advanced functionality and focusing on the industry best practice of the subject area. To do that you’ll need to have chosen the right infrastructure and technologies and implemented them well, supported with solid but agile IT processes.

Don’t sit back and admire

So you’ve got a great service? Don’t sit back and think how great you are. Your power users will be wanting more and more. It’s vital to have an overall BI vision. How are you going to expand your offerings and deliver even more value to your users?

I’m creating a Tableau COE. That takes care of data visualisation. But what about data modelling & management? Data integration and mining? They all form part of the overall BI stack and your users will want that. Maybe not immediately, but they’ll eventually ask the questions and to remain in control you’ll need your master plan.

Focus on community

Really successful applications / companies are backed by an almost fanatical level of community support. Making the most of this aspect, both internally and externally can turn a good service into an amazing one.

Creating a great community takes a lot of dedication. Obviously having the right tool and implementing it well is fundamental. It’s a lot easy to foster a culture of appreciation with a tool that users love to work with than with a turkey that makes their lives harder than it should be.

But get it right and you’ll see the benefits. Users will be blogging and discussing the merits of your latest functionality releases as well as suggesting their own enhancement requests. Brilliant blogs will spring up, guiding newbies and experts alike on how to get the best out of the tool and much more. This can all be replicated internally as well as externally.

Hopefully I’ll see some familiar faces at the session in October.

Best wishes, Paul

How UBS Created a Tableau Centre of Excellence – Additional Resources

Hi

If you’re reading this then the chances are you’ve just watched my talk at the London Tableau on Tour Conference in July 2014. I hope you enjoyed what I had to say. I certainly enjoyed delivering it. As mentioned in the presentation, this blog post lists all the resources referred to in the talk.

Link to Presentation – on Prezi

prezi

In order of reference

Grab me anytime @paulbanoub if you’d like a chat about anything.

Cheers, Paul