What to name your Tableau Server?

Hello all. Yes, contrary to popular belief, I did indeed survive Tableau Conference 2017. I’ll admit it was a close thing, but I got through it!

Now back to business..

This one all started with a tweet from my good pal, the Wizard of Excel (and Tableau genius) Mr Dan Harrison (@danosirra) who was wondering what to name his Tableau server host.

This awakened the server admin in me (not that it’s ever asleep).. It also made me question Dan’s taste in music.

An example being…

And this is the point. Server naming is critical in any organisation, large or small.

Picture the situation – you get woken up at 2am by an alert or a call from an operator – server “superman” is down. Fine if you’re experienced and know the environment inside out, but what if you’ve only just joined the company. You know the servers are named after superheroes, but that’s it. Was the production server superman or spiderman? Or was it WonderWoman? Do you need to take action? It’s not clear and you need to dig into a server inventory system to get more info, and that takes time. And you’re new, so you can’t remember the URL for the inventory system.

You assume superman is your production server. But it’s not reachable via console, and you need to call someone to get into the datacenter to fix it. So you call the hardware team – sure they’ll go in, but they need to know if the server is in the London or the Birmingham datacenter? Errrr…

So you dispatch Server 5’O to fix the box, and then the next day you find out it’s a development machine and could wait. What a waste of time.

So you get the problem here. It’s very tempting to name servers after Simpsons characters or animals (I’d choose sharks obviously), but those names convey no organisational information and as such, are useless.

So what should you do?

There are tons of different naming conventions you can use. Here’s one that I like

When I get that call in the middle of the night I (or the hardware technician) will want to know..

  • Where is the server?
  • Is it production, Disaster Recovery, UAT or development?
  • What application is it? Tableau, obviously.
  • Which worker of my Tableau cluster is it?
  • What operating system is it?

So I get called and am told server lnpwtsw3.company.com is down. Let me think..

  • ln – London
  • p – Production
  • w – Windows
  • ts – Tableau Server
  • w3 – Worker 3

And with that info, I know it needs acting on now, which platform admin to call if it’s an OS issue, and where to send the hardware technician if it needs physically addressing. I also know which worker it is so what the potential impact to my cluster is.

Yes the server name is a little harder to remember, but that soon comes and it is possible to create a DNS alias to reflect a friendlier name if needed. There’s also a balance to be had in terms of character count etc.

Another example – this time server name is lndwtsw1

  • ln – London
  • d – Development
  • w – Windows
  • ts – Tableau Server
  • w1 – Worker 1

Fine, that’s development. I’ll go back to sleep and deal with it in the morning.

So as you can see, it’s pretty important. Not just for me and my Tableau environment, what if you’re a server admin in charge of 5000 servers across the globe? This needs to be right!

Anyway, that’s it. Pretty simple. For the record, Dan made his choice…

See you next time, Paul

Advertisements

Tableau Server – all about the… Backgrounder

Hello everyone.

You may know me as a Tableau Centre of Excellence manager. That can involve a lot of paperclip pushing skills, with the real work being done by my excellent team (thx @jakesviz & The Information Lab). But I do try and get down and dirty with my lovely Tableau Server environment to keep my skills fresh. Obviously I don’t mess with it – @jakesviz gets pretty protective about his Server.

This series of posts is my attempt to shed some light on the internals of Server. Note there are many more experts in this field than me (Craig Bloodworth, Mark Jackson, Jen Vaughan, Tamas Foldi, Mike Roberts, Angie Greenhaw – to name but a few) so please do comment if anything is incorrect here. Maybe you guys could help me evolve this post?

What is the backgrounder?

The backgrounder is a process that runs as part of the Tableau Server application. As the name suggests, it handles background tasks such as refreshing extracts, running subscriptions and also processes tasks initiated from tabcmd.

Here are the backgrounder processes. The .exe file and the .war file. The WAR file is a Web Application Archive, and contains all the necessary components and resources needed for a Web Application such as Backgrounder.

On a clustered environment you’ll find these files in D:\Program Files\Tableau\Tableau Server\worker.1\bin (may vary slightly with your installation).

28-01-2016 14-07-52

Other files related to the backgrounder.

Template (.templ) files – These files TBD

28-01-2016 14-13-09

There are also a few .rb files in D:\Program Files\Tableau\Tableau Server\worker.1\tabmigrate\db\migrate.

And we also have a .properties file which contains all the config entries relevant to the backgrounder. It also has almost all of the other stuff that you’d find in the main workgroup.yml file which is odd. I’d  have expected it to be just the backgrounder config.

backprops

Here is the location of the backgrounder log files

backlogs

Here are 2 instances of backgrounder.exe running on my server (from Task Manager).

28-01-2016 14-01-08

 

Can I mess with it?

Backgrounder can be configured. There are several settings present in both workgroup.yml and backgrounder.properties. Workgroup.yml is the master config file, and it populates the backgrounder.properties (and other .properties files) when a ‘tabadmin configure’ is run.

I don’t know what all of these do (yet) and the only one I’ve ever edited is ‘backgrounder.extra_timeout_in_seconds‘ which sets the max time in seconds that a backgrounder session can run for. Tableau kills off the session if this threshold is reached. Useful for forcing users to optimise their extract times!

I also pay attention to the ‘backgrounder.vmopts‘ parameter, as this defines the size of the java heap space for this component. All components have a vmopts setting and I’ve had to increase them on occasion due to out of memory problems.

You may also want to change the ‘backgrounder.log.level‘ if you need more debug info, although Tableau logs are chatty enough for me.

If there’s a golden parameter in this lot that you get value from then let me know in the comments.

backgrounder.deploy.dir: D:/Program Files/Tableau/Tableau Server/data/tabsvc/backgrounder
backgrounder.external_cache.concurrency_limit: 10
backgrounder.external_cache.enabled: true
backgrounder.external_cache.num_connections: 1
backgrounder.external_native_query_cache_disable: true
backgrounder.extra_timeout_in_seconds: 1800
backgrounder.failure_threshold_for_run_prevention: -1
backgrounder.jdbc.wg.connections: 8
backgrounder.jdbc.wg.idle_connections: 4
backgrounder.log.dir: D:/Program Files/Tableau/Tableau Server/data/tabsvc/logs/backgrounder
backgrounder.log.level: info
backgrounder.native_api.log.level: info
backgrounder.out_of_date_schedule_minutes: 240
backgrounder.ping_dataengine.millis_to_wait: 5000
backgrounder.ping_dataengine.num_retries: 24
backgrounder.ping_services.num_retries: 10000000
backgrounder.ping_services.time_to_wait: 5000
backgrounder.purge_directories.directories: ""
backgrounder.querylimit: 7200
backgrounder.restart_interval_in_minutes: 480
backgrounder.restrict_serial_collections_to_site_level: true
backgrounder.search_index_verification.enabled: true
backgrounder.sheet_image_api.max_age: 240
backgrounder.sleep_interval: 10
backgrounder.sort_jobs_by_run_time_history_observable_hours: -1
backgrounder.sort_jobs_by_type_schedule_boundary_heuristics_milliSeconds: -1
backgrounder.timeout_tasks: refresh_extracts, increment_extracts, subscription_notify, single_subscription_notify
backgrounder.tomcat.threads: 4
backgrounder.urlprefix: backgrounder
backgrounder.vmopts: -XX:+UseConcMarkSweepGC -Xmx512m

Note that Tableau Support don’t like you to edit config files manually, they recommend that you use the tabadmin set commands to change any parameters. They might have to change that recommendation when we see Tableau Server on Linux.

For more about .templ, Ruby & properties files check out this from Tamas Foldi.

 

What are the problems with the backgrounder?

Here are some of the things that can be problematic with the backgrounder.

  • Single Threaded – This means the backgrounder process can only run one thread at a time, a thread being a set of executable instructions that a process can perform. The upshot of this is that your backgrounder works through a queue of tasks one-by-one.
  • Latency – Due to the single threaded nature of backgrounder, you may see delays or ‘latency’. For example, if you have one backgrounder, and 2 tasks for it to perform at 2am, then task 2 will have to wait until task 1 has finished. If task 1 takes an hour then task 2 won’t start until 3am. This can be annoying for users that expect their data to be refreshed by a certain time.
  • Resource Intensive – The backgrounder can consume a significant amount of processing power (CPU) and input / output (I/O) on your server. This is dependent on the type of task it is performing. It’s not uncommon to see a backgrounder node consuming 100% CPU.
  • Other Stuff – The backgrounder process also does other stuff on Tableau Server that isn’t concerned with extract refreshes. For example – reaping extracts, checking disk space, synching Active Directory groups, rebuilding the search index etc. Bear that in mind when building your system. In reality these tasks don’t take up too much resource but they do take some and you should be aware.

 

Isolating the backgrounders

A common configuration in a clustered environment is to dedicate one of your worker nodes to the backgrounder processes. This means you can dial up the number of backgrounders and let them do their stuff without worrying about any impact on other processes. This is one of the most common performance recommendations from Tableau support.

You can also get a lot of info out of the Postgres DB relating to backgrounder usage and performance. Nelson Davis has posted a guide to getting started here.

 

Improvements I’d like to see

Ok so here are a bunch of improvements I’d like to see to the topic of backgrounders and extract management. I know some of this will drop in upcoming releases and some of these problems have been solved with custom solutions at some customer sites.

Alerting – Tableau Server doesn’t alert (email / IM / SMS) when a task fails. This means you’ll need to set up external monitoring to detect issues. I know Tableau are on this though so expect to see it in an upcoming version. Some people in the community have also coded their own solutions to this problem but it really should be native functionality.

Control per site – We segregate our dev / test / prod user environments using sites , all on the same server. We run 8 backgrounder processes on that server, which are shared across the tasks on all sites. As an administrator I’d really like to be able to bind backgrounder processes to specific sites. For example, 2 backgrounders on each of dev & test sites, then the other 4 dedicated to the production site. That would ensure production tasks always have enough resource to be able to execute on time.

Control per process – I’d like to be able to stop / pause / mess with individual backgrounder processes easily. It is possible – see this from Toby Erkson, but it would be good to have this as part of an administrator console or something.

Control per type / size / pattern of extracts – It would be good if I could dedicate specific backgrounder processes to particular extracts based on their characteristics. In particular I’d like to allocate one backgrounder to all the extracts that take less than 1 minute to complete. Or even use this to reward users that show diligence with their extract management by dynamically prioritizing incremental refreshes or extracts that have a low failure rate.

Better metrics – I’d like to see exactly how much CPU is taken up by a particular backgrounder process or task / schedule or per project. This would be useful for chargeback.

Dynamic reprioritisation – I love all my users. But in particular the ones that take good care of their extract refreshes. I’d like Tableau to be able to dynamically increase the priority of tasks that complete quickly, are incremental and that have a low failure rate. The message being, if you want your stuff to get the best slice of available resource then help us out with best practice.65

Disable run now – We’ve had some issues with the “run now” option that allows users to kick off an extract refresh on-demand using the UI. In particular we’ve seen some trigger-happy users bring our server down by hammering on the run now option multiple times. I’d like to disable that or maybe throttle it somehow.

Better guidelines from Tableau – The documentation from Tableau isn’t great in this area.

05-05-2016 13-09-11

I have an 8 core 128GB server and  run 8 backgrounders with no capacity issues. And I know of other organisations running way more than that. According to this doc I should be running between 2-4. That would be some serious under-utilization of the server. I’d like to see some clearer recommendations, maybe taking into account the variety of use cases that I’ve seen in other big enterprise deployments.

 

OK that’s all for now. This post could have been more detailed but I figure that I’ll get some valuable inputs from the community that will help me expand it. Actually in the time I was writing this, Mike Roberts was doing the same – check his post for some excellent info.

Cheers, Paul

 

How to PROPERLY Back Up Your Tableau Server

Hello there,

Whadda ya mean you didn't take a backup?

Whadda ya mean you didn’t take a backup?

Time for another post about Tableau Server and how to get the best out of it in a large-scale, enterprise deployment situation.

Today we are focusing on how to PROPERLY back up your Tableau Server installation.

Like many aspects of enterprise services, this is a simple concept, but one that if you get wrong, can spell disaster. It always amazes me how many people / organisations don’t do this properly or even at all.

You know how annoyed you get when your mum tells you she isn’t backing up all her family photos – well that’s what I get like when I see IT systems neglecting backups.

Note this post refers to a standalone Tableau installation with a manual failover to DR. We don’t yet have a clustered environment. I’ll update the post with considerations when we implement that.

 

What’s a backup?

Seems a simple question, and there are a number of different types of backups that you can take, each useful in different situations. Here’s what I’ve got in place:

 

 

Full System Backup

This is a complete dump of the server filesystems to disk (or tape – there’s still plenty of tape backup infra out there). Most likely it will be one of the big vendor products that look like the mothership from Close Encounters of the Third Kind.

Your full system backup should be set up by your server team when you get your machine. However, the principle of “trust no-one” applies here as always and it’s up to you to check the following:

  • Have the backups been set up at all?
  • Are they backing up all the filesystems? – Many times I’ve seen that only operating system partition backups have been set up, and I’ve had to request the application partitions be included.
  • Have the backups been succeeding? – Get your backup team to send you a report every month of backup completion. They don’t always succeed and you probably won’t be told that there has been a failure.
  • If you need to perform a restore, do you know the process and how long does it take?

If you get the okay on that then you’re good. But only as an insurance policy. Full system backups can take a long time to restore, and may only be weekly so you could end up losing data even if these are in place. It’s up to you to ensure you’re covered rather than rely on other teams doing things correctly.

 

 

Nightly Tableau Backup

There’s no excuse for not having this in place. It’s easy to set up and it is a case of when rather than if it saves your ass.

The tabadmin backup command gets Tableau Server to dump all content & configuration to a single .tsbak file. You don’t have to stop the server to do this and it doesn’t seem to impact performance too much while it is running so this should be the first backup you configure.

A simple script like this will do the job.

@echo OFF
set Binpath="D:\Program Files\Tableau\Tableau Server\9.0\bin"
set Backuppath="D:\Program Files\Tableau\Backups\nightly"
echo %date% %time%: *** Housekeeping started ***

tabadmin backup %Backuppath%\ts_backup_ -d
timeout 5

tabadmin cleanup

move "D:\Program Files\Tableau\Backups\nightly\*" \\\tableau_shr\backups\nightly\
echo %date% %time%: *** Housekeeping completed ***

The tabadmin backup command does the actual work here, dumping everything to a file. Always a good idea to run tabadmin cleanup afterwards to remove logs etc.

We run this script at a quiet time for the server (not that there is one in my global environment). We use the Windows Scheduler on the server but I’d recommend using a decent scheduler like Autosys or whatever your enterprise standard is as WTS is pretty poor.

IMPORTANT: You may have noticed the move command at the end there. That takes our newly created backup file and moves it OFF THE SERVER to a share drive accessible by my backup server. Why? Well what happens if you lose the server and your backup file is on it? You may as well have no backup. So move it somewhere else.

Update – this tip actually saved my ass this week when we lost our entire VM cluster (er.. hardware team – *cough* – what’s going on??) . We were able to failover to the backup server successfully. Going forward we will be soon implementing Tableau’s High Availability capability.

Do make sure you rotate your backup files with a script that deletes the old files or your share drive will fill up. I keep 4 days worth, just in case the current file is somehow corrupted – rare but can happen.

 

 

Weekly Restart

You may know I’m not a fan of running enterprise apps on Windows. I prefer Linux for a number of reasons that I’m not going to go into here. I know many users want Tableau Server on Linux, and the amazing Tamas Foldi has only gone and written it himself – so one day we may see it.

Anyway, with Windows apps I always build in a weekly application restart. In our case every Saturday morning. That involves a server reboot (to clean out any OS related temp stuff), application restart and a tabadmin cleanup. The tabadmin cleanup with the server stopped has the added bonus of clearing out the temp files (doesn’t happen when the server is running). These files can get pretty big so worth clearing out.

 

 

Virtual Machine Snapshots

If you’re running on a VM then you may be able to utilise the VM snapshot facility. Contact your VM admins for details. I’ve not needed to implement this but I know some that do. VM snapshots are super handy.

Do be aware that Tableau don’t seem to support this though..

Screen Shot 2015-11-11 at 19.11.48

 

 

Config File Backup

Sometimes it’s handy to just back up your Tableau Server config. I’ve got a script that grabs all the .yml and template files in my Server directory, zips them up and moves them off the server. Pretty useful to refer back to old config settings if you need to. Make sure you include workgroup.yml.

If you’re being really good then you’ll be checking your config files into a revision control repo like SVN.

 

Site Specific Backups

Tableau Server allows you to backup per site. This doesn’t give me much extra but I know in orgs that have lots of sites, or a site per team / business unit it can be very handy.

One thing that isn’t great about exporting a site is that the site is locked and inaccessible as the export is taking place. See Toby Erkson’s blog for more info on exporting a site.

 

 

Backup File Size & Duration

As your environment grows you’ll need to be mindful of the size of your backup file. Mine is around 16GB and takes well over an hour to write. Takes about 25 mins to restore. You’ll need to understand those numbers as your system matures.

unnamed

Backup files can get pretty big

Another variable that can affect backup time is the specification of your primary server. If your primary is low spec then you’re gonna get a longer time to write a backup. I don’t have any stats on that but I know it is true. Contact Jeff Mills of Tableau if you want more info on Weak Primaries & backup times.

 

 

Backup Your Logs

Less important this, but handy to do on a weekly basis is to zip up your logs. We have a much better solution for logfile management using Splunk – you’ll see a blog about that in the future.

 

 

The Most Important Bit – TEST YOUR BACKUPS

OK so you’re backing up like a man / woman possessed? Fine. You’re only as good as your last restore. So TEST your backups periodically. Files get corrupted and you don’t want to be discovering that your only backup is broken when you need it.

OK that’s it. Backups can save your life – don’t ignore them. Paranoia is king in IT!

Cheers, Paul

Talkin’ bout a Revolution…

Hello, I trust you’re all ok,

There’s something stressing me about Tableau. It might not be the most obvious but I’ll have a go at describing it anyway.

See I demo Tableau Server all the time. Like daily. To some pretty senior people that I really want to impress. These demos often involve clicking around the server views to show some of the user content. Trouble is some of my user content isn’t that great. It can be badly designed and slow to load – that’s a separate issue that my team is working on.

So I click on the view and then, there it is…..

30x30REV

Spin, spin, spin. Will it be 2 seconds or 20? Will it even load at all? The room falls quiet as all eyes settle on the spinning circle. The audience is almost hypnotised. The tension grows, until the view pops into life (or occasionally doesn’t). It’s the moment I dread as I know everyone is staring at the circle, waiting. Seems like it makes 5 seconds feel like 50.

So here are my issues with this.

  • Positioning – The spinner is bang in the middle of the screen, you can’t avoid it. It grabs your attention and also the attention of the room. Everyone starts watching it whether they want to or not.
  • Information – The spinner doesn’t give any indication of the progress of the operation. I’m not sure how it can, given the nature of the underlying queries but it’s still a problem.
  • Inconsistency – Often the spin rate slows slightly just prior to completion. But sometimes it speeds up again. So I think it’s about to complete, then it carries on. I know it’s just an animated gif but it still seems to occur occasionally.
  • Errors – I’ve had occasions when the connection is lost with the server for whatever reason, but the circle keeps going. That’s very misleading.

I think everyone accepts that applications which perform loading operations will generally have some sort of indicator. But I think the spinning circle can be improved.

facebook_standard_loading_animation

Users blamed the system

facebook_custom_loading_animation

Users blamed the app

The psychology of waiting is certainly an interesting subject. This post refers to a study conducted by Facebook that seems to suggest the type of indicator used can affect how the user perceives the problem. Admittedly the post doesn’t provide an accurate source for this but I thought it interesting nonetheless.

.

Screen Shot 2015-01-09 at 22.57.05

There are many recommendations to making the waiting game feel less of a drag

There’s plenty of chat on how to make that waiting game feel less of a drag to users. This post from UXMovement.com has a number of suggestions such as

  • Use backwards moving ribbings
  • Increase number of pulsations
  • Accelerate progress, avoid pauses

Also take a look at this article by Chris Harrison. The associated video shows the theory in action and there’s a detailed study of the theories available.

.

I wonder how much thought Tableau have given the format and positioning of the spinning circle as well as the science of perceived performance as opposed to actual performance. I think the consensus is that a progress bar is the best option but I know that won’t happen as Tableau can’t easily know the duration of a query. However, there are plenty of recommendations out there that may be worth considering. Why not make use of every trick in the book to make users feel they’re getting a faster experience?

There has been some chat on the Tableau forums about this (thanks @johncmunoz). It seems such an insignificant component but why not add as much polish to the tool as possible? Anyway, I’m no expert on this so maybe someone who knows the subject can supply more info.

Until then I’ll just think of appropriate 90’s synth-pop as the circle spins.

Regards, Paul