You may know me as a Tableau Centre of Excellence manager. That can involve a lot of paperclip pushing skills, with the real work being done by my excellent team (thx @jakesviz & The Information Lab). But I do try and get down and dirty with my lovely Tableau Server environment to keep my skills fresh. Obviously I don’t mess with it – @jakesviz gets pretty protective about his Server.
This series of posts is my attempt to shed some light on the internals of Server. Note there are many more experts in this field than me (Craig Bloodworth, Mark Jackson, Jen Vaughan, Tamas Foldi, Mike Roberts, Angie Greenhaw – to name but a few) so please do comment if anything is incorrect here. Maybe you guys could help me evolve this post?
What is the backgrounder?
The backgrounder is a process that runs as part of the Tableau Server application. As the name suggests, it handles background tasks such as refreshing extracts, running subscriptions and also processes tasks initiated from tabcmd.
Here are the backgrounder processes. The .exe file and the .war file. The WAR file is a Web Application Archive, and contains all the necessary components and resources needed for a Web Application such as Backgrounder.
On a clustered environment you’ll find these files in D:\Program Files\Tableau\Tableau Server\worker.1\bin (may vary slightly with your installation).
Other files related to the backgrounder.
Template (.templ) files – These files TBD
There are also a few .rb files in D:\Program Files\Tableau\Tableau Server\worker.1\tabmigrate\db\migrate.
And we also have a .properties file which contains all the config entries relevant to the backgrounder. It also has almost all of the other stuff that you’d find in the main workgroup.yml file which is odd. I’d have expected it to be just the backgrounder config.
Here is the location of the backgrounder log files
Here are 2 instances of backgrounder.exe running on my server (from Task Manager).
Can I mess with it?
Backgrounder can be configured. There are several settings present in both workgroup.yml and backgrounder.properties. Workgroup.yml is the master config file, and it populates the backgrounder.properties (and other .properties files) when a ‘tabadmin configure’ is run.
I don’t know what all of these do (yet) and the only one I’ve ever edited is ‘backgrounder.extra_timeout_in_seconds‘ which sets the max time in seconds that a backgrounder session can run for. Tableau kills off the session if this threshold is reached. Useful for forcing users to optimise their extract times!
I also pay attention to the ‘backgrounder.vmopts‘ parameter, as this defines the size of the java heap space for this component. All components have a vmopts setting and I’ve had to increase them on occasion due to out of memory problems.
You may also want to change the ‘backgrounder.log.level‘ if you need more debug info, although Tableau logs are chatty enough for me.
If there’s a golden parameter in this lot that you get value from then let me know in the comments.
backgrounder.deploy.dir: D:/Program Files/Tableau/Tableau Server/data/tabsvc/backgrounder
backgrounder.log.dir: D:/Program Files/Tableau/Tableau Server/data/tabsvc/logs/backgrounder
backgrounder.timeout_tasks: refresh_extracts, increment_extracts, subscription_notify, single_subscription_notify
backgrounder.vmopts: -XX:+UseConcMarkSweepGC -Xmx512m
Note that Tableau Support don’t like you to edit config files manually, they recommend that you use the tabadmin set commands to change any parameters. They might have to change that recommendation when we see Tableau Server on Linux.
For more about .templ, Ruby & properties files check out this from Tamas Foldi.
What are the problems with the backgrounder?
Here are some of the things that can be problematic with the backgrounder.
- Single Threaded – This means the backgrounder process can only run one thread at a time, a thread being a set of executable instructions that a process can perform. The upshot of this is that your backgrounder works through a queue of tasks one-by-one.
- Latency – Due to the single threaded nature of backgrounder, you may see delays or ‘latency’. For example, if you have one backgrounder, and 2 tasks for it to perform at 2am, then task 2 will have to wait until task 1 has finished. If task 1 takes an hour then task 2 won’t start until 3am. This can be annoying for users that expect their data to be refreshed by a certain time.
- Resource Intensive – The backgrounder can consume a significant amount of processing power (CPU) and input / output (I/O) on your server. This is dependent on the type of task it is performing. It’s not uncommon to see a backgrounder node consuming 100% CPU.
- Other Stuff – The backgrounder process also does other stuff on Tableau Server that isn’t concerned with extract refreshes. For example – reaping extracts, checking disk space, synching Active Directory groups, rebuilding the search index etc. Bear that in mind when building your system. In reality these tasks don’t take up too much resource but they do take some and you should be aware.
Isolating the backgrounders
A common configuration in a clustered environment is to dedicate one of your worker nodes to the backgrounder processes. This means you can dial up the number of backgrounders and let them do their stuff without worrying about any impact on other processes. This is one of the most common performance recommendations from Tableau support.
You can also get a lot of info out of the Postgres DB relating to backgrounder usage and performance. Nelson Davis has posted a guide to getting started here.
Improvements I’d like to see
Ok so here are a bunch of improvements I’d like to see to the topic of backgrounders and extract management. I know some of this will drop in upcoming releases and some of these problems have been solved with custom solutions at some customer sites.
Alerting – Tableau Server doesn’t alert (email / IM / SMS) when a task fails. This means you’ll need to set up external monitoring to detect issues. I know Tableau are on this though so expect to see it in an upcoming version. Some people in the community have also coded their own solutions to this problem but it really should be native functionality.
Control per site – We segregate our dev / test / prod user environments using sites , all on the same server. We run 8 backgrounder processes on that server, which are shared across the tasks on all sites. As an administrator I’d really like to be able to bind backgrounder processes to specific sites. For example, 2 backgrounders on each of dev & test sites, then the other 4 dedicated to the production site. That would ensure production tasks always have enough resource to be able to execute on time.
Control per process – I’d like to be able to stop / pause / mess with individual backgrounder processes easily. It is possible – see this from Toby Erkson, but it would be good to have this as part of an administrator console or something.
Control per type / size / pattern of extracts – It would be good if I could dedicate specific backgrounder processes to particular extracts based on their characteristics. In particular I’d like to allocate one backgrounder to all the extracts that take less than 1 minute to complete. Or even use this to reward users that show diligence with their extract management by dynamically prioritizing incremental refreshes or extracts that have a low failure rate.
Better metrics – I’d like to see exactly how much CPU is taken up by a particular backgrounder process or task / schedule or per project. This would be useful for chargeback.
Dynamic reprioritisation – I love all my users. But in particular the ones that take good care of their extract refreshes. I’d like Tableau to be able to dynamically increase the priority of tasks that complete quickly, are incremental and that have a low failure rate. The message being, if you want your stuff to get the best slice of available resource then help us out with best practice.65
Disable run now – We’ve had some issues with the “run now” option that allows users to kick off an extract refresh on-demand using the UI. In particular we’ve seen some trigger-happy users bring our server down by hammering on the run now option multiple times. I’d like to disable that or maybe throttle it somehow.
Better guidelines from Tableau – The documentation from Tableau isn’t great in this area.
I have an 8 core 128GB server and run 8 backgrounders with no capacity issues. And I know of other organisations running way more than that. According to this doc I should be running between 2-4. That would be some serious under-utilization of the server. I’d like to see some clearer recommendations, maybe taking into account the variety of use cases that I’ve seen in other big enterprise deployments.
OK that’s all for now. This post could have been more detailed but I figure that I’ll get some valuable inputs from the community that will help me expand it. Actually in the time I was writing this, Mike Roberts was doing the same – check his post for some excellent info.