Advanced Capistrano usage

One of the most important parts of a development process is an application deployment. There are many tools developed to make this process easy and painless: from the simple inploy to a complex all-in-one chef-based solutions. My tool of choice is Capistrano, simple and incredibly flexible piece of software. Today I’m going to talk about some advanced Capistrano usage scenarios.

1. Graceful Passenger restarts

Passenger user guide contains a simple Capistrano recipe for application server restarts. It works pretty well in almost all the cases, but there is a huge problem when you use a multi-server setup: it restarts all Passengers at the same time, so all client requests will hang (or even drop) during the time needed to start your application. The simplest solution is to restart Passengers one by one with some shift in time (for example, 15 seconds — choose this value based on how long it take to get your application up and running), so at any given moment only one of your application servers will be unavailable. In this case Haproxy (you use it, don’t you?) won’t send any requests to the restarting server, and most of your users will continue their work without any troubles.

Let me show you how we could achieve this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

namespace :deploy do
desc <<-EOF
Graceful passengers restarts. By default, it restarts \
passengers on servers with a 15 interval, but \
this delay could be changed with the smart_restart_delay \
variable (in seconds). If you specify 0, the restart will be \
performed on all your servers immediately.

cap production deploy:smart_restart

Yet another way to restart passenger immediately everywhere is \
to specify NOW environment variable:

NOW=1 cap production deploy:smart_restart
EOF
task :smart_restart, :roles => :app do
delay = fetch(:smart_restart_delay, 15).to_i
delay = 0 if ENV['NOW']

if delay <= 0
logger.debug "Restarting passenger"
run "touch #{shared_path}/restart.txt"
else
logger.debug "Greaseful passengers restart with #{delay} seconds delay"
parallel(:roles => :app, :pty => true, :shell => false) do |session|
find_servers(:roles => :app).each_with_index do |server, idx|
# Calculating restart delay for this server
sleep_time = idx * delay
time_window = sleep_time > 0 ? "after #{sleep_time} seconds delay" : 'immediately'

# Restart command sleeps a given number of seconds and the touches the restart.txt file
touch_cmd = sleep_time > 0 ? "sleep #{sleep_time} && " : ''
touch_cmd << "touch #{shared_path}/restart.txt && echo [`date`] Restarted Passenger #{time_window}"
restart_cmd = "nohup sh -c '(#{touch_cmd}) &' 2>&1 >> #{current_release}/log/restart.log"

# Run restart command on a given server
session.when "server.host == '#{server.host}'", restart_cmd
end
end
end
end
end

The trickiest part is at the lines 25-26. There we use the parallel method to run all our commands in parallel, but it has a great limitation: there is no way to substitute command parts on the fly based on server where the command is going to be executed. So instead we are building a condition for each server in the :app role, and calculate time shift based on its index.

Sometimes it’s necessary to perform an immediate restart (for example, a database migration breaks old code). We use an environment variable to do this: cap production deploy:restart NOW=1

2. Generating deployment stages on the fly in multi-stage environments

In Scribd we use a single QA box for testing, with multiple configured applications on it. The only difference between corresponding deployment scripts is an application path (e.g. /var/www/apps/qa/01, /var/www/apps/qa/02, etc.) So how do we keep them DRY? First we have created a single deployment stage called qa, and deployed with cap qa deploy QAID=1. Works, but smells bad. Today’s version is much more elegant, but it took some effort to implement:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

(1..10).each do |idx|
qid = '%02d' % idx
name = "qa#{qid}"
stages << name

desc "Set the target stage to `#{name}'."
task(name) do
location = fetch(:stage_dir, "config/deploy")
set :stage, :qa
set :qa_id, qid
load "#{location}/qa"
end
end
# This is a tricky part. We need to re-define [cci]multistage:ensure[/cci] callback
# (which is simply raises an exception), so it will not be executed for our newly
# defined stages.
if callbacks[:start]
idx = callbacks[:start].index { |callback| callback.source == 'multistage:ensure' }
callbacks[:start].delete_at(idx)
on :start, 'multistage:ensure', :except => stages + ['multistage:prepare']
end

In the qa stage script we set the :deploy_to variable from :qa_id. Now we can deploy using cap qa01 deploy. I leave the implementation of cap qa deploy, which selects a free QA box and then performs deploy there, up to you (check the Hint 4: Deploy locks explaining how to prevent stealing QA boxes by overwriting deployments using a simple locks technique).

3. Campfire notifications

This is the most straightforward and easy to implement feature:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

begin
gem 'tinder', '>= 1.4.0'
require 'tinder'
rescue Gem::LoadError => e
puts "Load error: #{e}"
abort "Please update tinder, your version is out of date: 'gem install tinder -v 1.4.0'"
end

namespace :campfire do
desc "Send a message to the campfire chat room"
task :snitch do
campfire = Tinder::Campfire.new 'SUBDOMAIN', :ssl => true, :token => 'YOUR_TOKEN'
room = campfire.find_room_by_name 'YOUR ROOM'
snitch_message = fetch(:snitch_message) { ENV['MESSAGE'] || abort('Capfire snitch message is missing. Use set :snitch_message, "Your message"') }
room.speak(snitch_message)
end

desc "Send a message to the campfire chat room about the deploy start"
task :snitch_begin do
set :snitch_message, "BEGIN DEPLOY [#{stage.upcase}]: #{ENV['USER']}, #{branch}/#{real_revision[0, 7]} to #{deploy_to}"
snitch
end

desc "Send a message to the campfire chat room about the deploy end"
task :snitch_end do
set :snitch_message, "END DEPLOY [#{stage.upcase}]: #{ENV['USER']}, #{branch}/#{real_revision[0, 7]} to #{deploy_to}"
snitch
end

desc "Send a message to the campfire chat roob about the rollback"
task :snitch_rollback do
set :snitch_message, "ROLLBACK [#{stage.upcase}]: #{ENV['USER']}, #{latest_revision[0, 7]} to #{previous_revision[0, 7]} on #{deploy_to}"
snitch
end
end

#############################################################
# Hooks
#############################################################

before :deploy do
campfire.snitch_begin unless ENV['QUIET'].to_i > 0
end

after :deploy do
campfire.snitch_end unless ENV['QUIET'].to_i > 0
end

before 'deploy:rollback', 'campfire:snitch_rollback'

To deploy without notifications use cap production deploy QUIET=1 (but be careful, usually it’s not a good idea).

4. Deploy locks

Sometimes it’s useful to lock deploys to a specific stage. The most common reason is that you pushed a heavy migration to the master and want to run it yourself, before the actual deploy, or performing some production servers maintenance and want to be sure nobody will interfere with your work.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

namespace :deploy do
desc "Prevent other people from deploying to this environment"
task :lock, :roles => :web do
check_lock
msg = ENV['MESSAGE'] || ENV['MSG'] ||
fetch(:lock_message, 'Default lock message. Use MSG=msg to customize it')
timestamp = Time.now.strftime("%m/%d/%Y %H:%M:%S %Z")
lock_message = "Deploys locked by #{ENV['USER']} at #{timestamp}: #{msg}"
put lock_message, "#{shared_path}/system/lock.txt", :mode => 0644
end

desc "Check if deploys are OK here or if someone has locked down deploys"
task :check_lock, :roles => :web do
# We use echo in the end to reset exit code when lock file is missing
# (without it deployment will fail on this command — not exactly what we expected)
data = capture("cat #{shared_path}/system/lock.txt 2>/dev/null;echo").to_s.strip

if data != '' and !(data =~ /^Deploys locked by #{ENV['USER']}/)
logger.info "\e[0;31;1mATTENTION:\e[0m #{data}"
if ENV['FORCE']
logger.info "\e[0;33;1mWARNING:\e[0m You have forced the deploy"
else
abort 'Deploys are locked on this machine'
end
end
end

desc "Remove the deploy lock"
task :unlock, :roles => :web do
run "rm -f #{shared_path}/system/lock.txt"
end
end

before :deploy, :roles => :web do
deploy.check_lock
end

Now use can use cap production deploy:lock MSG="Running heavy migrations".

5. Generating servers list on the fly

Another interesting and sometimes pretty useful task is to fetch the list of servers for a deploy from some external service. For example, you have an application cloud, and do not want to change your deployment script every time you add, remove, or disable a node. Well, I have a good news for you: it’s easy!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

namespace :deploy do
task :set_nodes_from_remote_resource do
# Here you will fetch the list of servers from somewhere
nodes = %w(app01 app02 app03)

# Clear servers lists of :app and :db roles
roles[:app].clear
roles[:db].clear

# Fill :app role servers lists
nodes.each do |node|
parent.role :app, node
end

# First server in list is a primary node and db node (to run migrations)
primary = roles[:app].first
primary.options[:primary] = true
roles[:db].push(primary)

# Show information in log about where we are going to deploy to
nodes_to_deploy = roles[:app].servers.map do |server|
opts = server.options[:primary] ? ' (primary, db)' : ''
"#{server.host}#{opts}"
end.join(', ')

logger.info "Deploying to #{nodes_to_deploy}"
end
end

on :start, 'deploy:set_nodes_from_remote_resource'

When you run cap production deploy, something like this will be printed to your console:

1
2
3

triggering start callbacks for `deploy'
* executing `deploy:set_nodes_from_remote_resource'
** Deploying to app01 (primary, db), app02, app03

That’s all for today. Deployment automation could be a really tricky task, but with a right tool it turns out to be a pleasure. Do you have any questions, suggestions, or some other example deployment recipes? Do me a favor, put them in a comment! Also I have (surprise!) a Twitter account @kpumuk, and you simply must follow me there. No excuses!

michael

said on September 14th, 2011 at 05:08 · Permalink · Reply

1	data = capture("cat #{shared_path}/system/lock.txt 2>/dev/null;echo").to_s.strip

That is a dirty line of code – I think you want to check if the file exists first, and do an exit 1; at the end. Something similar to this:

1	data = capture("if [ -f #{shared_path}/system/lock.txt ]; then cat #{shared_path}/system/lock.txt; else exit 1;fi")

I have not tested, but that should get you just about what you were going for.

Dmytro Shteflyuk

said on November 4th, 2011 at 17:10 · Permalink · Reply

There is no reason to fail if the file is not exists. It is not a failure situation, so it’s totally ok to treat both missing file and empty file as no-lock.

Bash

said on July 10th, 2013 at 22:55 · Permalink · Reply

Thank you very much for item #5 – generating server lists on the fly. This was a huge help to me!

The only change I needed to make was to add :web role node creation inside the loop that already creates :app nodes.

Dmytro Shteflyuk's Home