Ruby & Rails | Dmytro Shteflyuk's Home

Submitting a patch to the Open Source project: composite_primary_keys

Dmytro Shteflyuk — Thu, 17 Dec 2009 10:28:12 +0000

Not so far ago I have found a weird bug in the Open Source Ruby gem called composite_primary_keys, occurred when you specify :primary_key option for has_one or has_many association. There are two ways to get it fixed: submit an issue and wait till someone will work out this problem or fix it by yourself and then pull request to get the patch merged into the core. This is a great library and I use it in almost all my project, so I decided to help the author and fix this bug by myself. Here I will show you how to do that.

I have discovered, that composite_primary_keys breaks my SQL queries when :primary_key option specified both for has_many and has_one associations.

Step 0. Reproducing the bug

First of all we need to reproduce a bug. Please note: if you know where the problem is, you can skip this step. Let’s start from a simple example:

1	rails cpk_bug && cd cpk_bug

Now we will add dependencies to the config/environment.rb:

1	config.gem 'composite_primary_keys'

and to the config/environments/test.rb:

1
2
3

config.gem 'rspec', :lib => 'spec'
config.gem 'rspec-rails', :lib => false
config.gem 'factory_girl'

Okay. Now we are ready to start. Let’s generate some migrations:

1
2
3
4
5

script/generate rspec_model document upload_request_id:integer title:string description:text
script/generate rspec_model upload_request filename:string state:integer
script/generate rspec_model copyright_request upload_request_id:integer explanation:text
rm -rf spec/fixtures
rake db:migrate && rake db:test:clone

Let’s create our factories:

1
2
3
4
5
6
7
8
9
10
11
12
13
14

Factory.define :copyright_request do |cr|
cr.explanation "This document is copyrighted by O'Reilly"
end

Factory.define :document do |d|
d.sequence(:title) { |n| "Document #{n}" }
d.description { |a| "The perfect description for the document '#{a.title}'" }
d.association :upload_request
end

Factory.define :upload_request do |ur|
ur.sequence(:filename) { |n| "file#{'%03d' % n}.pdf" }
ur.state 0
end

And our spec which should fail:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

require 'spec/spec_helper'

describe Document do
context 'when has copyright requests' do
before :each do
# Fake upload request used to desynchronise document and
# upload request IDs
Factory(:upload_request)
end

it 'should not have any copyright requests when just created' do
@document = Factory(:document)
@document.copyright_requests.should == []
end

it 'should return a list of copyright requests from #copyright_requests' do
@document = Factory(:document)
@request1 = Factory(:copyright_request, :upload_request => @document.upload_request)

@document.copyright_requests.should == [@request1]
end
end
end

Ok, it fails because we haven’t defined our associations on models. Let’s do it:

1
2
3
4
5
6
7
8
9
10
11
12

class Document < ActiveRecord::Base
belongs_to :upload_request

has_many :copyright_requests, :primary_key => :upload_request_id, :foreign_key => :upload_request_id
end

class UploadRequest < ActiveRecord::Base
end

class CopyrightRequest < ActiveRecord::Base
belongs_to :upload_request
end

Woohoo! Specs are failing with an error we’re trying to reproduce:

1
2
3
4
5
6
7
8
9
10
11
12

~/cpk_bug$ spec spec
..F.

1)
'Document when has copyright requests should return a list of copyright requests from #copyright_requests' FAILED
expected: [#],
got: [] (using ==)
/Users/kpumuk/cpk_bug/spec/models/document_spec.rb:20:

Finished in 0.155708 seconds

4 examples, 1 failure

Step 1. Setting up an environment for composite_primary_keys gem

In general to submit a patch to the Open Source project hosted by GitHub you have to perform the following steps: fork the repository on GitHub, write tests which fail, write a patch, ensure it works, push your changes to your fork repository, and submit a pull request. Let’s do just that!

To fork the repository I will use a perfect github gem by Dr Nic (BTW, he is the author of composite_primary_keys!)

1
2
3
4

sudo gem install github
gh clone drnic/composite_primary_keys
cd composite_primary_keys
gh fork

Now let’s configure our test environment and run tests:

1
2
3

rake local:setup
rake mysql:build_databases
rake test_mysql

You should see something like this:

1
2
3
4
5
6

Using native MySQL
Started
...................................................................................
Finished in 0.487679 seconds.

83 tests, 262 assertions, 0 failures, 0 errors

If you have any problems, check the test/README_tests.txt file for help.

Step 2. Reproducing failing tests inside composite_primary_keys test suite

We are doing TDD, right? So before any fixes we have to write a failing test first. Gem we’re hacking has a powerful test suite with many database tables created, so all we need is just to add associations to one of models, which will cover our issue.

First, add this to the test/fixtures/membership.rb model:

1 2	has_many :readings, :primary_key => :user_id, :foreign_key => :user_id has_one :reading, :primary_key => :user_id, :foreign_key => :user_id, :order => 'id DESC'

And this tests set goes to the test/test_associations.rb:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

def test_has_many_with_primary_key
@membership = Membership.find([1, 1])

assert_equal 2, @membership.readings.size
end

def test_has_one_with_primary_key
@membership = Membership.find([1, 1])

assert_equal 2, @membership.reading.id
end

def test_joins_has_many_with_primary_key
@membership = Membership.find(:first, :joins => :readings, :conditions => { :readings => { :id => 1 } })

assert_equal [1, 1], @membership.id
end

def test_joins_has_one_with_primary_key
@membership = Membership.find(:first, :joins => :reading, :conditions => { :readings => { :id => 2 } })

assert_equal [1, 1], @membership.id
end

Now rake test_mysql produces following error (there are 4 of them, I will show only the first one):

1
2
3
4

1) Error:
test_has_many_with_primary_key(TestAssociations):
ActiveRecord::StatementInvalid: Mysql::Error: Operand should contain 1 column(s): SELECT * FROM `readings` WHERE (`readings`.`user_id` = 1,1)
...

Well, that are the errors we are working on. Time to fix them!

Step 3. Fixing the bug

I will not explain how I fixed that, you can check my commit for details. Here is the diff:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

diff --git a/lib/composite_primary_keys/associations.rb b/lib/composite_primary_keys/associations.rb
index 6b63664..9a9e173 100644
--- a/lib/composite_primary_keys/associations.rb
+++ b/lib/composite_primary_keys/associations.rb
@@ -180,11 +180,12 @@ module ActiveRecord::Associations::ClassMethods
raise AssociationNotSupported, "Polymorphic joins not supported for composite keys"
else
foreign_key = options[:foreign_key] || reflection.active_record.name.foreign_key
+ primary_key = options[:primary_key] || parent.primary_key
" LEFT OUTER JOIN %s ON %s " % [
table_name_and_alias,
composite_join_clause(
full_keys(aliased_table_name, foreign_key),
- full_keys(parent.aliased_table_name, parent.primary_key)),
+ full_keys(parent.aliased_table_name, primary_key)),
]
end
when :belongs_to
@@ -338,7 +339,7 @@ module ActiveRecord::Associations
@finder_sql << " AND (#{conditions})" if conditions

else
- @finder_sql = full_columns_equals(@reflection.klass.table_name, @reflection.primary_key_name, @owner.quoted_id)
+ @finder_sql = full_columns_equals(@reflection.klass.table_name, @reflection.primary_key_name, owner_quoted_id)
@finder_sql << " AND (#{conditions})" if conditions
end

@@ -386,7 +387,7 @@ module ActiveRecord::Associations
"#{@reflection.klass.quoted_table_name}.#{@reflection.options[:as]}_id = #{@owner.quoted_id} AND " +
"#{@reflection.klass.quoted_table_name}.#{@reflection.options[:as]}_type = #{@owner.class.quote_value(@owner.class.base_class.name.to_s)}"
else
- @finder_sql = full_columns_equals(@reflection.klass.table_name, @reflection.primary_key_name, @owner.quoted_id)
+ @finder_sql = full_columns_equals(@reflection.klass.table_name, @reflection.primary_key_name, owner_quoted_id)
end

@finder_sql << " AND (#{conditions})" if conditions

Run tests to get the following output:

1
2
3
4
5
6
7

~/cpk_bug/composite_primary_keys (master)$ rake test_mysql
Using native MySQL
Started
.......................................................................................
Finished in 0.511129 seconds.

87 tests, 266 assertions, 0 failures, 0 errors

We are done for now!

Step 4. Committing changes and pulling request

It’s time to commit our changes now:

1
2
3
4
5
6
7
8
9
10
11
12
13

~/cpk_bug/composite_primary_keys (master)$ git add .
~/cpk_bug/composite_primary_keys (master)$ git commit -m 'Fixed several bugs in has_one and has_many associations when :primary_key specified'
[master 3e29891] Fixed several bugs in has_one and has_many associations when :primary_key specified
3 files changed, 31 insertions(+), 3 deletions(-)
~/cpk_bug/composite_primary_keys (master)$ git push
Counting objects: 16, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 2.14 KiB, done.
Total 9 (delta 7), reused 0 (delta 0)
To git@github.com:kpumuk/composite_primary_keys.git
050d832..3e29891 HEAD -> master
~/cpk_bug/composite_primary_keys (master)$ gh home

Okay, the next step is a pull request. Last command opened a browser window with your fork. Navigate to the latest commit and press the “Pull Request” button (at the time of writing this article gh pull-request didn’t worked, and you can try to fix by yourself to understand the workflow):

That’s all, we have contributed to the community! Today Darrin Holst merged my commit into the core, and you can find it here. Not all things went smooth (I forgot to add a fixture, so tests were failing on first run), but he helped me a lot to get it working. That’s how Open Source works: we help each other to develop high quality software.

Credits

First of all, thanks to Dr Nic for the great plugin, one of the best piece of functionality I can’t imagine life without. Thanks to Darrin Holst for his patience and great help in debugging tests problem, and also for merging my commit into the composite_primary_keys core. Thanks to GitHub for the great Open Source code hosting solution, which makes working on Open Source projects so exciting.

Do you have comments or suggestions? You are welcome! Also, I will be happy if you follow me in Twitter.

The post Submitting a patch to the Open Source project: composite_primary_keys first appeared on Dmytro Shteflyuk's Home.

My top 7 RSpec best practices

Dmytro Shteflyuk — Wed, 25 Nov 2009 16:22:31 +0000

I use RSpec in all my projects. It’s really hard to overemphasize how helpful it is and how much easier becomes your life if you have good specs coverage. But its outstanding flexibility enables many ways to make your specs awful: horribly slow, over-bloated, even non-readable sometimes. I do not want to teach you BDD and RSpec here, but instead I will give you some ideas how to improve your specs quality and increase efficiency of your BDD workflow.

1. Use `before :all` block carefully

Sometimes it looks like a good idea to create a test data in before :all block. But be careful — these blocks are not wrapped in a transaction, so the data will not be rolled back after the test. In this case you should clear your data in the after :all block manually.

1
2
3
4
5
6
7
8
9
10
11
12
13

describe Friendship do
before :all do
@users = (1..5).collect { Factory(:user) }
end

after :all do
@users.each { |user| user.destroy! }
end

it 'should do something' do
# Something interesting with @users
end
end

Another option is to move your before :all blocks to before :each to make them rolled back automatically.

2. For each test create exactly what it needs

Fixtures are cool when you start working on a project. But they quickly become painful while project grows: you add a new field to a fixture and break a half of your tests. There are tons of plugins which could simplify test data creation, I personally recommend factory_girl: it’s pretty slick and easy to use.

1
2
3
4
5
6
7
8

Factory.define :user do |f|
f.sequence(:login) { |n| "user#{n}" }
f.email { |a| "#{a.login}@example.com" }
f.description "Ruby on Rails Developer"
end

# Somewhere in specs
@user = Factory(:user, :admin => true)

3. Do not create hundreds of records for a particular spec

Sometimes you want to test a method which operates on large set of records (filtering, trimming, etc). For example, this method returns 50 most popular videos (and no more). The straight approach is to create 51 record and make sure, that the size of the returned array is 50. When I saw a code snippet like this in our project first time, I was surprised. There was a few more pieces sharing this behavior, so here is my advice: add a parameter to the method, which will limit the number of records to return. In this case you can create 3 records, and pass 2 as a parameter.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

describe User do
it 'should return top users in User.top method' do
@users = (1..3).collect { Factory(:user) }
top_users = User.top(2).all
top_users.should have(2).entries
end
end

class User < ActiveRecord::Base
# Select N top users. Returns 10 entries when called without arguments.
# User.top.all.size # => 10
# User.top(2).all.size # => 2
#
named_scope :top, lambda { |*args| { :limit => (args.size > 0 ? args[0] : 10) } }
end

4. Do not over-mock

Mocking is interesting and sometimes very useful technology. You may mock just everything so you spec will not hit the database. But there is a catch: your model code may be changed some day causing callers to break. Since you mock everything, you will never get failing specs. So now you should update all your mocks to fit a new interface. Also you would not be able to find SQL queries errors if you have mocked them. Instead of this I use integration approach: controller should talk to models, which have to hit the database. Real database with real data (OK, not so real). The practice 2 can help you in test data creation.

Bad:

1
2
3
4
5
6
7
8
9
10
11
12

describe VideosController do
describe '.create action' do
it 'should assign top videos' do
params = { :title => 'new video', :description => 'video description' }
@video = mock_model(Video)
Video.should_receive(:new).and_return(@video)
@video.should_receive(:update_attributes).with(params).and_return(true)
get :index, :video => params
assigns[:video].should be(@video)
end
end
end

Good:

1
2
3
4
5
6
7
8
9
10
11

describe VideosController do
describe '.create action' do
it 'should assign top videos' do
params = { :title => 'new video', :description => 'video description' }
get :index, :video => params
assigns[:video].should_not be_new_record
assigns[:video].title.should == params[:title]
assigns[:video].description.should == params[:description]
end
end
end

But you can use mocks to skip records retrieving from the database (make sure you have specs covering corresponding model code). Let me explain this. For example, you need to render 20 entries in an RSS feed. You could create 21 record in the database using a factory, and then ensure only 20 of them were retrieved, or you could mock your finder method and check its parameter. You may not like magic numbers like 20 in this particular case, and this is a good point. Just move this magic number to the config and ensure it was used to do the retrieval.

1
2
3
4
5
6
7
8
9
10

describe VideosController do
describe '.index action' do
it 'should assign top videos' do
@videos = [mock_model(Video), mock_model(Video)]
Video.should_receive(:top).with(50).and_return(@videos)
get :index
assigns[:top_videos].should be(@videos)
end
end
end

5. Use contexts

RSpec spec specifies how particular code should work. Usually, in the beginning you tell what you are going to describe in this spec, and inside describe block you specify what the code should do:

1
2
3
4
5

describe Video do
it 'should return 5 records in Video.top method' do
Video.top.should have(5).items
end
end

Usually you have more than one it block for each method. To group related specs I recommend to use nested describe blocks. Since describe is aliased to context when placed inside another describe, I think it’s a good idea to use it for specs grouping. Each context may have its own before and after blocks (in this case parent blocks will be called right before child ones).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

describe Video do
describe '.top' do
it 'should return 5 records' do
Video.top.should have(5).items
end
end

context 'when just created' do
before :each do
@video = Video.new
end

# ...
end
end

6. Create several test suites to speed up your workflow

There are many things you can do to make your BDD more efficient. We will take a look at two of them: creating a separate test suites and running recently modified specs.

There are several standard test suites configured in RSpec by default: spec:controllers, spec:views, spec:helpers, spec:lib. Check the rake -T spec output to get a list of available RSpec tasks. Let’s create a simple Rake tasks generator for spec suites:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

SPEC_SUITES = [
{ :id => :acl, :title => 'access control', :files => %w(spec/controllers/**/acl_spec.rb) },
{ :id => :amazon, :title => 'Amazon libraries', :dirs => %w(spec/lib/amazon) }
]

namespace :spec do
namespace :suite do
SPEC_SUITES.each do |suite|
desc "Run all specs in #{suite[:title]} spec suite"
Spec::Rake::SpecTask.new(suite[:id]) do |t|
spec_files = []
if suite[:files]
suite[:files].each { |glob| spec_files += Dir[glob] }
end

if suite[:dirs]
suite[:dirs].each { |glob| spec_files += Dir["#{glob}/**/*_spec.rb"] }
end

t.spec_opts = ['--options', ""#{Rails.root}/spec/spec.opts""]
t.spec_files = spec_files
end
end
end
end

Check what tasks are available now:

1
2
3
4
5

~/test$ rake -T spec:suite

(in /Users/kpumuk/test)
rake spec:suite:acl # Run all specs in access control spec suite
rake spec:suite:amazon # Run all specs in Amazon libraries spec suite

It was easy! And now let’s take a look at the Rake task for running the recently touched specs (last 10 minutes).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

# Grab recently touched specs
def recent_specs(touched_since)
recent_specs = Dir['app/**/*'].map do |path|
if File.mtime(path) > touched_since
spec = File.join('spec', File.dirname(path).split("/")[1..-1].join('/'),
"#{File.basename(path, ".*")}_spec.rb")
spec if File.exists?(spec)
end
end.compact

recent_specs += Dir['spec/**/*_spec.rb'].select do |path|
File.mtime(path) > touched_since
end.uniq
end

namespace :spec do
desc 'Run all recent specs in spec directory touched in last 10 minutes'
Spec::Rake::SpecTask.new(:recent) do |t|
t.spec_opts = ['--options', ""#{RAILS_ROOT}/spec/spec.opts""]
t.spec_files = recent_specs(Time.now - 10.minutes)
end
end

And don’t forget to check autospec and watchr gems.

7. Stop `spec_helper` from being loaded multiple times

Just don’t do that. If you got a big project, there is a chance that the spec_helper will be required in a many different ways: File.expand_path, File.join, etc.,— which results in it being loaded several times and it slows down your test suite!

To avoid this, add the following code at the top of your spec_helper.rb:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

# figure out where we are being loaded from
if $LOADED_FEATURES.grep(/spec\/spec_helper\.rb/).any?
begin
raise "foo"
rescue => e
puts <<-MSG
===================================================
It looks like spec_helper.rb has been loaded
multiple times. Normalize the require to:

require "spec/spec_helper"

Things like File.join and File.expand_path will
cause it to be loaded multiple times.

Loaded this time from:

#{e.backtrace.join("\n ")}
===================================================
MSG
end
end

It will show where you’ve tried to load spec_helper from so you will be able to fix it immediately. Also there is an interesting snippet of code here, which will find and replace all wrong includes.

Conclusion

RSpec is not a silver bullet. You can have 100% coverage and fine-grained specs, but it does not mean your application is completely bug-free. Refactor your specs, increase your programming level, and refactor again. Write specs for any issue that you, your QAs or users have faced. And remember: do not over-mock.

Credits

This is my first article written completely in Google Wave in collaboration with several good Russian rubyists. Thank you all, guys. I want to acknowledge the editorial help of Roman Dmytrenko and Alexey Kovyrin. Robby Russell created a great picture illustrating how sexy is RSpec. And thank you all my readers for your attention.

Did you like this article? You should follow me in Twitter here.

The post My top 7 RSpec best practices first appeared on Dmytro Shteflyuk's Home.

Simplifying your Ruby on Rails code: Presenter pattern, cells plugin

Dmytro Shteflyuk — Wed, 09 Sep 2009 05:41:14 +0000

Today we will talk about code organization in Ruby on Rails projects. As everybody knows, Ruby on Rails is a conventional framework, which means you should follow framework architects’ decisions (put your controllers inside app/controllers, move all your logic into models, etc.) But there are many open questions around those conventions. In this write-up I will try to summarize my personal experience and show how I usually solve these problems.

Here is the list of questions we will talk about:

You have some logic in your view, which uses your models extensively. There are no places in other views with such logic. The classic recommendation is to move this code into a model, but after a short time your models become bloated with stupid one-off helper methods. The solution: pattern Presenter.
Your constructor contains a lot of code to retrieve some values for your views from the database or another storage. You have a lot of fragment_exist? calls to ensure no of your data is loaded when corresponding fragment is already in cache. It’s really hard to test a particular action because of it’s size. The solution: pattern Presenter.
You have a partial, used everywhere on the site. It accepts a lot of parameters to configure how rendered code should look like. The header of this partial, which initializes default values of parameters, becomes larger and larger. The solution: cells plugin.

Please note: sample application is available on GitHub.

Presenter Pattern

Okay, you have an idea when to use this patterns. Let’s look at the example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

class HomeController < ApplicationController
def show
unless fragment_exist?('home/top_videos')
@top_videos = Video.top.all(:limit => 10)
end

unless fragment_exist?('home/categories')
@categories = Category.all(:order => 'name DESC')
end

unless fragment_exist?('home/featured_videos')
@featured_videos = Video.featured.all(:limit => 5)
end

unless fragment_exist?('home/latest_videos')
@latest_videos = Video.latest.all(:limit => 5)
end
end
end

And the view:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

<h1>Home page</h1>

<div id="top_videos">
<h2>Top videos</h2>
<% cache('home/top_videos') do %>
<%= render 'videos', :videos => @top_videos, :hide_description => true %>
<% end %>
</div>

<div class="tabs">
<ul id="taxonomy">
<li><a href="#" id="categories" class="current">Categories</a></li>
</ul>
<div class="categories_panel">
<h2>Categories</h2>
<% cache('home/categories') do %>
<%= render 'categories' %>
<% end %>
</div>
</div>

<div class="box">
<div id="latest">
<h2>Latest videos</h2>
<% cache('home/latest_videos') do %>
<%= render 'videos', :videos => @latest_videos, :hide_thumbnail => true %>
<% end %>
</div>
<div id="featured">
<h2>Featured videos</h2>
<% cache('home/featured_videos') do %>
<%= render 'videos', :videos => @featured_videos, :hide_thumbnail => true %>
<% end %>
</div>
</div>

Note: this code is available in the first commit of my presenter example project.

Scary code, isn’t it? So let’s refactor it using Presenter pattern. I prefer to put presenters into a separate folder app/presenters, so first we should add it to Rails load path. Add this line to your config/environment.rb:

1
2
3

config.load_paths += %W(
#{Rails.root}/app/presenters
)

Now we are ready to write our presenter (app/presenters/home_presenters/show_presenter.rb):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

module HomePresenters
class ShowPresenter
def top_videos
@top_videos ||= Video.top.all(:limit => 10)
end

def categories
@categories ||= Category.all(:order => 'name DESC')
end

def featured_videos
@featured_videos ||= Video.featured.all(:limit => 5)
end

def latest_videos
@latest_videos ||= Video.latest.all(:limit => 5)
end
end
end

Sometimes presenters depend on parameters, so feel free to add an initialize method. It could accept particular params or whole params collection:

1
2
3

def initialize(video_id)
@video_id = video_id
end

Now let’s refactor our controller:

1
2
3
4
5

class HomeController < ApplicationController
def show
@presenter = HomePresenters::ShowPresenter.new
end
end

Whoa, that’s nice! View now is little different:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

<h1>Home page</h1>

<div id="top_videos">
<h2>Top videos</h2>
<% cache('home/top_videos') do %>
<%= render 'videos', :videos => @presenter.top_videos, :hide_description => true %>
<% end %>
</div>

<div class="tabs">
<ul id="taxonomy">
<li><a href="#" id="categories" class="current">Categories</a></li>
</ul>
<div class="categories_panel">
<h2>Categories</h2>
<% cache('home/categories') do %>
<%= render 'categories' %>
<% end %>
</div>
</div>

<div class="box">
<div id="latest">
<h2>Latest videos</h2>
<% cache('home/latest_videos') do %>
<%= render 'videos', :videos => @presenter.latest_videos, :hide_thumbnail => true %>
<% end %>
</div>
<div id="featured">
<h2>Featured videos</h2>
<% cache('home/featured_videos') do %>
<%= render 'videos', :videos => @presenter.featured_videos, :hide_thumbnail => true %>
<% end %>
</div>
</div>

Presenters testing is much easier than testing of bloated controllers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

describe HomePresenters::ShowPresenter do
before :each do
@presenter = HomePresenters::ShowPresenter.new
end

it 'should respond to :top_videos' do
expect { @presenter.top_videos }.to_not raise_error
end

it 'should respond to :categories' do
expect { @presenter.categories }.to_not raise_error
end

it 'should respond to :featured_videos' do
expect { @presenter.featured_videos }.to_not raise_error
end

it 'should respond to :latest_videos' do
expect { @presenter.latest_videos }.to_not raise_error
end
end

Please note: this code is available in the second commit of my presenter example project.

Please note: you should not do any manipulations on models in presenters. They only decorate models with helper methods to be used inside controllers or views, nothing else. There are several articles describing a Conductor pattern as a presenter, do not repeat their mistakes. See the first link in the list below to get an idea about the differences.

Cells Plugin

Okay, now we have a clean controller. But what about views? Let’s take a look at the videos partial:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

<%
hide_thumbnail = hide_thumbnail === true;
hide_description = hide_description === true;
css_class ||= 'videos'
style ||= :div
case style.to_sym
when :section
parent_tag = 'section'
child_tag = 'div'
when :list
parent_tag = 'ul'
child_tag = 'li'
else
parent_tag = 'div'
child_tag = 'div'
end
%>

<% content_tag parent_tag, :class => css_class do %>
<% videos.each do |video| %>
<% content_tag child_tag do %>
<h3><%= h video.title %></h3>
<%= image_tag(video.thumbnail_url, :class => 'thumb') unless hide_thumbnail %>
<%= '

%s</p>' % h(video.description) unless hide_description %>
<% end %>
<% end %>
<% end %>

So, what the heck? Is this a view or a controller? Remember old PHP days, with all this spaghetti code? That is it. It’s hard to test, it looks scary, it bad. So here cells plugin comes to the stage.

First, we need to install the plugin:

1	script/plugin install git://github.com/apotonick/cells.git

Now let’s generate a cell:

1	script/generate cell Video videos

And write some code (app/cells/video.rb):

1
2
3
4
5
6
7
8
9
10
11
12
13

class VideoCell < Cell::Base
def videos
@videos = @opts[:videos]
@hide_thumbnail = @opts[:hide_thumbnail] === true;
@hide_description = @opts[:hide_description] === true;
@css_class = @opts[:css_class] || 'videos'

view = (@opts[:style] || :div).to_sym
view = :div unless [:section, :list].include?(view)

render :view => "videos_#{view}"
end
end

app/cells/video/videos_section.html.erb:

1
2
3
4
5
6
7

class="<%= @css_class %>">
<% @videos.each do |video| %>
<div>
<%= render :partial => 'video', :locals => { :video => video } %>
</div>
<% end %>
</section>

app/cells/video/videos_list.html.erb:

1
2
3
4
5
6
7

<ul class="<%= @css_class %>">
<% @videos.each do |video| %>
<li>
<%= render :partial => 'video', :locals => { :video => video } %>
</li>
<% end %>
</ul>

app/cells/video/videos_div.html.erb:

1
2
3
4
5
6
7

<div class="<%= @css_class %>">
<% @videos.each do |video| %>
<div>
<%= render :partial => 'video', :locals => { :video => video } %>
</div>
<% end %>
</div>

app/cells/video/_video.html.erb:

1
2
3

<h3><%= h video.title %></h3>
<%= image_tag(video.thumbnail_url, :class => 'thumb') unless @hide_thumbnail %>
<%= '

%s</p>' % h(video.description) unless @hide_description %>

And the view:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

<h1>Home page</h1>

<div id="top_videos">
<h2>Top videos</h2>
<% cache('home/top_videos') do %>
<%= render_cell :video, :videos, :videos => @presenter.top_videos, :hide_description => true %>
<% end %>
</div>

<div class="tabs">
<ul id="taxonomy">
<li><a href="#" id="categories" class="current">Categories</a></li>
</ul>
<div class="categories_panel">
<h2>Categories</h2>
<% cache('home/categories') do %>
<%= render 'categories' %>
<% end %>
</div>
</div>

<div class="box">
<div id="latest">
<h2>Latest videos</h2>
<% cache('home/latest_videos') do %>
<%= render_cell :video, :videos, :videos => @presenter.latest_videos, :hide_thumbnail => true %>
<% end %>
</div>
<div id="featured">
<h2>Featured videos</h2>
<% cache('home/featured_videos') do %>
<%= render_cell :video, :videos, :videos => @presenter.featured_videos, :hide_thumbnail => true %>
<% end %>
</div>
</div>

Wow! That’s pretty easy to read and modify. All the logic is in the code now, all the views are easy to read, and moreover: it’s more than easy to test now! I have a little plugin called rspec-cells, and I have committed a patch yesterday to get it working with the latest RSpec. Here is how you spec could look like:

1
2
3
4
5
6
7
8
9

describe VideoCell do
context '.videos' do
it 'should initialize :videos variable' do
videos = mock('Videos')
render_cell :videos, :videos => videos
assigns[:videos].should be(videos)
end
end
end

So it looks almost like a classic Ruby on Rails controller spec. I hope to review the code in nearest feature and will send a pull request to the cells plugin author. Of course, if you
found a bug, feel free to contact me.

Please note: this code is available in the third commit of my presenter example project.

Creating a simple but powerful profiler for Ruby on Rails

Dmytro Shteflyuk — Wed, 26 Aug 2009 21:23:15 +0000

You are developing a large Web application. Controllers are full of complex data retrieving logic, views contain tons of blocks, partials, loops. One day you will receive an email with user complaints about some of your pages slowness. There are many profiling tools, some of them are easy (ruby-prof), others are large and complex (newrelic), but regardless of this it’s really hard to find the particular place where you have a real bottleneck. So we created really simple, but über-useful tool for ruby code profiling.

First of all, we need to decide what features we need from this tool. Don’t know about you, but all I need is to measure execution time of particular ruby code block. Here is what I mean:

1
2
3
4
5
6
7
8
9
10
11
12
13

[home#index] debug: Logged in user home page
[home#index] progress: 0.7002 s [find top videos]
[home#index] progress: 0.0452 s [build categories list]
[home#index] progress: 0.0019 s [build tag cloud]
[home#index] progress: 0.0032 s [find featured videos]
[home#index] progress: 0.0324 s [find latest videos]
[home#index] debug: VIEW STARTED
[home#index] progress: 0.0649 s [top videos render]
[home#index] progress: 0.0014 s [categories render]
[home#index] progress: 2.5887 s [tag cloud render]
[home#index] progress: 0.0488 s [latest videos render]
[home#index] progress: 0.1053 s [featured video render]
[home#index] results: 3.592 seconds

So what do we see from this output? There are two slow blocks: top videos retrieving and tag cloud rendering. Now we just know what to do to make this page faster.

Let’s write the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

module EasyProfiler
class Profile
@@profile_results = {}

cattr_accessor :enable_profiling
@@enable_profiling = false

cattr_accessor :print_limit
@@print_limit = 0.01

def self.start(name, options = {})
options[:enabled] ||= @@enable_profiling
options[:limit] ||= @@print_limit
return NoProfileInstance.new unless options[:enabled]

if @@profile_results[name]
puts "EasyProfiler::Profile.start() collision! '#{name}' is already started!"
return NoProfileInstance.new
end

@@profile_results[name] = ProfileInstance.new(name, options)
end

def self.stop(name, options = {})
options[:enabled] ||= @@enable_profiling
options[:limit] ||= @@print_limit
return unless options[:enabled]

unless @@profile_results[name]
puts "EasyProfiler::Profile.stop() error! '#{name}' is not started yet!"
return false
end

total = @@profile_results[name].total

if total > options[:limit]
@@profile_results[name].buffer_checkpoint("results: %0.4f seconds" % total)
@@profile_results[name].dump_results
end

@@profile_results.delete(name)
end
end

class ProfileInstance
def initialize(name, options = {})
@name = name
@start = @progress = Time.now.to_f
@buffer = []
end

def progress(message)
progress = (now = Time.now.to_f) - @progress
@progress = now
buffer_checkpoint("progress: %0.4f seconds [#{message}]" % progress)
end

def debug(message)
@progress = Time.now.to_f
buffer_checkpoint("debug: #{message}")
end

def total
Time.now.to_f - @start
end

def buffer_checkpoint(message)
@buffer << message
end

def dump_results
profile_logger.info("[#{@name}] Benchmark results:")
@buffer.each do |message|
profile_logger.info("[#{@name}] #{message}")
end
end

def profile_logger
root = Object.const_defined?(:RAILS_ROOT) ? "#{RAILS_ROOT}/log" : File.dirname(__FILE__)
@profile_logger ||= Logger.new(root + '/profile.log')
end
end

class NoProfileInstance
def progress(message)
end

def debug(message)
end
end
end

We have defined two class attributes: EasyProfiler::Profile.enable_profiling (to be able to disable or enable profiler globally) and EasyProfiler::Profile.print_limit (to filter out from log code blocks that are fast enough).

Then we defined two methods, which accept name of profile session (for example, “home#index”), and hash of options. Possible options are :enabled (to enable profiling of particular block) and :limit (limit in seconds to filter out fast code fragments).

Method start returns an instance of profiler, which will be used to print check points. It contains two useful methods: debug (to display custom message) and progress (to display a message along with time spent since last checkpoint). Both methods define a new checkpoint.

To simplify usage, let’s create a helper:

1
2
3
4
5
6
7

module Kernel
def easy_profiler(name, options = {})
yield EasyProfiler::Profile.start(name, options)
ensure
EasyProfiler::Profile.stop(name, options)
end
end

And now example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

class HomeController < ApplicationController
def index
easy_profiler('home#index', :enabled => profile_request?, :limit => 2) do |p|
p.progress 'logged in user home page'

@top_videos = Video.top(:limit => 10)
p.progress 'find top videos'

@categories = Category.all(:order => 'name DESC')
p.progress 'build categories list'

@tag_cloud = Tag.tag_cloud(:limit => 200)
p.progress 'build tag cloud'

@featured_videos = Video.featured(limit => 5)
p.progress 'find featured videos'

@latest_videos = Video.latest(:limit => 5)
p.progress 'find latest videos'

@profiler = p
p.debug 'VIEW STARTED'
end
end

private

# Method returns +true+ if current request should ouput profiling information
def profile_request?
params['_with_profiling'] == 'yes'
end
end

and view:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

<div id="top_videos">
<%= render :partial => 'top_videos' %>
<% @profiler.progress 'top videos render' %>
</div>

<div class="tabs">
<ul id="taxonomy">
<li><a href="#" id="categories" class="current">Categories</a></li>
<li><a href="#" id="tags">Tags</a></li>
</ul>
<div class="categories_panel">
<%= render :partial => 'categories' %>
<% @profiler.progress 'categories render' %>
</div>
<div class="categories_panel hidden">
<%= render :partial => 'tag_cloud' %>
<% @profiler.progress 'tag cloud render' %>
</div>
</div>

<div class="box">
<div id="latest">
<%= render :partial => 'videos', :videos => @latest_videos %>
<% @profiler.progress 'latest videos render' %>
</div>
<div id="featured">
<%= render :partial => 'videos', :videos => @featured_videos %>
<% @profiler.progress 'featured video render' %>
</div>
</div>

As you can see from this example, profiler will be enabled only when you pass a _with_profiling parameter with value yes: http://example.com/home?_with_profiling=yes.

That’s all. If you have any question, feel free to post a comment or contact me.

Update: I have created a Rails plugin called easy-prof, which is hosted on GitHub. It’s more powerful and feature complete, so feel free to grab sources and play with it by yourself (check the RDoc documentation at rdoc.info). Do not forget to drop me a line about your feelings.

The post Creating a simple but powerful profiler for Ruby on Rails first appeared on Dmytro Shteflyuk's Home.

Memo #6: Using named routes and url_for outside the controller in Ruby on Rails

Dmytro Shteflyuk — Thu, 16 Jul 2009 12:11:50 +0000

Sometimes we need to write small console scripts which do some background processing on our models. In Scribd we are using Loops to run these tasks (check Alexey Kovyrin’s introduction post). One of our scripts supposed to generate some sort of html reports, and we need to generate links to our site there.

In Ruby on Rails we are using routes to generate any sort of links. So let’s include routing mechanism into our own script or class.

First, you need to ensure that Rails core is loaded (if you haven’t done this earlier; for example, in script/console you should not do this). I’m assuming you’re creating a script under /script folder:

1
2
3

ENV['RAILS_ENV'] ||= 'production'
require File.dirname(__FILE__) + '/../config/boot'
require RAILS_ROOT + '/config/environment'

Now you need to include ActionController::UrlWriter module, which allows to write URLs from arbitrary places in your codebase, and configure default_url_options[:host]:

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

# this is slow because all routes and resources being calculated now
include ActionController::UrlWriter
default_url_options[:host] = 'www.example.com'

# map.connect ':controller/:action/:id'
url_for(:controller => 'folders', :action => 'show', :id => Folder.first)
# => "http://www.example.com/folders/2"

# map.resources :folders
folders_url
# => "http://www.example.com/folders"
folder_url(Folder.first)
# => "http://www.example.com/folders/2"
edit_folder_url(Folder.first)
# => "http://www.example.com/folders/2/edit"

# you can use relative paths too
folders_path
# => "/folders"

Easy and helpful technique. Enjoy!

The post Memo #6: Using named routes and url_for outside the controller in Ruby on Rails first appeared on Dmytro Shteflyuk's Home.

Memo #5: Use ary.uniq method carefully in Ruby

Dmytro Shteflyuk — Wed, 15 Jul 2009 10:55:17 +0000

Today my friend asked me to help him with an unexpected behavior of Ruby’s Hash.uniq method. Here is an example:

1
2
3
4
5
6

[{"id"=>667824693}, {"id"=>667824693}].uniq
# => [{"id"=>667824693}, {"id"=>667824693}]
[{"id"=>66782469}, {"id"=>66782469}].uniq
# => [{"id"=>66782469}]
[{"id"=>6678246931}, {"id"=>6678246931}].uniq
# => [{"id"=>6678246931}]

Check the first command result. Very disappointing, right? So what happen? Quick looking through the Ruby code completely explained it. Here is how this method works internally (this is just prototype in Ruby, original code is in C, but works in the same way):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

def ary_make_hash(ary, val)
ary.inject({}) do |hash, el|
hash[el] = val
hash
end
end

def uniq(ary)
ary = ary.dup
ary.uniq!
ary
end

def uniq!(ary)
hash = ary_make_hash(ary, 0)
return nil if ary.length == hash.length

j = 0
(0...ary.length).each do |idx|
if hash.delete(ary[idx])
ary[j] = ary[idx]
j += 1
end
end
ary.slice!(0, j)
ary
end

Let’s test it:

1
2
3
4
5
6

uniq([{"id"=>667824693}, {"id"=>667824693}])
# => [{"id"=>667824693}, {"id"=>667824693}]
uniq([{"id"=>66782469}, {"id"=>66782469}])
# => [{"id"=>66782469}]
uniq([{"id"=>6678246931}, {"id"=>6678246931}])
# => [{"id"=>6678246931}]

And just to make sure our conclusions are correct:

1
2
3
4
5
6

[{"id"=>667824693}, {"id"=>667824693}].map { |el| el.hash }
# => [29793216, 29793156]
[{"id"=>66782469}, {"id"=>66782469}].map { |el| el.hash }
# => [255119887, 255119887]
[{"id"=>6678246931}, {"id"=>6678246931}].map { |el| el.hash }
# => [482552381, 482552381]

So the idea behind the Hash.uniq method is the method Hash.hash, which produces different results for hashes in the first example. Be careful when doing obj.uniq on complex objects.

Update: There is a good research on Hash.hash method here.

The post Memo #5: Use ary.uniq method carefully in Ruby first appeared on Dmytro Shteflyuk's Home.

Memo #4: Managing Ruby Gems

Dmytro Shteflyuk — Mon, 13 Apr 2009 10:07:43 +0000

The power of Ruby is not only in its flexibility. It allows to create easy to maintain reusable parts of software, and also provides a way to redistribute them and integrate with your applications — RubyGems system. The only thing that could hurt developer’s brain is managing installed gems. When you are updating already installed gem, previous version will stay in gems folder and will be available to use. But why do you need all these obsolete libraries? There is a command to cleanup stale libraries in RubyGems — gem cleanup.

First thing I want to mention is gems documentation. Every gem in system has powerful documentation based on rdoc, built during an installation procedure. The easiest way to view these docs is to run gem server and open http://localhost:8808/ in browser:

1	gem server

You will be able to review which gems has been installed, and check their documentation.

I’m pretty sure you have lots of gems with the same name, but different versions. You could use the following command to remove all gem versions except the latest one:

1	gem cleanup

That’s all. You have only latest gems installed. But wait, what about Max OS X users, who have pre-installed Ruby? They (actually we) are in trouble, because…

Fixing problem with `gem cleanup` on Mac OS X 10.5.x (Leopard)

On Mac OS X when you will try to use this command, you will hit following error:

1 2	Attempting to uninstall sqlite3-ruby-1.2.1 ERROR: While executing gem ... (Gem::InstallError)

That’s common problem and I will show how to fix it. First, run following command:

1
2
3
4
5
6
7
8
9
10
11
12

kpumuk@kpumuk-mbp~: gem list -d sqlite3

*** LOCAL GEMS ***

sqlite3-ruby (1.2.4, 1.2.1)
Author: Jamis Buck
Homepage: http://sqlite-ruby.rubyforge.org/sqlite3
Installed at (1.2.4): /Library/Ruby/Gems/1.8
(1.2.1): /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8

SQLite3/Ruby is a module to allow Ruby scripts to interface with a
SQLite3 database.

You may now uninstall the offending gem with:

1	gem uninstall --install-dir /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/gems/1.8 sqlite3-ruby

So, in general

1	gem list -d

will get you the location of the gem, and

1	gem uninstall --install-dir

will uninstall the gem (found here.)

The post Memo #4: Managing Ruby Gems first appeared on Dmytro Shteflyuk's Home.

Memo #3: Advanced usage of Ruby hashes and arrays

Dmytro Shteflyuk — Fri, 23 Jan 2009 00:59:58 +0000

One of the most used features in any programming language is a Hash. Today we are going to talk about some of the Ruby’s Hash features, which are well documented, but rarely used — parameters of the Hash constructor. In the second part of this article we will take a look at the arguments of the Array class’ constructor.

Take a look at the following example.

1
2
3
4
5
6

a = %w(apple banana apple)
h = a.inject({}) do |h, fruit|
h[fruit] ||= 0
h[fruit] += 1
h
end

Here we have an array of fruits and we need to calculate a frequency of each fruit. As you can see, in the line 3 we are initializing frequency value to 0 if there are was no fruit with this name before. We can simplify this code:

1
2
3
4
5

a = %w(apple banana apple)
h = a.inject(Hash.new(0)) do |h, fruit|
h[fruit] += 1
h
end

In line 2 we are creating a new hash, which default value is 0. This means that if we would try to retrieve value for a non-existing key, 0 would be returned.

Let’s check another example:

1
2
3
4
5
6

a = %w(apple banana apple)
h = {}
a.each_with_index do |fruit, i|
h[fruit] ||= []
h[fruit] << i
end

Here we are collecting indexes of each fruit in the source array. But now we can’t just create a new hash and pass [] as the default value, because all keys in this hash will refer to the same array, so in result we will get an array [1, 2, 3] for each fruit. So let’s try the following:

1
2
3
4
5

a = %w(apple banana apple)
h = Hash.new { |h, key| h[key] = [] }
a.each_with_index do |fruit, i|
h[fruit] << i
end

In this case we are creating a new array object for any non-existing key, that was accessed. So

1	h['some non-existing key']

will return [] and save it in the hash. When you will hit this key next time, previously created array will be returned.

You can pass a block to Array constructor too. For example, you need an array with 10 random numbers:

1 2	a = [] 10.times { a << rand(100) }

You can simplify it using map method:

1	a = (1..10).map { rand(100) }

But you can do it even easier:

1	a = Array.new(10) { rand(100) }

Next Memo will cover managing Ruby Gems, so stay tuned.

The post Memo #3: Advanced usage of Ruby hashes and arrays first appeared on Dmytro Shteflyuk's Home.

Memo #1: Installing mysql and memcached gems on Mac OS X with MacPorts

Dmytro Shteflyuk — Mon, 22 Dec 2008 19:20:39 +0000

I have not posted anything here for a long time. It’s hard to start blogging again, so I will write a short tips and tricks series called “Memo“. Today I’m going to talk about two Ruby gems I’m using in all my Ruby on Rails project: mysql and memcached. Every time I try to install or update those gems on Mac OS X following error occurs:

1
2
3

Building native extensions. This could take a while...
ERROR: Error installing mysql:
ERROR: Failed to build gem native extension.

And then I’m googling on how to install these gems. It’s time simplify my life and post commands here.

Installing the ruby mysql gem on Mac OS X and MacPorts

Installing mysql5 from MacPorts:

1	sudo port install mysql5

Now we can install mysql gem:

1
2
3
4

kpumuk@kpumuk-mbp~: sudo gem install mysql -- --with-mysql-config=/opt/local/bin/mysql_config5
Building native extensions. This could take a while...
Successfully installed mysql-2.7
1 gem installed

Installing the ruby memcached gem on Mac OS X and MacPorts

First you need to install memcached and libmemcached:

1	sudo port install memcached libmemcached

And then memcached gem:

1
2
3
4

kpumuk@kpumuk-mbp~: sudo env ARCHFLAGS="-arch i386" gem install memcached --no-ri --no-rdoc -- --with-libmemcached-dir=/opt/local
Building native extensions. This could take a while...
Successfully installed memcached-0.12
1 gem installed

If you have any questions that could be covered in this series — ask me in comments.

The post Memo #1: Installing mysql and memcached gems on Mac OS X with MacPorts first appeared on Dmytro Shteflyuk's Home.

Sphinx Client API 0.3.1 and 0.4.0 r909 for Sphinx 0.9.8 r909 released

Dmytro Shteflyuk — Sun, 09 Dec 2007 19:33:10 +0000

I have a good news: Sphinx Client API has been updated and now it supports all brand new features of the unstable Sphinx 0.9.8 development snapshot. What does it mean for you as a developer? What features you will get if you would decide to switch to the new version? I will describe most valuable improvements of the Sphinx in this article, and will show how to use them with new Sphinx Client API 0.4.0 r909.

Multi-query support
Extended engine V2
64-bit document and word IDs support
Multiple-valued attributes
Geodistance feature
Download

Multi-query support

What does it mean? Multi-query support means sending multiple search queries to Sphinx at once. It’s saving network connection overheads and other round-trip costs. But what’s much more important, it unlocks possibilities to optimize “related” queries internally. Here is quote from the Sphinx home page:

One typical Sphinx usage pattern is to return several different “views” on the search results. For instance, one might need to display per-category match counts along with product search results, or maybe a graph of matches over time. Yes, that could be easily done earlier using the grouping features. However, one had to run the same query multiple times, but with different settings.

From now on, if you submit such queries through newly added multi-query interface (as a side note, ye good olde Query() interface is not going anywhere, and compatibility with older clients should also be in place), Sphinx notices that the full-text search query is the same and it is just sorting/grouping settings which are different. In this case it only performs expensive full-text search once, but builds several different (differently sorted and/or grouped) result sets from retrieved matches. I’ve seen speedups of 1.5-2 times on my simple synthetic queries; depending on different factors, the speedup could be even greater in practice.

To perform multi-query you should add several queries using AddQuery method (parameters are exactly the same as in Query call), and then call RunQueries. Please note, that all parameters, filters, query settings are stored between AddQuery calls. It means that if you have specified sort mode using SetSortMode before first AddQuery call, then sort mode will be the same for the second AddQuery call. Currently you can reset only filters (using ResetFilters) and group by (ResetGroupBy) settings. BTW, you can use Query as usually to perform single query, but don’t try to make this call after you have added query into the batch using AddQuery.

Stop speaking, let’s look the example:

1
2
3
4
5
6
7
8
9
10

sphinx = Sphinx::Client.new
sphinx.SetFilter('group_id', [1])
sphinx.AddQuery('wifi')

sphinx.ResetFilters
sphinx.SetFilter('group_id', [2])
sphinx.AddQuery('wifi')

results = sphinx.RunQueries
pp results

As the result we will get array of 2 hashes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

[{"total_found"=>2,
"status"=>0,
"matches"=>
[{"attrs"=>{"group_id"=>1, "created_at"=>1175658647}, "weight"=>2, "id"=>3},
{"attrs"=>{"group_id"=>1, "created_at"=>1175658490}, "weight"=>1, "id"=>1}],
"error"=>"",
"words"=>{"wifi"=>{"hits"=>6, "docs"=>3}},
"time"=>"0.000",
"attrs"=>{"group_id"=>1, "created_at"=>2},
"fields"=>["name", "description"],
"total"=>2,
"warning"=>""},
{"total_found"=>1,
"status"=>0,
"matches"=>
[{"attrs"=>{"group_id"=>2, "created_at"=>1175658555}, "weight"=>2, "id"=>2}],
"error"=>"",
"words"=>{"wifi"=>{"hits"=>6, "docs"=>3}},
"time"=>"0.000",
"attrs"=>{"group_id"=>1, "created_at"=>2},
"fields"=>["name", "description"],
"total"=>1,
"warning"=>""}]

Each hash contains the same data as result of Query method call. Also they have additional fields error and warning which contains error and warning message respectively when not empty.

Note: I have added ResetFilters call before creating second query. Without this call our query will have two filters with conflicting conditions, so there will be no results at all.

Extended engine V2

New querying engine (codenamed “extended engine V2”) is going to gradually replace all the currently existing matching modes. At the moment, it is fully identical to extended mode in functionality, but is much less CPU intensive for some queries. Here are notes from Sphinx author:

I have already seen improvements of up to 3-5 times in extreme cases. The only currently known case when it’s slower is processing complex extended queries with tens to thousands keywords; but forthcoming optimizations will fix that.

V2 engine is currently in alpha state and does not affect any other matching mode yet. Temporary SPH_MATCH_EXTENDED2 mode was added to provide a way to test it easily. We are in the middle of extensive internal testing process (under simulated production load, and then actual production load) right now. Your independent testing results would be appreciated, too!

So, to use new matching mode we should use SPH_MATCH_EXTENDED2 mode. Let’s do it!

1
2
3

sphinx = Sphinx::Client.new
sphinx.SetMatchMode(Sphinx::Client::SPH_MATCH_EXTENDED2)
sphinx.Query('wifi')

Easy enough, right? You should try it by yourself to feel power of new engine. Please note, that this mode is temporary and it will be removed after release.

64-bit document and word IDs support

Before version 0.9.8 the Sphinx was limited to index up to 4 billion documents because of using 32-bit keys. From here on it has ability to use 64-bit IDs, and new feature does not impact on 32-bit keys performance. Let’s look at the example. First we will make query to DB with 32-bit keys:

1
2
3

sphinx = Sphinx::Client.new
result = sphinx.Query('wifi')
pp result['matches'][0]['id'].class

As you can see, class of the id field is Fixnum. Let’s try to make call to index with 64-bit keys. You will get Bignum as the result, and it means that you can have more than 4 billion documents!

Multiple-valued attributes

Plain attributes only allow to attach 1 value per each document. However, there are cases (such as tags or categories) when it is necessary to attach multiple values of the same attribute and be able to apply filtering to value lists. In these cases we can use multiple-valued attributes now.

1
2
3

sphinx = Sphinx::Client.new
sphinx.SetFilter('tag', [1,5])
pp sphinx.Query('wifi')

In case of using miltiple-valued attribute tag you will get result like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

{"total_found"=>2,
"status"=>0,
"matches"=>
[{"attrs"=>
{"tag"=>[4, 5],
"group_id"=>2,
"created_at"=>1175658555},
"weight"=>2,
"id"=>2},
{"attrs"=>
{"tag"=>[1, 2, 3],
"group_id"=>1,
"created_at"=>1175658490},
"weight"=>1,
"id"=>1}],
"error"=>"",
"words"=>{"wifi"=>{"hits"=>6, "docs"=>3}},
"time"=>"0.000",
"attrs"=>
{"price"=>5,
"tag"=>1073741825,
"is_active"=>4,
"group_id"=>1,
"created_at"=>2},
"fields"=>["name", "description"],
"total"=>2,
"warning"=>""}

As you can see, multiple-valued attributes returned as array of integers.

Geodistance feature

Sphinx now is able to compute geographical distance between two points specified by latitude and longitude pairs (in radians). So you now can specify per-query “anchor point” (and attribute names to fetch per-entry latitude and longitude from), and then use “@geodist” virtual attribute both in the filters and in the sorting clause. In this case distance (in meters) from anchor point to each match will be computed, used for filtering and/or sorting, and returned as a virtual attribute too.

1
2
3

sphinx = Sphinx::Client.new
sphinx.SetGeoAnchor('lat', 'long', 0.87248, 0.63195)
result = sphinx.Query('wifi')

Download

As always, you can download Sphinx Client API from project home page. Take into account that version 0.3.1 of the client API intended to use with Sphinx 0.9.7, and Sphinx Client API 0.4.0 r909 requires Sphinx 0.9.8 r909 development snapshot. You could download Sphinx from the Download section of the Sphinx home page.

The post Sphinx Client API 0.3.1 and 0.4.0 r909 for Sphinx 0.9.8 r909 released first appeared on Dmytro Shteflyuk's Home.

Ruby & Rails | Dmytro Shteflyuk's Home

Submitting a patch to the Open Source project: composite_primary_keys

Step 0. Reproducing the bug

Step 1. Setting up an environment for composite_primary_keys gem

Step 2. Reproducing failing tests inside composite_primary_keys test suite

Step 3. Fixing the bug

Step 4. Committing changes and pulling request

Credits

My top 7 RSpec best practices

1. Use before :all block carefully

2. For each test create exactly what it needs

3. Do not create hundreds of records for a particular spec

4. Do not over-mock

5. Use contexts

6. Create several test suites to speed up your workflow

7. Stop spec_helper from being loaded multiple times

Conclusion

Credits

Simplifying your Ruby on Rails code: Presenter pattern, cells plugin

Presenter Pattern

Cells Plugin

Creating a simple but powerful profiler for Ruby on Rails

Memo #6: Using named routes and url_for outside the controller in Ruby on Rails

Memo #5: Use ary.uniq method carefully in Ruby

Memo #4: Managing Ruby Gems

Fixing problem with gem cleanup on Mac OS X 10.5.x (Leopard)

Memo #3: Advanced usage of Ruby hashes and arrays

Memo #1: Installing mysql and memcached gems on Mac OS X with MacPorts

Installing the ruby mysql gem on Mac OS X and MacPorts

Installing the ruby memcached gem on Mac OS X and MacPorts

Sphinx Client API 0.3.1 and 0.4.0 r909 for Sphinx 0.9.8 r909 released

Table of contents

Multi-query support

Extended engine V2

64-bit document and word IDs support

Multiple-valued attributes

Geodistance feature

Download

1. Use `before :all` block carefully

7. Stop `spec_helper` from being loaded multiple times

Fixing problem with `gem cleanup` on Mac OS X 10.5.x (Leopard)