Using Sphinx search engine in Ruby on Rails

Posted by Dmytro Shteflyuk on under Ruby & Rails

Almost all Web-applications needs data search logic and really often this logic should have full-text search capabilities. If you are using MySQL database, you can use its FULLTEXT search, but it’s not efficient when you have a large amout of data. In this case third party search engines used, and one of them (and I think, the most efficient) is Sphinx. In this article I’ll present my port of Sphinx client library for Ruby and show how to use it.

First of all, what is the Sphinx itself? Sphinx is a full-text search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL, or from an XML pipe.

Current Sphinx distribution includes the following software:

  • indexer: an utility to create fulltext indices;
  • search: a simple (test) utility to query fulltext indices from command line;
  • searchd: a daemon to search through fulltext indices from external software (such as Web scripts);
  • sphinxapi: a set of API libraries for popular Web scripting languages (currently, PHP);

I will not describe how to install engine, if you are new with Sphinx, look the official documentation (but if you want to see my vision, you can always ask me in comments, and I will explain installation procedure in one of future posts). Instead I will present port of Sphinx client library to Ruby and show how to use it (to use this library you need Sphinx 0.9.7-RC2).

First you need to download plugin from RubyForge, or from this site.

This is Ruby on Rails plugin, therefor just unpack it in your <app>/vendor/plugins directory (library can be used outside the Rails application). Now you can write something like following in your code:

1
2
3
4
5
6
7
8
9
10
11
sphinx = Sphinx.new
sphinx.set_match_mode(Sphinx::SPH_MATCH_ANY)
result = sphinx.query('term1 term2')

# Fetch corresponding models
ids = result[:matches].map { |id, value| id }.join(',')
posts = Post.find :all, :conditions => "id IN (#{ids})"

# Get excerpts
docs = posts.map { |post| post.body }
excerpts = sphinx.build_excerpts(docs, 'index', 'term1 term2')

It’s pretty simple, isn’t it? There are several options you can use to get more relevant search results:

  • set_limits(offset, limit) – first document to fetch and number of documents.
  • set_match_mode(mode) – matching mode (can be SPH_MATCH_ALL – match all words, SPH_MATCH_ANY – match any of words, SPH_MATCH_PHRASE – match exact phrase, SPH_MATCH_BOOLEAN – match boolean query).
  • set_sort_mode(mode) – sorting mode (can be SPH_SORT_RELEVANCE – sort by document relevance desc, then by date, SPH_SORT_ATTR_DESC – sort by document date desc, then by relevance desc, SPH_SORT_ATTR_ASC – sort by document date asc, then by relevance desc, SPH_SORT_TIME_SEGMENTS – sort by time segments (hour/day/week/etc) desc, then by relevance desc).

Other options you can be found in API documentation.

If you are interested with this library, found bugs or have ideas how to improve it – please leave comments.

Updated: Unfortunately, there are no Windows binaries for latest Sphinx 0.9.7-rc2 version. I’ve built Sphinx for Windows, and added my config file into archive. You can download my build here.

37 Responses to this entry

Subscribe to comments with RSS

Comments are closed

Comments for this entry are closed for a while. If you have anything to say – use a contact form. Thank you for your patience.