Sphinx Search Engine I have a good news: Sphinx Client API has been updated and now it supports all brand new features of the unstable Sphinx 0.9.8 development snapshot. What does it mean for you as a developer? What features you will get if you would decide to switch to the new version? I will describe most valuable improvements of the Sphinx in this article, and will show how to use them with new Sphinx Client API 0.4.0 r909.

Table of contents

Multi-query support

What does it mean? Multi-query support means sending multiple search queries to Sphinx at once. It’s saving network connection overheads and other round-trip costs. But what’s much more important, it unlocks possibilities to optimize “related” queries internally. Here is quote from the Sphinx home page:

One typical Sphinx usage pattern is to return several different “views” on the search results. For instance, one might need to display per-category match counts along with product search results, or maybe a graph of matches over time. Yes, that could be easily done earlier using the grouping features. However, one had to run the same query multiple times, but with different settings.

From now on, if you submit such queries through newly added multi-query interface (as a side note, ye good olde Query() interface is not going anywhere, and compatibility with older clients should also be in place), Sphinx notices that the full-text search query is the same and it is just sorting/grouping settings which are different. In this case it only performs expensive full-text search once, but builds several different (differently sorted and/or grouped) result sets from retrieved matches. I’ve seen speedups of 1.5-2 times on my simple synthetic queries; depending on different factors, the speedup could be even greater in practice.

To perform multi-query you should add several queries using AddQuery method (parameters are exactly the same as in Query call), and then call RunQueries. Please note, that all parameters, filters, query settings are stored between AddQuery calls. It means that if you have specified sort mode using SetSortMode before first AddQuery call, then sort mode will be the same for the second AddQuery call. Currently you can reset only filters (using ResetFilters) and group by (ResetGroupBy) settings. BTW, you can use Query as usually to perform single query, but don’t try to make this call after you have added query into the batch using AddQuery.

Stop speaking, let’s look the example:

sphinx = Sphinx::Client.new
sphinx.SetFilter('group_id', [1])

sphinx.SetFilter('group_id', [2])

results = sphinx.RunQueries
pp results

As the result we will get array of 2 hashes:

   [{"attrs"=>{"group_id"=>1, "created_at"=>1175658647}, "weight"=>2, "id"=>3},
    {"attrs"=>{"group_id"=>1, "created_at"=>1175658490}, "weight"=>1, "id"=>1}],
  "words"=>{"wifi"=>{"hits"=>6, "docs"=>3}},
  "attrs"=>{"group_id"=>1, "created_at"=>2},
  "fields"=>["name", "description"],
   [{"attrs"=>{"group_id"=>2, "created_at"=>1175658555}, "weight"=>2, "id"=>2}],
  "words"=>{"wifi"=>{"hits"=>6, "docs"=>3}},
  "attrs"=>{"group_id"=>1, "created_at"=>2},
  "fields"=>["name", "description"],

Each hash contains the same data as result of Query method call. Also they have additional fields error and warning which contains error and warning message respectively when not empty.

Note: I have added ResetFilters call before creating second query. Without this call our query will have two filters with conflicting conditions, so there will be no results at all.

Extended engine V2

New querying engine (codenamed “extended engine V2”) is going to gradually replace all the currently existing matching modes. At the moment, it is fully identical to extended mode in functionality, but is much less CPU intensive for some queries. Here are notes from Sphinx author:

I have already seen improvements of up to 3-5 times in extreme cases. The only currently known case when it’s slower is processing complex extended queries with tens to thousands keywords; but forthcoming optimizations will fix that.

V2 engine is currently in alpha state and does not affect any other matching mode yet. Temporary SPH_MATCH_EXTENDED2 mode was added to provide a way to test it easily. We are in the middle of extensive internal testing process (under simulated production load, and then actual production load) right now. Your independent testing results would be appreciated, too!

So, to use new matching mode we should use SPH_MATCH_EXTENDED2 mode. Let’s do it!

sphinx = Sphinx::Client.new

Easy enough, right? You should try it by yourself to feel power of new engine. Please note, that this mode is temporary and it will be removed after release.

64-bit document and word IDs support

Before version 0.9.8 the Sphinx was limited to index up to 4 billion documents because of using 32-bit keys. From here on it has ability to use 64-bit IDs, and new feature does not impact on 32-bit keys performance. Let’s look at the example. First we will make query to DB with 32-bit keys:

sphinx = Sphinx::Client.new
result = sphinx.Query('wifi')
pp result['matches'][0]['id'].class

As you can see, class of the id field is Fixnum. Let’s try to make call to index with 64-bit keys. You will get Bignum as the result, and it means that you can have more than 4 billion documents!

Multiple-valued attributes

Plain attributes only allow to attach 1 value per each document. However, there are cases (such as tags or categories) when it is necessary to attach multiple values of the same attribute and be able to apply filtering to value lists. In these cases we can use multiple-valued attributes now.

sphinx = Sphinx::Client.new
sphinx.SetFilter('tag', [1,5])
pp sphinx.Query('wifi')

In case of using miltiple-valued attribute tag you will get result like:

     {"tag"=>[4, 5],
     {"tag"=>[1, 2, 3],
 "words"=>{"wifi"=>{"hits"=>6, "docs"=>3}},
 "fields"=>["name", "description"],

As you can see, multiple-valued attributes returned as array of integers.

Geodistance feature

Sphinx now is able to compute geographical distance between two points specified by latitude and longitude pairs (in radians). So you now can specify per-query “anchor point” (and attribute names to fetch per-entry latitude and longitude from), and then use “@geodist” virtual attribute both in the filters and in the sorting clause. In this case distance (in meters) from anchor point to each match will be computed, used for filtering and/or sorting, and returned as a virtual attribute too.

sphinx = Sphinx::Client.new
sphinx.SetGeoAnchor('lat', 'long', 0.87248, 0.63195)
result = sphinx.Query('wifi')


As always, you can download Sphinx Client API from project home page. Take into account that version 0.3.1 of the client API intended to use with Sphinx 0.9.7, and Sphinx Client API 0.4.0 r909 requires Sphinx 0.9.8 r909 development snapshot. You could download Sphinx from the Download section of the Sphinx home page.

Sphinx Search Engine 0.9.7, Ruby Client API 0.3.0 https://kpumuk.info/ruby-on-rails/sphinx-search-engine-0-9-7-ruby-client-api-0-3-0/ https://kpumuk.info/ruby-on-rails/sphinx-search-engine-0-9-7-ruby-client-api-0-3-0/#comments Thu, 05 Apr 2007 14:44:36 +0000 http://kpumuk.info/ror-plugins/sphinx-search-engine-0-9-7-ruby-client-api-0-3-0/ [lang_en] It’s happened! We all waited for Sphinx update and finally Andrew Aksyonoff has released version 0.9.7 of his wonderful search engine (who does not know about it, look my previous posts here and here). [/lang_en] [lang_ru] Свершилось! Мы все ждали обновления Sphinx, и вот наконец Andrew Aksyonoff выпустил версию 0.9.7 своего замечательного поискового движка […]

Sphinx Search EngineIt’s happened! We all waited for Sphinx update and finally Andrew Aksyonoff has released version 0.9.7 of his wonderful search engine (who does not know about it, look my previous posts here and here).



Sphinx Search EngineСвершилось! Мы все ждали обновления Sphinx, и вот наконец Andrew Aksyonoff выпустил версию 0.9.7 своего замечательного поискового движка (для тех, кто не понимает, о чем я говорю: посмотрите мои предыдущие заметки здесь и здесь).



Major Sphinx updates include:

  • separate groups sorting clause in group-by mode
  • support for 1-grams, prefix and infix indexing
  • improved documentation

Now about Sphinx Client API for Ruby. In this version I decided that it is not so good to have different interfaces in different languages (BuildExcerpts in PHP and build_excerpts in Ruby). Therefor applications which using version 0.1.0 or 0.2.0 of API should be reviewed after update. Check documentation for details.

New things in the Sphinx Ruby API:

  • Completely synchronized API with PHP version.
  • Fixed bug with processing attributes in query response (thanks to shawn).
  • Fixed bug query processing time round-up (thanks to michael).
  • 100% covered by RSpec specifications.

You could always download latest version from the Sphinx Client API for Ruby page.

If you are using Sphinx in your Ruby on Rails application, you should try acts_as_sphinx plugin.



Основные новшества Sphinx включают:

  • separate groups sorting clause in group-by mode
  • support for 1-grams, prefix and infix indexing
  • improved documentation

Теперь о Sphinx Client API для Ruby. В этой версии я решил, что нехорошо иметь разные интерфейсы в разных языка (BuildExcerpts в PHP и build_excerpts в Ruby). Потому код приложений, в которых использовали версии 0.1.0 или 0.2.0 API, необходимо пересмотреть. Детали смотрите в документации.

Изменения в Sphinx Client API для Ruby:

  • Полностью синхронизирован API с версией PHP.
  • Исправлена ошибка с обработкой атрибутов в результатах запроса (спасибо shawn).
  • Исправлена ошибка с округлением временем обработки запроса (спасибо michael).
  • Библиотека покрыта на 100% спецификациями RSpec.

Вы всегда можете загрузить последнюю версию со страницы Sphinx Client API для Ruby.

Если Вы используете Sphinx в приложении на Ruby on Rails, посмотрите плагин acts_as_sphinx.


Sphinx 0.9.7-RC2 released, Ruby API updated https://kpumuk.info/ruby-on-rails/sphinx-097-rc2-released-ruby-api-updated/ https://kpumuk.info/ruby-on-rails/sphinx-097-rc2-released-ruby-api-updated/#comments Wed, 20 Dec 2006 06:33:29 +0000 http://kpumuk.info/php/sphinx-097-rc2-released-ruby-api-updated/ Today I found that Sphinx search engine has been updated. Major new features include: extended query mode with boolean, field limits, phrases, and proximity support (eg.: @title "hello world"~10 | @body example program); extended sorting mode (eg.: @weight DESC @id ASC); combined phrase+statistical ranking which takes words frequencies into account (currently in extended mode only); […]

Today I found that Sphinx search engine has been updated. Major new features include:

  • extended query mode with boolean, field limits, phrases, and proximity support (eg.: @title "hello world"~10 | @body example program);
  • extended sorting mode (eg.: @weight DESC @id ASC);
  • combined phrase+statistical ranking which takes words frequencies into account (currently in extended mode only);
  • official Python API;
  • contributed Perl and Ruby APIs.

I have updated Sphinx Client Library along with Sphinx 0.9.7-RC2 Windows build.

Using Sphinx search engine in Ruby on Rails https://kpumuk.info/ruby-on-rails/using-sphinx-search-engine-in-ruby-on-rails/ https://kpumuk.info/ruby-on-rails/using-sphinx-search-engine-in-ruby-on-rails/#comments Sun, 26 Nov 2006 08:55:20 +0000 http://kpumuk.info/projects/ror-plugins/using-sphinx-search-engine-in-ruby-on-rails/ Almost all Web-applications needs data search logic and really often this logic should have full-text search capabilities. If you are using MySQL database, you can use its FULLTEXT search, but it’s not efficient when you have a large amout of data. In this case third party search engines used, and one of them (and I […]

Almost all Web-applications needs data search logic and really often this logic should have full-text search capabilities. If you are using MySQL database, you can use its FULLTEXT search, but it’s not efficient when you have a large amout of data. In this case third party search engines used, and one of them (and I think, the most efficient) is Sphinx. In this article I’ll present my port of Sphinx client library for Ruby and show how to use it.

First of all, what is the Sphinx itself? Sphinx is a full-text search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL, or from an XML pipe.

Current Sphinx distribution includes the following software:

  • indexer: an utility to create fulltext indices;
  • search: a simple (test) utility to query fulltext indices from command line;
  • searchd: a daemon to search through fulltext indices from external software (such as Web scripts);
  • sphinxapi: a set of API libraries for popular Web scripting languages (currently, PHP);

I will not describe how to install engine, if you are new with Sphinx, look the official documentation (but if you want to see my vision, you can always ask me in comments, and I will explain installation procedure in one of future posts). Instead I will present port of Sphinx client library to Ruby and show how to use it (to use this library you need Sphinx 0.9.7-RC2).

First you need to download plugin from RubyForge, or from this site.

This is Ruby on Rails plugin, therefor just unpack it in your <app>/vendor/plugins directory (library can be used outside the Rails application). Now you can write something like following in your code:

sphinx = Sphinx.new
result = sphinx.query('term1 term2')

# Fetch corresponding models
ids = result[:matches].map { |id, value| id }.join(',')
posts = Post.find :all, :conditions => "id IN (#{ids})"

# Get excerpts
docs = posts.map { |post| post.body }
excerpts = sphinx.build_excerpts(docs, 'index', 'term1 term2')

It’s pretty simple, isn’t it? There are several options you can use to get more relevant search results:

  • set_limits(offset, limit) – first document to fetch and number of documents.
  • set_match_mode(mode) – matching mode (can be SPH_MATCH_ALL – match all words, SPH_MATCH_ANY – match any of words, SPH_MATCH_PHRASE – match exact phrase, SPH_MATCH_BOOLEAN – match boolean query).
  • set_sort_mode(mode) – sorting mode (can be SPH_SORT_RELEVANCE – sort by document relevance desc, then by date, SPH_SORT_ATTR_DESC – sort by document date desc, then by relevance desc, SPH_SORT_ATTR_ASC – sort by document date asc, then by relevance desc, SPH_SORT_TIME_SEGMENTS – sort by time segments (hour/day/week/etc) desc, then by relevance desc).

Other options you can be found in API documentation.

If you are interested with this library, found bugs or have ideas how to improve it – please leave comments.

Updated: Unfortunately, there are no Windows binaries for latest Sphinx 0.9.7-rc2 version. I’ve built Sphinx for Windows, and added my config file into archive. You can download my build here.

