Sphinx Client API 0.3.1 and 0.4.0 r909 for Sphinx 0.9.8 r909 released

Posted by Dmytro Shteflyuk on under Ruby & Rails

Sphinx Search Engine I have a good news: Sphinx Client API has been updated and now it supports all brand new features of the unstable Sphinx 0.9.8 development snapshot. What does it mean for you as a developer? What features you will get if you would decide to switch to the new version? I will describe most valuable improvements of the Sphinx in this article, and will show how to use them with new Sphinx Client API 0.4.0 r909.

Table of contents

Multi-query support

What does it mean? Multi-query support means sending multiple search queries to Sphinx at once. It’s saving network connection overheads and other round-trip costs. But what’s much more important, it unlocks possibilities to optimize “related” queries internally. Here is quote from the Sphinx home page:

One typical Sphinx usage pattern is to return several different “views” on the search results. For instance, one might need to display per-category match counts along with product search results, or maybe a graph of matches over time. Yes, that could be easily done earlier using the grouping features. However, one had to run the same query multiple times, but with different settings.

From now on, if you submit such queries through newly added multi-query interface (as a side note, ye good olde Query() interface is not going anywhere, and compatibility with older clients should also be in place), Sphinx notices that the full-text search query is the same and it is just sorting/grouping settings which are different. In this case it only performs expensive full-text search once, but builds several different (differently sorted and/or grouped) result sets from retrieved matches. I’ve seen speedups of 1.5-2 times on my simple synthetic queries; depending on different factors, the speedup could be even greater in practice.

To perform multi-query you should add several queries using AddQuery method (parameters are exactly the same as in Query call), and then call RunQueries. Please note, that all parameters, filters, query settings are stored between AddQuery calls. It means that if you have specified sort mode using SetSortMode before first AddQuery call, then sort mode will be the same for the second AddQuery call. Currently you can reset only filters (using ResetFilters) and group by (ResetGroupBy) settings. BTW, you can use Query as usually to perform single query, but don’t try to make this call after you have added query into the batch using AddQuery.

Stop speaking, let’s look the example:

1
2
3
4
5
6
7
8
9
10
sphinx = Sphinx::Client.new
sphinx.SetFilter('group_id', [1])
sphinx.AddQuery('wifi')

sphinx.ResetFilters
sphinx.SetFilter('group_id', [2])
sphinx.AddQuery('wifi')

results = sphinx.RunQueries
pp results

As the result we will get array of 2 hashes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[{"total_found"=>2,
  "status"=>0,
  "matches"=>
   [{"attrs"=>{"group_id"=>1, "created_at"=>1175658647}, "weight"=>2, "id"=>3},
    {"attrs"=>{"group_id"=>1, "created_at"=>1175658490}, "weight"=>1, "id"=>1}],
  "error"=>"",
  "words"=>{"wifi"=>{"hits"=>6, "docs"=>3}},
  "time"=>"0.000",
  "attrs"=>{"group_id"=>1, "created_at"=>2},
  "fields"=>["name", "description"],
  "total"=>2,
  "warning"=>""},
 {"total_found"=>1,
  "status"=>0,
  "matches"=>
   [{"attrs"=>{"group_id"=>2, "created_at"=>1175658555}, "weight"=>2, "id"=>2}],
  "error"=>"",
  "words"=>{"wifi"=>{"hits"=>6, "docs"=>3}},
  "time"=>"0.000",
  "attrs"=>{"group_id"=>1, "created_at"=>2},
  "fields"=>["name", "description"],
  "total"=>1,
  "warning"=>""}]

Each hash contains the same data as result of Query method call. Also they have additional fields error and warning which contains error and warning message respectively when not empty.

Note: I have added ResetFilters call before creating second query. Without this call our query will have two filters with conflicting conditions, so there will be no results at all.

Extended engine V2

New querying engine (codenamed “extended engine V2”) is going to gradually replace all the currently existing matching modes. At the moment, it is fully identical to extended mode in functionality, but is much less CPU intensive for some queries. Here are notes from Sphinx author:

I have already seen improvements of up to 3-5 times in extreme cases. The only currently known case when it’s slower is processing complex extended queries with tens to thousands keywords; but forthcoming optimizations will fix that.

V2 engine is currently in alpha state and does not affect any other matching mode yet. Temporary SPH_MATCH_EXTENDED2 mode was added to provide a way to test it easily. We are in the middle of extensive internal testing process (under simulated production load, and then actual production load) right now. Your independent testing results would be appreciated, too!

So, to use new matching mode we should use SPH_MATCH_EXTENDED2 mode. Let’s do it!

1
2
3
sphinx = Sphinx::Client.new
sphinx.SetMatchMode(Sphinx::Client::SPH_MATCH_EXTENDED2)
sphinx.Query('wifi')

Easy enough, right? You should try it by yourself to feel power of new engine. Please note, that this mode is temporary and it will be removed after release.

64-bit document and word IDs support

Before version 0.9.8 the Sphinx was limited to index up to 4 billion documents because of using 32-bit keys. From here on it has ability to use 64-bit IDs, and new feature does not impact on 32-bit keys performance. Let’s look at the example. First we will make query to DB with 32-bit keys:

1
2
3
sphinx = Sphinx::Client.new
result = sphinx.Query('wifi')
pp result['matches'][0]['id'].class

As you can see, class of the id field is Fixnum. Let’s try to make call to index with 64-bit keys. You will get Bignum as the result, and it means that you can have more than 4 billion documents!

Multiple-valued attributes

Plain attributes only allow to attach 1 value per each document. However, there are cases (such as tags or categories) when it is necessary to attach multiple values of the same attribute and be able to apply filtering to value lists. In these cases we can use multiple-valued attributes now.

1
2
3
sphinx = Sphinx::Client.new
sphinx.SetFilter('tag', [1,5])
pp sphinx.Query('wifi')

In case of using miltiple-valued attribute tag you will get result like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{"total_found"=>2,
 "status"=>0,
 "matches"=>
  [{"attrs"=>
     {"tag"=>[4, 5],
      "group_id"=>2,
      "created_at"=>1175658555},
    "weight"=>2,
    "id"=>2},
   {"attrs"=>
     {"tag"=>[1, 2, 3],
      "group_id"=>1,
      "created_at"=>1175658490},
    "weight"=>1,
    "id"=>1}],
 "error"=>"",
 "words"=>{"wifi"=>{"hits"=>6, "docs"=>3}},
 "time"=>"0.000",
 "attrs"=>
  {"price"=>5,
   "tag"=>1073741825,
   "is_active"=>4,
   "group_id"=>1,
   "created_at"=>2},
 "fields"=>["name", "description"],
 "total"=>2,
 "warning"=>""}

As you can see, multiple-valued attributes returned as array of integers.

Geodistance feature

Sphinx now is able to compute geographical distance between two points specified by latitude and longitude pairs (in radians). So you now can specify per-query “anchor point” (and attribute names to fetch per-entry latitude and longitude from), and then use “@geodist” virtual attribute both in the filters and in the sorting clause. In this case distance (in meters) from anchor point to each match will be computed, used for filtering and/or sorting, and returned as a virtual attribute too.

1
2
3
sphinx = Sphinx::Client.new
sphinx.SetGeoAnchor('lat', 'long', 0.87248, 0.63195)
result = sphinx.Query('wifi')

Download

As always, you can download Sphinx Client API from project home page. Take into account that version 0.3.1 of the client API intended to use with Sphinx 0.9.7, and Sphinx Client API 0.4.0 r909 requires Sphinx 0.9.8 r909 development snapshot. You could download Sphinx from the Download section of the Sphinx home page.

4 Responses to this entry

Subscribe to comments with RSS

said on January 18th, 2008 at 11:43 · Permalink

Dmytro, any plans with the new release of sphinx (as of Jan 14, 2007)?
I saw some new cool features like the ranking engine for ext2 mode.

said on January 18th, 2008 at 11:49 · Permalink

Oh, thanks, I was trying to do it myself, but I’ve got stuck while translating some of the new functions added to php api.

said on January 19th, 2008 at 18:42 · Permalink

[lang_en]Sphinx Client API 0.4.0-r1065 for Sphinx 0.9.8 r1065 released! Get it here.[/lang_en]

[lang_ru]Вышла версия Sphinx Client API 0.4.0-r1065 для Sphinx 0.9.8 r1065! Забрать можно здесь.[/lang_ru]

Comments are closed

Comments for this entry are closed for a while. If you have anything to say – use a contact form. Thank you for your patience.