Using Sphinx search engine in Ruby on Rails

Posted by Dmytro Shteflyuk on under MySQL, Ruby & Rails · Русский (43,826 views)

Almost all Web-applications needs data search logic and really often this logic should have full-text search capabilities. If you are using MySQL database, you can use its FULLTEXT search, but it’s not efficient when you have a large amout of data. In this case third party search engines used, and one of them (and I think, the most efficient) is Sphinx. In this article I’ll present my port of Sphinx client library for Ruby and show how to use it.

First of all, what is the Sphinx itself? Sphinx is a full-text search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL, or from an XML pipe.

Current Sphinx distribution includes the following software:

  • indexer: an utility to create fulltext indices;
  • search: a simple (test) utility to query fulltext indices from command line;
  • searchd: a daemon to search through fulltext indices from external software (such as Web scripts);
  • sphinxapi: a set of API libraries for popular Web scripting languages (currently, PHP);

I will not describe how to install engine, if you are new with Sphinx, look the official documentation (but if you want to see my vision, you can always ask me in comments, and I will explain installation procedure in one of future posts). Instead I will present port of Sphinx client library to Ruby and show how to use it (to use this library you need Sphinx 0.9.7-RC2).

First you need to download plugin from RubyForge, or from this site.

This is Ruby on Rails plugin, therefor just unpack it in your <app>/vendor/plugins directory (library can be used outside the Rails application). Now you can write something like following in your code:

1
2
3
4
5
6
7
8
9
10
11
sphinx = Sphinx.new
sphinx.set_match_mode(Sphinx::SPH_MATCH_ANY)
result = sphinx.query('term1 term2')

# Fetch corresponding models
ids = result[:matches].map { |id, value| id }.join(',')
posts = Post.find :all, :conditions => "id IN (#{ids})"

# Get excerpts
docs = posts.map { |post| post.body }
excerpts = sphinx.build_excerpts(docs, 'index', 'term1 term2')

It’s pretty simple, isn’t it? There are several options you can use to get more relevant search results:

  • set_limits(offset, limit) – first document to fetch and number of documents.
  • set_match_mode(mode) – matching mode (can be SPH_MATCH_ALL – match all words, SPH_MATCH_ANY – match any of words, SPH_MATCH_PHRASE – match exact phrase, SPH_MATCH_BOOLEAN – match boolean query).
  • set_sort_mode(mode) – sorting mode (can be SPH_SORT_RELEVANCE – sort by document relevance desc, then by date, SPH_SORT_ATTR_DESC – sort by document date desc, then by relevance desc, SPH_SORT_ATTR_ASC – sort by document date asc, then by relevance desc, SPH_SORT_TIME_SEGMENTS – sort by time segments (hour/day/week/etc) desc, then by relevance desc).

Other options you can be found in API documentation.

If you are interested with this library, found bugs or have ideas how to improve it – please leave comments.

Updated: Unfortunately, there are no Windows binaries for latest Sphinx 0.9.7-rc2 version. I’ve built Sphinx for Windows, and added my config file into archive. You can download my build here

37 Responses to this entry

Subscribe to comments with RSS

said on September 10, 2007 at 5:12 am · Permalink
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Index: vendor/plugins/sphinx/lib/client.rb
===================================================================
--- vendor/plugins/sphinx/lib/client.rb (revision 5885)
+++ vendor/plugins/sphinx/lib/client.rb (working copy)
@@ -391,18 +391,20 @@
       count = response[p, 4].unpack('N*').first; p += 4
       
       # read matches
-      result['matches'] = {}
+      result['matches'] = []
       while count > 0 and p < max
         count -= 1
         doc, weight = response[p, 8].unpack('N*N*'); p += 8
   
-        result['matches'][doc] ||= {}
-        result['matches'][doc]['weight'] = weight
+        doc_data = {}
+        doc_data['weight'] = weight
         attrs_names_in_order.each do |attr|
           val = response[p, 4].unpack('N*').first; p += 4
-          result['matches'][doc]['attrs'] ||= {}
-          result['matches'][doc]['attrs'][attr] = val
+          doc_data['attrs'] ||= {}
+          doc_data['attrs'][attr] = val
         end
+        
+        result['matches'] << [doc, doc_data]
       end
       result['total'], result['total_found'], msecs, words = response[p, 16].unpack('N*N*N*N*'); p += 16
       result['time'] = '%.3f' % (msecs / 1000.0)
tolya @
said on September 11, 2008 at 12:54 pm · Permalink

Привет, Всем!

У меня появился вопрос по Sphinx, помогите пожалуйста найти решение.

У меня есть следующая структура в конфигурационном файле:

sphinx.conf:

1
2
3
4
5
6
7
8
source sphinx_users_main
source sphinx_users_delta : sphinx_users_main
source sphinx_spaces_main
source sphinx_spaces_delta : sphinx_spaces_main
index users_main
index users_delta : users_main
index spaces_main
index spaces_delta : spaces_main

Такая структура была придумана мной для того, чтоб можно было при поиске получать ID по отдельной таблицы(указав по какому индексу с конфигурационного файла производить поиск).

Все, вроде как, корректно работает:

search -a test

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
Sphinx 0.9.8-release (r1371)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/usr/local/etc/sphinx.conf'...
index 'users_main': query 'test ': returned 14 matches of 14 total in 0.000 sec

displaying matches:
1. document=3592, weight=2
2. document=4178, weight=2
3. document=4179, weight=2
4. document=4181, weight=2
5. document=6192, weight=2
6. document=2807, weight=1
7. document=3593, weight=1
8. document=4717, weight=1
9. document=4740, weight=1
10. document=6090, weight=1
11. document=6196, weight=1
12. document=6218, weight=1
13. document=6219, weight=1
14. document=6220, weight=1

words:
1. 'test': 14 documents, 19 hits

index 'users_delta': query 'test ': returned 0 matches of 0 total in 0.000 sec

words:
1. 'test': 0 documents, 0 hits

index 'spaces_main': query 'test ': returned 17 matches of 17 total in 0.000 sec

displaying matches:
1. document=937, weight=1
2. document=940, weight=1
3. document=942, weight=1
4. document=943, weight=1
5. document=944, weight=1
6. document=945, weight=1
7. document=964, weight=1
8. document=983, weight=1
9. document=984, weight=1
10. document=985, weight=1
11. document=986, weight=1
12. document=987, weight=1
13. document=988, weight=1
14. document=989, weight=1
15. document=990, weight=1
16. document=991, weight=1
17. document=992, weight=1

words:
1. 'test': 17 documents, 17 hits

index 'spaces_delta': query 'test ': returned 0 matches of 0 total in 0.000 sec

words:
1. 'test': 0 documents, 0 hits

Но вот не могу понять, как с помощью Sphinx организовать поиск по указанному мной индексу, как например я это делаю с консоли:

search -i spaces_main -a test

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Sphinx 0.9.8-release (r1371)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/usr/local/etc/sphinx.conf'...
index 'spaces_main': query 'test ': returned 17 matches of 17 total in 0.000 sec

displaying matches:
1. document=937, weight=1
2. document=940, weight=1
3. document=942, weight=1
4. document=943, weight=1
5. document=944, weight=1
6. document=945, weight=1
7. document=964, weight=1
8. document=983, weight=1
9. document=984, weight=1
10. document=985, weight=1
11. document=986, weight=1
12. document=987, weight=1
13. document=988, weight=1
14. document=989, weight=1
15. document=990, weight=1
16. document=991, weight=1
17. document=992, weight=1

words:
1. 'test': 17 documents, 17 hits

Подскажите мне пожалуйста, как это можно организовать?

Спасибо

said on September 11, 2008 at 2:51 pm · Permalink

Второй параметр метода Query – название индекса, по которому искать:

1
sphinx.Query('test', 'spaces_main');
tolya @
said on September 12, 2008 at 2:27 pm · Permalink

Спасибо большое за ответ.

Подскажите пожалуйста, как я могу в Sphinx изменить шаблон, по которому мне возвращается результат запроса?
Например в результате запроса: sphinx.Query(‘test’)
я хотел бы, чтоб в результате я мог бы получить кроме всего прочего: test16, test_12, hello@test.com.

Спасибо

Anatoliy @
said on September 24, 2008 at 5:23 pm · Permalink

Привет, всем!!!

Подскажите пожалуйста, как в sphinx реализовать такой же поиск, какой бы например был бы при ‘…LIKE %name%…’

Спасибо

More comments: 1 2

Comments are closed

Comments for this entry are closed for a while. If you have anything to say – use a contact form. Thank you for your patience.