I think it was pretty nice and successful, because I got many questions (really, really good ones), and instead of one-hour introduction we spent more than two hours in discussions and talks. I was enjoyed.
Below you could find slides from my talk (in English) and video recorded during the session (in Russian).
Scribd Architecture Overview by Dmytro Shteflyuk on Scribd
The video has been split into two parts.
X1 Tech Talks #1: Scribd Architecture Overview, Part 1 from Dmytro Shteflyuk on Vimeo.
X1 Tech Talks #1: Scribd Architecture Overview, Part 2 from Dmytro Shteflyuk on Vimeo.
In this short recording, you could see all guests who took part in this event. Many thanks to all of you, guys.
X1 Tech Talks #1: All Guests from Dmytro Shteflyuk on Vimeo.
Hope, you will enjoy this talk too. Your comments, suggestions, and ideas about upcoming talks are welcome.
Feedbacks on this talk:
The post X1 Tech Talks #1: Scribd Architecture Overview first appeared on Dmytro Shteflyuk's Home.]]>1 2 3 | Building native extensions. This could take a while... ERROR: Error installing mysql: ERROR: Failed to build gem native extension. |
And then I’m googling on how to install these gems. It’s time simplify my life and post commands here.
Installing mysql5 from MacPorts:
1 | sudo port install mysql5 |
Now we can install mysql gem:
1 2 3 4 | kpumuk@kpumuk-mbp~: sudo gem install mysql -- --with-mysql-config=/opt/local/bin/mysql_config5 Building native extensions. This could take a while... Successfully installed mysql-2.7 1 gem installed |
First you need to install memcached and libmemcached:
1 | sudo port install memcached libmemcached |
And then memcached gem:
1 2 3 4 | kpumuk@kpumuk-mbp~: sudo env ARCHFLAGS="-arch i386" gem install memcached --no-ri --no-rdoc -- --with-libmemcached-dir=/opt/local Building native extensions. This could take a while... Successfully installed memcached-0.12 1 gem installed |
If you have any questions that could be covered in this series — ask me in comments.
The post Memo #1: Installing mysql and memcached gems on Mac OS X with MacPorts first appeared on Dmytro Shteflyuk's Home.]]>
What does it mean? Multi-query support means sending multiple search queries to Sphinx at once. It’s saving network connection overheads and other round-trip costs. But what’s much more important, it unlocks possibilities to optimize “related” queries internally. Here is quote from the Sphinx home page:
One typical Sphinx usage pattern is to return several different “views” on the search results. For instance, one might need to display per-category match counts along with product search results, or maybe a graph of matches over time. Yes, that could be easily done earlier using the grouping features. However, one had to run the same query multiple times, but with different settings.
From now on, if you submit such queries through newly added multi-query interface (as a side note, ye good olde Query() interface is not going anywhere, and compatibility with older clients should also be in place), Sphinx notices that the full-text search query is the same and it is just sorting/grouping settings which are different. In this case it only performs expensive full-text search once, but builds several different (differently sorted and/or grouped) result sets from retrieved matches. I’ve seen speedups of 1.5-2 times on my simple synthetic queries; depending on different factors, the speedup could be even greater in practice.
To perform multi-query you should add several queries using AddQuery method (parameters are exactly the same as in Query call), and then call RunQueries. Please note, that all parameters, filters, query settings are stored between AddQuery calls. It means that if you have specified sort mode using SetSortMode before first AddQuery call, then sort mode will be the same for the second AddQuery call. Currently you can reset only filters (using ResetFilters) and group by (ResetGroupBy) settings. BTW, you can use Query as usually to perform single query, but don’t try to make this call after you have added query into the batch using AddQuery.
Stop speaking, let’s look the example:
1 2 3 4 5 6 7 8 9 10 | sphinx = Sphinx::Client.new sphinx.SetFilter('group_id', [1]) sphinx.AddQuery('wifi') sphinx.ResetFilters sphinx.SetFilter('group_id', [2]) sphinx.AddQuery('wifi') results = sphinx.RunQueries pp results |
As the result we will get array of 2 hashes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | [{"total_found"=>2, "status"=>0, "matches"=> [{"attrs"=>{"group_id"=>1, "created_at"=>1175658647}, "weight"=>2, "id"=>3}, {"attrs"=>{"group_id"=>1, "created_at"=>1175658490}, "weight"=>1, "id"=>1}], "error"=>"", "words"=>{"wifi"=>{"hits"=>6, "docs"=>3}}, "time"=>"0.000", "attrs"=>{"group_id"=>1, "created_at"=>2}, "fields"=>["name", "description"], "total"=>2, "warning"=>""}, {"total_found"=>1, "status"=>0, "matches"=> [{"attrs"=>{"group_id"=>2, "created_at"=>1175658555}, "weight"=>2, "id"=>2}], "error"=>"", "words"=>{"wifi"=>{"hits"=>6, "docs"=>3}}, "time"=>"0.000", "attrs"=>{"group_id"=>1, "created_at"=>2}, "fields"=>["name", "description"], "total"=>1, "warning"=>""}] |
Each hash contains the same data as result of Query method call. Also they have additional fields error and warning which contains error and warning message respectively when not empty.
Note: I have added ResetFilters call before creating second query. Without this call our query will have two filters with conflicting conditions, so there will be no results at all.
New querying engine (codenamed “extended engine V2”) is going to gradually replace all the currently existing matching modes. At the moment, it is fully identical to extended mode in functionality, but is much less CPU intensive for some queries. Here are notes from Sphinx author:
I have already seen improvements of up to 3-5 times in extreme cases. The only currently known case when it’s slower is processing complex extended queries with tens to thousands keywords; but forthcoming optimizations will fix that.
V2 engine is currently in alpha state and does not affect any other matching mode yet. Temporary SPH_MATCH_EXTENDED2 mode was added to provide a way to test it easily. We are in the middle of extensive internal testing process (under simulated production load, and then actual production load) right now. Your independent testing results would be appreciated, too!
So, to use new matching mode we should use SPH_MATCH_EXTENDED2 mode. Let’s do it!
1 2 3 | sphinx = Sphinx::Client.new sphinx.SetMatchMode(Sphinx::Client::SPH_MATCH_EXTENDED2) sphinx.Query('wifi') |
Easy enough, right? You should try it by yourself to feel power of new engine. Please note, that this mode is temporary and it will be removed after release.
Before version 0.9.8 the Sphinx was limited to index up to 4 billion documents because of using 32-bit keys. From here on it has ability to use 64-bit IDs, and new feature does not impact on 32-bit keys performance. Let’s look at the example. First we will make query to DB with 32-bit keys:
1 2 3 | sphinx = Sphinx::Client.new result = sphinx.Query('wifi') pp result['matches'][0]['id'].class |
As you can see, class of the id field is Fixnum. Let’s try to make call to index with 64-bit keys. You will get Bignum as the result, and it means that you can have more than 4 billion documents!
Plain attributes only allow to attach 1 value per each document. However, there are cases (such as tags or categories) when it is necessary to attach multiple values of the same attribute and be able to apply filtering to value lists. In these cases we can use multiple-valued attributes now.
1 2 3 | sphinx = Sphinx::Client.new sphinx.SetFilter('tag', [1,5]) pp sphinx.Query('wifi') |
In case of using miltiple-valued attribute tag you will get result like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | {"total_found"=>2, "status"=>0, "matches"=> [{"attrs"=> {"tag"=>[4, 5], "group_id"=>2, "created_at"=>1175658555}, "weight"=>2, "id"=>2}, {"attrs"=> {"tag"=>[1, 2, 3], "group_id"=>1, "created_at"=>1175658490}, "weight"=>1, "id"=>1}], "error"=>"", "words"=>{"wifi"=>{"hits"=>6, "docs"=>3}}, "time"=>"0.000", "attrs"=> {"price"=>5, "tag"=>1073741825, "is_active"=>4, "group_id"=>1, "created_at"=>2}, "fields"=>["name", "description"], "total"=>2, "warning"=>""} |
As you can see, multiple-valued attributes returned as array of integers.
Sphinx now is able to compute geographical distance between two points specified by latitude and longitude pairs (in radians). So you now can specify per-query “anchor point” (and attribute names to fetch per-entry latitude and longitude from), and then use “@geodist” virtual attribute both in the filters and in the sorting clause. In this case distance (in meters) from anchor point to each match will be computed, used for filtering and/or sorting, and returned as a virtual attribute too.
1 2 3 | sphinx = Sphinx::Client.new sphinx.SetGeoAnchor('lat', 'long', 0.87248, 0.63195) result = sphinx.Query('wifi') |
As always, you can download Sphinx Client API from project home page. Take into account that version 0.3.1 of the client API intended to use with Sphinx 0.9.7, and Sphinx Client API 0.4.0 r909 requires Sphinx 0.9.8 r909 development snapshot. You could download Sphinx from the Download section of the Sphinx home page.
The post Sphinx Client API 0.3.1 and 0.4.0 r909 for Sphinx 0.9.8 r909 released first appeared on Dmytro Shteflyuk's Home.]]>During the last few months we (I and Alexey Kovyrin) have been working on a major Best Tech Videos site platform update. If you have not seen it before — it is time to take a look at it because really soon everything will change. I don’t mean that site idea would change (you would be able to find there the best tech videos), but usability, information availability and many small but useful things will be changed for good.
So, what are these major changes? First of all, design of the site has been changed and, I think, it is much better now. Here are some examples of the first page, login page and videos list page as they look on the test server now:
Video Authors (producers) could manually post their own videos on the server now. To make some video available to site visitors and users (onlooker), it have to be approved by one of the site’s moderators (instructor) or by some other author. Video posting page will look like following:
One more really great feature — now you could compose your own videos RSS-feed using some of the site’s feeds for tags and categories. You could simply choose tags and categories you like and get your own RSS feed with the best videos from your point of view.
Besides of view changes, we have decided to add some ‘sociality’ to the service, so now you could vote for videos you like, add videos to your favorite videos list and discuss videos with your fellow BTV users. Of course, you and your friends could get your favorite videos list RSS-feed.
By the way, I want to emphasize a bit on a technologies. Current version of the site was based on WordPress engine with some set of custom plugins. New version has been created from scratch using Ruby on Rails and now we have pretty flexible basement for further service development. So, if you have any ideas how to make the service better — your suggestions are really welcome.
And now I want to disclose some of our stats. As of today, you can find on our site 754 high-quality (custom filtered) videos in 100 categories which are more than 385 hours of video content! This collection grows every day. So, stay tuned — we will announce final release of the services soon.
The post Best Tech Videos: Great Changes Coming first appeared on Dmytro Shteflyuk's Home.]]>Part 1. Ruby on Rails vs Java
Part 2. Ruby on Rails vs PHP
Part 3. Ruby on Rails vs PHP — Organization
Part 4. Ruby on Rails vs PHP — Changing Database
From Rails Envy.
The post Hi, I’m Ruby on Rails first appeared on Dmytro Shteflyuk's Home.]]>
First thing I love in Ruby is an ability to extend classes with my own methods. I could just add method to_permalink to any string and then everywhere I could write something like @post.title.to_permalink. It’s amazing!
Here is my version of to_permalink method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | class String def to_permalink result = strip_tags # Preserve escaped octets. result.gsub!(/-+/, '-') result.gsub!(/%([a-f0-9]{2})/i, '--\1--') # Remove percent signs that are not part of an octet. result.gsub!('%', '-') # Restore octets. result.gsub!(/--([a-f0-9]{2})--/i, '%\1') result.gsub!(/&.+?;/, '-') # kill entities result.gsub!(/[^%a-z0-9_-]+/i, '-') result.gsub!(/-+/, '-') result.gsub!(/(^-+|-+$)/, '') return result.downcase end private def strip_tags return clone if blank? if index('<') text = '' tokenizer = HTML::Tokenizer.new(self) while token = tokenizer.next node = HTML::Node.parse(nil, 0, 0, token, false) # result is only the content of any Text nodes text << node.to_s if node.class == HTML::Text end # strip any comments, and if they have a newline at the end (ie. line with # only a comment) strip that too text.gsub(/<!--(.*?)-->[\n]?/m, '') else clone # already plain text end end end |
How it’s working? First thing you would see is a private method strip_tags. Yes, I know about ActionView::Helpers::TextHelper::strip_tags, and this is almost 100% copy of Rails version (the only difference is that my version always returns clone of the original string). I just don’t want to rely on the Rails library.
Then my method replaces all special characters with dashes (only octets like %A0 would be kept), and trims dashed from the beginning and the end of the string. Finally full string will be lowercased.
Of course, in your application you should check collisions (several posts which have the same title should have unique permalinks, for example you could append numbers starting from 1: hello, hello-1, hello-2, etc). This is not my goal to cover all difficulties you could face, it’s small post, do you remember?
Just for your pleasure, here are the RSpec tests for this method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | describe 'String.to_permalink from extensions.rb' do it 'should replace all punctuation marks and spaces with dashes' do "!.@#$\%^&*()Test case\n\t".to_permalink.should == 'test-case' end it 'should preserve _ symbol' do "Test_case".to_permalink.should == 'test_case' end it 'should preserve escaped octets and remove redundant %' do 'Test%%20case'.to_permalink.should == 'test-%20case' end it 'should strip HTML tags' do '<a href="http://example.com">Test</a> <b>case</b>'.to_permalink.should == 'test-case' end it 'should strip HTML entities and insert dashes' do 'Test case'.to_permalink.should == 'test-case' end it 'should trim beginning and ending dashes' do '-. Test case .-'.to_permalink.should == 'test-case' end it 'should not use ---aa--- as octet' do 'b---aa---b'.to_permalink.should == 'b-aa-b' end it 'should replace % with -' do 'Hello%world'.to_permalink.should == 'hello-world' end it 'should not modify original string' do s = 'Hello, <b>world</b>%20' s.to_permalink.should == 'hello-world%20' s.should == 'Hello, <b>world</b>%20' s = 'Hello' s.to_permalink.should == 'hello' s.should == 'Hello' end end |
It’s funny, right?
The post Generating permalink from string in Ruby first appeared on Dmytro Shteflyuk's Home.]]>1 2 3 4 5 6 7 8 9 10 | describe UserHelper it 'should generate correct link to user profile in user_link' do @user = mock('User') @user.stub!(:id, 10) @user.stub!(:new_record?, false) @user.stub!(:preferred_name, 'Dmytro S.') @user.stub!(:full_name, 'Dmytro Shteflyuk') user_link(@user).should == link_to('Dmytro S.', user_url(:id => 10), :title => 'Dmytro Shteflyuk') end end |
Well, and what does it mean? Initially we are creating mock object @user, which would be used instead of real model. Please don’t ask me why do we need such complexity and why we can’t just use real model. I will explain it myself. On the one hand, mocks are much faster than the database operations — when you have many tests (and you have, right?) total tests execution time would be much smaller if we would use mock objects. On the other hand, you don’t want to test the same methods again and again, right? In my example preferred_name method has non-trivia logic (it’s not simple database field), and it has already been tested in model spec. Imagine that you are using model in helper tests. If this method would break down, two specifications will be failed instead of one — model specification. In addition, there is one interesting feature exists — rcov, which shows how your code is covered by tests. Thus if model tests would not exist, and helper tests would use model,– rcov will show preferred_name method as covered, but it is not true. Oops, I have been distracted.
Oh yes, if you don’t know, stub is just method, which does nothing, it just return value, which you have passed to it.
So we have a test. Could we simplify it somehow? Yep!
1 2 3 4 5 6 7 | describe UserHelper it 'should generate correct link to user profile in user_link' do @user = mock('User') add_stubs(@user, :id => 10, :new_record? => false, :preferred_name => 'Dmytro S.', :full_name => 'Dmytro Shteflyuk') user_link(@user).should == link_to('Dmytro S.', user_url(:id => 10), :title => 'Dmytro Shteflyuk') end end |
Much better. Helper method add_stubs adds to the object, passed as a first parameter, stubs, passed as a second parameter in hash. But it’s not all. Especially for the Active Record models RSpec has method mock_model, which automatically creates several stubs, common for all Ruby on Rails models, and accepts hash with stubs just like add_stubs does:
1 2 3 4 5 6 | describe UserHelper it 'should generate correct link to user profile in user_link' do @user = mock_model(User, :preferred_name => 'Dmytro S.', :full_name => 'Dmytro Shteflyuk') user_link(@user).should == link_to('Dmytro S.', user_url(:id => @user.id), :title => 'Dmytro Shteflyuk') end end |
You definitely noticed that I have missed parameters :id and :new_record?, and it was not coincidence. Firstly, mock_model automatically defines unique ids for models, which were created using it. Secondly, it defines methods to_param (returns string representation of the id) and new_record? (returns false). The first parameter of the method is a model class, and the second one is a hash of stubs, as I have already said. That’s all, folks.
The post Useful helpers for RSpec mocks first appeared on Dmytro Shteflyuk's Home.]]>Cache the Hell out Everything
90% API Requests — cache them
Read the Scaling Twitter.
The post Rails is just… first appeared on Dmytro Shteflyuk's Home.]]>
For the beginning we would define helper for reading fixtures and put them into the spec/mailer_spec_helper.rb file:
1 2 3 4 5 6 7 8 9 | require File.dirname(__FILE__) + '/spec_helper.rb' module MailerSpecHelper private def read_fixture(action) IO.readlines("#{FIXTURES_PATH}/mailers/user_mailer/#{action}") end end |
Now we need to create fixtures for mailers in the spec/fixtures/mailers folder, each mailer in the separate subfolder like spec/fixtures/mailers/some_mailer/activation:
1 | Hello, Bob |
In spec/models/mailers/some_mailer_spec.rb we would write something like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | require File.dirname(__FILE__) + '/../../mailer_spec_helper.rb' context 'The SomeMailer mailer' do FIXTURES_PATH = File.dirname(__FILE__) + '/../../fixtures' CHARSET = 'utf-8' fixtures :users include MailerSpecHelper include ActionMailer::Quoting setup do # You don't need these lines while you are using create_ instead of deliver_ #ActionMailer::Base.delivery_method = :test #ActionMailer::Base.perform_deliveries = true #ActionMailer::Base.deliveries = [] @expected = TMail::Mail.new @expected.set_content_type 'text', 'plain', { 'charset' => CHARSET } @expected.mime_version = '1.0' end specify 'should send activation email' do @expected.subject = 'Account activation' @expected.body = read_fixture('activation') @expected.from = '[email protected]' @expected.to = users(:bob).email Mailers::UserMailer.create_activation(users(:bob)).encoded.should == @expected.encoded end end |
That’s almost all. We are fully confident that emails look as we expected. In controller we don’t need to re-test mailers again, we just need to become convinced of mailer calling!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | context 'Given an signup action of UserController' do controller_name 'user' setup do @user = mock('user') @valid_user_params = { :email => '[email protected]' } end specify 'should deliver activation email to newly created user' do User.should_receive(:new).with(@valid_user_params).and_return(@user) Mailers::UserMailer.should_receive(:deliver_activation).with(@user) post :signup, :user => @valid_user_params response.should redirect_to(user_activate_url) end end |
For example, we are developing blogging software. We have Post and Category model, where post could have one or more categories (many-to-many relation):
1 2 3 4 5 6 7 | class Post < ActiveRecord::Base has_and_belongs_to_many :categories end class Category < ActiveRecord::Base has_and_belongs_to_many :posts end |
We need to display posts on the page along with categories for each one. The simplest way is to do find with :include:
1 | Post.find :all, :include => :categories |
And corresponding helper:
1 2 3 | def show_categories(post) post.categories.map(&:name).join(' ') end |
But what if you need to filter off posts by category? Here is example:
1 2 3 | Post.find :all, :include => :categories, :conditions => ['categories.id = ?', category_id] |
It works, but there is one small trouble: you would get not all categories! Now we could fix it in helper:
1 2 3 4 | def show_categories(post) # reload categories post.categories(true).map(&:name).join(' ') end |
In this case categories for all posts would be requested in separate queries. It’s not so good, therefor I propose to use sub-query:
1 2 3 | Post.find :all, :include => :categories, :conditions => ['EXISTS (SELECT tmp_cp.category_id FROM categories_posts tmp_cp WHERE posts.id = tmp_cp.post_id AND tmp_cp.category_id = ?)', category_id] |
Now posts would be filtered by category and all categories of posts would be loaded properly. Do you have other ideas?
The post Using sub-queries to avoid multiple DB requests in Rails first appeared on Dmytro Shteflyuk's Home.]]>