Today my friend asked me to help him with an unexpected behavior of Ruby’s Hash.uniq
method. Here is an example:
1 2 3 4 5 6 | [{"id"=>667824693}, {"id"=>667824693}].uniq # => [{"id"=>667824693}, {"id"=>667824693}] [{"id"=>66782469}, {"id"=>66782469}].uniq # => [{"id"=>66782469}] [{"id"=>6678246931}, {"id"=>6678246931}].uniq # => [{"id"=>6678246931}] |
Check the first command result. Very disappointing, right? So what happen? Quick looking through the Ruby code completely explained it. Here is how this method works internally (this is just prototype in Ruby, original code is in C, but works in the same way):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | def ary_make_hash(ary, val) ary.inject({}) do |hash, el| hash[el] = val hash end end def uniq(ary) ary = ary.dup ary.uniq! ary end def uniq!(ary) hash = ary_make_hash(ary, 0) return nil if ary.length == hash.length j = 0 (0...ary.length).each do |idx| if hash.delete(ary[idx]) ary[j] = ary[idx] j += 1 end end ary.slice!(0, j) ary end |
Let’s test it:
1 2 3 4 5 6 | uniq([{"id"=>667824693}, {"id"=>667824693}]) # => [{"id"=>667824693}, {"id"=>667824693}] uniq([{"id"=>66782469}, {"id"=>66782469}]) # => [{"id"=>66782469}] uniq([{"id"=>6678246931}, {"id"=>6678246931}]) # => [{"id"=>6678246931}] |
And just to make sure our conclusions are correct:
1 2 3 4 5 6 | [{"id"=>667824693}, {"id"=>667824693}].map { |el| el.hash } # => [29793216, 29793156] [{"id"=>66782469}, {"id"=>66782469}].map { |el| el.hash } # => [255119887, 255119887] [{"id"=>6678246931}, {"id"=>6678246931}].map { |el| el.hash } # => [482552381, 482552381] |
So the idea behind the Hash.uniq
method is the method Hash.hash
, which produces different results for hashes in the first example. Be careful when doing obj.uniq
on complex objects.
Update: There is a good research on Hash.hash
method here.
Эх, теперь мне прийдется имплементить свой uniq-метод. Но за обьяснения спасибо. :)
The good news is, it looks like this was fixed in 1.8.7 and 1.9.
I slightly modified your script:
2
3
p [{"id"=>66782469}, {"id"=>66782469}].uniq
p [{"id"=>6678246931}, {"id"=>6678246931}].uniq
And ran it with multiruby:
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
VERSION = 1.8.7-p174
CMD = ~/.multiruby/install/1.8.7-p174/bin/ruby example.rb
[{"id"=>667824693}]
[{"id"=>66782469}]
[{"id"=>6678246931}]
RESULT = 0
VERSION = 1.8.6-p369
CMD = ~/.multiruby/install/1.8.6-p369/bin/ruby example.rb
[{"id"=>667824693}, {"id"=>667824693}]
[{"id"=>66782469}, {"id"=>66782469}]
[{"id"=>6678246931}, {"id"=>6678246931}]
RESULT = 0
VERSION = 1.9.1-p129
CMD = ~/.multiruby/install/1.9.1-p129/bin/ruby example.rb
[{"id"=>667824693}]
[{"id"=>66782469}]
[{"id"=>6678246931}]
RESULT = 0
TOTAL RESULT = 0 failures out of 3
Passed: 1.9.1-p129, 1.8.6-p369, 1.8.7-p174
Failed:
Ubuntu 9.04 have Ruby 1.8.7-p72, and this “bug” is still there :)
So was this fixed for 1.8.7 at some point then? [between p72 and p174]?