Memo #5: Use ary.uniq method carefully in Ruby

Posted by Dmytro Shteflyuk on under Ruby & Rails

Today my friend asked me to help him with an unexpected behavior of Ruby’s Hash.uniq method. Here is an example:

1
2
3
4
5
6
[{"id"=>667824693}, {"id"=>667824693}].uniq
# => [{"id"=>667824693}, {"id"=>667824693}]
[{"id"=>66782469}, {"id"=>66782469}].uniq
# => [{"id"=>66782469}]
[{"id"=>6678246931}, {"id"=>6678246931}].uniq
# => [{"id"=>6678246931}]

Check the first command result. Very disappointing, right? So what happen? Quick looking through the Ruby code completely explained it. Here is how this method works internally (this is just prototype in Ruby, original code is in C, but works in the same way):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def ary_make_hash(ary, val)
  ary.inject({}) do |hash, el|
    hash[el] = val
    hash
  end
end

def uniq(ary)
  ary = ary.dup
  ary.uniq!
  ary
end

def uniq!(ary)
  hash = ary_make_hash(ary, 0)
  return nil if ary.length == hash.length

  j = 0
  (0...ary.length).each do |idx|
    if hash.delete(ary[idx])
      ary[j] = ary[idx]
      j += 1
    end
  end
  ary.slice!(0, j)
  ary
end

Let’s test it:

1
2
3
4
5
6
uniq([{"id"=>667824693}, {"id"=>667824693}])
# => [{"id"=>667824693}, {"id"=>667824693}]
uniq([{"id"=>66782469}, {"id"=>66782469}])
# => [{"id"=>66782469}]
uniq([{"id"=>6678246931}, {"id"=>6678246931}])
# => [{"id"=>6678246931}]

And just to make sure our conclusions are correct:

1
2
3
4
5
6
[{"id"=>667824693}, {"id"=>667824693}].map { |el| el.hash }
# => [29793216, 29793156]
[{"id"=>66782469}, {"id"=>66782469}].map { |el| el.hash }
# => [255119887, 255119887]
[{"id"=>6678246931}, {"id"=>6678246931}].map { |el| el.hash }
# => [482552381, 482552381]

So the idea behind the Hash.uniq method is the method Hash.hash, which produces different results for hashes in the first example. Be careful when doing obj.uniq on complex objects.

Update: There is a good research on Hash.hash method here.

4 Responses to this entry

Subscribe to comments with RSS or TrackBack to 'Memo #5: Use ary.uniq method carefully in Ruby'.

said on July 15th, 2009 at 13:57 · Permalink · Reply

Эх, теперь мне прийдется имплементить свой uniq-метод. Но за обьяснения спасибо. :)

said on July 15th, 2009 at 17:14 · Permalink · Reply

The good news is, it looks like this was fixed in 1.8.7 and 1.9.

I slightly modified your script:

1
2
3
p [{"id"=>667824693}, {"id"=>667824693}].uniq
p [{"id"=>66782469}, {"id"=>66782469}].uniq
p [{"id"=>6678246931}, {"id"=>6678246931}].uniq

And ran it with multiruby:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
phig[email protected] ~ $ multiruby example.rb

VERSION = 1.8.7-p174
CMD     = ~/.multiruby/install/1.8.7-p174/bin/ruby example.rb

[{"id"=>667824693}]
[{"id"=>66782469}]
[{"id"=>6678246931}]

RESULT = 0

VERSION = 1.8.6-p369
CMD     = ~/.multiruby/install/1.8.6-p369/bin/ruby example.rb

[{"id"=>667824693}, {"id"=>667824693}]
[{"id"=>66782469}, {"id"=>66782469}]
[{"id"=>6678246931}, {"id"=>6678246931}]

RESULT = 0

VERSION = 1.9.1-p129
CMD     = ~/.multiruby/install/1.9.1-p129/bin/ruby example.rb

[{"id"=>667824693}]
[{"id"=>66782469}]
[{"id"=>6678246931}]

RESULT = 0

TOTAL RESULT = 0 failures out of 3

Passed: 1.9.1-p129, 1.8.6-p369, 1.8.7-p174
Failed:
said on July 16th, 2009 at 13:39 · Permalink · Reply

Ubuntu 9.04 have Ruby 1.8.7-p72, and this “bug” is still there :)

said on September 30th, 2009 at 19:12 · Permalink · Reply

So was this fixed for 1.8.7 at some point then? [between p72 and p174]?

Post a comment

You can use simple HTML-formatting tags (like <a>, <strong>, <em>, <ul>, <blockquote>, and other). To format your code sample use [cc lang="php"]$a = "hello";[/cc] (allowed languages are ruby, php, yaml, html, csharp, javascript). Also you can use [cc][/cc] block and its syntax would not be highlighted.

Submit Comment