Generating permalink from string in Ruby

Posted by Dmytro Shteflyuk on under Ruby & Rails

If you are creating Ruby on Rails application like a blog, you most probably want to generate URLs using post titles. It’s good practice, because search engines like keywords in URL, and it looks more human-readable. Just compare: http://example.com/posts/10 and http://example.com/posts/generating-permalinks-from-string (yeah, it’s long, but self-descriptive). Anyways, this is small post about converting a title to a permalink.

First thing I love in Ruby is an ability to extend classes with my own methods. I could just add method to_permalink to any string and then everywhere I could write something like @post.title.to_permalink. It’s amazing!

Here is my version of to_permalink method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class String
  def to_permalink
    result = strip_tags
    # Preserve escaped octets.
    result.gsub!(/-+/, '-')
    result.gsub!(/%([a-f0-9]{2})/i, '--\1--')
    # Remove percent signs that are not part of an octet.
    result.gsub!('%', '-')
    # Restore octets.
    result.gsub!(/--([a-f0-9]{2})--/i, '%\1')

    result.gsub!(/&.+?;/, '-') # kill entities
    result.gsub!(/[^%a-z0-9_-]+/i, '-')
    result.gsub!(/-+/, '-')
    result.gsub!(/(^-+|-+$)/, '')
    return result.downcase
  end

  private
 
    def strip_tags
      return clone if blank?
      if index('<')
        text = ''
        tokenizer = HTML::Tokenizer.new(self)

        while token = tokenizer.next
          node = HTML::Node.parse(nil, 0, 0, token, false)
          # result is only the content of any Text nodes
          text << node.to_s if node.class == HTML::Text
        end
        # strip any comments, and if they have a newline at the end (ie. line with
        # only a comment) strip that too
        text.gsub(/<!--(.*?)-->[\n]?/m, '')
      else
        clone # already plain text
      end
    end
end

How it’s working? First thing you would see is a private method strip_tags. Yes, I know about ActionView::Helpers::TextHelper::strip_tags, and this is almost 100% copy of Rails version (the only difference is that my version always returns clone of the original string). I just don’t want to rely on the Rails library.

Then my method replaces all special characters with dashes (only octets like %A0 would be kept), and trims dashed from the beginning and the end of the string. Finally full string will be lowercased.

Of course, in your application you should check collisions (several posts which have the same title should have unique permalinks, for example you could append numbers starting from 1: hello, hello-1, hello-2, etc). This is not my goal to cover all difficulties you could face, it’s small post, do you remember?

Just for your pleasure, here are the RSpec tests for this method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
describe 'String.to_permalink from extensions.rb' do
  it 'should replace all punctuation marks and spaces with dashes' do
    "[email protected]#$\%^&*()Test case\n\t".to_permalink.should == 'test-case'
  end

  it 'should preserve _ symbol' do
    "Test_case".to_permalink.should == 'test_case'
  end
 
  it 'should preserve escaped octets and remove redundant %' do
    'Test%%20case'.to_permalink.should == 'test-%20case'
  end

  it 'should strip HTML tags' do
    '<a href="http://example.com">Test</a> <b>case</b>'.to_permalink.should == 'test-case'
  end

  it 'should strip HTML entities and insert dashes' do
    'Test&nbsp;case'.to_permalink.should == 'test-case'
  end

  it 'should trim beginning and ending dashes' do
    '-. Test case .-'.to_permalink.should == 'test-case'
  end

  it 'should not use ---aa--- as octet' do
    'b---aa---b'.to_permalink.should == 'b-aa-b'
  end
 
  it 'should replace % with -' do
    'Hello%world'.to_permalink.should == 'hello-world'
  end

  it 'should not modify original string' do
    s = 'Hello,&nbsp;<b>world</b>%20'
    s.to_permalink.should == 'hello-world%20'
    s.should == 'Hello,&nbsp;<b>world</b>%20'

    s = 'Hello'
    s.to_permalink.should == 'hello'
    s.should == 'Hello'
  end
end

It’s funny, right?

5 Responses to this entry

Subscribe to comments with RSS

demjan
said on May 18th, 2007 at 20:59 · Permalink

Да. ))
Интересно сколько займет ресурсов и как будет выглядеть реализация на php в свете недавнего обсуждения на харахабарачтототам

said on May 18th, 2007 at 21:30 · Permalink

Да ровно столько же! Все эти войны Something vs. Something another — чистой воды треп людей, которым нечем заняться. При грамотном подходе код в любом языке будет лаконичным и эффективным. Вопрос в компактности и читабельности — это да, целиком ложится на совесть языка.

Вон в комментах к статье о подзапросах я приводил пример кода пхп. Да, задача не повседневная, но классическая, когда надо обработать выборку. И ну никак всего этого уродства не обойти. Можно размазать по коду, это да. Или вынести в отдельный файл и никому не показывать. Но оно останется. Потому что ПХП — это набор знаков препинания, разделенных словами.

demjan
said on May 19th, 2007 at 08:57 · Permalink

Аналога этому в не обьектном языке не напишешь:

1
2
3
4
5
class String
  def to_permalink
  end
end
'string'.to_permalink

Только вроде такого

1
2
3
function to_permalink ($name) {
}
to_permalink($post->title);
said on May 19th, 2007 at 12:41 · Permalink

Ну почему же. В C# 3.0 можно будет писать extension методы:

1
2
3
4
5
6
7
8
public static class Extensions
{
    public static String ToPermalink(this String s)
    {
    }
}

'Something'.ToPermalink();

Это не аналогия открытости классов руби, а просто синтаксический сахар. Но результаты в общем-то неразличимы невооруженным взглядом :-)

kelyar
said on June 4th, 2007 at 11:29 · Permalink

а почему просто не переопределить to_param модели?

Comments are closed

Comments for this entry are closed for a while. If you have anything to say – use a contact form. Thank you for your patience.