Scribd open source projects

Posted by Dmytro Shteflyuk on under Development

It’s time to summarize what we have done for the Open Source community. Scribd is pretty open company, we release a lot of code into the public after a time (sometimes it is short, sometimes it is not). Here I want to mention all the code we have opensourced. Please take into account that time is moving on, so we are publishing more and more code. I will update this post periodically, so stay tuned. Follow me on Twitter to get instant updates.

Table of Contents

Here is the list of our projects in alphabetical order:

  • bounces-handler — Email Bounces Processing System with Rails plugin to prevent Rails mailers from sending any messages to a blocked addresses.
  • db-charmer — ActiveRecord Connections Magic (slaves, multiple connections, etc).
  • easy-prof — Simple and easy to use Ruby code profiler, which could be used as a Rails plugin.
  • Fast Sessions — Sessions class for ActiveRecord sessions store created to work fast (really fast).
  • loops — Simple background loops framework for Ruby on Rails and Merb.
  • magic-enum — Method used to define ENUM-like attributes in your model (int fields actually).
  • rlibsphinxclient — A Ruby wrapper for pure C searchd client API library.
  • rscribd — Ruby client library for the Scribd API.
  • Rspec Cells — A library for testing applications that are using Cells in RSpec.
  • Scribd Desktop Uploader — A fully native Cocoa Macintosh uploader app for the Scribd.com website.

bounces-handler

Bounces-handler package is a simple set of scripts to automatically process email bounces and ISP’s feedback loops emails, maintain your mailing blacklists and a Ruby on Rails plugin to use those blacklists in your RoR applications.

This piece of software has been developed as a part of more global work on mailing quality improvement in Scribd.com, but it was one of the most critical steps after setting up reverse DNS records, DKIM and SPF.

Links: Project Home Page on GitHub | Introduction Blog Post | RDoc Documentation.

db-charmer

DbCharmer is a simple yet powerful plugin for ActiveRecord that does a few things:

  • Allows you to easily manage AR models’ connections (switch_connection_to method)
  • Allows you to switch AR models’ default connections to a separate servers/databases
  • Allows you to easily choose where your query should go (Model.on_db methods)
  • Allows you to automatically send read queries to your slaves while masters would handle all the updates.
  • Adds multiple databases migrations to ActiveRecord

It requires Ruby on Rails version 2.3 or later. The main purpose of this plugin is to put all the databases-related code we have been using in Scribd for a while into a single easy-to use package.

Links: Project Home Page on GitHub | Test Rails Application on GitHub | RDoc Documentation.

easy-prof

Simple and easy to use Ruby code profiler, which could be used as a Rails plugin. The main idea behind the easy-prof is creating check points and your code and measuring time needed to execute code blocks. Here is the example of easy-prof output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[home#index] Benchmark results:
[home#index] debug: Logged in user home page
[home#index] progress: 0.7002 s [find top videos]
[home#index] progress: 0.0452 s [build categories list]
[home#index] progress: 0.0019 s [build tag cloud]
[home#index] progress: 0.0032 s [find featured videos]
[home#index] progress: 0.0324 s [find latest videos]
[home#index] debug: VIEW STARTED
[home#index] progress: 0.0649 s [top videos render]
[home#index] progress: 0.0014 s [categories render]
[home#index] progress: 2.5887 s [tag cloud render]
[home#index] progress: 0.0488 s [latest videos render]
[home#index] progress: 0.1053 s [featured video render]
[home#index] results: 3.592 s

From this output you can see what checkpoints takes longer to reach, and what code fragments are pretty fast.

Links: Project Home Page on GitHub | Introduction Blog Post | RDoc Documentation.

Fast Sessions

FastSessions is a sessions class for ActiveRecord sessions store created to work fast (really fast). It uses some techniques which are not so widely known in developers’ community and only when they cause huge problems, performance consultants are trying to help with them.

FastSessions plugin was born as a hack created for Scribd.com (large RoR-based web project), which was suffering from InnoDB auto-increment table-level locks on sessions table.

So, first of all, we removed id field from the table. Next step was to make lookups faster and we’ve used a following technique: instead of using (session_id) as a lookup key, we started using (CRC32(session_id), session_id) — two-columns key which really helps MySQL to find sessions faster because key cardinality is higher (so, mysql is able to find a record earlier w/o checking a lots of index rows). We’ve benchmarked this approach and it shows 10–15% performance gain on large sessions tables.

And last, but most powerful change we’ve tried to make was to not create database records for empty sessions and to not save sessions data back to database if this data has not been changed during current request processing. With this change we basically reduce inserts number by 50-90% (depends 0n application).

All of these changes were implemented and you can use them automatically after a simple plugin installation.

There is a fork patched by mudge for full compatibility with Ruby on Rails version 2.3 or later.

Links: Project Home Page on Google Code | Introduction Blog Post | Fork on GitHub Compatible with Rails 2.3 and Later.

loops

loops is a small and lightweight library for Ruby on Rails, Merb and other frameworks created to support simple background loops in your application which are usually used to do some background data processing on your servers (queue workers, batch tasks processors, etc).

Originally loops plugin was created to make our own loops code more organized. We used to have tens of different modules with methods that were called with script/runner and then used with nohup and other not so convenient backgrounding techniques. When you have such a number of loops/workers to run in background it becomes a nightmare to manage them on a regular basis (restarts, code upgrades, status/health checking, etc).

After a short time of writing our loops in more organized ways we were able to generalize most of the loops code so now our loops look like a classes with a single mandatory public method called run. Everything else (spawning many workers, managing them, logging, backgrounding, pid-files management, etc) is handled by the plugin itself.

Links: Project Home Page on GitHub | Introduction Blog Post | RDoc Documentation.

magic-enum

Method used to define ENUM-like attributes in your model (int fields actually). It’s easier to show what it does in code rather than to explain in plain English:

1
2
3
4
5
6
7
Statuses = {
  :unknown => 0,
  :draft => 1,
  :published => 2,
  :approved => 3
}
define_enum :status, :default => 1, :raise_on_invalid => true, :simple_accessors => true

is identical to

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Statuses = {
  :unknown => 0,
  :draft => 1,
  :published => 2,
  :approved => 3
}
StatusesInverted = Statuses.invert

def status
  StatusesInverted[self[:status].to_i] || StatusesInverted[1]
end

def status=(value)
  raise ArgumentError, "Invalid value "#{value}" for :status attribute of the #{self.class} model" if
  Statuses[value].nil?
  self[:status] = Statuses[value]
end

def unknown?
  status == :unknown
end

def draft?
  status == :draft
end

def published?
  status == :published
end

def approved?
  status == :approved
end

This plugin was originally developed for Best Tech Videosand later was cleaned up in Scribd repository and released to the public.

Links: Project Home Page on GitHub | RDoc Documentation.

rlibsphinxclient

A Ruby wrapper for pure C searchd client API library. It works much faster than any Ruby client for Sphinx, so you can check it to ensure you application works as fast as possible.

Please note: this is *highly experimental* library so use it at your own risk.

Links: Project Home Page on GitHub | RDoc Documentation.

rscribd

Ruby client library for the Scribd API. This gem provides a simple and powerful library for the Scribd API, allowing you to write Ruby applications or Ruby on Rails websites that upload, convert, display, search, and control documents in many formats. For more information on the Scribd platform, visit the Scribd Platform Documentation page.

The main features are:

  • Upload your documents to Scribd’s servers and access them using the gem
  • Upload local files or from remote web sites
  • Search, tag, and organize documents
  • Associate documents with your users’ accounts

Links: Project Home Page on GitHub | Scribd Platform Documentation | RDoc Documentation.

Rspec Cells

This plugin allows you to test your cells easily using RSpec. Basically, it adds an example group especially for cells, with several helpers to perform cells rendering.

If you are not sure what is cells, please visit its home page.

Spec for a regular cell could look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
describe VideoCell do
    integrate_views

    context '.videos' do
      it 'should initialize :videos variable' do
        params[:id] = 10
        session[:user_id] = 20
        opts[:opt] = 'value'
        result = render_cell :videos, { :videos => [] }, :slug => 'hello'
        result.should have_tag('div', :class => :videos)
      end
    end
  end

Links: Project Home Page on GitHub | Cells Home Page | Cells Home Page on GitHub.

Scribd Desktop Uploader

A fully native Cocoa Macintosh uploader app for the Scribd.com website. Supports following features:

  • Upload many files at once from your desktop.
  • Edit titles, tags, and other metadata before uploading.
  • Quickly and easily manage bulk uploads, straight from your desktop.
  • Right-click to start uploading files directly to Scribd (Windows only).

Links: Project Home Page on GitHub | Home Page on Scribd.com.

Changelog

  • September 9, 2009
    • Fixed a link to GitHub homepage of the rspec-cells plugin.
  • September 8, 2009

No Responses to this entry

Subscribe to comments with RSS

Comments are closed

Comments for this entry are closed for a while. If you have anything to say – use a contact form. Thank you for your patience.