Saturday, January 28, 2006

Using Darcs

The caboose blog has a good intro to using darcs for version control.

I was first exposed to darcs when I heard a cool talk by David Roundy, the author of darcs at ICFP 05 last September (darcs is probably the most popular Haskell software out there, aside from the Haskell compilers). The talk brought up other distributed SCMs, such as the subversion-based SVK.

Ian and I started out using subversion for Openomy but kept having trouble with branching: the process of looking through commit logs to find the right revision numbers to merge was really annoying, and we screwed it up more than once. It got to the point where merging branches actually made us nervous!

We decided to try something new and give darcs a shot-- we haven't looked back since. We don't use it in a purely decentralized manner: Ian and I each have our own repositories, but we keep centralized production and development repositories on our servers that we push to whenever we're done with new functionality. The merging headache we faced with subversion is totally gone, and we can pick individual patches to promote to production and hold others back in dev. It's a really great SCM and I encourage other developers to give it a try. Here's a link to the darcs manual to get you started.

Tuesday, January 17, 2006

Ruby Multiple Inheritance

As many people know, Ruby doesn't support multiple inheritance out of the box. Instead, it provides mix-in modules that let you do almost the same thing. However, if half way through your program you realize that you need multiple inheritance-like behavior, you are forced to move the code you want to share into a module, and mix it in everywhere it's needed. That can be slightly annoying. But, we can get around this with a pretty cool hack.

The key observation that makes this possible is that the class definition syntax is pretty flexible. The only requirement is that when you say "class A < B", B needs to be of type Class. In particular, you could define a function Transformer() that returns a Class object, and legally say "class A < Transformer()". The delegate library in the Ruby stdlib is a good example of this trick.

We can exploit this trick to simulate multiple inheritance without the refactoring. I'll define a function Multiple() that takes in many classes and returns a new class that answers all of the right methods to emulate multiple inheritance.

def Multiple(*args)
c = Class.new do
def initialize(*a)
# ugly, but retrieves the value saved below
myparents = self.class.superclass.instance_variable_get(:@parents)

# initialize each parent class
@parents = []
myparents.each do |klass|
@parents << klass.new(*a)
end
end

def method_missing(meth, *a)
# find the first parent that responds to this message
# and send it over there. if none do, then error
@parents.each do |p|
return p.send(meth, *a) if p.respond_to? meth
end
raise NoMethodError.new(meth)
end

def respond_to?(meth)
# mirror behavior in method_missing, check each parent
# before passing the message up to Object#respond_to?
@parents.each do |p|
return true if p.respond_to? meth
end
super
end
end

# we need to save the parent classes so they can be initialized when
# our class is initialized
c.instance_variable_set(:@parents, args)
return c
end

Using this function, we can do the following:

class A
def a; "a"; end
end
class B
def b; "b"; end
end
class AorB < Multiple(A, B)
def foo; "foo"; end
end

test = AorB.new
test.a #-> "a"
test.b #-> "b"
test.foo #-> "foo"


You can download the full code for this from my projects page.

Follow up: Maybe Lessig is wrong...

I wrote a pretty long blog post last night about Google Book Search and about Kelly v Arribasoft (see that post for links to Lessig's argument about it)

Here's a summary of last night's explanation: When you download an image, you are also making a copy of it. This copy is then processed to create a thumbnail, which the courts was valid fair use. I claimed that the situation in GBS is the analogous: You copy the book, process it, create snippets.

After thinking about it a little more this morning, I'm no longer totally convinced Kelly v Arribasoft applies in this case. Is copying a book really the same as downloading an image? I'm not so sure anymore.

The main difference is who creates the copy:
  • On a website, the full copy is made by the webserver when the data is being transmitted through the connection. Assuming the webserver is "authorized" to do this by the owner of the image, there is no copyright issue here. In the physical world, this is just like if the printing presses at O'Reilly created an extra copy of a book that it owns publishing rights to.
  • In GBS, Google is making the full copy, without any authorization from the publisher or author.


I'm only going on the information in both pieces linked to last night. Perhaps someone with some more information can clarify the issue?

Monday, January 16, 2006

Google Book Search Drama

I haven't been following the Google Book Search drama too closely, but a couple of good arguments have been going on in the blogosphere this weekend about it. Here is Lessig's pro-Google argument (it's a video), and here is Brian Dear's rebuttal. I don't agree with several of Dear's points.

Lessig compares GBS with serving thumbnails on products like google image search, which is was found to be valid fair use in the Kelly v Arribasoft case. Dear contends that while the display of book "snippets" may be protected by fair use, if Google is making a full copy of the book and then generating snippets from it, fair use certainly doesn't cover that initial full copy, as in RIAA vs MP3.com.

However, in order to generate a thumbnail, one also needs to make a copy of the original image-- by downloading it. Then, the original is processed to produce a thumbnail, and the original is deleted. The GBS implementation can easily follow the same process: Make a copy of the book and index it. For each search term in the reverse index into that book, create a snippet of a few lines around that term. Once you are done, delete the full copy. At a high level, both processes are identical. Kelly v Arribasoft says that following this process to produce a transformation of the original work is fair use.

The key is the transformation-- no one considers the thumbnail to be a replacement for the original image. Similarly, a 3 line snippet is no substitute for the book. On the other hand, an mp3 album is easily a substitute for a full album on CD. This is the main difference between this scenario and the MP3.com case.

Lessig claims that GBS would give us access to a significant chunk of our past that would otherwise be either lost or expensive to learn from. This is a common theme in a lot of Lessig's writing. Dear claims that this doesn't matter: "copyright is copyright," he says.

That is true, copyright is copyright. However, The whole point of copyright is to give authors adequate economic incentive to create, so that they may enrich our culture. GBS is a clear example of a technology that has the potential to greatly enrich our culture, by allowing free access to otherwise hard to reach information. A good friend of mine, Joshua Steinman, wrote an article for the Chicago Maroon where he presents his own solution to the GBS controversy, that explains this point fairly well:
To the extent that the history of humanity is a history of recorded text, written work in spirit belongs to the canon of our collective soul, and, in a pure realization of culture, written words would belong to the collective consciousness, and would be easily accessible. [...] electronic categorization presents the next best option for the efficient collection of human knowledge.


I agree with Randy Picker of the UChicago Law School that although a market failure in copyright justifies fair use, Google needs to give a simple method for publishers to opt-out of the snippets, similar to how websites can opt-out from being included in a web search index. Fair use and an opt-out mechanism would allow GBS to provide a great service to society while respecting the rights of publishers and authors. Rights that they aren't even benefiting from if the books are out of print.

Personally, I can't wait until this lawsuit is over, so that Google can build out GBS and release a Book Search API. Picture this scenario: You are writing a history research paper and you need a quote. Instead of spending hours at the library, you just select a few words in your word processor. Right-click. Select "Find Related Quotes on GBS", and instantly find the most relevant quotes from the most important books in the field (author data, bibliographies, and a book's text should give GBS enough data to come up with pretty good results). I'm sure there are thousands of undergrads and grad students that would kill for that feature.