Saturday, December 27, 2008

Intellectual Property and Open Source

A few months ago, I received a complementary copy of Van Lindberg's new O'Reilly book Intellectual Property and Open Source: A Practical Guide to Protecting Code and the first thing that happened at home when the book was unwrapped was three of us began arguing over who got to read it first.

This may seem like an odd thing to happen for what one could easily assume was a dry and less than interesting topic. However, at the time I was strongly considering the possibility of beginning a non-tech-industry startup built with both open source and proprietary code. The discussions with the potential founders of the startup had been very vigorous and exciting, but the big questions that remained revolved around patents, protecting IP, and providing protection against big business while still offering powerful, free code for use by individuals/private consumers. If you've read the book or even seen the table of contents, you can see why everyone wanted to be the first to read it and learn from the insights provided between its covers.

Instead of jumping into another startup, I ended up joining Canonical; this has kept me both very busy and exceptionally happy. The holiday break has provided an opportunity to finish reading the book, and it has been a delight. I have friends working on startups that depend upon exciting code to power some or all of the business models for their visions, and reading this book should be on their shelves, close at hand. Even if you're not involved directly with open source and intellectual property, this book is an excellent read.

Intellectual Property and Open Source accomplishes a difficult goal of sharing dense information while making the subject matter engaging. This is done through examples, thought experiments, and well developed analogies. Van does an excellent job of igniting a powerful curiosity on the part of the reader while providing rewards for this in the lucid explanations of related laws and perspectives. I am resisting the urge to turn this post into a long series of quotes, but at the very least I want to mention a few little "spoilers" ;-)

The book starts off with an excellent foundation, giving an overview of the origins of intellectual property from an economic and legal perspective. This was particularly useful for me, as I have no background in this field. Van Lindberg does a really great job of expressing some of the widely held (and diverse) views of IP in the open source community.

The book then launches the reader into an array of well organized chapters on patents, the patent system, trademarks, copyright, trade secrets and licenses. Every open source developer should read chapter 10 on choosing an open source license (the opening dialog had me laughing out loud, a hilarious parody of news groups and IRC arguments as well as a nod to Princess Bride). There's also a chapter dedicated to patches and their relationships to copyright; another on reverse engineering; and the final one provides information and advice on establishing non-profits for open source projects -- the author even gives mention to our friends at the Software Freedom Conservancy (the umbrella non-profit for the Twisted Software Foundation).

In all honesty, I can't rave enough about this book. I've re-read parts of it just because I enjoyed the clarity of the explanations so much. Law is a twisty maze of easily confused subtleties to those who have not been trained in its dark arts. Through explicit language and examples, the author guides us past pitfalls of misunderstanding and brings us directly to all the major points.

If you are an Amazon shopper, you may want to act quickly: last I checked, there were only two copies left.


Monday, December 15, 2008

Ubuntu Developer Summit

For the past two weeks, I've been listening, learning, discussing, and hacking various Landscape and Ubuntu initiatives with members of the Ubuntu community and fellow Canonical employees. It was an amazing experience, and we've got the next 6 months crammed full of plans... with the next 3 months already spec'ed out.

Canonical has surprised me. It's an extraordinary company... both in the modern business-sense of the word as well as the original sense: a fellowship of companions with a common goal. While so far I have only had a chance to hear some personal histories, it's evident that every member of this company is an extraordinary individual with a rich background and a great deal to offer to the whole. Everyone works with an unprecedented amount of motivation towards the company vision, one that is well and tightly integrated into the corporate culture.

There is a bright future ahead for this amazing group...

Monday, December 08, 2008

The State of Graphs in Python

There is a sad need for standardization of graphs in Python. The topic has come up numerous times on various mail lists, news groups, forums, etc. There is even a wiki page dedicated to the discussion of the topic on Ach, when will the madness end?

As far as I can tell, Guido van Rossum essentially solved this issue 10 years ago when he published his paper on Python Patterns - Implementing Graphs. The graph representation is a simple dict and he provided a few functions for demonstration purposes. In 2004, UC Irvine professor David Eppstein started making public his Python graph-theoretic efforts (with a functional programming approach). Both of these represent a direct approach that appeals to my aesthetic sense.

Now, after years of tracking the lack of progress made in standardizing graph representations in Python, I've recently had strong need of them. I did some checking around, and found projects that potentially met my needs. Sadly, none of them had the simplicity of Guido's original implementation (and therefore, anticipated speed benefits).

I was looking for graph implementations with no cruft, no external dependencies, no afterthoughts. I need something that balances runtime performance with a usable API, preferably created using PEP-8 (or similar) coding style.

Here's what I found, with some notes that I used to make a decision for my own needs:
  • PADS - David Eppstein's work; functional programming style; very strong math; leaves the implementation of the graph up to the developer-user
  • altgraph - too many utility and special-purpose methods for my taste; uses a custom graph object
  • python-graph - a new implementation; uses its own objects; seems to take the "framework" approach to graph implementation
  • graph - requires the use of custom vertex and edge objects
  • NetworkX - fairly complete; lots of redundant code; covers more than just a graph implementation (I only include it here because it seems to be fairly highly used)
If you know me, then you've guessed what's coming next. Yes, I'm going to contribute to the general chaos and announce yet another graph library. What I hope to accomplish with this is provide a very simple implementation based on Guido van Rossum's approach (dictionary-based) that doesn't consume much memory, can be operated on quickly, and can be used anywhere.

In keeping with this motivation, I've started a new project on Launchpad and named it simple-graph. My initial efforts will be aimed at implementing a dict-based graph per Guido's paper, with the possible inclusion of some of David's functions (updated to operate on a dict object). I will then spend some time taking inspiration from the best of what the other graph libraries have to offer while keeping things simple.

As I stated on the web panel at PyCon 2007, diversity is a good thing; it gives us a rich gene pool from which a full and healthy process of natural selection may occur. Let's hope that the efforts of so many Python programmers eventually lead to the inclusion of a graph object in the Python standard library.