Monday, December 08, 2008

The State of Graphs in Python


There is a sad need for standardization of graphs in Python. The topic has come up numerous times on various mail lists, news groups, forums, etc. There is even a wiki page dedicated to the discussion of the topic on python.org. Ach, when will the madness end?

As far as I can tell, Guido van Rossum essentially solved this issue 10 years ago when he published his paper on Python Patterns - Implementing Graphs. The graph representation is a simple dict and he provided a few functions for demonstration purposes. In 2004, UC Irvine professor David Eppstein started making public his Python graph-theoretic efforts (with a functional programming approach). Both of these represent a direct approach that appeals to my aesthetic sense.

Now, after years of tracking the lack of progress made in standardizing graph representations in Python, I've recently had strong need of them. I did some checking around, and found projects that potentially met my needs. Sadly, none of them had the simplicity of Guido's original implementation (and therefore, anticipated speed benefits).

I was looking for graph implementations with no cruft, no external dependencies, no afterthoughts. I need something that balances runtime performance with a usable API, preferably created using PEP-8 (or similar) coding style.

Here's what I found, with some notes that I used to make a decision for my own needs:
  • PADS - David Eppstein's work; functional programming style; very strong math; leaves the implementation of the graph up to the developer-user
  • altgraph - too many utility and special-purpose methods for my taste; uses a custom graph object
  • python-graph - a new implementation; uses its own objects; seems to take the "framework" approach to graph implementation
  • graph - requires the use of custom vertex and edge objects
  • NetworkX - fairly complete; lots of redundant code; covers more than just a graph implementation (I only include it here because it seems to be fairly highly used)
If you know me, then you've guessed what's coming next. Yes, I'm going to contribute to the general chaos and announce yet another graph library. What I hope to accomplish with this is provide a very simple implementation based on Guido van Rossum's approach (dictionary-based) that doesn't consume much memory, can be operated on quickly, and can be used anywhere.

In keeping with this motivation, I've started a new project on Launchpad and named it simple-graph. My initial efforts will be aimed at implementing a dict-based graph per Guido's paper, with the possible inclusion of some of David's functions (updated to operate on a dict object). I will then spend some time taking inspiration from the best of what the other graph libraries have to offer while keeping things simple.

As I stated on the web panel at PyCon 2007, diversity is a good thing; it gives us a rich gene pool from which a full and healthy process of natural selection may occur. Let's hope that the efforts of so many Python programmers eventually lead to the inclusion of a graph object in the Python standard library.


17 comments:

Mike Thompson said...

Here's another for the list:

http://www.osl.iu.edu/~dgregor/bgl-python/

René Dudfield said...

hello,

maybe call it simplegraph so it is a pdp-8 compatible module name?

Another one is kjbuckets. It contained a goodish graph implementation ages ago that used much less memory than using dicts... but I don't know if it's been updated.

cu,

Unknown said...

Sorry to spoil the fun but...

I understand the pleasure to reinvent the wheel but the reasonable thing to do would be to test the different frameworks/modules.

When I mean test, I mean real code not simply looking at the docs.

Even if you do your own thing after this evaluation, you would have a much better understanding of the shortcoming of the others modules.

More, you would better understand why the original authors had made the choices they did.

More often than not, you need to code with a module to fully grok what trade-off the author have choosen.

*That* would add value, as they say.

John said...

Also igraph.

DrKJam said...

I've been using NetworkX for over a year now and have yet to find a better Python graph library.

Personally, all the features I needed are here (configurable graph behaviour, decent graph traversal, persistence, access to graphviz, etc).

However, NetworkX does have downsides. You graph is stored entirely in memory which can be very expensive and limits the maximum size of your graphs. I'd love to see a graph implementation that uses Berkeley DB under the covers. Perhaps an idea for simple-graph Duncan ;-)

Until the latest (official) release NetworkX didn't have support for graphs that stored data at the vertices (edges only in the past), but this is now solved in version 0.99 with the inclusion of LabeledGraph and LabeledDiGraph.

One serious beef I do have is how long it takes to start up. Try running this several times :-

#!/usr/bin/env python
import networkx

You'll notice it can take 2-3 seconds even with all the .pyc files in place.

IMHO, this is too costly for anything other than fairly long running processes. This is a real shame and indeed an argument for something a bit simpler with less of the 'kitchen sink' about it.

I'll be watching your progress with interest ;-)

Dave M.

Duncan McGreggor said...

It looks like I need to clarify a little more. And then I'll address the comments that have been posted :-)

What would we expect from a graph module in the standard library? What sort of functionality would it contain? What dependencies would it have?

In my opinion, such an object would have a bare minimum of methods -- just those needed to actually define a graph and a graph's basic operations. It would have no dependencies other than that which is already in the standard Python library. Anything visual (including labels) is right out. That sort of thing should be supported by projects that need it, not the base object itself.

This graph object should -- given its simplicity -- be a perfect basis for all the specialized graphs that various projects need. The intent of this project is not so much a reinvention of the wheel, but rather determining the most basic version of the wheel that could (ideally) be used as the basis for other wheels.

Now to respond to the comments...

Mike and John:

Thanks for the links. As you know,the projects you indicated are not simple graph implementations, but rather something like a "graph suite". NetworkX falls into that same category; the only reason I included it in my original list is because it seems to be the most popular.

illume:

The project is called "simple-graph" but the module name (and in fact, the URL) use the name "simplegraph."

LBaret:

No fun was spoiled :-) Did something I saw give you the impression I only looked at the docs? I'm sorry if that's the case! I have used some of the listed modules for months (and some of them even years). I am intimately familiar with their implementations and am fairly confident of the reasons certain decisions were made. Of them all, Bob's probably comes the closest to meeting my needs.

DrKJam:

Sadly, NetworkX has built up some cruft over time. The project could probably benefit from a long "tuning" sprint where everyone worked on improving test coverage (mostly edge cases) and refactoring to eliminate redundant code.

Thanks for the feedback folks, and I hope this response has cleared up any confusion :-)

Unknown said...

If your goal is to have a basic graph module in the standard library, I think it would be a good idea to write (and submitting) a PEP, to check if the core developers think that such a module is a desirable feature.

Duncan McGreggor said...

usagi:

If all goes well, that is exactly what I would like to see happen. However, my immediate goal is to have a module that could be easily accepted into the standard library. Whether it goes beyond that (and does eventually get accepted) is yet to be seen. If it does get that far, I'll be looking at PEP 218. But! One step at a time :-)

Regardless of how the core developers feel about a graph object in the standard library, one is certainly needed in the community. A basic graph with basic operations could benefit everyone.

Dorian said...

This might be useful to you too:
http://www.sagemath.org:9001/graph_survey

http://wiki.sagemath.org/graph

Anonymous said...

I went through the same list when porting LAD (lines and dots) from C++ to Python, ultimately deciding on NetworkX. NetworkX is used by Sage and is pure Python (you can easy_install it!), both of which are pluses for me.

Now for some shameless self promotion:
LAD - Lines and Dots, a graph theory exploration and research program

Unfortunately I haven't had the time or motivation to continue working on it. I thought it was a promising idea though and would like to pick up where I left off someday. I even still have the TODO list for it, and support for other graph libraries is in there!

Kamil Kisiel said...

I'd love to help out with this effort. I've been wishing for a clean implementation of a bare-bones graph and tree module for a long time, but don't have enough confidence in my own Python-fu to create something general-purpose enough for mass consumption.

I've whipped up some basic directed and undirected graph classes based on dictionaries.

If you actually want to go ahead with your plan, let me know, I'd be glad to try and help out.

Duncan McGreggor said...

Kamil,

Thanks for your interest! I've created a new team on Launchpad for this. If you can email me your lp id, I will invite you.

Also, if you branch the project ("bzr branch lp:simplegraph"), just add a sandbox directory for yourself, add your dict classes, and push it up to Launchpad ("bzr push lp~yourusername/simplegraph/yourbranchname") and I'll take a look at the code.

All the stuff I've been working on is also in the sandbox... ;-)

Thanks!

mrjbq7 said...

Here's another for you to look at:

http://nodebox.net/code/index.php/Graph

geremycondra said...

Yet another one for the heap- we call it Graphine, and we're aiming at Python 3. You can find us here

Duncan McGreggor said...

gcc,

Graphine looks fantastic! Seriously. This is just the sort of thing I was looking for: nice and simple, something that anyone with special graph-theoretic needs could base their work on.

/me muses on a Python 2.x branch...

dobson said...

I'm one of the python-graph maintainers. Thanks for the link. I'd love to know if anybody is using our stuff.

We've recently changed focus to Python 3.x - it puts us in a better place. Our focus is to create a truly modular collection of graph tools. The core libraries have no dependancies, however we support optional add-ins which provide support for popular products such as Dot or Numpy.

Bruce said...

When I started reading this article, I immediately identify with your intent. In my case, I am constantly on the hunt for good numerical methods. However most responses to questions trigger a gush of library names... and this is less than useless. These names all usually found on the 1st Google search anyway and of course I would try them out.

The reason that Guido's article was so powerful is because he provides a very simple graph definition and demonstrated some basic searches with raw Python. Nothing more.

It's not a mistake but I think you left out a useful step in your program development.

For example, I have been playing with A* search. All implementations I found on line were too complicated and hide the logic of the algorithm. As far as I am concerned, the raw Python is even better than the pseudocode in illustrating the logic.

Take a leaf out of Guido's book and start by extending his functions.

[see here for my feeble attempt...]