Tuesday, August 31, 2004

The File System as a Database and the Last Post of Summer

software :: python


Heh, that title reads like the title of a Victorian-era
pseduo-science journal article. However, I really do feel that August
31st is the last day of Summer... that come September first, Everything
Changes. But not in a bad way -- Fall is amazing. It's exhilarating in
a funny, erie kind of way.



So, this is the last blog entry of the Summer, 'nough said.



As for the rest of the title, I've been discussing this topic with
friends of late. Basically, how can we access the file system in
python, treat it like a database, search on it like a database, and
write code for it in such a way that when we are ready, we can
migratate to a database with no code changes (only config or module
import changes)?



Well, we've debated the issue(s) back and forth. I even asked
Phillip Eby about it on the PEAK mail list (we will be using PEAK for
this project... once we learn it!). But I think we've all been trying
to make the problem and solution too general. A very simple and rigid
API would work for us right now. It means less time spent on R&D,
and since this is for a paying customer, that means more money for us
in the long run.... as long as what we implement leaves enough
flexibility for future change.



So, without further ado, an adaptation from a post to the mail list today:



If we have a file on the filesystem, then the full path + the file
name uniquely identifies that file. In my limited knowledge of OODBs,
this is pretty standard (path-to-object = UID). Then there's the file
itself, which contains some data. Additionally, however, is the path:
it contains data that is just as important as the data inside the file.



UID: full path + filename

Data: stored in file at /fullpath/filename

Data: stored in path and filename



How do we think about this problem? If this were a table, we might be looking at a schema like this:



Table

-----

id: full_path + filename

blob: rrd file/text file/ini file/xml file with DTD/whatever

additional field 1:

additional field 2:

additional field 3:

...

additional field n:



I'm not proposing an OR mapping here: that's complicated shit. Way
beyond me. Some of the biggest brains in the software development world
are working on ways to do that which make sense and work right.



I'm just talking about doing something simple and straight-forward.
Something that's easy to configure and easy to migrate from a
filesystem to an RDBMS.



This shouldn't be as hard as I was thinking originally. The only
issue is that for every implementation, there will need to be a
configuration. This is because, by their nature, database tables and
fields defined therein are fixed; directory structures aren't/don't
have to be. A configuration would "lock" a directory structure... you'd
have to have an API (or something) that defined what each level of the
directory structure indicated, as these would have to be mappable to
fields defined in a table (for migration to a SQL framework).



Additionally, if you wanted to move the data stored in you blob
field out of its own little format into SQL, you'd have to define an
additional config/API for mapping its data to more fields in the
table...



So what you'd really have here is a directory structure schema and
then a file storage schema. Using the two together, you'd get what I
originally asked Phillip Eby about...



PEAK already has 'peak.storage.files' which lets you interact with
text files transactionally. We could do something like this for other
types of data-containing files at the end of whatever directory tree.
The combination of this with an implementation of a queryable filsystem
data interface should leave us with a fairly powerful tool for many of
our projects.



Questions to ask:

* What constitutes a database? (root dir and below?)

* What constitutes a table? (all directories at the first level, inside the root dir?)

* What constitutes a row? (every branch from root? this means all paths
from the root dir have to have the same number of dirs, subdirs, etc.)

* Can there be no file at the end of a path?

* Can there be empty files at the end of a path?

* Can there be multiple files at the end of a path?

* Can there be files in intermediate directories? (dirs that aren't the end of a path; good place for metadata?)



Tuesday, August 24, 2004

Freemind

software

It's been a couple years since I checked out freemind (a free "mind mapping" software for oganizing text/notes in loose relationships), and I just downloaded the latest version for Mac OS X. I must say, I'm impressed.

The software is very stable, provides *exactly* what I need, and is free :-) I highly recommend interested parties check it out:

http://freemind.sourceforge.net/.

Now, if they could only develop a live wiki version of the same thing ;-) Well, it's java, so I imagine it wouldn't be too hard to turn this into a servelet... of course, I'm not a java programmer and I have no right to make that statement!

Python port, anyone?


Thursday, August 19, 2004

PyHTMLWidgets

python :: html :: web



Well, I've been working with some ancient code base (1995) to produce HTML for PyHTMLWidgets.... and thanks to a friend of mine (Paul Taney), I have realized that the proper tool has been in front of me the whole time: HTMLgen.



Once I am finished with the looming projects (two weeks, I hope!), I
will update the code base for PyHTMLWidgets and provide a download.
It's not rocket science... in fact, it's not science at all: it's
convenience. In the world of awesome templating systems and complete
tool sets for projects, there are customers out there that want things
in a particular way... sometimes, this precludes the use of some of the
better tools sets out there, and the need arises for something
completely independant and without dependancies on a larger system.



Thus, the ever-present need for separate tools with the ability to loosely bind them together...



Update:



A benchmark of different xml tools in python led me to the ElementTree web site and I am excited to see what this tool can do. There are a couple possibilities for PyHTMLWidgets:



* Use HTMLgen

* Use ElementTree instead


* Rewrite HTMLgen to use ElementTree



I'll have to spend more time thinking about potential features (mmm,
candy... mmm, scope creep) as well as running some benchmarks, but this
could actually be an interesting project...



Update:



Well, I had a chance to look at the source for HTMLgen, and it's
really not what I need. On the other hand, I've been playing with
ElementTree and that is a fantastic tool set. I am very pleased with
it's potential, and I think that it will provide *very* significant
increases in speed over what we are currently using.


My Dream Home Robot

technology



Two years ago, I had the desire to build a small flying "robot"
that I could control remotely and connect to an X10 home automation
setup. Turns out that I probably couldn't have done it at the size I
wanted, seeing how Seiko Epson has just produced it and it has the
smallest, lightest gyro sensor in the world. (See http://news.bbc.co.uk/1/hi/technology/3579232.stm.)
Hell, I probably couldn't have done it no matter what, seeing how I've
never built a robot in my life! To be useful for what I wanted, it had
to be wirelessly remoteable and capable of sending images (either
static snap shots or live video). This little critter meets such
requirements.



This is especially cool to me, far more cool than wheeled robots. It
receives messages via bluetooth, and with its micro camera, you could
remote it through your house, checking things like ovens, toasters,
irons, etc. It can operate for 3 hours before running out of juice in
its on-board batteries (rechargeable?). I would imagine that once you
had a device like this plugged up to your home automation system, new
ideas for useage would crop up by the hundreds...


Wednesday, August 18, 2004

! Not Blogging

technology


Yup, that's a double negative. I have no time to do it, so I
can't... and I'm not... not... blogging. In fact, I'm supposed to be
writing code for AOL...



Anyway, I think a friend of mine got mentioned in Clay Shirky's latest article
about Situated Software (of course, when I saw that title, all I could
think about was ATHF's Meatwad singing MC Peepants'/MC Chris's lyrics
"...get re-situated..."; yeah, I know I'm fucked ;-) ). Paul Barry is
my friend's name; he's working for NYU and taking classes. Really smart
dude. He built an LMS from the ground up for NYU. Unless my memory and
email addresses have failed me, Clay typo'ed Paul's name. Or it might
not be the same Paul... still waiting verification from Paul...


Insanely Busy...

technology


I have absolutely ZERO time to blog these days. Python
programming is keeping me busy 120% of the day. The other 3.2 hours of
my 32 hour days (yeah, that's my circadian rythm... or however the hell
you spell it) are spent learning programming concepts and theory I am
only becoming aware of through amazing and ungodly uberhackers in the
open source community.



Business is good, but I would love to have more time to ruminate on
the things the sundry collection of neurons in my body find amusing.
These days, it seems to be matters sociological and economic that
really do it for me. Of particular interest: need to comment on



* one of Tim O'Reilly's excellent essays

* The Dyson's at OSCon

* The social and catalytic impact conferences can have


* architect Christopher Alexander's three-volume set

* Conversation with Alan Thompson about software glue and business trends



We shall see...



Until then, I leave the place-holder and personal reminder ;-)