Relations with Python Named Tuples


Back in 2006, I wrote an entry called Python Tuples are Not Just Constant Lists which, after the Dr Horrible covers is my most visited blog post ever.

In it, I suggest that:

the index in a tuple has an implied semantic. The point of a tuple is that the i-th slot means something specific. In other words, it's a index-based (rather than name based) datastructure.

In that same post, I pointed out the connection between the relational algebra and this notion of a tuple and further suggested that:

it might be useful to have the notion of a tuple whose slots could additionally be named and then accessed via name.

I implemented aspects of this in my initial explorations of relational python. Basically a relation in relational algebra is a set of dictionaries (called tuples) where each dictionary has identical keys (called attributes). In Basic Class for Relations, I actually use Python tuples internally but they go in and out as dictionaries. As I said in that post:

Basically, I store the each tuple internally as a Python tuple rather than a dictionary and the relation also keeps an ordered list of the attributes which is used as the index into the tuples. Amongst other things, this gets around dictionaries not being hashable. It's also a storage optimization akin to using slots for Python attributes.

Here is a slightly cleaned up version of my code at the time:

class Rel: def __init__(self, *attributes): self.attributes_ = tuple(attributes) self.tuples_ = set() def add(self, **tupdict): self.tuples_.add(tuple([tupdict[attribute] for attribute in self.attributes_])) def attributes(self): return set(self.attributes_) def tuples(self): for tup in self.tuples_: tupdict = {} for col in range(len(self.attributes_)): tupdict[self.attributes_[col]] = tup[col] yield tupdict

One could then say stuff like:

rel1 = Rel("ENO", "ENAME", "DNO", "SALARY") rel1.add(ENO="E1", ENAME="Lopez", DNO="D1", SALARY="40K") rel1.add(ENO="E2", ENAME="Cheng", DNO="D1", SALARY="42K") rel1.add(ENO="E3", ENAME="Finzi", DNO="D2", SALARY="30K")

and in subsequent posts I started to show how some relational operations could be performed on this datastructure.

Well, now in Python 2.6, some of this can be simplified. Python 2.6 introduced a wonderful new collections type called a named tuple — a tuple whose slots can also be addressed by name.

Now I can do something similar to Rel above as follows:

from collections import namedtuple class Rel: def __init__(self, typename, field_names): self.tuple_type = namedtuple(typename, field_names) self.tuples = set() def add(self, **tupdict): self.tuples.add(self.tuple_type(**tupdict)) def attributes(self): return set(self.tuple_type._fields)

and use it as follows:

rel1 = Rel("EMPLOYEE", "ENO ENAME DNO SALARY") rel1.add(ENO="E1", ENAME="Lopez", DNO="D1", SALARY="40K") rel1.add(ENO="E2", ENAME="Cheng", DNO="D1", SALARY="42K") rel1.add(ENO="E3", ENAME="Finzi", DNO="D2", SALARY="30K")

then:

>>> rel1.attributes() set(['SALARY', 'ENAME', 'DNO', 'ENO']) >>> rel1.tuples set([EMPLOYEE(ENO='E1', ENAME='Lopez', DNO='D1', SALARY='40K'), EMPLOYEE(ENO='E3', ENAME='Finzi', DNO='D2', SALARY='30K'), EMPLOYEE(ENO='E2', ENAME='Cheng', DNO='D1', SALARY='42K')]) >>> for employee in rel1.tuples: print employee.ENO, employee.ENAME E1 Lopez E3 Finzi E2 Cheng

The original post was in the categories: python relational_python but I'm still in the process of migrating categories over.

The original post had 2 comments I'm in the process of migrating over.