James Tauber

journeyman of some

blog > 2005 > 11 > 30 >

Relational Python: Restrict

The next relational operator I implemented in my relational python experiment was RESTRICT.

Restrict filters the set of tuples by some condition. In many formulations of the relational algebra, the restriction is on just one attribute at a time and you chain RESTRICTs together but for relational python, we'll use a lambda that can do rich testing across one or more attributes.

Here was my first attempt:

def RESTRICT(orig_rel, restriction):
    new_rel = Rel(orig_rel.attributes())
    for tup in orig_rel.tuples():
        if restriction(tup):
            new_rel.add(tup)
    return new_rel

but I thought it would be neater as a list comprehension. The only problem was, there was no way in Rel to add multiple tuples at a time so I added the following method to Rel:

    def add_multiple(self, tupset):
        self.tuples_.update(set([self._convert_dict(tup) for tup in tupset]))

This enabled me to rewrite RESTRICT as:

def RESTRICT(orig_rel, restriction):
    new_rel = Rel(orig_rel.attributes())
    new_rel.add_multiple([tup for tup in orig_rel.tuples() if restriction(tup)])
    return new_rel

Here's how to use the function:

rel4 = RESTRICT(rel1, lambda tup: tup["SALARY"] > "30K")

And, of course, I had to write a lazy "view" version:

class RESTRICT_VIEW(Rel):

    def __init__(self, orig_rel, restriction):
        Rel.__init__(self, orig_rel.attributes())
        self.orig_rel = orig_rel
        self.restriction = restriction

    def add(self, tup):
        raise Exception

    def tuples(self):
        for tup in self.orig_rel.tuples():
            if self.restriction(tup):
                yield tup

As always, suggestions for improvements are welcome in comments.

Next up, I'll implement the cross product.

Categories:
prev « python » next
prev « relational_python » next

Comments (6)

Calvin Spealman on Nov. 30, 2005:

Curious, why do you limit your relations to being expressed as mappings? Couldn't the 'columns' in this case be attributes as well? as a matter of fact, it would be more powerful in that case, allowing any python operation to be modeled relationally. Maybe. So, basically, RESTRICT is the more direct relational-algebra equivalent to the WHERE clause? I have more expirience with relational databases than relational algebra.

James Tauber on Nov. 30, 2005:

Sorry, Calvin, I'm not fully following what you are saying here.

"why do you limit your relations to being expressed as mappings?"

Are you asking why are tuples dictionaries rather than Python objects?

"Couldn't the 'columns' in this case be attributes as well?"

I assume by "attributes" you are referring to Python attributes rather than relational attributes because 'columns' *are* attributes in the terminology of relational algebra.

Yes, RESTRICT corresponds most closely to WHERE in SQL.

Dorai Thodla on March 27, 2006:

You bring back 20 year old memories with this article. Our first implementation of Integra SQL was based on relational opeators.

I think this idea has some merit. First of all, it provides a lot of the power of SQL with a simple API. It is better than a low level API like BerkeleyDB which deals with records.

By passing around references and not copying records (in project), you can improve the performance as well. My Python knowledge is very elementary. However, I think this library (as a C library embedded in different scripting languages would be a great, useful tool).

Alex on April 18, 2006:

Very interresting set of posts.
I was thinking that it would be very interesting to have such a simple implementation then be able to play with it to test a dynamic algorithm.
Say you have a request, it can be viewed as a sequence of relationnal operations, some of which may be interverted. An interesting thing would be to have them self-ordering according to real-time performance measurement.
Adding some cache on previous state for a request, trying to move them around if performance seems to change (kind of simulated requi where changes in performance warms temperature up) it could be interresting as opposed to static analysis of request made in modern database engines.

Dan Milstein on Sept. 13, 2006:

I would be interested in seeing if there was a way to hook those operators up to a persistent RDBMS backend, rather than an in-memory collection of data.

You'd get the benefit of the elegant and flexible relational algebra which Date describes; you'd have seamless access to persistent data from within Python (and not through a horrific ORM); you'd have all your data safe, accessible to other apps and ad hoc queries, etc, etc.

Obviously, there are some challenges (;-), but it'd be fun to go after.

ahmed on Feb. 19, 2007:

good

Created: Nov. 29, 2005
Last Modified: Nov. 29, 2005
Author: James Tauber