Disclaimer: this post is about ORMs. If you don't know what an ORM is, then this post may not be for you.
Disclaimer for people who made it through the first disclaimer: some of this code is untested. There may be typos, logical fallacies, or accidental invocations of unspeakable, nameless
things from beyond time.
$DAYJOB has recently involved spending lots of time using a certain ORM which suffers from what Michael Bayer, creator of SQLAlchemy, would call "the 80% problem": that is, many toolsets solve 80% of a problem domain; if your requirements lie outside that 80%, you need roll your own solution, or add a dependency to another toolset which solves a different 80% of the problem domain.
SQLAlchemy is a Python ORM which breaks the 80% barrier. Actually it's more than that; at its heart is an SQL expression language which allows you to construct SQL expressions programmatically without having to do manual string mangling. This even includes DDL ("CREATE TABLE" and friends).
On top of that is a mapper, which allows you to map any Python class onto any SQLAlchemy "selectable". A selectable is anything you can select from; so obviously tables are selectables, and the most common use case is to map classes onto tables, so instances represent rows and foreign keys are translated into properties of instances etc.
That's fine, and it may make you happy. So far this is what all ORMs can do, even the simple ones. This is known as the
Active Record pattern.
An SQL SELECT statement is also a type of selectable, because you can use it as a subquery and select from it. SQLAlchemy allows you to map a class to a select statement. Here's the example from the docs:
s = select([customers, func.count(orders).label('order_count'), func.max(orders.price).label('highest_order')], customers.c.customer_id==orders.c.customer_id, group_by=[c for c in customers.c] ).alias('somealias')
class Customer(object): pass mapper(Customer, s) This defines a select statement which joins
customers to
orders, grouping by all the fields in
customers, and retrieving a count of orders per customer and the value of their highest order. Then it creates an empty class called
Customer, and finally maps the class to the select statement (it's assumed that the table objects have been defined elsewhere in this example.)
The end result is like simply mapping the
Customer class to the
customers table, but with the benefit that all the
Customer objects will have an
order_count property and a
highest_order property. Which might be useful if you want something like that.
Here's a simplified example from something I was mucking about with this weekend. It's a basic wiki with page-versioning abilities. There's a
pages table:
CREATE TABLE pages ( id INTEGER NOT NULL, name VARCHAR NOT NULL, PRIMARY KEY (id), UNIQUE (name) ); And a
versions table:
CREATE TABLE versions (
rev INTEGER NOT NULL AUTOINCREMENT,
page_id INTEGER NOT NULL,
title VARCHAR NOT NULL,
content VARCHAR NOT NULL,
PRIMARY KEY (rev),
FOREIGN KEY(page_id) REFERENCES pages (id)
);
And matching
Page and
Version classes with basic column <-> attribute mapping.
Every time a page is created or edited, a new record is created in
versions. To get latest version of the content of a page (id=5), you could do:
SELECT * FROM versions
WHERE page_id = 5
ORDER BY rev DESC
LIMIT 1
Which is fine, for getting one at a time. What I want, though, is a property on
Page instances called
latest_version. And if I query lots of pages all at once, I want to get their
latest_versions efficiently, not by running one query per
Page object.
Easy. Well, no, I had to work it out, but once you understand, it's easy: SQLAlchemy mapped classes can interpret foreign key relationships, even across a "linker" table, and give you an object or a collection of objects as a property. So in my definition of Page, I have:
class Page(Base):
# ...other junk...
versions = relationship(Version, order_by=Version.rev.desc)
# ...other junk...
Which basically says, "follow the only link you can find between this class's table and the
Version class's table, and give me a collection of
Versions on instances of this class."
That's a good-ish start. Now I can say my_page_instance.versions[0] and I get the latest one. But that triggers a query, because by default relationships are loaded lazily (mean on request). I
could say
: class Page(Base):
# (other setup)
versions = relationship(Version, order_by=Version.rev.desc, lazy="joined")
Which would automatically join versions to pages when I query. But then I'd get one results row for every version, and even though SQLAlchemy would sort through it for me and create the right objects, it would be slow and yucky. 99% of the time I just want the
latest version of each page.
The solution:
- When you tell SQLAlchemy about a relationship, it understands many-many joins across "linker tables"
- and by "linker tables" I mean any selectable
- and a subquery is a selectable
- so I can do a subquery which aggregates versions and gets max(rev) for every page_id, and use that to join pages to versions
- and I can do this right in the ORM, and SQL will understand and create the right objects for me
In full, it looks a little something like this (using SQLAlchemy's "declarative" extension to cut down on typing):
class Page(Base):
__table__ = Table("pages", Base.metadata,
Column("id", Integer, primary_key=True),
Column("name", String(1000), nullable=False, unique=True)
)
versions = relation(Version,order_by=Version.rev.desc) # our old friend
# a select statement, written in SQL expression language
max_rev_query = select(
[
Version.page_id.label("pageid"),
func.max(Version.rev).label("max_rev")
],
group_by=Version.page_id
)
# a many-many relationship, using that select statement as "linker"
_latest_version = relationship(Version,
secondary=max_rev_query,
primaryjoin = max_rev_query.c.pageid == __table__.
c.id,
secondaryjoin = max_rev_query.c.max_rev == Version.rev,
foreign_keys = [max_rev_query.c.pageid, max_rev_query.c.max_rev],
lazy="joined",
viewonly=True)
# custom accessor for many-many to make it look one-one
@synonym_for('_latest_version')
@property
def latest_version(self):
return self._latest_version[0]
(That last block takes the _latest_version property and wraps in a special accessor which just gets _latest_version[0], because _latest version is a many-many relationship so it returns a collection).
So now, out there in application-land, I can go:
pages = session.query(Page).order_by(sql.desc("max_rev")).limit(5)
for page in pages:
print
page.id, " ", page.latest_version.title, "\n"
And it prints out the last five pages to be edited, in order, along with their most recent titles. This is the kind of crazy fun that SQLAlchemy
totally lets you do right in the ORM, where other, lesser solutions throw up their hands and say "Hey, you can always pass plain SQL!" Doesn't that always sound like an admission of defeat?
Lastly, I'll just point out one more efficiency win here. You may have looked at that last query and said, "But Neil! You're querying for five pages, retrieving their full content, and then not using the body text! So wasteful!"
If you did say that, and I'll take it on trust that you did, you're wrong. I *didn't* retrieve the full body text, because in the definition of my
Version class, I did this:
class Version(Base):
# ... other junk...
content = deferred(Column(Text))
# ... more junk...
See that deferred() wrapping the column definition? That means that unless I specify otherwise, the content column will *not* be loaded in the main query. If I try and use that property, SQLAlchemy will fire off a query to get it. If I do want to get everything all at once, I can do something like:
session.query(Page).options(undefer(Version.content)).get(47)
And I'll get page(id=47) and its latest version and content, all in one query.
The magic that makes SQLAlchemy do all this stuff is the
data mapper pattern, and it's not the only ORM that uses it. For Java and .Net, Hibernate and NHibernate have been playing this game for years, and Ruby's astonishingly-named DataMapper does it too. I wrote about SQLAlchemy because, well, it's the one I've been playing with just lately, and it does absolutely rock.