The "Excuse Pattern" Pattern

Here's a fool-proof technique for making any code instantly become good: prefix it with a comment which uses the word "pattern".

// this implements the modular universal computation pattern:
for (i in numberOfAtomsInUniverse) {
    calculatePi();
}

Any maintenance programmer coming along after you will instantly assume that the "pattern" you've cited is some profound bit of software engineering wisdom, and not notice that you've perpetrated a terrible crime against programming.

I hereby call this the "excuse pattern" pattern.

SQLAlchemy happiness

Disclaimer: this post is about ORMs. If you don't know what an ORM is, then this post may not be for you.

Disclaimer for people who made it through the first disclaimer: some of this code is untested. There may be typos, logical fallacies, or accidental invocations of unspeakable, nameless things from beyond time.

$DAYJOB has recently involved spending lots of time using a certain ORM which suffers from what Michael Bayer, creator of SQLAlchemy, would call "the 80% problem": that is, many toolsets solve 80% of a problem domain; if your requirements lie outside that 80%, you need roll your own solution, or add a dependency to another toolset which solves a different 80% of the problem domain.

SQLAlchemy is a Python ORM which breaks the 80% barrier. Actually it's more than that; at its heart is an SQL expression language which allows you to construct SQL expressions programmatically without having to do manual string mangling. This even includes DDL ("CREATE TABLE" and friends).

On top of that is a mapper, which allows you to map any Python class onto any SQLAlchemy "selectable". A selectable is anything you can select from; so obviously tables are selectables, and the most common use case is to map classes onto tables, so instances represent rows and foreign keys are translated into properties of instances etc.

That's fine, and it may make you happy. So far this is what all ORMs can do, even the simple ones. This is known as the Active Record pattern.

An SQL SELECT statement is also a type of selectable, because you can use it as a subquery and select from it. SQLAlchemy allows you to map a class to a select statement. Here's the example from the docs:

s = select([customers,
         func.count(orders).label('order_count'),
         func.max(orders.price).label('highest_order')],
         customers.c.customer_id==orders.c.customer_id,
        group_by=[c for c in customers.c]
    ).alias('somealias')

class Customer(object):

     pass

mapper(Customer, s)

This defines a select statement which joins customers to orders, grouping by all the fields in customers, and retrieving a count of orders per customer and the value of their highest order. Then it creates an empty class called Customer, and finally maps the class to the select statement (it's assumed that the table objects have been defined elsewhere in this example.)

The end result is like simply mapping the Customer class to the customers table, but with the benefit that all the Customer objects will have an order_count property and a highest_order property. Which might be useful if you want something like that.

Here's a simplified example from something I was mucking about with this weekend. It's a basic wiki with page-versioning abilities. There's a pages table:

CREATE TABLE pages (
        id INTEGER NOT NULL,
        name VARCHAR NOT NULL,
        PRIMARY KEY (id),
        UNIQUE (name)
);

And a versions table:

CREATE TABLE versions (
        rev INTEGER NOT NULL AUTOINCREMENT,
        page_id INTEGER NOT NULL,
        title VARCHAR NOT NULL,
        content VARCHAR NOT NULL,
        PRIMARY KEY (rev),
        FOREIGN KEY(page_id) REFERENCES pages (id)
);

And matching Page and Version classes with basic column <-> attribute mapping.

Every time a page is created or edited, a new record is created in versions. To get latest version of the content of a page (id=5), you could do:

SELECT * FROM versions
WHERE page_id = 5
ORDER BY rev DESC
LIMIT 1

Which is fine, for getting one at a time. What I want, though, is a property on Page instances called latest_version. And if I query lots of pages all at once, I want to get their latest_versions efficiently, not by running one query per Page object.

Easy. Well, no, I had to work it out, but once you understand, it's easy: SQLAlchemy mapped classes can interpret foreign key relationships, even across a "linker" table, and give you an object or a collection of objects as a property. So in my definition of Page, I have:

class Page(Base):
    # ...other junk...
    versions = relationship(Version, order_by=Version.rev.desc)
    # ...other junk...

Which basically says, "follow the only link you can find between this class's table and the Version class's table, and give me a collection of Versions on instances of this class."

That's a good-ish start. Now I can say my_page_instance.versions[0] and I get the latest one. But that triggers a query, because by default relationships are loaded lazily (mean on request). I could say:

class Page(Base):
    # (other setup)
    versions = relationship(Version, order_by=Version.rev.desc, lazy="joined")

Which would automatically join versions to pages when I query. But then I'd get one results row for every version, and even though SQLAlchemy would sort through it for me and create the right objects, it would be slow and yucky. 99% of the time I just want the latest version of each page.

The solution:

  • When you tell SQLAlchemy about a relationship, it understands many-many joins across "linker tables"
  • and by "linker tables" I mean any selectable
  • and a subquery is a selectable
  • so I can do a subquery which aggregates versions and gets max(rev) for every page_id, and use that to join pages to versions
  • and I can do this right in the ORM, and SQL will understand and create the right objects for me
In full, it looks a little something like this (using SQLAlchemy's "declarative" extension to cut down on typing):

class Page(Base):
    __table__ = Table("pages", Base.metadata,
        Column("id", Integer, primary_key=True),
        Column("name", String(1000), nullable=False, unique=True)
    )
    versions = relation(Version,order_by=Version.rev.desc) # our old friend

    # a select statement, written in SQL expression language
    max_rev_query = select(
        [
            Version.page_id.label("pageid"),
            func.max(Version.rev).label("max_rev")
        ],
        group_by=Version.page_id
    )
   
    # a many-many relationship, using that select statement as "linker"
    _latest_version = relationship(Version,
        secondary=max_rev_query,
        primaryjoin = max_rev_query.c.pageid == __table__.c.id,
        secondaryjoin = max_rev_query.c.max_rev == Version.rev,
        foreign_keys = [max_rev_query.c.pageid, max_rev_query.c.max_rev],
        lazy="joined",
        viewonly=True)
   
    # custom accessor for many-many to make it look one-one
    @synonym_for('_latest_version')
    @property
    def latest_version(self):
        return self._latest_version[0]

(That last block takes the _latest_version property and wraps in a special accessor which just gets _latest_version[0], because _latest version is a many-many relationship so it returns a collection).

So now, out there in application-land, I can go:

pages = session.query(Page).order_by(sql.desc("max_rev")).limit(5)

for page in pages:
    print page.id, " ", page.latest_version.title, "\n"


And it prints out the last five pages to be edited, in order, along with their most recent titles. This is the kind of crazy fun that SQLAlchemy totally lets you do right in the ORM, where other, lesser solutions throw up their hands and say "Hey, you can  always pass plain SQL!" Doesn't that always sound like an admission of defeat?

Lastly, I'll just point out one more efficiency win here. You may have looked at that last query and said, "But Neil! You're querying for five pages, retrieving their full content, and then not using the body text! So wasteful!"

If you did say that, and I'll take it on trust that you did, you're wrong. I *didn't* retrieve the full body text, because in the definition of my Version class, I did this:

class Version(Base):
    # ... other junk...
    content = deferred(Column(Text))
    # ... more junk...
   
See that deferred() wrapping the column definition? That means that unless I specify otherwise, the content column will *not* be loaded in the main query. If I try and use that property, SQLAlchemy will fire off a query to get it. If I do want to get everything all at once, I can do something like:

session.query(Page).options(undefer(Version.content)).get(47)

And I'll get page(id=47) and its latest version and content, all in one query.

The magic that makes SQLAlchemy do all this stuff is the data mapper pattern, and it's not the only ORM that uses it. For Java and .Net, Hibernate and NHibernate have been playing this game for years, and Ruby's astonishingly-named DataMapper does it too. I wrote about SQLAlchemy because, well, it's the one I've been playing with just lately, and it does absolutely rock.

Buggered by the debugger

Yesterday I checked in a change to a project I work on which moved it from jQuery 1.3.2 to 1.4.2.

There was a problem with a page which used jTemplates. IE (all versions) reported a syntax error on a very large line number (anywhere between 90,000,000 and 130,000,000,) but could not display that line (obviously.) No other browser had a problem. Well, the problem has now been resolved. The stars are unlikely to ever be aligned that way again, but just in case, here’s a short story about mime-types:

 

  •  jQuery.ajax() takes a “dataType” parameter to tell it what to do with the data it receives (decode as JSON, parse as HTML, execute as script etc.)
  • In jQuery 1.3.2, if you leave out that parameter, it defaults to treating it as plain text
  • In jQuery 1.4.2, if you leave it out, it will attempt to autodetect based on the Content-Type header of the response
  • jTemplates requests its content templates using jQuery.ajax, and leaves out the dataType parameter
  • The templates being used by the project were in files with the extension “content.js”, so IIS was serving them up as “text/javascript”
  • So jQuery was attempting to (effectively) eval() the templates, hence causing syntax error
  • Most browsers were then simply discarding that block of code
  • But in IE it gets executed *right then and there*. Unfortunately this means that although it can cause, and report, a syntax error, it’s not in a “real” line of code anywhere, so it helpfully makes up a huge line number which I can only guess is a memory offset.

 

One possible fix would have been to monkeypatch jTemplates to send a dataType parameter. What I did, though, was to rename the template *.js files to be *.html. This means no syntax error in IE, but you still get syntax highlighting while editing them.

More on HTML5gate

After thinking about HTML5gate yesterday, I was going to make a HILARIOUS spoof this morning. Then this popped up in my RSS feeder so now I don’t need to.

Incidentally, I hadn’t realized the full majesty of Apple’s epic fuckwadness until after I posted: not only are they browser-sniffing to block non-Safari browsers, but even if you spoof your UA string, many of the demos don’t work because they’re written with vendor-specific -webkit-whatever CSS prefixes. Obviously Chrome is okay because it uses the same prefix, but Firefox and Opera are out in the cold, even if they implement an equivalent property with a different prefix (-moz-whatever or just plain whatever.)

It’s just a terrible demonstration of what HTML5 is supposed to be about, and, I fear, it betrays their real attitude towards the open web.

Daring Fireball on HTML5gate

Is the problem that Apple has written Safari-specific demos, or is it that Apple is showing off things only Safari can do? There’s a big difference.

They're not only showing off things that only Safari can do - many of the demos work fine in other browsers[1]. So by elimination it must the the artificial Safari-only restriction that's the problem.

[1] In the interests of balance, it should be pointed out that other browsers can do things that Safari can't.They're all developing along ultimately convergent, but distinct paths.

Edit: John Gruber has come down on the side of reason. Maybe Apple will listen.

Apple's Safari-only HTML5 demo

So Apple have launched a showcase to show off the HTML5 features of Safari. 

They've called it an "HTML5 showcase", not a "Safari Showcase", but they're using browser sniffing to block anything other than Safari from viewing the demos. So the internet is outraged, because:
  1. Browser sniffing is kinda ghetto.
  2. Browser sniffing is kinda 1998.
  3. It implies that no other browser can do any of the stuff in the demos, when in fact many browsers can do a lot of it.
  4. At the top of the page, it says, very clearly, "Standards aren’t add-ons to the web. They are the web." And yet they're demonstrating the power of Safari, not the power of standards..
  5. chromeexperiments.com gets it right - if your browser won't support a standard needed for a particular demo, you're offered a video of it, with links to download either Chrome or Safari, and the option to go barrelling in and run it anyway.
  6. People want to see cool stuff on the web without having to inflict Apple's godawful software updater on their computer.
Apple wants perfection at all costs. This means not trying to run a demo in a browser which nominally supports a given feature, but not fast enough to make it look good. This is Apple's trademark mentality, and it's a good one because it brought us the iPhone and the iPad. But it's a closed mentality. It's defensive. It's exclusive, not inclusive. It's worked very well for Apple so far, and will continue to do so in future. It will continue to scare the competition into trying harder. But it absolutely does not belong on the web.

A New Type of Phishing Attack

  1. A user navigates to your normal looking site.
  2. You detect when the page has lost its focus and hasn’t been interacted with for a while.
  3. Replace the favicon with the Gmail favicon, the title with “Gmail: Email from Google”, and the page with a Gmail login look-a-like. This can all be done with just a little bit of Javascript that takes place instantly.

Visit Aza's article at http://www.azarask.in/blog/post/a-new-type-of-phishing-attack/ , go look at another browser tab for a few seconds, then go back.

When I did it, muscle memory actually kicked in and I started moving my pointer to the login form.