Skip to content

Re: Bad ORM is infinitely worse than bad SQL

This a response to Mattis Geniar’s post, titled Bad ORM is infinitely worse than bad SQL, where he talks about his bad encounters with ORMs, and how easy it is to mess up performance, and forget about what really happens behind the scenes. He ends the post with

If you disagree with me, prove me wrong.

Be my guest.

Edit 2012.04.15.

I have kinda went off the point with this article. I felt that the original article generalized too much, by saying things like If you’ve done PHP for more than 3 months, the above will give you shivers down your spine. This post is more about how to use an ORM correctly, and what to look out for, to avoid the problem that the original author mentions (which is very easy to avoid). My opinion still stands, an ORM is just a tool like everything else, that you can use incorrectly, and that will lead to trouble. I don’t believe it is harder to fix problems that arise from using an ORM incorrectly, than fixing badly performing SQL. So, if you are interested, read on.

I will be using Doctrine2 as the ORM for my examples. I have the whole demo app up on GitHub if you want to check it out. It uses Composer to pull in the dependencies, see the README on how to bootstrap, and run the application.

So, let’s take a look at the first code piece

// First, get all the companies in your database
$companies = $this->getAllCompanies();
$totalValue = 0;
foreach ($companies as $company) {
    // For each company there is, retrieve the total value of all their orders
    $orders = $company->getOrders();
    foreach ($orders as $order) {
        $totalValue += (float) $order->getValue();
    }

    echo "This company made us ". $totalValue ." euro already.";
}

If you’ve done PHP for more than 3 months, the above will give you shivers down your spine. It looks like perfectly valid code, it even makes sense to read it. But in reality, it probably does something like this.
[...]

This is a classical example of the N+1 problem. I’ll quote one of my earlier articles (which is incidentally about optimizing a PHP site, which uses an ORM):

In case you are not familiar with the expression, the N+1 problem is when you have an entity with a to-many, relation to another entity, and you execute 1 query to get the root entity, and execute N queries to get all the related entities (the many side), resulting in N+1 queries. I don’t have to say how inefficient is this, and I have a feeling this is why many projects blame the ORM for not scaling. You can avoid this, if you correctly use your ORM.

The above PHP code is perfectly fine, it’s just a matter of the Company entities in the $companies array were fetched. If they were fetched lazily, then you have a problem. If they have been fetched eagerly, then you are golden. So how do you specify the fetch method? There are several ways, I’m going to show one solution, which involves using DQL (Doctrine Query Language), which is similar to SQL, but instead of referencing tables and columns, you refer to entities, and their properties. The difference is subtle, and if you know SQL, you can grasp DQL in 5 minutes.

Eager load

$dql = "SELECT c, o
        FROM   ORMDemo\\Company c
        JOIN   c.orders o";
$query = $em->createQuery($dql);
return $query->getResult();

Lazy load

$dql = "SELECT c
        FROM   ORMDemo\\Company c
        JOIN   c.orders o";
$query = $em->createQuery($dql);
return $query->getResult();

Notice the difference? It’s the missing “o” from the SELECT clause. Entities that you refer to in the SELECT clause will get eager loaded, everything else will be loaded on the first access (lazy). In case you are wondering, the first DQL example will be translated to the following SQL:

SELECT c0_.id AS id0, c0_.name AS name1, o1_.id AS id2, o1_.value AS value3, o1_.company_id AS company_id4
FROM companies c0_
INNER JOIN orders o1_ ON c0_.id = o1_.company_id

This a pretty perfect query to use in this case. Obviously the column names are horrible, but you will never see them, because your application will get the hydrated models. Kinda sucks when debugging, agreed.

My problem with ORM is that it’s too easy to write bad code. It’s very easy to use all the default mappings and just do “SELECT *” in the background.

Ever nuked a partition by swapping the parameters for dd by mistake, when all you tried to do was write your [favorite linux distro] image to your USB drive? Wanted to empty a folder, and inadvertently pressed space, and executed rm -rf ./folder/ *? Because these types of errors are also easy to make, yet everyone still uses rm -rf, and wikis and tutorials still recommend using dd for making a bootable pendrive with your favorite distro on it. The destruction that these commands can bring is also much greater.

Just like with every tool out there, you can shoot yourself in the foot with an ORM, yes, but you can take precautions. Make a query logger, that counts the number of queries executed, and if that number gets too high, pop up in image of Clippy, telling the developer that “It seems you executed 94 queries, would you like help?”. Make it output this number to the lower right corner, so it can be always seen. If you have QA, ask them to report any page they encounter, where this number is greater than 20. Even if you mess up, fixing it is just as easy, because you just have to modify the DQL query to include the relations in the SELECT clause, and you are done.

It’s also just as equally easy to blow yourself up with SQL, because then you will have to worry about SQL injection, and those are harder to track down. Also what’s the worst that can happen if your app goes out to production with shitty queries by the ORM? It will crash under the load, and will stay down until you fix the queries.

What happens if your app goes out with SQL injection holes? It can get hacked, which can possibly lead to your database getting filled with malicious data, like backdoor users, or simply getting stolen (revealing that you store your passwords in plain text, nowadays). This is much worse than your app crashing under the load, yet, you don’t see people advocating that You should stop using SQL, and store your data in CSV. No, people advocate teaching how to correctly escape your statements. THIS IS THE SAME THING. Learn how to use your ORM, and you won’t have basic problems like this.

And don’t quote me saying that Norbert told me that I should use an ORM, so I don’t have to worry about SQL injection, because that’s not what I said, and I will hunt you down.

Every ORM-system gives you the ability to write custom SQL queries, but that sort of defies the point of ORM in general, hence hardly anyone does it.

Saying “hardly anyone does it” is pulling stuff out of the air, and it does not defy the point of an ORM. If you have some pages that require a complex query that cannot be expressed in the language of the ORM, or too cumbersome to express, or it get’s translated to an inefficient SQL query, then use native SQL, and work with the result set arrays. It’s all right. You can still enjoy the shitload amount of other features of the ORM like:

  • Lifting the boring, repetitive, error-prone CRUD work from your shoulders
  • Generating your database schema for you. Updating it, as the mapping files change
  • The previous point makes database migrations, and deployment easy. YOU DON’T HAVE TO STORE DB DIFFS.
  • Letting you listen to events on your models, like when they get loaded/saved/deleted. Cascade delete is not the same, because this is much more powerful
  • Automatic transaction handling (And I can still have explicit transactions, if I want)
  • Generating skeleton classes from your mapping files

As more and more developers use only ORM to create applications, they lose their touch with the database interaction, the queries behind it, the reasoning of why to use a certain kind of query, the performance impact of an INNER or OUTER joins, … The example above, causing many small queries, could also be replaced with one efficient query to give you the same result.

SELECT SUM(o.VALUE) AS TotalValue, c.name AS CompanyName
FROM company AS c
LEFT JOIN "order" AS o ON o.companyid = c.companyid
GROUP BY o.companyid;

Instead of lose their touch, I’d say these people simply don’t care, and that’s not the fault of the ORM. You don’t start to forget SQL because you started using an ORM. You still need the same knowledge. The same query can be easily expressed with DQL by the way:

$dql = "SELECT   c, SUM(o.value)
        FROM     ORMDemo\\Company c
        JOIN     c.orders o
        GROUP BY c.id, c.name";
$query = $em->createQuery($dql);

return $query->getResult();

Which is going to produce the same SQL:

SELECT c0_.id AS id0, c0_.name AS name1, SUM(o1_.value) AS sclr2
FROM companies c0_
INNER JOIN orders o1_ ON c0_.id = o1_.company_id
GROUP BY c0_.id, c0_.name

While the ORM example looks perfectly valid and the performance bottleneck may not immediately be clear.

They never are. You have to profile your code to see where the bottlenecks are. Using an ORM will not change this.

And that just may be my biggest frustration with ORM: as a developer, you lose focus on the underlying SQL queries. Some may claim that it’s not important, that it is exactly the reason why ORM is gaining so much attention. But if you’re serious about tuning your application, you need knowledge of the SQL that is being performed. You need to know how to write efficient queries with complex JOIN’s, GROUP BY’s and aggregate functions such as SUM(), AVG(), …

Well, don’t lose focus. The third google hit for “SQL logger Doctrine2″ is the official documentation, showing you how to register a query logger. It even gives you pointers on how to implement your own. You also have the option of calling $query->getSQL(), if you want to see individual queries. The query logger is mentioned in the Configuration section of the documentation, which will probably be read by everyone, but even if someone misses it, the first few Google hits will set them on the right track.

ORMs are not gaining attention (they have been around for a long time anyway), because they make, or try to make SQL unimportant. This is stupid, Hey, let’s learn a library specific SQL-like dialect, so we don’t have to use SQL.. You still have a need that SQL knowledge, to express the complex queries in the ORM’s language. If that is not possible, write a custom query for it, that the ORM can use to hydrate your objects. If that is also not possible, use native SQL, and map the results sets to your objects by hand. If even that is not possible/cumbersome/not worth it, then use the arrays that came back from the query.

Yes. You don’t write stupid code if you master your tools, and that includes ORM frameworks as well. I’m sure there are people out there that write perfectly performant ORM-related code, it seems I have just yet to find them.

If you disagree with me, prove me wrong.

You don’t have to master it. It certainly helps, but if you spend an hour or two flipping through the docs, you can start using it just fine.

I’m not saying that an ORM can never be a bottleneck, if you use it right, but, in my opinion, it usually is not. What I really want to get out of this article, is that an ORM is not an automatic performance killer, and it’s not easier to shoot yourself in the foot with it, than with SQL injections. Spend some time to learn it, because it can become a fucking awesome tool that can considerably speed up development.

7 Comments

  1. Björn Steinbrink wrote:

    That last example is bad, you should drop “o” from the SELECT clause.

    The generated SQL isn’t valid, because the orders columns in the SELECT clause aren’t part of the GROUP BY clause, neither are they used in an aggregate function. MySQL ignores this error (unless your sql_mode includes only_full_group_by) and gives you indeterminate values (coming from orders in the group) for the order columns in the SELECT clause.

    2012.04.04. at 07:55 | Permalink
  2. Mal Haak wrote:

    I read his. I read yours. Congrats on your strawman. You are correct you can write good code. He never said it was impossible to do so. He stated that it was easy to write bad code. In your first example you reuse his code, then suggest changes to it to make it good. That is not his point. His point was that it is easy to write bad code and think it to be good. Sure you can always fix bad code, if you know it is bad. You clearly missed that distinction he was making.

    2012.04.04. at 14:51 | Permalink
  3. @Björn Steinbrink:
    Thank you, fixed it.

    @Mal Haak:
    A lot of tools can falsely lead you to believe that you are doing fine, I’m stating that an ORM is no different in this regard, ie. it’s not easier to mess up with it, than with anything else.

    I did not suggest improving on the code. The original article says “If you’ve done PHP for more than 3 months, the above will give you shivers down your spine.”, which is is simply not true, because the given code piece depends on context, namely how $companies is assembled. I just provided an example for that.

    2012.04.04. at 19:43 | Permalink
  4. Arthur ice wrote:

    Good tools make it easy to the right thing and hard (but not impossible) to do the wrong thing.

    Bad tools make it easy to do the wrong thing or hard to do the right thing.

    Your post solves the problem and essentially says ‘see? you can do it the right way’ which is perfectly true. The problem though is that it simply reinforces his initial point. ORM tools would make it seem that the problem is solved correctly with the most straight forward of direct actions any imperative programmer would do. The results do not match these expectations.

    SQL at least is a hint that things do not work the old way, things are strange and new and different, that you should do things SQL’s way first and your language’s way second. Not that it is perfect, but it is slightly better then the ORM’s will-o-wisp lure in behavior.

    2012.04.04. at 20:19 | Permalink
  5. Justin wrote:

    I’m not sure the original author originally wrote “ORM is worse than SQL”. I’m pretty sure he wrote “Bad ORM is infinitely worse than bad SQL”. I’d have to agree with that statement.

    Good ORM isn’t worse than bad SQL, nor is it worse than SQL in general. It’s just an abstraction, and should be used to help the programmer with the task of managing complexity, because, hey, that’s what abstraction is for.

    Bad ORM is infinitely worse than bad SQL. It achieves none of its goals, but at the same time, creates obscufucation and costs resources on some level. Because ORM is ‘more’ than SQL by containing it and driving SQL, comparitively, bad ORM will have a worse impact than bad SQL because you have to lift the ORM costs and the costs from the bad SQL it makes.

    Nobody ever said good ORM was a bad thing, so I am unsure what you are positing by showing us good ORM. Glad we could all agree using a tool well is a good thing.

    Also, Hitler. Can we all go make cool things now?

    2012.04.05. at 00:56 | Permalink
  6. I will have to chime in and also say that it reads like a straw man argument. However, good blog and keep writing new posts!

    2012.04.10. at 10:33 | Permalink
  7. I’ve encountered several really bad ORM problems. I can say that they’re really much harder to deal with than when using SQL.

    2012.04.17. at 19:57 | Permalink

Post a Comment

Your email is never published nor shared. You can use [php][/php], [html][/html], etc. tags around your code blocks, and they will receive syntax highlighting.

Please solve this totally random captcha