Edit 2012.04.15.
I have kinda went off the point with this article. I felt that the original article generalized too much, by saying things like If you've done PHP for more than 3 months, the above will give you shivers down your spine. This post is more about how to use an ORM correctly, and what to look out for, to avoid the problem that the original author mentions (which is very easy to avoid). My opinion still stands, an ORM is just a tool like everything else, that you can use incorrectly, and that will lead to trouble. I don't believe it is harder to fix problems that arise from using an ORM incorrectly, than fixing badly performing SQL. So, if you are interested, read on.
I will be using Doctrine2 as the ORM for my examples. I have the whole Composer to pull in the dependencies, see the README on how to bootstrap, and run the application.
So, let's take a look at the first code piece
// First, get all the companies in your database $companies = $this->getAllCompanies(); $totalValue = 0; foreach ($companies as $company) { // For each company there is, retrieve the total value of all their orders $orders = $company->getOrders(); foreach ($orders as $order) { $totalValue += (float) $order->getValue(); } echo "This company made us ". $totalValue ." euro already."; }
If you've done PHP for more than 3 months, the above will give you shivers down your spine. It looks like perfectly valid code, it even makes sense to read it. But in reality, it probably does something like this. [...]
This is a classical example of the N+1 problem. I'll quote one of my earlier articles (which is incidentally about optimizing a PHP site, which uses an ORM):
In case you are not familiar with the expression, the N+1 problem is when you have an entity with a to-many, relation to another entity, and you execute 1 query to get the root entity, and execute N queries to get all the related entities (the many side), resulting in N+1 queries. I don’t have to say how inefficient is this, and I have a feeling this is why many projects blame the ORM for not scaling. You can avoid this, if you correctly use your ORM.
The above PHP code is perfectly fine, it's just a matter of the Company entities in the $companies array were fetched. If they were fetched lazily, then you have a problem. If they have been fetched eagerly, then you are golden. So how do you specify the fetch method? There are several ways, I'm going to show one solution, which involves using DQL (Doctrine Query Language), which is similar to SQL, but instead of referencing tables and columns, you refer to entities, and their properties. The difference is subtle, and if you know SQL, you can grasp DQL in 5 minutes.
Eager load
$dql = "SELECT c, o FROM ORMDemo\\Company c JOIN c.orders o"; $query = $em->createQuery($dql); return $query->getResult();
Lazy load
$dql = "SELECT c FROM ORMDemo\\Company c JOIN c.orders o"; $query = $em->createQuery($dql); return $query->getResult();
Notice the difference? It's the missing "o" from the SELECT clause. Entities that you refer to in the SELECT clause will get eager loaded, everything else will be loaded on the first access (lazy). In case you are wondering, the first DQL example will be translated to the following SQL:
SELECT c0_.id AS id0, c0_.name AS name1, o1_.id AS id2, o1_.value AS value3, o1_.company_id AS company_id4 FROM companies c0_ INNER JOIN orders o1_ ON c0_.id = o1_.company_id
This a pretty perfect query to use in this case. Obviously the column names are horrible, but you will never see them, because your application will get the hydrated models. Kinda sucks when debugging, agreed.
My problem with ORM is that it's too easy to write bad code. It's very easy to use all the default mappings and just do "SELECT *" in the background.
Ever nuked a partition by swapping the parameters for dd by mistake, when all you tried to do was write your [favorite linux distro] image to your USB drive? Wanted to empty a folder, and inadvertently pressed space, and executed rm -rf ./folder/ *? Because these types of errors are also easy to make, yet everyone still uses rm -rf, and wikis and tutorials still recommend using dd for making a bootable pendrive with your favorite distro on it. The destruction that these commands can bring is also much greater.
Just like with every tool out there, you can shoot yourself in the foot with an ORM, yes, but you can take precautions. Make a query logger, that counts the number of queries executed, and if that number gets too high, pop up in image of Clippy, telling the developer that "It seems you executed 94 queries, would you like help?". Make it output this number to the lower right corner, so it can be always seen. If you have QA, ask them to report any page they encounter, where this number is greater than 20. Even if you mess up, fixing it is just as easy, because you just have to modify the DQL query to include the relations in the SELECT clause, and you are done.
It's also just as equally easy to blow yourself up with SQL, because then you will have to worry about SQL injection, and those are harder to track down. Also what's the worst that can happen if your app goes out to production with shitty queries by the ORM? It will crash under the load, and will stay down until you fix the queries.
What happens if your app goes out with SQL injection holes? It can get hacked, which can possibly lead to your database getting filled with malicious data, like backdoor users, or simply getting stolen (revealing that you store your passwords in plain text, nowadays). This is much worse than your app crashing under the load, yet, you don't see people advocating that You should stop using SQL, and store your data in CSV. No, people advocate teaching how to correctly escape your statements. THIS IS THE SAME THING. Learn how to use your ORM, and you won't have basic problems like this.
And don't quote me saying that Norbert told me that I should use an ORM, so I don't have to worry about SQL injection, because that's not what I said, and I will hunt you down.
Every ORM-system gives you the ability to write custom SQL queries, but that sort of defies the point of ORM in general, hence hardly anyone does it.
Saying "hardly anyone does it" is pulling stuff out of the air, and it does not defy the point of an ORM. If you have some pages that require a complex query that cannot be expressed in the language of the ORM, or too cumbersome to express, or it get's translated to an inefficient SQL query, then use native SQL, and work with the result set arrays. It's all right. You can still enjoy the shitload amount of other features of the ORM like:
- Lifting the boring, repetitive, error-prone CRUD work from your shoulders
- Generating your database schema for you. Updating it, as the mapping files change
- The previous point makes database migrations, and deployment easy. YOU DON'T HAVE TO STORE DB DIFFS.
- Letting you listen to events on your models, like when they get loaded/saved/deleted. Cascade delete is not the same, because this is much more powerful
- Automatic transaction handling (And I can still have explicit transactions, if I want)
- Generating skeleton classes from your mapping files
As more and more developers use only ORM to create applications, they lose their touch with the database interaction, the queries behind it, the reasoning of why to use a certain kind of query, the performance impact of an INNER or OUTER joins, ... The example above, causing many small queries, could also be replaced with one efficient query to give you the same result.
SELECT SUM(o.VALUE) AS TotalValue, c.name AS CompanyName FROM company AS c LEFT JOIN "order" AS o ON o.companyid = c.companyid GROUP BY o.companyid;
Instead of lose their touch, I'd say these people simply don't care, and that's not the fault of the ORM. You don't start to forget SQL because you started using an ORM. You still need the same knowledge. The same query can be easily expressed with DQL by the way:
$dql = "SELECT c, SUM(o.value) FROM ORMDemo\\Company c JOIN c.orders o GROUP BY c.id, c.name"; $query = $em->createQuery($dql); return $query->getResult();
Which is going to produce the same SQL:
SELECT c0_.id AS id0, c0_.name AS name1, SUM(o1_.value) AS sclr2 FROM companies c0_ INNER JOIN orders o1_ ON c0_.id = o1_.company_id GROUP BY c0_.id, c0_.name
While the ORM example looks perfectly valid and the performance bottleneck may not immediately be clear.
They never are. You have to profile your code to see where the bottlenecks are. Using an ORM will not change this.
And that just may be my biggest frustration with ORM: as a developer, you lose focus on the underlying SQL queries. Some may claim that it's not important, that it is exactly the reason why ORM is gaining so much attention. But if you're serious about tuning your application, you need knowledge of the SQL that is being performed. You need to know how to write efficient queries with complex JOIN's, GROUP BY's and aggregate functions such as SUM(), AVG(), ...
Well, don't lose focus. The third google hit for "SQL logger Doctrine2" is the official documentation, showing you how to register a query logger. It even gives you pointers on how to implement your own. You also have the option of calling $query->getSQL(), if you want to see individual queries. The query logger is mentioned in the Configuration section of the documentation, which will probably be read by everyone, but even if someone misses it, the first few Google hits will set them on the right track.
ORMs are not gaining attention (they have been around for a long time anyway), because they make, or try to make SQL unimportant. This is stupid, Hey, let's learn a library specific SQL-like dialect, so we don't have to use SQL.. You still have a need that SQL knowledge, to express the complex queries in the ORM's language. If that is not possible, write a custom query for it, that the ORM can use to hydrate your objects. If that is also not possible, use native SQL, and map the results sets to your objects by hand. If even that is not possible/cumbersome/not worth it, then use the arrays that came back from the query.
Yes. You don't write stupid code if you master your tools, and that includes ORM frameworks as well. I'm sure there are people out there that write perfectly performant ORM-related code, it seems I have just yet to find them.
You don't have to master it. It certainly helps, but if you spend an hour or two flipping through the docs, you can start using it just fine. I'm not saying that an ORM can never be a bottleneck, if you use it right, but, in my opinion, it usually is not. What I really want to get out of this article, is that an ORM is not an automatic performance killer, and it's not easier to shoot yourself in the foot with it, than with SQL injections. Spend some time to learn it, because it can become a fucking awesome tool that can considerably speed up development.