Of course I have a backup!

Random blobs of wisdom about software development

What is Doctrine2 anyway

Monday, February 27, 2012

Doctrine2 is an ORM (Object relational mapper) library for PHP, and is one of the best things to happen to the PHP world, so far. The main point of an ORM is, to put it simply, to allow you to map PHP objects, to database tables. You can save and load your objects to/from the database, in a general way, that is, you don't need to write select/insert/update/delete SQL queries for them, because the ORM will take care of that for you.

Oh I see, I have already written my own ORM

Let me reassure you, everyone has. It's a stepping stone, just like how you learn to organize your code blocks into functions, put your functions into seperate include files, or group your functions logically into classes, and so on. After spending maybe one hour total at writing SQL for your object saving/loading logic, it becomes evident that you need a way to automate this, because it is tedious, boring, repetitive, and error-prone. The problem with that is, that it's really hard to get an ORM implemented right. It's alright to use/write your own, just realize that you will hit the limits sooner or later. For example, does your ORM support these features?

Relation, and collection handling

In other words "An article belongs to a category", or "The articles can have tags attached to them". These are the classic examples for a one-to-many, and a many-to-many relation. I want to be able to "walk on the associations", as in, if I have an $article, I should be able to call $article->getCategory() to get the Category it belongs to, or $article->getTags() to get the tags, and so on, if it has more relations.

Lazy, and eager loading support

I must be able to specify at runtime if I want both the Category, and it's Articles to be loaded together (eager), or only load the Articles when they are first accessed (lazy). When loading a relation, N+1 queries are not acceptable (eg. loading a Category with 1 SELECT, and then loading every Article that belongs to it with N seperate SELECTs).

Return the same object for the same id

If I load the same object with the same ID, I must get back the exact same instance of the class, and not two different ones, because I may have already made changes to it:

$id = 1;
$obj1 = $orm->load('Customer', $id);
$obj1->setName('Alex');
$obj2 = $orm->load('Customer', $id);

// does $obj2->getName() return Alex?

The reason this is important to implement, is because if you load the same entity twice, you won't be able to identify the changes, when the time comes to save the entities. This problem becomes more apparent, when you start working with relations, because loading a Category might have already loaded all of it's Articles, without you knowing.

Change tracking, and insert/delete ordering

If I load something from the DB, and then change a property on it, I want the ORM to pick up that change, without me telling it so. That means, I shouldn't have to call any kind of save(), on entities that have been loaded through the ORM. The ORM must be clever enough to figure out the order in which records must be inserted, across multiple tables, without causing foreign key errors. The same goes for removing entities.

Do I really need those? Won't those make my app slow?

In my opinion, these 4 things are mandatory for doing any serious work at all. If you don't think the same, it might be better to skip using an ORM, because you will probably feel that it hinders you more, than helps. You can always come back later, when you have changed your mind :).

There are two common misconceptions that I often encounter:

  • You cannot use SQL if you use an ORM

This is simply stupid. You can use whatever data access library you have used in the past, along with an ORM. It does not magicaly disable mysql_query (or whatever your DAL uses).

  • Using an ORM kills scalability

This is not true, at least not in this form. Using an ORM is indeed a tradeoff, between development time, and runtime performance. However, using one, does not automaticaly mean that your site will be slow, but if you use it incorrectly, you can indeed shoot yourself in the foot, just like with almost anything else in the world.

OK, let's say I'm still interested, show me code.

Continuing on using the Article - Category example, let's see those two classes. Your model classes are POPOs (plain old PHP object), with no external dependencies whatsoever. This is because Doctrine uses a pattern called data mapper, instead of active record. A seperate class (the EntityManager, more on it later) is responsible for keeping track of your objects, while your own classes have no knowledge of Doctrine whatsoever, and this is why I consider Doctrine2 the only, and truly usable PHP ORM. It does not impose any kind of restrictions or requirements on your classes, you can design them in any way you like.

article.php
namespace MyProject\Model;

class Article
{
    private $id;
    private $title;
    private $category;

    public function getId()
    {
        return $this->id;
    }

    public function setTitle($title)
    {
        $this->title = $title;
        return $this;
    }

    public function getTitle()
    {
        return $this->title;
    }

    public function setCategory(\D2\Model\Category $category = null)
    {
        $this->category = $category;
        return $this;
    }

    public function getCategory()
    {
        return $this->category;
    }
}
category.php
namespace MyProject\Model;

class Category
{
    private $id;
    private $name;
    private $articles;

    public function __construct()
    {
        $this->articles = new \Doctrine\Common\Collections\ArrayCollection();
    }

    public function getId()
    {
        return $this->id;
    }

    public function setName($name)
    {
        $this->name = $name;
        return $this;
    }

    public function getName()
    {
        return $this->name;
    }

    public function addArticle(\D2\Model\Article $articles)
    {
        $this->articles[] = $articles;
    }

    public function getArticles()
    {
        return $this->articles;
    }
}

You don't even need the getters/setters, because Doctrine uses reflection to manipulate your objects. Now that we have the two entities ready, we need to set up the mapping information. The mapping information is what describes how our objects relate to each other, what table they should be saved to, what properties should be saved (you don't have to save all of them), or how the relations should be loaded by default (lazy/eager).

You can describe your mappings in XML, YAML, or using annotations. In case you don't know, annotations are just a fancy way of writing metadata in code comments, that other tools (like Doctrine) can inspect, look here for an example on all three. I go with YAML because it's terse and easy to read:

MyProject.Model.Article.dcm.yml
MyProject\Model\Article:
    type: entity
    table: article
    id:
        id:
            type: integer
            generator:
                strategy: AUTO
    fields:
        title:
            type: string
    manyToOne:
        category:
            targetEntity: Category
MyProject.Model.Category.dcm.yml
MyProject\Model\Category:
    type: entity
    table: category
    id:
        id:
            type: integer
            generator:
                strategy: AUTO
    fields:
        name:
            type: string
    oneToMany:
        articles:
            targetEntity: Article
            mappedBy: category

It is pretty self-explanatory. The filename, and the first line must correspond to the FQCN (fully qualified class name) of the class that we are describing the mappings for. Next, we define which table we want to save to, what fields we want to save (we could also define what column to save into, but it defaults to the property name, which is fine for now), and the relations. We have defined a many-to-one relation, which means that a Category can belong to many Articles, and the reverse side for it, a one-to-many relations, which means that many Articles belong to one Category.

Defining the relations is what permits us to do stuff like $article->getCategory() , and the inverse of it, $category->getArticles() . It is not mandatory to always define both sides, you have to decide by your use cases. If there is no reason to access a Category by it's Article, then you can leave that relation out.

Usage

When you have your classes, and your mapping set up, you will have to use the EntityManager (will be called $em from now on), which is the main entry point into Doctrine2. You will have to ask for your objects (load your objects) through the $em, and when you create new objects, you will to tell $em to track them (save them).

I will skip the part about configuring the $em itself, because I want to concentrate on how the code that uses it actually looks like, but after you have decided to give it a try, you can look here for information on configuring it.

// Loading and saving an Article, with a specified id
$article = $em->find('MyProject\\Model\\Article', 3);

// Changing it's title
$article->setTitle('A feast for crows');

// Changing the the Category's that belongs to the article
$article->getCategory()->setName('Fantasy');

// Load a different Article, and mark a different article for deletion
$toDelete = $em->find('MyProject\\Model\\Article', 5);
$em->remove($toDelete);

/* Inserting a new Article, under a new Category.
   Notice that we need to call $em->persist() on newly created objects, BUT ONLY on newly created ones.
   If you load something with $em->find() it is automatically tracked.
*/

$newArticle = new MyProject\Model\Article();
$newCategory = new MyProject\Model\Category();
$newArticle->setTitle('Hunger Games');
$newCategory->setName('Thriller');
$newCategory->addArticle($newArticle);

$em->persist($newCategory);
$em->persist($newArticle);

/* Save the changes to the database. As long as you don't call this method, nothing will be written
   to the database, which is usually favourable, because if there is an exception, or fatal error,
   you won't have partial UPDATEs/DELETEs. Either everything goes in, or nothing. In this case this
   will issue 2 UPDATE, 1 DELETE, and 2 INSERT statements. You only call this once, at the end
   of the request.
*/
$em->flush();

What I see, is that you have replaced SQL with "mappings"

Indeed, that is true, but there is more to it, than that.

  • Writing mapping information, is nowhere near as error prone as writing SQL for all of your objects.
  • If you want to understand the relations between entities, instead of going through the code, you can read the mapping information.
  • Your SQL statements will be batched together into a single transaction. You don't have to worry about the order of inserting, or removing entities, the foreign keys will be taken care of.
  • You have decoupled your models from the underlying ORM. They don't depend on anything, you are free to design them in whatever way you want.

You can generate skeleton classes from your mapping information, with this command:

php doctrine.php orm:generate-entities

This will generate the exact same Article, and Category class as what I've written above. You also have the option of generating your whole database from the mappings:

php doctrine.php orm:schema-tool:create

This will create all your database tables, with foreign keys set up. Yes, this will generate actual SQL statements. It can even diff your database, against the mapping. If you add some new fields to your Article, you can run:

php doctrine.php orm:schema-tool:update --dump-sql

And it will show you what SQL commands it would execute, to bring your DB up to sync with your mappings. This is a godsend, when you have to deploy to a different server (eg. production), because instead of keeping track of SQL diffs, you can just use the command above to do everything for you. (you can replace --dump-sql with --force, to execute the commands, instead of displaying them). You can subscribe to lifecycle events, on your models. For example, you can have a piece of code executed, each time one of your Articles gets saved/loaded. This means you could write a generalized "seo-friendly-title-creator", for your Articles, that generates a seo friendly link for the Article that just got saved.

There are much more features, but this is what Doctrine2, and ORMs are generally about. They help you manage your objects, by sparing you from writing repititve SQL. I know that this article wasn't much of a help, to actually get you started with it, and that's alright, because it was meant to get your attention, and not teach the specifics. This great library exists, and it can solve many of your problems. Use it. I will write more about it in the future, because I really think that it is one of the best things in the PHP world at the moment (with Packagist, and Composer following closely).

Khayrattee Wasseem wrote
Hi Norbert,

you have written a really nice article in explaining the "what and why" to use Doctrine2. I agree that a data mapper is far better than active record.

One thing, the time to structure the yaml or xml for all the tables that exists and setting up all those class schemas/blue-print, don't you think it might take up about same time or even more as compared to writing direct SQL statements with PDO? Would appreciate your feedback..

//Wasseem

2012-02-27 10:29:42

Norbert Kéri wrote
In my experience, no, development time is greatly reduced by using Doctrine. Setting up the classes must be done either way, so you won't spare any time on that. If you use handwritten SQL, you need to write at least 4 statements, one for each of load/insert/update/delete. Each time you add a new field, or remove an old one, you have to modify two of these (the insert, and the update one), and the also the select one, unless you use SELECT *, in your statements. Also, depending on how you handle relations (probably INNER JOINs), you will have to update the other SELECT queries, that join this entity in. After that, you have to update your DB schema, and take care to track the DB change somehow, because you will need to deploy it to production, and must share it with everyone else who works on the project.

With Doctrine, you update the mappings, it will generate the SQL you need to update your tables, and you can already forget about that, because everyone else will just have to run the schema-update command, to have their DB updated. You don't have to worry about the different types of queries, because they are generated by Doctrine, from the mapping information. As another added bonus, you get a orm:info command, which will tell you if your mappings are correct (syntactially), and if not, tell you which one has a problem. Can you validate all your handwritten SQL preemptively? I don't think so, that will be detected at runtime.

2012-02-27 14:10:27

Jacob Greenleaf wrote
There is a small grammatical error, where the article reads "[..] you won’t be able to identify the changes, when you the time comes to save the entities," it should read "[...] you won't be able to identify the changes, when the time comes to save the entities."

2012-02-27 14:15:41

Norbert Kéri wrote
Fixed, thanks.

2012-02-27 14:34:18

Jeebus wrote
One day the dark ages of Object will end and everyone will finally realize how stupid OOP was.

2013-06-19 04:05:17

Post a comment

Providing your email is optional, it is never published or shared, it is only used for auto approval purposes. If you already have at least 1 approved comment(s) tied to your email, you don't have to wait for moderation, otherwise the author must approve your comment.

Please solve this totally random captcha