What can an extension do?

Or, what advantages are there to writing an extension? An extension is basically a way to modify the browser's behavior, without diving into the browser's code. You can add your own buttons, toolbars, menus, interact with the browser's features, like querying open tabs/windows, bookmarks, history, modifing the DOM of any page being loaded, or do cross domain AJAX requests without the usual Access-Control-* headers. If you need any of these featuers, writing an extension is the way to go.

Obviously there are some disadvantages, and the biggest one is that you will have to learn the vendor-specific APIs for the browsers that you target. If you are familiar with the pain of writing cross-browser compatible HTML/CSS/JS, you will be fine, as these are actually documented well. You will have to be prepared on supporting users on how to install and troubleshoot the extension, as it's much harder to track down problems, because tere are no server logs to look at. We had a user who was kind enough to spend an hour on the phone while we helped him debug the issue, and it turned out he had Firefox 3. It didn't even occur to us to actually put some version checks in place, because we thought all browsers are autoupdating, we don't have to deal with that.

Browsers also provide an auto-update mechanism for extensions, so you can seamlessly update your extension, but this also carries a risk in itself. You must specify a URL that the browser can ping at regular intervals for version information, and update the extension if there is an update. The specified auto-update URL must stay alive, even if you choose to migrate to a new domain, unless you ask your users to redownload the extension again.

What skills are involved when writing an extension?

It's similar to writing a simple web page, any web developer worth its salt should be able to do it, really. The extension's UI is composed of HTML and CSS, and the logic behind it is driven with Javascript. The JS is the exact same JS that you are familiar with, only with a few more APIs (functions, and objects) that you can call. You will also need patience, and the ability to read documentation, and experiment through trial and error. You don't need to write cross-browser compatible code (since you are already targeting a specific browser with the extension), but you will have to know what features the browser that you target supports.

Background scripts, and content scripts

There is one concept that you have to grasp, and it will clear up as you actually start writing code. An extension is separated into a background script, and a number of (possibly zero) content scripts. The background script is executed only once during the lifetime of the browser, either when the browser starts, or when the extension is installed, and it is tied to the browser instance, so regardless of how many windows or tabs the browser has open, it is still executed only once. Content scripts on the other hand are executed in the context of a webpage, so they get executed on every page load, and they get their own variable namespace. You can employ a blacklist/whitelist for content scripts, if you want to modify only specific pages, instead of modifying any page that gets loaded into the browser.

Communication between background scripts, content scripts, and the web page

The content and background scripts cannot access a background script's variables, and vice-versa, at least not directly, but there is a way to pass data between them. They also cannot interact with the variables in the page itself, eg. if the page that your extension is loaded in, provides a function called hello(), you will not be able to call hello() in your content script, because they are in separate, isolated namespaces. Don't worry, there is way to pass data between the content script, and the page itself, through the HTML5 messaging API (the wiki page is very detailed, don't worry about it, I'll show an example as we go along), so you can implement a two-way messaging system, between all three players (background script, content script, the loaded page).

There are various restrictions on what a background script, and a content script can do. For example, a background script can listen to tabchange events, while a content script not (since they are tied to a page), but a content script can modify a loaded page's DOM, while a background script can't.

My use case

For my current project, we are developing an extension, that allows the user to translate any word on the current webpage that you are visiting. If you have the extension installed, and turned on, then any word that you double click on, will have its definitions identified, and the word translated in a simple dialog. You can try out a demo, if you want to (you don't have to install anything, it's a demo).

To paraphrase it: the extension adds a button to the browser's toolbar, that has 2 states (enabled, or disabled), and when you visit a webpage with the button being turned on, we will wait for DOMReady, and inject some javascript into the current page, that listens for doubleclicks. If a doubleclick happens, we send an AJAX request to our server, which sends back an HTML response with the translations and definitions, that we embed into the current page.

We initially started with a simple bookmarklet (like Pinterest offers, scroll down a bit) that you could drag to your favorites bar, but that meant that for every page the user visits, he has to click on the bookmarklet. We wanted it to load automatically on every webpage (as long as it's turned on), and for that we needed to write an extension.

I'm going to walk through an example, on how to develop an extension, that is very similar to the above, in the coming posts.