Unless you’ve been hiding somewhere, you’ll already be familiar with HTML5. The latest version of the specification adds support for a host of technologies that allow designers and developers to create more intuitive and user-focused websites.
It wasn’t until the arrival of the (almost) ubiquitous Flash player plug-ins support for audio and video streams that it became possible to deliver rich content reliably over the Internet. The arrival of services such as YouTube, Vimeo and now SoundCloud have spearheaded the adoption and demand for rich media, democratising the content-creation and sharing process, opening up a new world of creativity that didn’t exist ten years ago.
But the reliance on Flash as the delivery mechanism for video and audio wasn’t all roses; the plug-in has historically had stability and security issues, not to mention the strain it puts on computer hardware – especially pertinent when dealing with lower-powered, mobile devices such as smartphones.
When Apple famously announced that it wouldn’t ever support the Flash platform on its iOS-powered devices, an existing movement towards a more open, standards-based approach to rich media delivery was cemented: native browser-rendered rich media content was almost guaranteed to happen. The W3C were already working on a specification to enable this, and the earliest iterations of iPhone OS leveraged this specification to deliver video to the device natively – taking advantage of hardware-decoding to present video smoothly without plug-ins.
Fast-forward to the present day, and all the major browser vendors now support native video and audio using HTML5. YouTube, Vimeo and SoundCloud have all adopted HTML5 as a delivery mechanism, proving that the web is ready for HTML5-delivered media. In the following pages we’ll explain everything you need to know to make the most of HTML5’s media capabilities, and start using it today – all while having the fallback of Flash available should you ever need it!
What HTML5 means to video and audio
Practically every web-delivered video you’ll have seen prior to HTML5 will have been delivered using a third-party plug-in. In the main, this has likely meant Adobe Flash, although you may have also come across Silverlight videos, Quicktime or RealPlayer content along the way.
All of these plug-ins share one thing in common: they sit on top of the web browser, providing additional functionality not found natively in the browser itself. They act as an intermediary; converting video and audio into something that can be rendered on your screen or via your speakers.
Plug-ins can cause headaches
This system did work, but the lack of a common, core standard for encoding and embedding media in a web page meant that different websites used different plug-ins, media-encoding and delivery methods. The result was that you might have had up to half a dozen different plug-ins installed just to allow you access to this rich content. Furthermore, if you arrived at a site that used a system you didn’t already have installed, you’d have to go through the process of installing a new plug-in before you could view the video or listen to the audio.
The HTML5 approach
HTML5 addresses all of these problems by defining a standard way of embedding both video and audio in a webpage. Instead of using plug-ins, you can now embed media directly using a simple set of HTML tags to define what kind of media you’d like to add, along with a link (or set of links) to the media itself.
Unfortunately it’s not as simple as swapping HTML in place of Flash or Silverlight; as with all the other exciting features of HTML5, we have to temper our enthusiasm with the knowledge that a significant proportion of users are still using browsers that don’t support modern web standards. The most obvious and oft-cited example is Internet Explorer, but it’s certainly not the only culprit!
HTML first, plug-ins for fallback
While we’re waiting for many of these older browsers to be upgraded, we still need to provide a fallback or polyfill to allow access to our media even for out-of-date software. As such, we can’t yet free ourselves of plug-ins entirely – but we can take an HTML-first approach. This means we’ll begin using plug-ins as a failsafe rather than rendering using plug-ins first, with HTML being the fallback.
Formats and encoding
Sadly, although you may now have bought into the utopian world of HTML5 video and audio, the reality of disparate formats and encoding will bring you back down to earth. While there’s now a single, common standard that defines how you embed video and audio in a webpage, there are a number of different formats and codecs that can be used to package up your video and audio. Regrettably there isn’t one particular combination of container format and codec that will work across every browser (although H.264 now comes very close with Firefox’s announcement that they would be supporting the codec).
What is a container?
When you’re looking at video formats, you need to understand that there are two different elements that make video files. The first is the container.Think of the container as being like a box into which different elements that make up the video are placed. These elements include the video itself, the audio soundtrack, plus metadata, poster artwork and so on.
You will already be familiar with several different container formats: the most common include AVI, MP4, FLV and WebM. The important thing to understand is that the container itself doesn’t determine the encoding of the video, which you’ll see is incredibly important.
The problem with encoding format compatibility stems from patents and licensing, which place restrictions on some formats. There are three principal encoding types relevant to the web: H.264, Theora and VP8. Each of these three tend to be delivered inside a particular container type (MP4 for H.264, OGG for Theora and WebM for VP8), although it’s not always the case.
H.264 compresses video with a lossy algorithm, with the aim of providing a high-quality, low-bandwidth file. H.264 doesn’t have a single algorithm, but defines different profiles that offer a different levels of compression and features, traded off against file size and complexity of decoding. This allows low-power mobile devices to access “baseline” encoded content, while offering a higher quality option for desktops that can handle decoding the high profile.
Many mobile devices, Blu-Ray players and set-top boxes have dedicated H.264 chips that decode the video stream in hardware – but as H.264 is subject to patents, its adoption hasn’t been universal. iOS, Android, Safari, Chrome and Internet Explorer all offer out-the-box support for H.264. In February, Firefox announced it would be supporting H.264 natively too.Theora is a patent-free codec it was originally based on a patented codec, which has since been licensed royalty-free). It is used natively on Linux systems and Firefox supports it out-the-box.
VP8 comes from the same team that developed th
e original codec upon which Theora was based, and was open-sourced by Google following its acquisition of the company On2. It offers the same quality as H.264 with less decoding complexity, and is supported natively in Chrome, Android and Opera.
Assuming you’re not overly concerned with Opera on the desktop, you can encode your video into two formats to cover all bases: H.264/MP4 and VP8/WebM
Just like video, audio has its own particular encoding formats and, as you might have come to expect by now, different browsers support different codecs. This is important to understand, whether you’re producing stand-alone audio or video with a soundtrack – both use the same set of codecs, and both suffer from the same browser limitations. However, there are three principal audio codecs that work on the web:
> MP3 (MPEG-1 Audio Layer 3)
> AAC (Advanced Audio Coding)
MP3 is the most prevalent stand-alone, and accounts for much of the way music is transmitted and sold on the web – although it is subject to patents. AAC is also widely used, being the format Apple chose for distribution of music through iTunes. However, AAC is also subject to patents. Predictably enough, the third option – Vorbis – doesn’t have any patents that apply to it, meaning it can be used freely by anyone. Browser support largely follows the same pattern as for the video codecs.
Targeting mobile devices
As we’ve seen, one of the key drivers behind the adoption of HTML5 video was the arrival of rich-web consumption devices such as smartphones and tablets. Broadly speaking, if you want to target iPhones, Android phones and tablets (including the iPad), the safest option is to encode using H.264. There are a vast majority of phones and tablets on the market today that have hardware decoding support for H.264, allowing the device to hand decoding over to its dedicated chip without adversely affecting its battery life or performance.
Designing custom interfaces
The trend seems to be that video and audio on small-screen devices such as the iPhone can be rather interruptive. This means that the user has to effectively leave the browser while viewing video or audio content. From there, the standard OS controls take over from your page.
On some Android phones – and across more larger-screened devices such as the wide range of tablets available – your media content plays in position on the page, with absolutely no interruption taking place at all.
With the API exposed as part of the HTML5 specification, you can construct a custom user interface and interactions, including your own skin. This means you can match it to suit the look and feel of your page, and is effectively replaces the default browser controls with whatever you choose.