Streamalot: Streaming
Media Tips, Tricks, and Hints
This Month's Top Question:
How do I choose a video capture card?
Choosing a Video Capture Card
By Tim Siglin
(This article first appeared in the 2007 Streaming Media Industry)
Just when it seems safe to make a video capture card purchase, something new seems to come along. What’s a content creator to do, and what criteria should one assess to make the best purchase? And is a capture card even needed now that computers are more powerful? This article will provide several options to consider.
To properly assess content capture, we’re going to break down the field into three areas: live capture and transmission; asynchronous capture and delivery (live capture and encoding for later playback); and edited on-demand content—in other words, content that may be created in pieces, edited together, and then output for consumption.
Live Capture and Transmission
This area is the most misunderstood and also the one with the least amount of innovation, at least in recent years. Perhaps the lack of innovation is due to financial difficulties facing particular industry leaders or the simple fact that we’re in a transition from standard-definition content to high-definition content as the primary medium. One thing is certain, though: Live streaming has never been easier or more cost-effective to do.
To properly assess live streaming capture cards, one must consider three areas: the types of inputs—both audio and video—that will be captured; the format or formats and bit rate or bit rates that must be generated; and the decision regarding whether all content will generate from a single server or be mirrored on multiple servers. Let’s look at each of these three areas.
Inputs
GIGO (Garbage In, Garbage Out). Even with a high-quality camera as the acquisition device, too many content creators settle for the use of composite inputs and substandard cabling for use in their streaming media projects. The rule of thumb for analog capture is that every 3 decibels of noise (snow, artifacts, etc) results in a doubling of the bit rate required to achieve equivalent quality. The converse is also true: eliminate noise or artifacting in the acquisition device, and the bit rate will fall while the compressed content quality remains high.
Consider using a camera with a 3-CCD (or newer CMOS) capture chipset, and then use either S-video or component video outputs to connect the camera to the streaming media card’s input. Better yet, with the advent of digital video cameras—especially those that use USB 2 or FireWire/i.Link connectors—consider connecting directly to the capture card or the computer if it contains internal FireWire connectors in order to maintain a direct digital path from the camera to the streaming software. On really high-end cameras, the option to use SDI (serial digital interface) should also be considered, as this is offered on several streaming capture cards. A note of clarification, though: with the exception of very new cameras that use the H.264 codec, any content captured with a digital camera will still be re-encoded with a streaming codec. The reason for this re-encoding is strictly a matter of bandwidth: most digital video cameras capture at 25 megabits per second, or almost 50 times the average sustained bandwidth of a consumer’s cable modem or DSL.
On the audio side, if possible, don’t use the on-camera microphone for audio capture. Doing so adds additional extraneous audio noise that is very difficult for the audio codec to translate, and extremely distracting to consumers when they listen through low-end desktop or laptop speakers. Instead, place an external microphone as close to the subject of your video, and use wireless microphones if necessary.
Inputs on streaming audio boards and analog-to-USB audio capture devices typically fall into two categories: balanced and unbalanced. Unbalanced connectors are typically RCA connectors or the noise-prone 1/8” (3.5mm) stereo jack—the same types of connectors used on VCRs or headphones, respectively. Balanced connections, on the other hand, are typically 3-pin XLR connectors, the type of connector used on a professional microphone. The XLR connector is normally attached to a streaming media capture card by way of a breakout cable, as the connector itself is too large to fit on a standard PCI card. Some cards will also generate enough power via the XLR connector to power an external microphone through a process known as phantom powering.
Formats and Bit Rates
When the proper connectors have been determined, the next step in live capture is to determine the codec and bit rate of the content that will be streamed. The use of multiple codecs and bit rates used to require a 1:1 ratio of inputs to capture cards, along with a significant amount of external gear. Fortunately, companies such as Viewcast have created software solutions like SimulStream, which allows multiple bit rates or even different codecs to be simultaneously captured and streamed from a single video and audio input. At the time of this writing, the most popular live codecs were WindowsMedia, QuickTime and Real, but the recent advent of a live streaming SDK for the On2 VP6 codec (better know as Flash Video 8) will probably propel that format to one of the top three codecs required for live streaming, especially given the fact that the installed base of Flash players far exceeds the number of installed Real Players.
Delivery
Once the inputs, codecs, and bit rates have been determined, the last step in the live streaming scenario is to choose whether to deliver directly from the streaming capture device or to offload delivery to other servers that have more robust streaming and bandwidth capacity. This decision is typically based on three key criteria: number of simultaneous users, number of chosen codecs, and processor power/bandwidth available at the point of capture.
Let’s use a scenario to illustrate this decision. John has been asked to shoot a wedding at a local house of worship. He’s chosen to use an all-in-one capture and streaming device like NewTek’s TriCaster PRO, as he knows that he will be using three cameras and graphics (such as the names of the different members of the wedding party) and will need to stream the show live to four relatives who were unable to attend. John checks with the church and is told that they have a cable modem with a 2 megabit per second upload speed, and he confirms that each potential guest has broadband on which they’ll watch the ceremony.
Based on that information, John calculates that he can safely send a 400 kilobit per second stream to 4 viewers, if each one is watching the same video format (in the instance of the TriCaster Pro, it would limited to Windows Media). The calculation is 400Kbps x 4 = 1600Kbps, or just over 1.6 megabits per second. This provides adequate bandwidth overhead to compensate for a drop in upload speed, an occasional church worker sending out small emails, or any other programs that might be running on one of the church’s machines. If John finds out that one of the guests requires Real or QuickTime to view the wedding, though, he will have to use one of the four potential streams for that format and may need to hire space on a streaming server for a block of time during the wedding to accommodate the use of a different format or additional viewers.
Other robust live capture and encoding cards include Digital Rapids’ StreamZ line and ViewCast’s Niagara GoStream.
Asynchronous Capture and Delivery
Live capture for later delivery is quite similar to live capture, but with the marked difference that content can be captured at a higher overall bit rate and then compressed after the fact into a smaller bit rate. In fact, if one is using a digital video camera to capture the event, such as one that uses MiniDV or the newer high-definition HDV format, content can be captured directly in the camera as video data and then transferred directly to a computer via USB or FireWire to be further compressed for web delivery. This process is known as transcoding if the content will be converted from one format to another, or transrating if only the bit rate will be changed while the codec remains the same.
For those without access to a digital camera, it is highly recommended to use a capture card directly, rather than recording to analog tape and then transferring to the computer. The reasoning goes back to initial comments in this article: analog tape introduces a significant amount of video noise (and some audio noise) in the recording process, which is eliminated if the computer is used to capture a first-generation digital file.
For those who might also provide nonlinear video editing services, the great news is that a video capture card for your NLE can serve double duty as your streaming video capture card. In fact, many cards on the market for NLE work have the ability to stream live, using computer drivers that come with the card but aren’t utilized by the NLE system. Companies such as Aja make robust breakout boxes that have all the connection types noted above (S-video, component, SDI, balanced audio, unbalanced audio) and can be mounted several yards away from a laptop or small desktop if space is at a premium when it comes to capturing for transcoding.
The major difference between this type of capture and edited on-demand content, which we’ll cover later, is that it’s assumed this capture is fully self-contained and requires no additional editing or graphics. In the scenario above, where John was asked to shoot a wedding video at his local house of worship, this type of capture might be used if the church has no limited DSL or cable Internet service or too many guests who wish to view the ceremony, without enough time to plan for additional streaming capacity. In this instance, as the remaining guests would like to see the video immediately after the ceremony ends, John would set the TriCaster PRO to both capture an archival (25Mbps) version as well as a streaming version (300-400Kbps) and then upload the smaller streaming file as soon as he is finished shooting, saving the archival version for inclusion on a DVD or for future editing.
Edited On-Demand Content
This type of streaming content probably covers the majority of content that is streamed. From movie trailers to some YouTube videos, a significant amount of web video is still pre-produced, piecing together different clips of video, adding music and graphics and perhaps a bit of 3D work and then rendering the content into a single digital media file.
The process for dealing with this type of content is quite a bit different from the previous two examples, primarily due to the number of different still-image, graphic, audio, and video formats that must be combined together into the typical nonlinear editing timeline. Fortunately, products such as Adobe Premiere Pro, Sony Vegas, Apple Final Cut Pro, Grass Valley EDIUS, Avid Liquid, and higher-end offerings from Avid and others provide software-only or hardware-assisted editing of myriad content types and formats.
For those who have pre-recorded analog video content captured on tape, a nonlinear video editing capture card will be required, as noted in the section above. Several companies offer low-cost S-video-to-USB converters that aren’t powerful enough to allow for direct encoding to streaming formats but are more than adequate for raw capture to add to the nonlinear editing system’s timeline.
Once all content is edited and ready for output, the majority of the nonlinear editing systems are adequate for output to high-end formats such as DVD MPEG-2 content as well as streaming formats. Apple Compressor, an add-on to Final Cut Pro software, as well as Adobe Media Encoder, which comes bundled with the Adobe Creative Suite, provide dozens of formats and hundreds of settings for customized content delivery formats.
For those who might need an additional tool to do the heavy lifting of converting files from one format to many formats, or to encode complex scenes that are causing the output file to choke on high-bandwidth data spikes, several other tools do the trick, including Sorenson Squeeze, Grass Valley ProCoder, and Autodesk Cleaner. All of these offer three features that make life easier for the content creator: profiles that can be saved and applied to multiple video files; batch processing which allows tens or hundreds of files to be transcoded with no need to manually start encoding each file; and automated watch folders, which automatically begin to apply encoding profiles to any file that appears in a particular folder. Additional tools from Digital Rapids and Inlet, noted above, perform similar functions to Cleaner and contain additional tools to tweak particular parts of a file—rather than having to completely re-encode an entire file—plus include the added benefit of rapid transcoding thanks to hardware accelerators sold by both companies.
Conclusion
In conclusion, the industry continues its shift in two directions. On the standard-definition front, the move is away from add-on video cards and toward software-only solutions that use internal FireWire or USB ports or small outboard capture devices and a power central processing unit (CPU) to perform streaming encoding and delivery chores. While these software-based solutions perform admirably in many circumstances, the pre-processing of video and audio signals provided on capture cards made by NewTek, Digital Rapids, Inlet, ViewCast, and others will ensure a market for those products for at least the next year.
On the high-definition front, software-only solutions are adequate for encoding on-demand content, but will not suffice for capturing high-definition content live and simultaneously streaming it. Even complex hardware solutions are just approaching the ability to transcode content into other high-definition formats or down to standard-definition formats, a trend that will continue throughout 2007 as a significant amount of content begins to be captured in high-definition formats. Software-based high-definition capture is possible for previously compressed video formats such as HDV, which use the same 25Mbps capture rate as their standard-definition MiniDV predecessor (or, for 720-line acquisition, as low as 19Mbps), but this content must be cached first on the capture computer and then transcoded—typically over a time of 2 to 3 times the length of the initial video and at 90-95% CPU load—before being available in bandwidths low enough to be streamed on the internet.