HDSLR Guide Chapter 16: Video Basics
The amount of equipment and electrical power required to match a powerful studio strobe with a continuous-output movie light is prodigious.
We’re helped in this regard by the exceptional sensitivity of cameras such as the 5D Mark II and 1Ds Mark IV, which have vanishingly low noise even at comparatively high ISO ratings and, therefore, high gain. Even the smaller 7D is acceptable up to a rating of at least several hundred, which compares favorably to current 35mm film stocks in the ISO 500 range.
The problem, then, is not so much one of absolute amount of light, which is good, because it frees the filmmaker up to consider the type and quality of light. Because DSLRs react badly to overexposure, lighting setups can be more complex than film, with the need to achieve a more even field and flag off areas of overexposure on a case-by-case basis. One of the most useful items is a set of flags, perhaps something like Matthew’s Road Rags kit, which contains both solid flags and a set of nets to either block or diffuse and diminish light, respectively.
In many cases this will actually be more useful than a huge amount of lighting, especially if there's a need to shoot interviews of people outside in full sun, which would otherwise be harsh.
Of course, lighting a big dramatic scene or enormous conference hall remains a particular challenge regardless of the choice of imaging device. If in doubt, hire and consult with a good gaffer. For more information, please read this article about HDSLR video lighting.
While most stills photographers will be used to thinking of photographs as part of a sequence, as something that doesn’t necessarily stand alone, there are special concerns for moving-image work.
Here are some basic checkpoints:
Think sequence. While stills can tell a story, motion picture work does it much more explicitly. Consider what the shot will look like when cut directly into another, and consider shot-listing. Many shoots involve two people talking to one another, which will involve taking (at least) one shot of each of them—shots that will be very nearly back to back. Ensure the lighting looks consistent, even if it isn’t: watch any of the CSI shows for examples of lighting that looks good superficially, but when critically examined is logically inconsistent.
When shooting those two people, use similar lenses and be at a similar angle to them; large disparities in either of these can be jarring when cutting from one to another.
Before getting clever, master the standard techniques: an establishing wide shot, over-the-shoulder cutaways of the two talking protagonists, and inserts as required to show whatever else might be needed. The closer we are to the character looking into the lens, the more personal it will become.
Be aware of where the camera is in the vertical, not just the horizontal, plane. Looming over someone, then being loomed over by that person’s conversation partner, is a powerful way to suggest a seniority differential among individuals.
Camera movement during the take, as on a track and dolly, is great because it encodes information about the 3D layout of the scene in the parallax shift that can be seen when moving around. Look at cranes, track and dolly equipment and Steadicam. Mainstream cinema has an established visual language, which can be learned by examination: evaluate likes and dislikes by watching it.
This is an absolutely massive topic, and it can only be touched upon in this guide. For more information, consider reading some cinematography textbooks by Kris Malkiewicz or Blain Brown.
In technical terms, the greatest difference is simply that the cinematographer has a much reduced choice of shutter speeds. Motion picture shutter speeds are discussed in terms of shutter angle; that is, the open portion of the rotating circular shutter in a film camera. In a real film camera, this circular shutter is not in continuous motion, but the mechanics of the device are usually such that a 180-degree shutter will be open 50% of the time. Note that this applies to any frame rate, so a 24fps shot will inherently have a shutter speed of 1/48th of a second.
Shooting at higher rates—referred to as overcranking from the historic film practice of hand-cranked cameras—will produce slow motion when played back at a more normal speed. In the case of a shot at 60fps, the shutter angle might still be 180 degrees, meaning that each frame then has an exposure of 1/120th of a second, but the apparent motion blur in the slowed-down shot is appropriate to the apparently-slowed movement.
Video DSLRs are fundamentally still cameras and aren’t generally rated in shutter-angle terms. Nevertheless, it’s useful to be able to perform a rough conversion with a bit of mental arithmetic, since changing shutter angles—and therefore exposure time—has a pronounced effect on the rendering of motion. In certain cases, some cameras don’t actually provide a shutter speed that would represent a 180 degree shutter at the frame rate that's being used. For instance, the closest thing to a 1/48th shutter speed for 24 fps work is often 1/50th. This small disparity won’t be visible, but larger errors—more than 20%, for instance—might be. It's always a good idea to shoot tests with the intended settings.
It might seem that these shutter speeds are pretty slow compared with the way stills are shot, and it’s true—examining a freeze frame of any motion picture will reveal an awful lot of motion blur. This is a necessary component of motion-picture imaging because the frame rate is comparatively low, having been chosen largely for business reasons in order to limit film stock consumption. The persistence of vision in our eyes needs a little help to blend 24 stills per second into fluid motion, and even with a 180-degree shutter (or 1/48th exposures) there can still be problems (see below).
Deliberately selecting an exposure representative of a shutter angle wildly different from 180 degrees generally has a fairly visible effect on motion rendering. Very narrow shutter angles of down to 45 degrees (equivalent to 1/192nd exposure) were used during the combat scenes in Saving Private Ryan, and have been much imitated since. The resulting lack of motion blur creates a staccato, flickering quality to movement that can give the shot an intense and high-energy feel, exacerbating the instability of handheld camerawork. Conversely, very wide shutter angles—some video cameras can get to almost 360 degrees—produce a smeary, fluid look that is somewhat less popular.
There are essentially four principal combinations of resolution and frame rate that represent the majority of current video distribution. These are supported by nonlinear editors, video tape equipment manufacturers, the software companies who write compression codecs, and most importantly by a world full of devices designed to reproduce the material for consumption.
Standard definition, including NTSC (the first color system, at 29.97 fps) and PAL, a similar but improved system at 25 fps. PAL and NTSC are intrinsically distribution formats, with the majority of their effort going toward defining a radio broadcast standard and the video stream it would carry. Standard-definition frames are usually 720 pixels wide by either 576 (PAL) or 486 (NTSC) high. Standard definition tape formats include DVCAM, DVCPRO and Digital Betacam. Standard-def post is often done using a compression codec that exactly duplicates the data format used on DVCAM tape, so that the tape’s contents can be captured onto hard disk and edited without being altered from the way they were originally recorded.
720p HD, being a 1280x720 pixel frame, is lower in resolution than what’s increasingly called full HD, but more usually runs at a higher frame rate of either 50 or 60 frames per second, so the amount of information involved is not much smaller. It’s commonly used for sports broadcasting, where the improved temporal resolution makes fast-moving objects such as a ball easier to follow. The DVCPRO-HD tape format originally worked at 1280 x 720.
1080-line HD, at 1920 x 1080 pixels, may be either interlaced (I) or progressive (P), but for our application is invariably progressive. Many tape formats handle 1080p: HDCAM is Sony’s HD answer to DVCAM, and their HDCAM-SR format offers increased technical quality.
Larger, digital cinema formats: Digital post-production work for feature films has traditionally been done on images 2048 or, very rarely, 4096 pixels wide. While a 2048-pixel image holds negligibly more detail than a 1920 pixel HD frame, the way it’s handled is often very different, with frames saved as individual still-image files, completely uncompressed and with greater than usual color precision. Some video I/O boards have the ability to work with 2048 x 1556 images for monitoring purposes, but videotape is rarely involved.
The final purely technical issue to be aware of is the possibility of finishing on an interlaced video format. This is rapidly becoming of historical interest, but anyone who needs to produce material that will go to PAL or NTSC standard-definition consumers can expect to need it for many years to come.
Interlacing involves broadcasting, receiving and displaying first only the odd rows of the picture, then the even rows, in a repetitive sequence of two fields. This technique was originally developed to make better use of radio bandwidth on the original broadcast TV formats, since it allows a CRT-based monitor to be in an almost continuous state of drawing images onto the screen. Without interlacing, the monitor would be required to very quickly draw all the lines, hold them there, then very quickly draw the next frame’s lines, which would cause pronounced flicker. Interlacing smoothes out these lumps in information delivery and made life easier for engineers working with the technology available at the dawn of television. Now we can easily store a progressively scanned frame in video memory, but forty or fifty years ago, such technologies had not yet been developed.
Video DSLRs don’t work this way. Instead, they produce progressively scanned frames in which all the rows of pixels are captured simultaneously or, thanks to rolling shutter, at least sequentially. Happily, this doesn’t really matter: While it’s normal for interlaced video cameras to capture the two fields some time apart, it isn’t absolutely required that they do that. A DSLR video can be split into two fields and reproduced as interlaced on most CRT video monitors without problems. In fact, this happens by default if a DVD of a production is played on a CRT monitor. This is how movies were shown on TV for decades and it doesn’t cause difficulty.
The only slight wrinkle in this situation is what happens with NTSC. Since the video format runs at 29.97fps and film at 24 (or properly 23.98), there aren’t enough film frames to fill the NTSC frame count. The solution involves duplicating the 24-frame film image to the fields of more than one frame. The technique, called 3:2 pulldown, involves deriving three consecutive interlaced fields (and thus one and a half frames) from one film frame, followed by two consecutive fields (one frame, but not necessarily both fields from the same frame) from the next film frame. This technical workaround has a visible effect on motion rendering, but it’s about the best possible way to get 24p video to NTSC viewers as it preserves at least something of the motion rendering.
Two terms which are often confused are color space and color subsampling. Color space refers to the chromatic values enclosed by a set of three color primaries, or, more simply put, defines how red a red can get, how green a green can get, and so on. Clearly, no green can be a deeper, more saturated green than the green used as an RGB primary in a particular recording and display system. The situation is complicated by the presence of other systems that define their primaries in terms other than red, green and blue. For instance, the YUV strategy used by many image compression technologies separates brightness from color information. This is done to exploit the human visual system’s inability to perceive colors as sharply as it perceives brightness, making it more efficient to compress the data.
Color subsampling is the technique used by the YUV system to reduce the resolution of the U and V color difference channels.
This system encodes color images by representing a deviation of an uncolored pixel toward a colored result; in essence, it takes a black and white image and bends its pixels toward a color in each of two axes. There are various subsampled-color spaces that are sometimes referred to as YUV, such as YCrCb, which may use various different axes of deflection with the same intention.
This separation of chrominance (color) from luminance (brightness) allows the chrominance information to be stored at a lower resolution, usually by simply skipping pixels. A 4:2:2 YUV color image has half the horizontal color resolution as compared to the brightness resolution: a 4:1:1 has quarter-resolution chrominance data. The less common 4:2:0 notation describes a system using chrominance data halved in resolution in both the horizontal and vertical axes; its color performance is better than 4:1:1 horizontally but poorer vertically, although in general it will be perceived as a better compromise.
Many key compression technologies use YUV color subsampling, including the JPEG and H.264 systems found on many HDSLRs. As we can see from this, it is possible to use different color spaces with different subsampling, and the techniques are quite separate.