Anda di halaman 1dari 68

The Cinematic VR Field Guide

A Guide to Best Practices for Shooting 360

Written by Grant Anderson


Contributions on sound by Adam Somers
2017 Jaunt, Inc.
jauntvr.com
First Edition 2017

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 2 of 68

Table of Contents
Introduction ............................................................................................................. 6
Virtual Reality Basics .............................................................................................. 7
Types of Virtual Reality ......................................................................................7
What is Game Based Virtual Reality?

What is Cinematic Virtual Reality?

Stereoscopic vs. Monoscopic VR ...................................................................... 9


360 Video .......................................................................................................11
Minimum VR Requirements.............................................................................11
Types of HMDs ................................................................................................12
Camera .................................................................................................................13
Types of VR Cameras .....................................................................................13
Panoptic

13

Mirror Rigs

14

Fisheye

15

Light-field

16

Photogrammetry

18

Stitching Approaches.......................................................................................18
Geometric

19

Optical Flow

20

Live Stitching

21

Avoiding Artifacts

22

Resolution

23

Distance to Subject ......................................................................................... 23


Camera Motion ................................................................................................24
Sickness in VR

24

Guidelines to Minimize Sickness in VR

25

Mounts & Rigging ............................................................................................ 27


Types of Mounts

27

Drones

31

Clean Plates

32

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 3 of 68

Water Issues ....................................................................................................33


Rain

33

Underwater

33

Directing the Action ............................................................................................... 35


Getting out of the Way .....................................................................................35
Live Preview ....................................................................................................37
Blocking & Framing .......................................................................................38
FOV & VRs Answer to the 3D Gimmick

38

Close-ups, Over-the-shoulder shots, & Other 2D Remnants

38

Getting the Viewers Attention

40

Rig Height & Identity

42

POV & Eye Contact

45

Lighting & Exposure ..............................................................................................46


Extreme Contrast ............................................................................................ 46
Flares ..............................................................................................................47
Rigging ............................................................................................................49
Spatial Audio for Cinematic VR............................................................................. 51
Binaural Audio Basics: How We Hear .............................................................51
Binaural recording

51

HRTFs

52

Caveat: personalization

52

Spatial audio formats for cinematic VR ........................................................... 53


Ambisonic B-format in depth ...........................................................................53
Ambisonics overview

53

B-format explained

54

B-format representations

54

B-format playback

55

Recording B-format

55

Mixing B-format

56

Dolby Atmos ....................................................................................................62


Facebook 360 Spatial Workstation ..................................................................63
Post-Production ....................................................................................................64

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 4 of 68

Fixing Stitching Artifacts ..................................................................................64


Editing .............................................................................................................64
Working with Proxies

64

Available Tools

64

Final Conform

64

Post Stabilization .............................................................................................64


Color Correction ..............................................................................................64
Compositing & Adding VFX ............................................................................. 64
Working in the 360 Equirectangular Format

64

Nuke & CaraVR

64

Rendering in 360

64

Interactivity ...........................................................................................................65
Appendix ...............................................................................................................66
Guidelines for Avoiding Artifacts using the Jaunt ONE ....................................66
Distance from the camera rig

66

Leveling & placing the camera rig

66

Challenging situations to avoid

66

Legal

68

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 5 of 68

Introduction
Virtual reality (VR) is truly a new medium. Along with the excitement at the creative possibilities
there is also much confusion within the film industry on how best to shoot a compelling piece of
VR content. Questions regarding camera movement, blocking, lighting, stereoscopic 3D versus
mono, spatial sound capture, and interactivity all get asked repeatedly.
As Jaunt is at the forefront of cinematic virtual reality production, the purpose of this guide is to
share our experiences with shooting a vast array of VR content with the wider communitywhat
works and what doesnt. We are not, however, trying to produce an exhaustive text on the
entirety of filmmaking but rather trying to cover the additional complexities and challenges that
come with shooting in VR.
Much of what will be discussed is framed through the lens (so to speak) of the Jaunt ONE
camera system as that is the rig with which we are most familiar and we provide specific details
on it wherever applicable. The vast majority of the content of this paper covers general VR
shooting techniques however and we attempt to keep the material as agnostic as possible.
Virtual reality technology as well as the language of cinematic VR is constantly and rapidly
changing at a breakneck pace so we will endeavor to update this guide from time to time as new
techniques present themselves, new technology develops, and we receive feedback from the
community.
Were interested to hear your feedback and what is working (or not) for your production teams.
To send feedback, please shoot an email to fieldguide@jauntvr.com
We hope you enjoy this guide.

The Jaunt Team

Virtual Reality Basics


According to Wikipedia virtual reality is a computer technology that replicates an environment,
real or imagined, and simulates a user's physical presence and environment to allow for user
interaction.1 On a computer or cell phone this usually means sight and sound on a display
device and speakers or headphones. Devices for touch or force feedback are starting to be
introduced. Smell and taste are quite a ways off still.
The key to any form of virtual reality is presence and immersion. Its these qualities that
separate it from any media that has come before it and can create an intense emotional
connection to the material. Chris Milk one of the early directors in VR has been frequently
quoted as calling it the empathy engine.

Types of Virtual Reality


There are two basic camps of virtual realitycinematic and game engine based. These differ on
the means of production, playback method, realism, and amount of interactivity allowed.

What is Game Based Virtual Reality?


For a long while this is what people typically thought of when they thought of virtual reality. This
is computer graphics generated in realtime typically by a 3D CG gaming engine such as Unity or
Unreal. Since the world is generated on the fly, using specialized head mounted displays
(HMDs) like the Oculus Rift or HTC Vive which include motion tracking, users are able to walk
around the environment as if it was real.

Wikipedia https://en.wikipedia.org/wiki/Virtual_reality

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 7 of 68

UNREAL SHOWDOWN CINEMATIC VR DEMO. EPIC GAMES

This type of VR also lends itself to highly interactive content with the Rift and Vive also offering
tracked hand controls allowing you to pick up and move objects, wield a sword, shoot a gun,
and generally interact with the entire environment. Its very much like being dropped into a video
game.
Just because the first round of HMDs have been heavily targeting gamers does not mean that
this type of technology is only for gaming. Game engines are just as capable of making an
interactive film or music video as they are a game and excel at creating worlds you can visit that
are completely unlike real life.

What is Cinematic Virtual Reality?


Jaunt specializes in cinematic virtual reality. This is 360 video filmed using a panoramic video
camera system and played back as an equirectangular video file which allows the user to look
around the scene as it unfolds. An equirectangular file is a fancy way of saying the image is an
unwrapped sphere. Depending on the camera system and stitching process the scenes can be
either monoscopic (flat) or stereoscopic (3D).

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 8 of 68

ESCAPE THE LIVING DEAD EQUIRECTANGULAR IMAGE

Here you have the advantage of scenes looking completely real and not computer generated as
with game engines. Scenes are also usually captured with spatial sound microphones making
them sound just as real. If you hear a dog to your right and turn your head youll see the dog in
the spot the sound came from. Its as if you were dropped into a movie.
Unlike in game engines, however, you cannot move around the scene freely. Only if the camera
is moved during filming do you move. As new camera systems and acquisition technologies are
developed eventually you will be able to move around filmed scenes as well. See below under
Types of VR Cameras for more on this.
Though not as interactive as full game based environments you can still add interactivity to
cinematic VR. Branching Choose Your Own Adventure stories, gaze detection, interactive
overlays and interfaces, audio or clip triggers, gestures, and even full CG integration are all
possible.
All of this leads to a completely new form of media. A blank canvas with which weve only just
begun to realize whats possible. The killer app in VR will be some combination of cinema,
gaming, and interactive theatre. Right now were only in the dress rehearsal and anything is
possible. Even just five years from now VR content will look nothing like it does today.

Stereoscopic vs. Monoscopic VR


VR footage can be either monoscopic or stereoscopic 3D. Mono footage is flat and has no
depth with everything projected back to the same depth of the 360 viewing sphere. While you
can still turn your head and look around the scene nothing ever truly gets closer to you, only
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 9 of 68

bigger. This is similar to the difference between a closeup in a 2D film versus how something
actually gets closer and comes out at you in a 3D film.
With 360 stereoscopic 3D on the other hand you have full 3D in every direction and objects can
actually get closer to the viewer. This leads to a much more naturalistic and immersive feeling
as this is how we actually experience things in real life. Imagine filming a shark underwater in
VR. For maximum impact youd want the viewer to feel the shark actually getting up close and
personal with the viewer. With stereoscopic 3D you can achieve that while in mono, although
still menacing, the shark wont actually ever get any closer and you lose that sense of presence
and immersionand fear factor!
Wherever possible you should always
strive to shoot in full 360 3D. Why
doesnt everyone just do that then? As
you would expect, the camera rigs are
more expensive and the stitching
process is much more complicated and
its difficult to get good results without a
lot of post effort and dollars. Jaunts
Jaunt ONE camera and the Jaunt Cloud
Services (JCS) are meant to ease this
process, greatly automating the entire
effort.

JAUNT CLOUD SERVICES (JCS) PROJECT PAGE

See the Jaunt Cloud Services documentation for more information.


All that said, not every scene or shot necessarily requires shooting in 3D nor is it always
possible. Currently, there are very few stereoscopic 360 rigs for shooting underwater. Due to
the confines of the protective encasement its harder to fit a stereo rig within and smaller mono
rigs are typically used. See the Underwater section below for more.
Likewise when shooting in the confines of a car where things are going to be very close quarters
you usually have a better shot using a smaller GoPro style rig and stitching the material in
mono. Most cameras have a minimum distance to subject that you must respect in order to get
a quality stitch and these distances are generally greater when stitching in 3D. If you need to get
very close to a subject it may be better to go the mono route. See Distance to Subject below for
more information.
Similarly, when using drones weight is always an issue. Therefore there are many instances
where we can again use a smaller, lighter GoPro rig and stitch in mono. Very often you are far
enough above the landscape where youre not getting much stereo parallax anyway and the
viewer will hardly notice.
In any given show we might have the majority of footage shot in full 360 3D with a smattering of
shots as in the above cases filmed with smaller, lighter GoPro rigs and stitched in mono. If done
correctly and in the right circumstances your audience will likely not notice.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 10 of 68

360 Video
A note must be made about what we call 360 video. How is this different from VR? In an effort
to get people into VR and leverage the heavily trafficked platforms that exist now many
companies, Facebook and Googles YouTube in particular, have started promoting 360 video.
This is video you can watch in a browser on the web and pan around the scene with your
mouse or move and tilt your smartphone around to do the same on mobile apps.
As well, our Jaunt smartphone and web apps have this capability for those times or for those
users that do not yet have a VR viewing device to be able to experience the content in full 3D.
Brands and companies such as Facebook love 360 video as it allows them to leverage their
massive user bases on platforms that everyone is already using.
We must be careful however to call this virtual reality. If too many users believe this is the real
deal then they may think that they have actually experienced virtual realitywith all of its
presence and immersion in 3D with spatial soundand not been that impressed. We need to
make a clear distinction between 360 video and true VR and use the former to activate viewers
fully into the latter or risk VR dying an early death similar to what happened with broadcast 3D
TV.
Which leads us to what the requirements to be considered virtual reality actually are.

Minimum VR Requirements
You could talk to 100 people about what is essential to be considered virtual reality and get as
many answers. As we are looking for maximum immersion and presencethe feeling of actually
being thereJaunt assumes a minimum of four things:
360 Equirectangular Imagesthis is a scene in which you can look around, up, and down a
full 360 degrees. Some camera rigs have instead opted for 180 particularly cameras that are
streaming live to reduce bandwidth and stitching complexity. However, as soon as you look
behind you youre pulled right out of the scene. Often times to combat just having a black
background behind you a graphic will be inserted such as a poster frame from the show, stat
card from a game, etc.
Stereoscopic 3Dthis is one of the more contentious requirements as many people are filming
in mono today as it is both cheaper and simpler to capture and stitch per the reasons given
above. However, to truly get that sense of presenceof being presentthat is the hallmark of VR
you really need to shoot in stereoscopic 3D wherever possible. Stereo 3D vision is how we see
in real life and is equally important in VR.
Spatial 3D Soundsound is always a hugely important part of any production. In VR it is critical.
Not only does it help with immersion but it is one of the few cues, along with motion and light, to
get your viewers attention for an important moment as they could be looking anywhere.
Capturing spatial audio increases your sense of place.
Viewed in an HMDfinally, none of the above is any good unless you have a method of actually
viewing it. Though 360 video is often created for those without a viewing device and allows you
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 11 of 68

to pan around the image, it doesnt allow you to see in 3D or provide you with spatial audio
playback. For the full experience you really must use a proper HMD. The good news is you
dont need an expensive Oculus Rift or HTC Vive. There are some very inexpensive or even
free options on the market with the selection increasing at a dizzying rate.

Types of HMDs
There are many different types of HMDs or head mounted displays that vary drastically in price
and capability ranging from the very simple Google Cardboard to the Samsung Gear VR to the
Oculus Rift and HTC Vive. It was the cell phone and its suite of miniaturized components
gyroscopes, accelerometers, small hi-resolution screensthat led to the resurgence of viable
virtual reality and allowed Palmer Lucky to create the first Oculus headset and its the cellphone
that is the basis of all of them, even the high end Rift and Vive.
The higher end HMDs provide full body tracking and some also include hand controllers
creating a room scale VR system that allows you to move about and interact with your
environment. But using just your cellphone with some simple lenses housed in a cardboard or
plastic enclosure gets you a pretty amazing experience. This will only get better as cell phone
manufacturers integrate better VR subsystems into their handsets.
The list of HMDs is ever growing at a breakneck pace but for a good overall list of the current
HMDs on the market or in development see the VR Times.

VARIOUS HEAD MOUNTED DISPLAYS (HMDS)

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 12 of 68

Camera
In this section we discuss the various types of VR camera rigs you will encounter, some of the
gotchas to be aware of with VR cinematography and how to avoid them, mounts and rigging
solutions, the importance of clean plates, and underwater and aerial VR shoots.

Types of VR Cameras
There are many types of camera systems for shooting VR and the space is evolving rapidly.
Each has their own strengths and weaknesses and we cover the major forms below. There are
many other forms of panoramic cameras but we wont cover those that dont allow for video
capture such as slit-scan cameras. Where possible, its best to research and test each one
based on your own needs.

Panoptic
These camera systems are generally inspired from the visual system of flying insects and
consist of many discrete camera modules arranged on a sphere, dome, or other shape. The
term comes from the Greek Panoptes which was a giant with a hundred eyes in Greek
mythology. Jaunts camera systems, including the Jaunt ONE, are of this variety.

JAUNT ONE CAMERA

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 13 of 68

This is by far the most popular type of VR camera rig and many companies have jumped into
the fray by designing lightweight rigs to support a variety of off the shelf camera modules
usually the GoPro. Being small, lightweight, and relatively inexpensive the GoPro has proved to
be the go to camera for snapping together a VR camera rig. In fact, Jaunts first production
camera, the GP16, consisted of sixteen GoPro cameras in a custom 3D printed enclosure.
However, there are numerous problems with
a GoPro based system including image
quality, heat dissipation, and most
importantly lack of sync. When shooting VR
it is crucial that all of your camera modules
are in lockstep so that overlapping images
match precisely and can be easily stitched
together in post. Out of the box, GoPros
have no built-in syncing capability and even
when properly synced in post based on
audio/visual cues they can drift over time.
JAUNT GP16 CAMERA

This isnt to pan GoPro cameras. The mere fact that they have enabled so many different VR
rigs is a feat unto itself but they werent originally conceived for this task and the cracks are
showing.
Jaunt has since moved on to twenty-four custom built camera modules in the Jaunt ONE that
provide four times the sensor size with better low light performance, higher dynamic range with
eleven stops of latitude, better color reproduction, global shutters to prevent tearing of fast
moving objects, and most importantly synced camera modules.
The number of cameras in any given system is a function of overlap. You need enough cameras
to provide sufficient overlap between images of between 15-20% in order to properly stitch
adjacent frames togethermore if you want to provide a stereo stitch. The more cameras you
have in a rig and the more closely spaced they are to one another also provides a shorter
minimum distance to camera allowing subjects to get much closer before stitching falls apart.
See Stitching Approaches and Distance to Subject below for more information.

Mirror Rigs
Another common type of panoramic 360 camera is the mirror rig. This typically has a number of
cameras in a circular configuration shooting up into a collection of mirrors that are facing out into
the scene at an angle. A good example of this kind of rig is the Fraunhofer OmniCam.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 14 of 68

FRAUNHOFER OMNICAM FRAUNHOFER HEINRICH HERTZ INSTITUTE

These rigs can be either mono or stereo and are generally bigger and heavier than other types
of panorama rigs due to the mirrors. A big benefit of these rigs however is that the mirrors allow
the cameras to shoot into a virtual nodal point within the mirrors that provide minimal or no
parallax in the scene making stitching very easy and relatively artifact free.
Because of that many of these rigs allow for realtime stitching and transmission of live 360
imagery. By having two cameras shooting into each mirror you can create a seamless stereo
stitch. The main drawback again being the size and weight of these rigs and the relatively
powerful computer they must be attached to for live stitching.

Fisheye
Many consumer panoramic cameras are of this variety because they are relatively cheap, small,
lightweight, and are easily stitchedusually in-camera. Some use one lens, like the Kodak 360
Action Cam, and capture 180 degrees while a two lens system, like the Ricoh Theta, captures a
full 360 degrees by stitching the two halves together.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 15 of 68

KODAK PIXPRO SP360-4K & RICOH THETA CAMERAS

Though they are convenient and easily stitched the quality of this type of camera is relatively
low. Many can stream to an iPhone or Android device making them a good remote viewing
solution if your VR camera doesn't provide one. See below under Live Preview for more
information.
Prosumer versions of these types of cameras also exist with much larger lenses and sensors.
Unfortunately all cameras of this type produce only monoscopic images and not stereoscopic
3D images lessening the immersion for VR purposes.

Light-field
Light-field cameras are the latest technology to hit the panoramic market. They represent the
future of virtual reality filmmaking though their practical use is still a ways off. Instead of focusing
light through a lens and onto a sensor there are hundreds of tiny micro-lenses that capture light
rays from every conceivable direction.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 16 of 68

This allows for some pretty amazing things to


be done in post including shifting parallax by
moving your head in an HMD, refocusing the
image, generating depth mattes and stereo
3D, and pulling mattes without a green
screen.
Light field cameras were first popularized in
the consumer market with the Lytro Illum still
camera. Recently Lytro entered the
professional video market with the Lytro
Cinema and Lytro Immerge for VR. Visit Lytro
for more information.
LYTRO LIGHTFIELD MICRO-LENS ARRAY

Unlike in a light field still camera with its micro-array of lenses, most video based light field
cameras use numerous camera modules arranged in a grid or sphere configuration. With some
fancy processing these multiple video streams can be packed into a compressed light field
format that enables you to move around the scene as the video playsalbeit limitedly.

LYTRO IMMERGE VR LIGHTFIELD CAMERA CONCEPTUAL RENDERING LYTRO

You are limited in movement roughly equal to the diameter of the sphere or width of the grid
from which it was captured. You wont be fully walking around the room but you will be able to
move your head and see shifting parallax which can help minimize motion sickness in VR.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 17 of 68

Unfortunately, the practical uses of these cameras in production are currently limited as they
require a large array of computers attached to the camera for data capture and processing. In
addition, the light-field movie stream, even though it is compressed, is enormous making
working with, downloading, or streaming incredibly difficult at todays bandwidth limits.
Though currently difficult to film light field imagery it is quite possible to render it out in CG using
technology developed by oToy. For a good video describing holographic or light field video
rendering see oToys website.

Photogrammetry
To fully realize scene capture for VR you need to change your thinking entirely and move from
the current inside-out methodology to an outside-in perspective. That is, instead of filming with
an array of cameras that are facing out into the scene, surround the scene with an array of
cameras that are looking in.
Microsoft has created a video based photogrammetry technology used to create holographic
videos for its HoloLens augmented reality headset called Free Viewpoint Video. An array of
cameras placed around a green screen stage captures video from many different angles where
it is then processed using advanced photogrammetry techniques to create a full 3D mesh with
projection mapped textures of whatever is in the scene. Their technology uses advanced mesh
tessellation, smoothed mesh reduction, and compression to create scenes that you can actually
walk around in VR or AR.
For more information on the process see this Microsoft video on YouTube.
Another company working in this space, 8i, uses a similar array of cameras to capture what they
call volumetric video stored in a proprietary compressed light field format. This technology does
not create a full CG mesh (though that is an option) but yet still allows you to walk about the
scene and observe it from any angle. For more info visit 8i.
Whatever the technology or approach, advanced realtime photogrammetry techniques will be an
important capture technology in the not too distant future allowing you to fully immerse yourself
in any scene. As the technology improves and reduces in cost it will also allow consumers to
truly connect like never before through holographic video feeds and social environments.
For a list of current camera technologies mentioned in this section and additional information,
please visit The Full Dome Blogs Collection of 360 Video Rigs.

Stitching Approaches
Once you have shot the scene with your 360 camera youll need to stitch all the individual
cameras together to create a single, seamless 360 spherical image. Creating an image without
visible seams or artifacts is one of the more difficult and time consuming issues in VR
filmmakingparticularly when creating a 3D image.
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 18 of 68

Jaunts Cloud Services (JCS) has made this process nearly automatic and currently supports
the Jaunt ONE and Nokia Ozo cameras. There are a variety of approaches to stitching outlined
below. Jaunt has experimented with several techniques but has currently settled on optical flow
as the technology that provides the best 3D with no seams and a minimum of artifacts.

Geometric
Geometric stitching is the approach used by most off the shelf software like Kolor Autopano and
was first used at Jaunt with our GP16 camera. In this approach barrel distortion due to lensing is
corrected in each image, the images are aligned based on like points between them, and then
smoothly blended together. This creates a full 360x180 equirectangular unwrapped spherical
image.

GEOMETRIC STITCHING

Stitching in stereo 3D is more difficult and requires creating a virtual stereo camera pair within
the sphere using slices of each image for the left and right eye virtual cameras to be stitched.
As mentioned, this is not always perfect and can lead to visible seams and also 3D artifacts
where portions of the scene are not at the correct depth. For that reason, Jaunt has moved on
to the following optical flow technique.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 19 of 68

Optical Flow
Jaunt is currently using and optimized optical flow algorithm as the basis of its Jaunt Cloud
Services (JCS) online stitching platform both for the Jaunt ONE and Nokia Ozo cameras.
Optical flow is a technique that has been used in computer graphics for some time. It has many
applications including motion estimation, tracking and camera creation, matte creation, and
stereo disparity generation. At its core, optical flow algorithms calculate the movement of every
pixel in a scene usually across frames in a time based analysis from frame to frame.

OPTICAL FLOW FIELD VECTORS2

In the case of stitching a stereo panorama the flow graph is used to track distinct pixels
representing like visual elements in the scene between adjacent cameras. Using this in
conjunction with known physical camera geometry and lens characteristics it is possible to
determine the distance of each pixel between the cameras and therefore a disparity (depth)
map of the entire scene.

2 Better Exploiting Motion for Better Action Recognition (PDF Download Available). Available from: https://www.researchgate.net/

publication/261524984_Better_Exploiting_Motion_for_Better_Action_Recognition [accessed Sep 9, 2016]

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 20 of 68

This disparity map is used to


interpolate synthesized virtual
cameras for both the left and right
eyes that are between physical
cameras creating a stereoscopic
3D scene in 360. This technique
provides for a much better sense
of depth with fewer artifacts and
very little if any seams. However,
the approach is not perfect as
errors can creep into the flow
graph generation or there may be
blind spots due to objects
occluding certain regions from one
or more cameras. In this case, the
occluded area must be estimated
and artifacts called halos can
form around those objects.

DISPARITY DEPTH MAP

Though difficult to see in the below picture since it is not moving, you can see the warping and
distortion around the gentlemans nose. In the video, this area pops and wiggles back and forth
around his outline. Generally, the closer objects are to camera the harder it is for adjacent
cameras to see similar points
which causes more estimation
and larger halos. For more
information see Avoiding
Artifacts and Distance to
Subject below.
Unfortunately, it is nearly
impossible for current
algorithms to fully eliminate all
errors and some post
processing will be required to
remove them. See the PostProduction section below for
more information on
techniques.

Live Stitching
Live stitching uses similar methods as above but obviously must do them in realtime so that a
live 360 image can be transmitted. There are very few products that currently do live stitching
or do it well. Most of those either operate in less than 360 going typically for a 180 approach
or in monoscopic instead of stereoscopic 3D.
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 21 of 68

One off the shelf solution is VideoStitch which enables you to do live monoscopic stitching via
their Vahana VR product with live stereo 3D on the horizon. It is a software solution that works
with a variety of VR camera rigs. See VideoStitch for more information.
Many live stitching solutions are of the aforementioned mirror rig variety as their configuration
allows for easier, quicker stitching. The Fraunhofer OmniCam is one such solution and has two
models, one for mono and one capable of live transmission of 3D 360 streams. See Mirror Rigs
above for more information.

Avoiding Artifacts
Each of the above cameras and algorithms have their own idiosyncrasies in terms of how well
they will stitch without introducing artifacts. Nearly all algorithms introduce some form of
undesirable artifacts. These can be lessened if the scene is shot correctly so it is worth your
while to investigate and test the capabilities of your particular camera/algorithm combination.
In the case of the Jaunt ONE camera and its optical flow based stitching algorithm for example
we sometimes see chattering halos around moving objects or objects too close to camera.
These often occur because the flow algorithm has a hard time finding similar pixels between
adjacent cameras in order to do the stitch. If a person is standing in front of a bright blown out
window its difficult for the algorithm to tell which pixel is which as they are all of similar value
around the subject as there is no detail in that area.
Likewise, if there are too many points that look exactly the same you can run into a similar
issue. If a person is standing in front of wallpaper with fine vertical stripes the flow algorithm can
have a tough time figuring out which point to match between many points that look the same.
The solution then is to place your subject over a different portion of the background that does
not have similar repeating detail in the case of the wallpaper or to expose the camera down so
that there is more information within the window for the algorithm to discern.
You can also run into this problem with objects that are too close to camera. If an object is too
close then one camera may see detail in the scene that are blocked by that close object in the
adjacent camera. In this case, there is no way for the algorithm to see around that object and it
must make pixels up. This estimated region around those objects are the halos and since this is
evaluated from frame to frame they may vary over time resulting in their chattery nature.
The solution to this problem is simple. Move your subject back. For the Jaunt ONE camera the
safe distance to closest subject is 4-6.
Algorithms are becoming smarter all the time and artifacts are expected to be reduced if not
eliminated in the not too distant future. In the meantime, though you may never eliminate all
artifacts you can drastically reduce them by taking some time to think about how your algorithm
will stitch the shot and compose your scene accordingly.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 22 of 68

Resolution
A note about resolution. Most individual camera modules in a VR rig be they GoPros, machine
vision cameras, or custom modules as in the Jaunt ONE camera, are usually HD (1920 x 1080)
or 2k (2048 x 1556) resolution. When these images get stitched together they obviously create a
much larger panorama.
The Jaunt ONE camera for instance is capable of creating up to a 12k resolution image.
Working with this size image with todays technology however is completely impractical. Just a
couple of years ago the industry was working with HD TV or 2k film images. Then in rapid
succession the industry introduced stereo 3D, ultra-high definition (UHD) and 4k, high dynamic
range (HDR), wide color gamut, and high frame rate (HFR) imagery each with the potential to at
least double the amount of data if not worse.
Where each of the above was a concern on its own before, now we come to VR which calls for
all of the above combined. And we need it all right now. But it is not possible to work with such
large files in post production with current technology. The CPUs and GPUs even on the fastest
machines cant keep up. Given this, Jaunt has currently limited the resolution we output to 3840
x 3840 stacked left/right eye equirectangular stereo 3D at 60fps.
Even this size of file can be difficult to work with in post and beefy machines are needed for
compositing, editing, color correction, and CGI at that resolution and frame rate. And although
we compress the final file delivered to the consumer, bandwidth around the world remains highly
variable and is also an issue.
Ultimately, the biggest bottleneck is the resolution of the final display device. While there is
always a desire to future proof your media right now it is important to keep in mind that all of the
major HMDs including the Rift, Vive, and PlayStation VR currently only support around 1k per
eye. This will of course improve over time but you want to balance the desire for massive
resolution with what is practical today. As technology improves over time we will be able to
increase the resolution and fidelity of our images.

Distance to Subject
With any type of rig that uses an array or ball of outward facing cameras, as the Jaunt ONE
camera does, one of the chief constraints is distance of the subject to camera. Get too close
and the stitch will fall apart and may be unusable. The closest distance you can achieve is a
factor of overlaphow many cameras you have and how closely spaced they are.
The issue is, the closer you come to camera the closer the cameras must be to each other.
Otherwise, one camera may see an object where its neighboring camera does notlike part of a
face in a closeup for example.
Of course the closer the cameras are to each other, the more cameras you need to cover the
full 360, and the smaller they physically must become. The smaller the cameras, the smaller
the sensors, the worse the image quality and low light sensitivity. Designing a proper VR
camera then becomes a game of optimization and tradeoffs.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 23 of 68

2015LUNARConfidential|JAUNTMeeting |April17,2015

JAUNT CAMERA EVOLUTIONPROTOTYPE, GP16, & THE JAUNT ONE

The Jaunt ONE camera has a total of 24 lenses16 around the horizon, four facing up, and four
facing down. In this configuration and with our optics we can achieve a minimum distance to
subject of about 4 feet.
Ignore these distances at your peril as it can cost you thousands of dollars in post trying to fix
these shots as you are literally trying to recreate the missing information seen in one camera but
not in its neighbor. Many shots are simply not reparable.
Many scenes can benefit by getting much closer to camera than the minimum distance may
allow. One of the hallmarks of VR is being able to elicit visceral emotions from viewers by
bringing actors or objects right up to viewers. Unlike in 2D where this close-up is really just a
big-up and not really any closer at all, you can create real intimacy or anxiety in your viewers
by having someone step up to the camera actually getting closer to their POV.
A way around these minimum distance limits is to shoot the main environment in 360 3D with
the VR camera and then use a traditional stereo camera rig to shoot the actor green screen in
the same environment and lighting and then composite them into the 360 background in post.
This obviously takes additional equipment and more time but can be worth the payoff when you
really need an extreme closeup.

Camera Motion
Of all issues related to cinematic VR none is more important than that of moving the camera as
it has the potential of literally making your audience sick. Directors and filmmakers are used to
moving the camera to achieve dynamism within shots and as a tool to add extra emotion to
scenes. In 2D this rarely presents a problem but in VR it can cause dizziness, disorientation,
nausea, or even vomiting. Not something you want to do to your audience! Special care must be
taken to ensure that doesnt happen while still enabling interesting camera moves.

Sickness in VR
Motion sickness in VR is thought to occur due to the separation between your visual system and
your auditory canal. Normally, when you move there is a corresponding action between what
your see and the fluid within your inner ear canal. In VR, unless you are actually moving in

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 24 of 68

tandem with what you are experiencing within the headset there becomes a disconnect. This
disconnect is a similar physiological effect to being poisoned and your body reacts the same.
For more on this see the article on virtual reality sickness on Wikipedia.
This can happen in gaming or cinematic VR but is currently more prevalent in the cinematic
case due to technological limitations. That being you can only experience moves if the camera
was moved by the filmmaker yet you yourself are not physically moving. In room-scale gaming
VR very often you are self directing your own moves in the virtual world through your actual
physical actions since real time gaming engines can generate everything on the fly and there is
no disconnect between the physical and the virtual.
Put yourself in a spaceship within a gaming engine however and the possibility for sickness
once again becomes very real as you are now zipping through space with no corresponding
physical forces on you. To counter this, some of the emerging location based VR experiences
are actually building around motion based rides which mimic the moves within the virtual space.
In addition to providing a more thrilling experience the possibility of feeling sick is diminished.

Guidelines to Minimize Sickness in VR


The above might have you scared to ever move the camera. Rest assured it is possible. Over
the course of much trial and error we have narrowed down what constitutes a successful
camera move in VR.
Stable/Level Horizon Lines
Every attempt should be made to keep your horizon lines stable and horizontal. Swaying
horizons recall being on a boat in rough seas and can easily lead to virtual seasickness.
Use of a gyroscopic stabilization system is highly recommended when doing camera moves to
prevent sway. Likewise, mounting a virtual reality camera system to your head is not
recommended as your body movement causes the camera to sway back and forth in tandem
with your steps.
If you do not have a physical stabilization rig on the camera, post stabilization can remove some
of the horizontal sway but there are limits due to the necessity of maintaining the stereo 3D
effect when played back in the HMD. If your horizon line sways too much then when postrotated back into position it can introduce vertical offsets into the 3D which can be very
discomforting for the viewer. The only way around this is to stabilize before you stitch which
complicates workflow as youll need to do this on every camera within the rig.
Likewise, even if your camera is not moving you will want to make sure the camera is leveled to
the horizon. Otherwise you will force a crooked view on your audience or, more likely, they will
rotate their head to level it again introducing painful stereo vertical offsets.
There are many different pieces of software available to perform post stabilization. See the
Post-Production section for more details. See also the section Mounts & Rigging below for
stabilization options during shooting.
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 25 of 68

Minimal Bumps
You also want to make sure that there are minimal bumps or jostling of the camera during your
moves otherwise it will feel as though you are on a mountain bike going down a bumpy hill.
Again youll want a stabilization system that can mitigate these bumps during shooting or use
post stabilization software to remove as much as possible after the fact. However, due to the
nature of spherical video your options in post here are limited as you cant translate in spherical
space to offset any bumps and youll have to live with any motion blur present in the video.
No Pans
You shouldnt pan or yaw the camera on the z-axis. This effectively forces a head turn to the
viewer which is very disconcerting in VR. You should instead allow the viewer to turn their head
naturally within the environment and look where they choose.
If you need the viewer to look at a particular spot so as not to miss a crucial piece of action you
should use lighting, movement, or sound to guide their eye instead. Depending on your
playback engine and resources, there is also the possibility of adding some interactivity to the
piece such that particular pieces of action are only triggered when the view actually looks in that
direction.
See Getting the Viewers Attention under Directing the Action below.
Minimal Acceleration
Finally, you should limit the acceleration present in your camera moves. Fast acceleration and
deceleration definitely can cause motion sickness
Ideally you would want no acceleration instead cutting into the shot with smooth, controlled
motion. If this is not possible then very slow acceleration or deceleration is generally acceptable.
Any rigs you use should be precise enough to allow for this without unneeded rubber banding or
sway. See below for more information.
The Last Word
Of course rules were made to be broken! All of the above should be heeded in most cases but
there are shots where you may be going for a little bit of motion sicknessbeing pushed off the
side of a building perhaps or on a rollercoaster. Here, the side effects of motion in virtual reality
can actually work in your favor. Use sparingly.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 26 of 68

Mounts & Rigging


As mentioned above it is essential to have a proper mount that can help to stabilize the camera.
Excessive motion can lead to severe motion sickness and every attempt should be made to
eliminate or at least reduce it. Not all undesirable motion can be removed in post so finding a
solution during shooting is highly recommended.
The main problem with this however is that any rig you use will ultimately be seen by the
camera. In 2D you can usually frame the shot such that the rig is never seen, not so when
shooting in 360. Every inch of the scene is shot. Thats why some rigs work better than others
but all have some kind of footprint no matter how small and will either need to be covered up by
a logo or graphic or removed in post. Depending on the rig and approach removing it in post can
be painstaking and expensive.

Types of Mounts
Over the course of many projects we have experimented with all of the types of rigs below in an
attempt to create smooth, stable shots. There is no one best solution and all require some form
of rig removal in the end.
Tripod
The simplest and most widely used solution
in VR. Put the camera on sticks and dont
move it. Here you have no possibility of
motion sickness and a fairly compact form.
The tripod will still be seen by the bottom
cameras but by pulling in the legs and
making the footprint as small as possible it
will minimize the ground occlusion. Clean
plates can be taken which can help with the
removal using a DSLR camera. See Clean
Plates below.

By attaching a sling on the tripod underneath the camera it also provides a good place to stash
additional equipment like sound gear or computers. Weve even had camera operators
contorting themselves to hide underneath the tripod in scenes where they would otherwise be
seen!
Though this isnt the most dynamic of solutions it roots the viewer in the scene and lets them
fully explore it without distraction and with no chance of motion sickness. The vast majority of
your shots will likely be of this form with moving shots sprinkled in for extra effectand in VR you
really feel them giving those an added sense of weight.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 27 of 68

Dolly
As in 2D the dolly can create some very smooth,
beautiful moves. Its very easy to mount most any VR
camera on a dolly. However, this is also one of the
bulkier, most visible rigs available. Due to its size and
the fact that you have at least one operator controlling
itand most likely tracksthere will be a huge swath of
the scene that is occluded. It will be virtually impossible
to paint this out on any reasonable budget making the
dolly a not very good solution.
Remote Controlled Slider
The slider is a length of track that comes in various lengths and build qualities with mounts that
can move a camera down the track at various speeds. These can usually be controlled by a
computer to create complex moves that can be repeated.
Sliders vary drastically in quality both in terms
of track gauge and motor speed. Youll want
one sturdy enough to support your camera of
choice with a motor that can drive the camera
at a sufficient speed for the move without a
lot of error. Cheaper rigs can produce
significant recoil when stopping and starting
which is not ideal given motion issues in VR.
Sliders are much smaller than dollys but
generally have a fairly long bit of track that
will obviously be in the shot. There is no easy
way of removing them other than painting
them out. See Clean Plates below.
Motion Control Rig
Similar to the slider above, motion control
rigs use a set of tracks and a motorized rig to
reproduce moves accurately again and
again. They are generally bulkier than a
slider being almost a combination between a
slider and a dolly but with many more
degrees of freedom. They can take several
forms the first large-scale application of
which was by Lucasfilm for Star Wars in
order to film the many passes that were
necessary for the final composites in the
films.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 28 of 68

They can accommodate most any camera but are very bulky and a big portion of the 360
scene will be blocked making paint-out costly. However, if you are needing to composite many
different layers of action with a single camera move in a mixed CG/live-action virtual reality
scene, shooting those layers against green screen using a MoCo rig could be a good option as
the rig could be easily removed in this case.
Cable Cam
Providing that a cable cam rig is sturdy enough to support
your VR camera it is actually a pretty good solution for 360
shooting. The camera is mounted upside down to a mount
on a small sliding cart attached to the cable. The rig is fairly
small and, if outdoors, would be likely replaced with sky and
therefore easily painted out.
Again, youll want to make sure that the motor on the cart is
precise enough to prevent sway when both starting and
stopping otherwise youll have the dreaded swaying horizon
lines and motion sickness. You can use the camera as
counterbalance and also attach gyroscopic stabilizers to
combat this as seen here.
Here the weight of your camera is the major consideration
so make sure your cable can support it or use smaller,
lighter VR camera rigs often of the GoPro variety.
Stedicam
The Stedicam is the workhorse of stabilized solutions for
2D productions. It is a mechanical solution that isolates the
camera from the operators movement. The operator
wears a vest which is attached to an arm that is connected
by a multi axis low friction gimbal to the Stedicam
armature which has the camera on one end and a
counterbalance on the other.
This setup allows for very fluid movements of the operator
without affecting the camera yet easy camera moves with
very little effort when desired. It produces very smooth and
pleasing moves in VR. Unfortunately, it also means you
have an operator extremely close to camera in all of your
shots. When the viewer looks behind them they get an
extreme closeup of the operators face right in theirs. Very
disconcerting to say the least!
It is nearly impossible to remove the operator from the scene in post. We have gone so far as to
attach a GoPro camera to the operators back and attempt to patch that back into the scene in
compositing. Unfortunately, the framing, lens, and differing perspective never allows a decent fit

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 29 of 68

and youre left to blurring it into place. We have also tried blurring out the operator or replacing
him with a black vignette. All far from ideal and they remove the viewer from the experience.
If you are going to use a Stedicam the best option is to place a graphic to cover him. This could
be a static logo or an animated stats card for a player at an NFL game for example. Here the
viewer turns around, sees the graphic, and then can decide whether or not to turn back around
later. Yet they still remain grounded in the experience.
Gyroscopic Stabilizer
With virtual reality skyrocketing and the need for stabilized
movement without occluding the scene many other options
are coming to market. The Mantis 360 from Motion
Impossible is one of the more interesting solutions as it
combines a wheel dampened remote controlled dolly
(buggy) with a gimbal stabilized monopod for the VR
camera.
This allows you to remotely move your camera with smooth,
stabilized motion, no operator in view, and a very small
footprintsmaller than even a tripodallowing for easier
ground replacement or smaller logos needed to cover it up.
Eventually you will be able to plot courses within a set or
room and repeat these as though it were a MoCo rig as
well.
For more information visit Motion Impossible.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 30 of 68

Drones
Drones are a special category of camera rig.
Originally it was thought to be a bad idea
mounting a VR camera to a drone due to the
potential for very bumpy, unstable movement. But
it turns out that if properly stabilized and flown
according to the guidelines for movement above
drones can make for some amazing aerial shots
making it seem as though you are flying.
Typically the camera is mounted inverted to the
drone with some kind of stabilization system as
seen in the picture to the side. You want the
stabilization to occlude as little of the scene as
possible. Obviously the top of the scene (bottom
of the camera if mounted upside down) will see
the drone but this is relatively easily painted out in
post as it is usually just sky.

You may need to reconfigure the drone so


that landing gear are not seen. If your
drone is not equipped with retractable
landing gear you may be able to remove
the legs and land it on a base as seen in
this sawhorse example. This can make for
some tricky landings especially in high
winds so use caution!

As with the cablecam, weight is of primary importance here. The heavier your camera the bigger
the drone required. Bigger drones require a special pilots license and additional restrictions from
the FAA which limits where and how you can fly them and increases the cost of operating them.
For these shots you may need to use a smaller, lighter GoPro based rig as seen in the picture
above with Jaunts older GP16 rig.
Finally, most drones, particularly the bigger ones, drown out any audio. You will likely need to
replace the audio with a separate voice over or music track.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 31 of 68

Clean Plates
All of the rigs above occlude the view of the scene from the VR camera in some capacity.
Usually the ground will be occluded from a tripod or slider or remote dolly and it will be
necessary to paint it back in or cover it up with a logo.
Logos are fine for quick turnarounds or travel or news pieces for example but not ideal for more
cinematic or dramatic content where youll want to paint out any rig that is in the scene.
Depending on the rig it could be relatively easy or very time consuming and expensive.
To aid in any rig removal it is highly recommended that you shoot clean plates of the ground that
your rig is covering so that you can use them to paint it back in. The most common tool for this
is a DSLR still camera with a wide enough lens to cover the area of ground occluded by your rig
from the distance of the bottom of the VR camera rig.
In a pinch, even an iPhone can be used. Depending on the area covered by the rig, you may
need to take multiple overlapping shots with your still camera. Youll also need to ensure that
your feet or shadows do not end up in the shots. A simple still camera rig for clean plates
consisting of sandbagged C-stands can assist in getting you out of your own way.
Obviously, if you have a moving shot, creating clean plates becomes more complicated.
Depending on the distance covered you may need to shoot many overlapping plates. It is
recommended that you overlap your plates by 15-20% to enable aligning and stitching these
plates together in post to create a ground patch that can be used to fully paint out the rig over
the full length of travel.
If the distance is long, like if remotely moving the Mantis rig above, it may be better to attach a
GoPro or other small video camera to the back of the rig and either manually or procedurally
pull still frames from it for ground plane reconstruction in post.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 32 of 68

Water Issues
Working in and around water constantly comes up. Some of the most exciting shots in VR can
actually be underwaterpicture coming nose to nose with a Great White shark or exploring the
Great Barrier Reef. Unfortunately, the complexities of many VR camera rigs makes shooting in
these environments cumbersome or impossible.

Rain
When shooting in 2D with a framed shot, rain is rarely an issue as the camera can be placed
under a waterproof tarp with a hood over the lens or other such mechanism. Not so when
shooting in 360the tarp or hood would be visible in the shot and block much of the scene.
This makes shooting in rain, even a light rain, very difficult. Not only are many rigs not built to
withstand water exposure but even a few drops would ultimately land on one or more lenses
making that portion of the scene blurry and likely unstitchable.
Many a shoot has been called off due to in-climate weather or a chance of rain where its just
not worth potential damage to the camera. Likewise, depending on your camera rig, even snow
might be too risky. In the Jaunt ONE case, a light snow might be doable but anything more and
you risk water exposure and damage. In any event, within a matter of seconds snow would hit
one of the lenses and the shot would be compromised.

Underwater
Obviously shooting underwater is the ultimate test for any
camera rig. For standard 2D video cameras (or even 3D
cameras) there are many underwater housing options
available that allow for full submersion. Its much more
complicated to devise such an enclosure for VR rigs without
them obscuring the scene or compromising the lenses.
Typically these are for the smaller GoPro type rigs. One
such rig is the Kolor Abyss rig.
Unfortunately, most underwater VR rigs are currently mono
only and dont support the somewhat larger 360 stereo 3D
rigs which can limit their impact. Nose to nose with a shark isnt as great if it never really gets
any closer to you. More options need to be available in this space and for a wider variety of rigs.
Even if you do have such a rig available, shooting underwater presents its own challenges. The
sea is very unpredictable and uncontrollable and sea life likely wont respect whatever distance
from camera is required from your rig likely making some shots unstitchable. Its also usually
difficult to operate all the camera modules within the rig and you must pay special attention to
battery life and memory card space as gaining access to these enclosures typically is not easy
and takes a fair amount of time away from shooting.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 33 of 68

Jaunt is currently investigating options for a waterproof enclosure for the Jaunt ONE camera
system as underwater shooting in full 360 3D VR is too ripe with possibilities to miss. Many
companies in the underwater camera housing space are also looking to get into VR and
devising solutions for a variety of rigs.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 34 of 68

Directing the Action


Everything about VR is different and this includes how you direct for it. Weve had over 100+
years or cinema and television to develop a language for shooting in 2D. Most everyone, even
laypeople, understand the concepts of a close-up, an over-the-shoulder shot, or a cutaway.
What scares people most about VR is that much of this no longer applies. We no longer have
the concept of a frame and you can no longer count on people always looking where you need
them to.
It is this lack of a frame and the viewers ability to look around in a realistic 3D environment that
make this a truly unique, new medium with unlimited creative potential. It also changes the way
you must think about production including blocking, lighting, camera operation, even writing.
This was partly true with the advent of 3D films (and their resurgence) but it is exponentially true
in virtual reality.
Those that embrace this new canvas without trying to force their 2D sensibilities onto it will be
the ones that succeed and contribute to the development of the new language of cinematic VR.

Getting out of the Way


One of the most important and practical considerations you must plan for is that, since there is
no longer a frame, there is no behind the camera and everything in the sceneevery inch of it
will be shot. This includes your crew, lighting, vehicles, looky-loos, everything.
This presents a big challenge when you are the director and need to see and hear what your
actors are doing during the take. Likewise, it becomes very difficult to light the scene if you are a
DP and cant have big lighting rigs lying about. Everything must now be hidden organically
within the set or it must be painted out.
This includes your crew. Everyone must vacate the set and hide nearby in a hallway, closet,
behind a piece of furniture, even sometimes under the camera. Its always a game of hide-andseek when youre shooting a VR scene. On a typical Hollywood set this generally doesnt pose
a problem but if you are out in the open it can become quite challenging and you may even
need to have the set designer or location scout construct a place where crew and gear can be
safely stashed out of sight yet close by. A VR savvy production designer goes a long way here.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 35 of 68

HIDING BEHIND A TREE DURING THE NORTH FACE: CLIMB SHOOT

There is another option in certain circumstances. If you have control of the scene and there
arent too many objects that are moving in and out, sunlight and shadows changing, trees
blowing, etc then you might be able to shoot the scene in two halves. Shooting the main action
in 180 with crew and lights on the opposite side and then switching everything over and filming
the second 180. These two sections would then be comped together in post. You would
actually film and stitch both shots fully 360 but only use each half as needed in the comp to
enable a proper blend.
Again, this only works if you have command of the scene. If lighting and shadows change or
something moves then the two halves may not comp together properly and the shot will be
ruined unless you spend more time in post painting things out. This also wont work if you have
a moving camera unless you are using a MoCo rig and the movement for the two halves are
nearly exact. If they arent it will be impossible to blend the two due to parallax differences.
More generally you can use this technique to create clean plates and remove unwanted objects
by filming the scene with and without the objects present in their particular locations. By
continuing to roll until an unwanted object is out of the scene you can comp the clean portion
into the main action thereby removing that object. This is often used for removing crew and
other production gear when piggybacking onto an existing 2D production where control is
limited. Again this only works if you have a static camera.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 36 of 68

Live Preview
So if you cant be near the camera while youre filming how do you see what youre doing? The
simple answer is you need live previewing out of your camera. Seems like a simple feature that
2D cameras have had for years in video village. Of course, being VR, things arent that simple.
The first problem is that there are many camera modules in any given VR rig. In the Jaunt ONE
there are 24 cameras all shooting HD footage. This is a huge amount of data to manage even if
cabled. And remember, having no behind the camera means you would see these cables
traveling to video village. Any solution then must be wireless which is currently impossible with
that many cameras and current bandwidth limitations. As anyone whos worked with WiFi
broadcast from a GoPro knows, there are many inherit limitations with wireless signals not the
least of which is distance and obstructions. If youre hiding behind a steel wall say bye bye to
WiFi.
Even if you were able to wirelessly receive all camera feeds, live stitching algorithms still arent
great and require extra processing horsepower on set. If you cant view a full 360 stitched
preview out of your camera then which camera do you choose to monitor? Your rig must provide
the ability to choose which camera or cameras to look at on the fly or you must use another
solution.
Luckily there are several inexpensive solutions available in a pinch if your rig doesnt support
some form of live preview. Inexpensive consumer cameras such as the aforementioned Ricoh
Theta or Kodak 360 Action Cam can be placed on or near your camera that stream a livestitched 360 mono image to your iPhone or iPad over WiFi that
enables to your move around the scene and see what is happening.
Coupled with standard wireless audio feeds you can safely direct your
actors. Again, these are consumer cameras and WiFi and image
quality are not the best so be aware of the limitations.
Another higher quality solution is the Teradeck Sphere. This allows
you to connect up to 4 HDMI cameras (such as GoPros) to its hub
and wirelessly transmit and stitch the 360 image directly on your
iPhone or iPad. This solution is small enough to mount underneath
your VR rig and provides a high quality mono stitched imaged with
which you can move around in.
The Jaunt ONE camera was designed to record directly onto SD
cards with no cables necessary making it largely autonomous. Simply
press the button to record and walk away. However, for the reasons
above, a wireless live preview capability will be added to the camera in
a future firmware update.

Version 1.5, January 2017

2016 Jaunt, Inc.

TERADEK SPHERE

Page 37 of 68

Blocking & Framing


Blocking the action and framing your shots takes on a whole new meaning when there is no
longer a frame and you can have action all around the camera in 360. New language and
techniques need to be developed to take advantage of the creative possibilities this affords.
Below we highlight some of the issues youll encounter.

FOV & VRs Answer to the 3D Gimmick


The single biggest mistake people make when starting to shoot VR is feeling the need to have
action occurring all around the camera at all times. This is very similar to the days in 3D
filmmaking when it wasnt considered good 3D unless something was poking you in the eye.
This became very gimmicky very quickly. Good 3D should immerse and connect you
emotionally to the scene in ways that a 2D film cannot. Use of extreme negative space (that
outside the screen towards the audience) should be limited to infrequent moments that are
organic to the film.
Similarly with VR you do not need to have action occurring at
all times around the camera. In fact, it becomes very
fatiguing to the audience constantly having to contort
themselves around as if watching a tennis match. They will
tire very quickly and also of your content. In real life we are
not constantly thrashing our heads around looking in all
directions. Its best to keep the action within 150 in front of
us and save the looking behind us for moments that are
organic to the piecea car crash or a lurking monster.
Where does 150 come from? The view in most HMDs is
roughly 90. You can comfortably turn your head roughly 30
in each direction which gives you a total field of view when
moving your head of 150. You should endeavor to keep the
main action within these limits.

Close-ups, Over-the-shoulder shots, & Other 2D Remnants


Another common mistake seen with filmmakers new to VR is trying to force their 2D sensibilities
and film language into what is a truly new medium. How do I do a closeup or an over the
shoulder shot? How do I zoom the camera? How best do I shoot coverage? Does cutting
work? All good questions. As Yoda once said, You must unlearn what you have learned. While
it is of course natural to build on the skills that you already know as a filmmaker it is also
important to embrace this for the new canvas that it is and experiment with new techniques and
new ways of telling stories unencumbered from the past.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 38 of 68

Close-ups & Distance


Take the closeup for example. In 2D filming the closeup isnt really any closer at all, its just
bigger. In VR, even though you cant zoom most cameras due to their nature, because you are
(hopefully) shooting in 3D you can do a closeup by simply having the actor approach the
camera! They will not only get bigger but will also actually get closer to the viewer when viewed
in an HMD. This not only focuses the viewers attention on that subject since they are filling
most of the view (if they are looking in that direction!) but also instills a sense or emotional
connection to the subject.
Taken further you can use this to great effect for grabbing the viewers attention which is
sometimes difficult to do given they could be looking anywhere (see below). Given several
million years of evolution human beings have become astutely attuned to focusing on things that
are close to them, especially if they are moving. Otherwise the Woolly Mammoth, the tiger, the
lion all would have killed us off centuries ago. Therefore you can snap the viewers attention to
objects simply by having them jump to camera, even if just in the viewers peripheral vision.
The opposite of this is also true. To marginalize or reduce the
importance or objects you can place them off in the distant
background. Here big changes in movement are reduced due
to parallax and are less likely to draw the viewers attention. It
will also make your viewer feel less connected to that
subjectgood for the cowboy riding off into the sunset for
instance.
So what are the minimum and maximum distances you
should strive for? The minimum distance Oculus
recommends is .75 meters or about 2.5 feet before the
viewer starts going cross-eyed. On the other end of the
spectrum beyond 30 feet depth perception begins to fall off
and after 60 feet there is virtually no perceived parallax.
This gives depths of between roughly 2.5 feet to 30 feet in
which to place critical content that you want to focus the viewers attention on. Bear in mind that
many VR cameras have greater minimum depth limits than 2.5 feet with most doubling that to 5
feet in order to stitch properly. See above under Distance to Subject for more.
Coverage & Cutting
Many people think that you cant cut in VR or that it will be too jarring and this can be true
depending on what you are cutting from and to. Sitting in a chair and then cutting to dangling
over the side of a cliff will obviously be very off-putting. Though maybe thats the desired effect.
You can absolutely shoot coverage in VR from different camera positions and cut between
them. Just remember that your viewer can be looking anywhere when you cut so it may not

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 39 of 68

have the desired effect. They might be looking off to the side so when you cut theyre looking at
a chair and not your actor. See below for ways on Getting the Viewers Attention.
Most important are the pace of the cuts. Every time you cut its like youre teleporting to a
different location and that can be very jarring especially if the pace is too quick so youll want to
slow this way down. The viewer needs a good amount of time within a new position to fully
immerse themselves, look around, and get their bearings. Cut too quickly and your viewer will
be frantically looking about trying to figure out what is going on and what to look at all while you
are tiring them out.
Though hard cuts can work and are effective for feeling abrupt changes, in general it is much
gentler and more effective to blink. This is where the scene dims to black and then back up to
the cut scene over the course of about a second or more. Its very much like blinking in actual
life and opening your eyes in a new location. Surprisingly effective. Spherical reveals or wipes in
360 are also a great way to gently unroll the next scene.
New methods of coverage need to be developed for VR that get around some of the limitations
and are more suitable to the medium. For instance, instead of cutting to a close-up from a more
distant wide shot, stay in the distant shot and overlay a 2D or 3D inset close-up shot of the
main subject. This not only keeps you from feeling teleported but also provides visual interest in
the scene by introducing other elements overlaid at different depths. You can even project these
in post on different objects in the sceneto make a video wall for example as we did with a shot
including Ryan Seacrest for Tastemades A Perfect Day in LA.

TASTEMADES A PERFECT DAY IN LA WITH RYAN SEACREST

Ultimately, many more of these types on innovations will be needed to evolve our language of
storytelling in VR.

Getting the Viewers Attention


One of the single biggest anxieties in VR filmmaking is how to focus the viewers attention on
what you want them to in the scene at any given moment. In traditional filmmaking you have
frame composition, lighting, and depth of field to guide the viewers eye to where you need it.
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 40 of 68

Not so in VR. The viewer is free to look around in any direction they like and there is no frame.
Which means its entirely possible they may miss an important story point.
So how do you get the viewers attention and keep it? Luckily, many of the same tricks from film
still applymotion, light, and sound.
Motion
As mentioned above humans are finely attuned to motion and will generally gravitate to anything
moving in the scene. Have a butterfly flit in and around and most viewers will typically follow it.
Set this up right with enough preroll for them to see it and you can guide them precisely to what
you want them to see. This is doubly true if you couple movement with the stereoscopic depth
you have at your disposal. Have the butterfly also fly towards the viewer and you are
guaranteed to grab their interest.
Light
Light is also a motivating factor. Just as in a 2D frame light can can draw attention to objects or
subjects. As viewers look about the scene a ray of light highlighting something is a subjective
clue that they should pay attention to that object. Similarly, dappled light can highlight and
heighten different depth cues along with actual stereoscopic depth.

DAPPLED FOREST LIGHT MICHAELECAINE

Sound
Sound is an incredibly important component of any piece of content but exponentially so in VR
because of what it provides in capturing the viewers attention. Many VR platforms, including the
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 41 of 68

Jaunt app, are capable of playing back spatial 3D audio in the ambisonic or even Dolby Atmos
formats. These sound formats record and emanate sound from where they actually occurred in
the scene. This gives you an extraordinary opportunity to use sound in directing the viewers
interest. Place a car crash behind them with corresponding sound and they are guaranteed to
look. See below in the Sound section for more information.
Interactive
Depending on how your content is being distributed you may have some interactive capabilities
at your disposal. If so, this is another great wayperhaps the best wayto make sure your
viewers are looking where you want.
Most platforms or development environments use gaze detection to know exactly at what
portion of the 360 scene the viewer is looking at during any given moment. If you can harness
that information interactively you can do some very cool things. For one, if someone isnt looking
at what you need them to for an important story point you can pause of loop the scene until they
do and then trigger the scene to continue.
Also, as noted above, viewers may be looking somewhere else in the scene so that when
cutting to a new shot they arent at all focused on what you need them to be. Using gaze
detection you can cut to the new shot and change its yaw value (the rotation of the 360 sphere)
to match the object of focus in that scene to the direction in which the viewer is looking. Magic.
There are many more ways in which interactivity can skirt tricky issues in VR and engage
viewers more fully. See below in the Interactivity section for more information.
None of the Above
Finally, maybe you should just let go and let the viewers look wherever it is they please! In this
new medium it might be best to relinquish such strict control and let the audience have their own
experience. Secondary actions in scenes can give the viewers more to look at and can enhance
the narrative. This can lead to personalized experiences and repeat viewings. Tricky stuff to be
sure and it needs to be planned in from the script writing phase but this is the future of
storytelling in VR.

Rig Height & Identity


Camera height has always been an important part of the emotional composition of scenes and,
as with many topics, its even more true in VR. Due to the immersive nature of VR and feeling
like youre actually in the scene the height of the camera plays an incredibly important part in
terms of shaping your identity.
Generally the camera is placed at average human height of around 5-10. This makes the
average viewer feel as if they are standing within the scene. Place the camera lower, say
around 4, and it will feel as though you are a child. Place the camera higher and you will start to
feel like a giant. Higher still and it will feel like you are floating or looking God-like down on the
scene. The first few times a viewer experiences this perspective it can be very disconcerting as
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 42 of 68

it feels as though they they are going to fall as seen in this picture from Jaunts North Face:
Climb.

EQUIRECTANGULAR IMAGE OF DRONE SHOT FROM THE NORTH FACE: CLIMB

DOWNWARD VIEW FROM ABOVE SHOT AS SEEN THROUGH HMD

Placing the camera on the ground is also very unnatural and feels as though you are embedded
or trapped within the floor leading to a very claustrophobic feeling for some! While unnatural
feeling this can be used to great effect in the right circumstances as Doug Liman did in Jaunts
first thriller series, Invisible. Here, after the invisible killers first kill, the viewer feels a sense of
hopelessness and claustrophobia as they look eye level into the dead victims face.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 43 of 68

INVISIBLE SHOOT WITH DOUG LIMAN

In general, for most circumstances, its best to place the camera at the height of an average
human and move up or down from there remaining conscience of how this affects the viewers
perception and their identity. There are of course many situations where you will want to play
with the viewers emotions, making them feel small or powerful, and camera height is a great
way to achieve this.
Another very important part of identity that should be mentioned is your bodyor in this case
lack of it. In any of these setups it can be very disconcerting for the viewer to look down and not
see the rest of their torso, arms, and legs. It can make the viewer feel like a disembodied ghost.
Here again interactivity and more advanced body tracking and sensors will enable us to overlay
a CG avatar of the viewers body that gives them a heightened sense of presence. Already hand
controllers with the HTC Vive enables this in realtime gaming engines and its only a matter of
time where this becomes the standard in cinematic VR as well. In the meantime be aware that
the lack of it can reduce immersion.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 44 of 68

POV & Eye Contact


Following on from camera height and because of the immersive nature of VR, every shot really
needs to be thought of as a point-of-view shot. If you want to know what its going to feel like in
the shot simply put your head at the location of the camera and look around! The shot when
experienced in the headset will feel very much to the viewer as if they are actually standing in
that spot on location.
Because of the realism VR affords the viewer will feel very much a part of the scene from a 3rd
person perspective watching on as events unfold. But have one of the actors break the fourth
wall and look into the camera and the viewer will instantly feel as though they are drawn into the
story and now participating from a 1st person perspective. Eye contact in VR is even more
powerful than in 2D when drawing the viewer in. Doubly so if the actor engages the viewer or
talks directly to them.
Because of the heightened realism and with no sense of a frame separating the viewer from the
fictional world the viewer can often feel compelled to respond. This of course is usually not
possible unless there is some form of interactivity available in the delivery platform. Simple
responses can be supplied using gaze detection, hotspots, and Yes/No type answer overlays
but this is perhaps too simplistic and serves to pull the viewer out of their immersive state.
Ultimately, pairing advanced speech recognition with an AI that can understand a variety of
responses similar to Siri or Cortana (or using one or both of those) will go a long way towards
making this type of action more realistic.
All of these types of interactions will need to be scripted of course and this has been what the
gaming community has struggled with for yearstelling a compelling, tight narrative where the
viewer still has some control over it. With the immersive nature of VR it will become critical to
provide viewers with some sense of agency and control over their environment and so this will
become necessary and commonplace. A new type of storytelling will develop that integrates
cinema, gaming, and interactive theatre.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 45 of 68

Lighting & Exposure


Lighting is a critical part of any cinematography process and VR is no different. Because we are
shooting in a full 360 however there are some additional challenges involving contrast, flares,
and lighting rigs that you need to be aware of.

Extreme Contrast
Unlike in traditional filmmaking where you are generally only exposing for one section of the
environment in frame, with 360 filming you need to account for the entire environment. In many
cases, especially outside, that means you might have a very bright sunlit side and a darker
shadow side.

HIGH CONTRAST SCENE FROM THE NORTH FACE: CLIMB

Generally, your camera rig should allow for individual exposures for the cameras that make up
the rig. Most often they would be set to auto exposure to adequately expose the scene through
each camera. The stitching algorithm will then blend these exposures to make for a seamless
stitch. You may want to lock a particular cameras exposure settings for a certain effect or to
keep the camera from cycling up and downin the presence of strobe lights for example. Just
beware that some cameras may blow out in the highlight areas or become underexposed with
lots of noise and not enough detail in the shadow areas.
The Jaunt ONE camera with Jaunt ONE Controller software is able to globally or individually
control each cameras ISO or shutter speed to create a proper exposure around the camera
while in manual mode. Full automatic mode with control over Exposure Value (EV) bias is also

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 46 of 68

available for each camera to auto expose the scene and is the recommended mode for general
use.

JAUNT ONE CONTROLLER EXPOSURE SETTINGS

For more information please consult the Jaunt ONE Controller user guide.

Flares
Flares are caused when sunlight hits a lens and scatters causing circles, streaks, haze, or a
combination of these to form across the image. Generally these are undesirable though certain
filmmakers go to great lengths to include them. Just ask J.J. Abrams! In stereoscopic
cinematography they can create stereo artifacts between the left and right eyes as the lens
refraction is different between adjacent cameras due to varying angles incident to the light
source and should generally be avoided.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 47 of 68

DIFFERING FLARE ARTIFACTS BETWEEN LEFT/RIGHT EYES

Of course, in VR things are more difficult. With a traditional camera you can usually eliminate
flares just by blocking out the source of the light thats causing the flare near to the lens.
Generally this means mounting a flag or lens hooda piece of dark cardboard or fabricon or
near the camera housing or on a C-stand to block the light. This is possible because you can
usually mount the flag outside of the frame so it is not scene by the lens. With a 360 VR
camera there is no frame of course and anything mounted around the camera will be seen and
recorded. Thus, to eliminate flares in VR you need to get creative.
If possible, you should first try to position or rotate the camera in such a way that can minimize
any flares. Hopefully your camera system is capable of doing quick 360 previews. With the
Jaunt ONE, you can use the Jaunt ONE Controller software to shoot a quick 360 preview still
frame or short video sequence to check for flares. It should be immediately obvious if any of the
lenses are getting hit.
For more information see the Jaunt ONE Controller guide.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 48 of 68

JAUNT ONE CONTROLLER PREVIEW PANE

If you cant rotate or position the camera to eliminate or minimize the flares in the lens then you
should try to move it behind something within the scene that can block the light and act as an
organic flag. A tree, a rock, a vehicle, a building, or even another person can go a long way
towards blocking the light none of which would need to be removed later.
If you cant move or reposition the camera then you may be forced to just live with the flares if
they arent too distracting or cause discomfort in stereo due to left/right eye rivalry. Your final
option if they are too distracting is to remove them in post by painting them out. In VR, you have
many adjacent cameras available to you which may not have been hit by flares which you can
use as a clean plates to clone into one or both eyes after stitching.

Rigging
Much like flares, rigging lights becomes a bigger
issue in VR due to the lack of a frame and there
being no behind the camera. Big, expensive lighting
rigs may become a thing of the past simply because
youll always need to hide them. DPs of the future
will need to become much more adept at hiding their
lighting and making it more organic to the scene or
taking better advantage of natural light.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 49 of 68

One very useful tool in the VR DPs toolkit are LED strip lights. Many manufacturers are coming
out with these in a variety or formats and configurations and they can easily be placed on or
around the camera housing or tripod while not being seen in the shot. These can provide
enough illumination to fill the scene or subject with light from the camera.

NINJA LED STRIP LIGHTING W/ DMX CONTROL

Additional banks of these can be hidden within the scene to provide extra lighting such as key
lights, rim lights, hair lights, etc. Many of these are fully dimmable and color temperature
selectablesome fully changeable to any of millions of colors. Some are capable of local
dimming where you can individually control each LED on the strip for various lighting effects.
Some, like the Ninja lights above, are even capable of being remote controlled via WiFi so you
have a full DMX controller on your iPhone or iPad to control each of the individual strips that
make up your scene lighting.
If these solutions dont end up providing enough light, you can always bring in traditional set
lighting and try to remove it in post. If you have full control over the environment this is most
easily done by shooting in 180 halves and then compositing these together in post. You would
first shoot the action in the front 180 with the lighting behind the camera in the other 180.
You would then switch this and shoot the action or scenery in the other 180 while lighting in the
reverse. This only works if you can control the environment to enable you to stitch these two
halves together later. If you are outside and something moves between the two halves or a car
disappears or the lighting changes on you this wont work. Youll also need to be able to
separate the action and lighting into discrete halves or it will become more difficult and
expensive to blend the two in post.
Ultimately, the lighting communityboth on the creative side and the hardware sideis going to
need to get a lot more resourceful in terms of how they approach VR in order to simplify
shooting and reduce post costs while still creating beautifully lit scenes.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 50 of 68

Spatial Audio for Cinematic VR


In traditional media formats such as stereo music and 5.1 surround, sound is produced relative
to a singular point of view. In traditional cinema, sound mixes are meant to be heard while
looking forward at the screen. VR does away with this constraint, allowing the user to turn their
head in any direction. Therefore, it is necessary for sound to be transmitted as a complete 360
scene and rendered appropriately for the listeners point of view, in real time. This chapter
covers the formats and production techniques that can be used to create audio for cinematic
VR. But first, we will provide an overview of the broader landscape of 3D audio as it relates to
VR.

Binaural Audio Basics: How We Hear


Binaural audio refers to a kind of audio signal which contains spatial cues that our brains use to
localize sound. First, it is necessary to form an understanding of how we hear.
When we hear sounds in our environment, our brains are given signals provided by our ears,
which have a unique shape and spacing for each person. The process by which our brains
interpret sonic variations between our ears, allowing us to place the origin of a sound relative to
our bodies, is called localization. The auditory system is very complex, but localization cues can
be summarized as three independent components:
1. Interaural time delay (ITD): The difference in time of arrival of a sound to each ear.
2. Interaural level difference (ILD): The difference in volume of a sound between each ear.
3. Pinna filter: The characteristic effect caused by the shape of the outer ear (the pinna).
ITD and ILD work in conjunction to provide left/right localization. If
a sound is coming from your left, for example, it will arrive at your
left ear first and will be slightly louder in your left ear. On the other
hand, if a sound is coming from directly in front of you, there is no
time or level difference. In this case, the pinna filter provides the
cues required to determine whether a sound is in front of you or
behind you. The three components of localization cues work in
conjunction to provide your brain with the information it needs to
determine where a sound originates relative to your body.

Binaural recording
One application of binaural audio is binaural recording, in which a pair of microphones are
placed into the ear canals of a person or a dummy head in order to record live sound with the
physiological aspects of the head, torso, and ears, affecting the left and right channel
accordingly. The resulting recording is intended to be played back over headphones, providing
not only an accurate stereo image, but the illusion of being present in the sonic environment. In
the real world, however, we are able to turn our head from side-to-side, changing the relative
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 51 of 68

positions of objects in the scene. Head mounted displays provide this capability, so binaural
recording is not a technique that can be used for capturing audio for VR.

HRTFs
In order to provide the spatial cues of binaural audio and allow for head movement, VR audio
requires the use of head-related transfer functions, or HRTFs. An HRTF is a series of audio
impulse responses measured using a binaural recording apparatus, with each IR captured at a
different position relative to the head. When placing a sound into a VR scene, the following
attributes are considered:
1. The objects position within the scene
2. The users position within the scene
3. The direction the user is facing
A fully-featured object sound rendering system will utilize this information in selecting the
appropriate directional IR from the HRTF and additional processing to create distance cues.
Furthermore, the size, shape, and material composition of the virtual environment can be
modeled to create an acoustic simulation for an even more convincing immersive experience. In
general, such advanced features are available only in game engines in which the experience is
generated in real-time. As will be discussed later, 360 video players generally employ a subset
of these features for the sake of being able to take advantage of bitstream formats such as AAC
audio, avoiding the need for large downloads.

Caveat: personalization
It should be noted each of us has a very unique physiology when it comes to our ear shape,
head size, and other physical attributes. Our brains are tuned to understand the signals coming
from our own ears, not the ears of another person or those of a dummy head. Therefore, the
effectiveness of a binaural recording or binaural audio achieved with HRTFs is reduced the
more the recording apparatus differs from our own body. Several approaches exist for
generalizing binaural audio as a one-size-fits-all solution, but the results vary from person to
person. The only way to fully achieve the illusion of auditory presence is by using personalized
HRTFs, but this is impractical for any kind of widespread adoption. The good news is that the
visual component of VR makes up for the inaccuracy of generalized HRTFs to a certain extent.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 52 of 68

Spatial audio formats for cinematic VR


There are many toolsets and formats for delivering object-based VR audio scenes. This section
covers only those formats which apply to cinematic VR content. The criteria for cinematic VR
audio are as follows:
1.
2.
3.
4.

The format represents the full 360 scene, including height information
The format can be streamed over a network
The scene can be rotated using data from a head-mounted display
The scene can be rendered for headphone playback

The rest of this chapter will discuss the following formats which fulfill the above criteria:
Ambisonic B-format via compressed PCM audio (e.g. AAC)
Dolby ATMOS via Dolby Digital Plus E-AC3
Another format is the Facebook 360 Spatial Workstation (formerly TwoBigEars 3DCeption),
which defines a format for cinematic VR which is not yet able to be streamed over a network. Its
toolset, however, allows for B-format conversion and can be used for the purpose of cinematic
VR production.
Also worth mentioning is so-called quad binaural, which is produced by the 3DIO Omni
microphone. This format provides four directions worth of stationary binaural recordings which
can be cross-faded with head tracking data.

Ambisonic B-format in depth


Ambisonics overview
Ambisonics describes a multi-channel audio signal that encodes a spherical sound field. In
ambisonics, the degree of spatial resolution that can be achieved with B-format is the ambisonic
order, which is a positive integer number. A brief description of common ambisonic orders
follows:
First order ambisonics (FOA) contains four channels. This is the minimum order required to
represent a full sound field. Note that commonly available ambisonic microphones capture
FOA in so-called A-format. Any order above first order is referred to as higher order
ambisonics, or HOA.
Second order contains nine channels and is not commonly used.
Third order (TOA) contains sixteen channels. TOA is considered today to be the optimal
format for ambisonics. Though FOA is the most common today, it is possible that TOA will
become the dominant format for ambisonics in the future, and some software systems have
been developed to scale up to TOA where available. Note that the first four channels of any
HOA sigal are themselves a complete FOA signal. So if you produce TOA, you can easily
deliver FOA if that is all that the playback software can handle.
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 53 of 68

Increasing order above TOA is possible, but for practical purposes is rather uncommon
(fourth order requires 25 channels, etc.).

B-format explained
Ambisonics is typically encoded in B-format, which represents a collection of spherical
harmonics. Since the mathematics involved in understanding ambisonics is beyond the scope of
this guide, we prefer to use an analogy to a well-understood concept in sound reproduction:
B-format is to surround sound as mid/side is to left/right stereo.
In other words, similarly to how a mid/side recording can be converted to left/right stereo, Bformat can be converted to a surround sound speaker setup. In fact, FOA is itself an extension
of mid/side stereo.
From Wikipedia: Ambisonics can be understood as a three-dimensional extension of M/S (mid/
side) stereo, adding additional difference channels for height and depth. The resulting signal set
is called B-format. Its component channels are labelled W for the sound pressure (the M in M/
S), X for the front-minus-back sound pressure gradient, Y for left-minus-right (the S in M/S) and
Z for up-minus-down.
The W signal corresponds to an omnidirectional microphone, whereas XYZ are the components
that would be picked up by figure-of-eight capsules oriented along the three spatial axes.
HOA involves more complex pressure gradient patterns than figure-of-eight microphones, but
the analogy holds.
It is important to understand that B-format is not a channel-based surround sound format such
as 5.1. Surround sound formats deliver audio channels intended to be played over speakers at
specific positions. B-format, on the other hand, represents the full sphere and can be decoded
to extract arbitrary directional components. This is sometimes useful in playing back ambisonics
over a speaker array, but for the purposes of VR, ambisonics to binaural conversion is
performed in the player application, such as the Jaunt VR app.

B-format representations
There are two predominant channel layouts for B-format. The most common is called FurseMalham (FuMa), which orders the four first-order channels W, X, Y, Z. HOA representations
using FuMa involve the use of a lookup table to map spherical harmonics to channel numbers.
The second representation is called Ambix, which has recently been established as a format
unto itself. Ambix orders the first-order components W, Y, Z, X. Ambixs channel ordering is
based on an equation that maps spherical harmonics to channel indices, and scales up to
arbitrarily higher orders without the need for a lookup table. For this reason, Ambix is the
preferred representation for HOA and has been adopted by Google and others as the de-facto
standard transmission format for ambisonics. Unfortunately, most production tools available
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 54 of 68

today were built with FuMa in mind, so conversions are necessary when interoperating with
Ambix tools and publishing to YouTube. At the time of writing, Jaunt VR continues to use FuMa.

B-format playback
Unlike stereo or surround formats, B-format can not simply be played back by mapping its
channels to speakers. B-format signals must be decoded to directional components, which can
be mapped to speakers or headphones. The most common method of B-format decoding
employs an algorithm to extract a virtual microphone signal with a specific polar pattern and
yaw/pitch orientation. For example, an FOA signal can be used to synthesize a cardioid
microphone pointing in any direction. HOA signals allow for beamforming narrower polar
patterns for more spatially precise decoding. In general, the higher the order, the narrower the
virtual mics, and the more speakers can be used to reproduce the sound field.
In VR playback applications involving FOA, a cube-shaped decoder is often used, employing
eight virtual microphones. To achieve binaural localization, each virtual mic is processed
through an HRTF of the corresponding direction. The resulting processed signals are then
summed for the left ear HRTFs and right ear HRTFs. Prior to binaural rendering, the soundfield
is rotated using head tracking data from a head mounted display.

Recording B-format
There are microphones which are designed specifically for the purpose of capturing first order
B-format, including:

Core Sound Tetramic


TSL Soundfield SPS-200
Sennheiser Ambeo
Brahma Microphone

FOUR SPATIAL AUDIO SYSTEMS. TOP: ZOOM H2N, LEFT:


SENNHEISER AMBEO, RIGHT CORESOUND TETRAMIC,
BOTTOM: EIGENMIKE EM32

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 55 of 68

All of these microphones capture in A-format which is a tetrahedral configuration of four


cardioid microphone capsules. A-format audio must be converted to B-format for further mixing
and playback. Generic recording devices can be used to record with these mics, but each input
channel must be carefully level-matched for the A-to-B format conversion to work properly. Each
microphone vendor provides tools for performing these conversions. Jaunts Media Manager
software performs the conversion automatically, provided the audio was recorded with a
supported recording unit such as the TASCAM DR-680 or DR-701d.
The Zoom H2N Handy Recorder can also be used to capture B-format. Its spatial audio mode
utilizes its internal microphones to create an Ambix-formatted four channel recording. It should
be noted that these recordings lack any height information (the Z-channel is silent).
For recording HOA, the only commercially available product is the MH Acoustics EM-32
Eigenmike. This is a 32 capsule microphone array which is typically used in automotive and
industrial use-cases in which precise, highly directional beamforming is required. An AudioUnit
plugin is available to convert the 32 channel signal into TOA B-format.

Mixing B-format
When mixing for cinematic VR, traditional DAW workflows are preferred. Since 360 video is a
linear format just like any other video, it makes sense to use tools and workflows that are
already well established in the film and video production industry. Also, since 360 video is
linear, it can be streamed to the end-users device without the need for downloading the entire
scene. This means we can deliver our spatial audio mixes as traditional PCM audio and
package it within MOV and MP4 containers along with h.264 video. This applies to Dolby Atmos
as well as ambisonic B-format.
This section covers some common activities you will encounter when mixing B-format:
1.
2.
3.
4.
5.

Create an ambisonic mix using monophonic sound sources (e,g. lavaliers, sound effects)
Edit an ambisonic field recording
Adapt a surround sound recording to B-format
Match audio sources with 360 video
Combine the B-format audio mix with 360 video

In our walkthrough of these steps, we assume the use of the Reaper DAW from Cockos Inc.
Reaper is currently the best DAW for working with Ambisonics because it allows for tracks with
arbitrary channel counts. Other DAWs only support well-known surround sound channel
arrangements such as 5.1, while Reaper allows for the four and sixteen channel tracks required
to create FOA and TOA.
In addition to Reaper, you will need a suite of VST plug-ins that support B-format processing. A
number of high quality plug-in suites exist, but we recommend the following options:
For TOA or FOA: Blue Ripple Sound TOA Core VST (Free)
For FOA: ATK Community Ambisonic Toolkit
For HOA or FOA: Ambix Plugin Suite from Matthias Kronlachner
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 56 of 68

The Blue Ripple Sound plugins will cover all of your needs up to third order processing
(including converting to/from Ambix), and other packages are available for more advanced
functionality. The ATK plugin suite is designed for Reaper only and is quite simple and userfriendly, but the range of functions are limited and you can only mix FOA. The Ambix Plugin
Suite has limited functionality, but can be used to assemble very high order ambisonics (up to
seventh order, with 64 channels per track). The Ambix converter plugin is particularly helpful
when going between different ambisonic representations. This guide assumes you are
producing FOA (four channels), but you can easily switch to TOA if using the Blue Ripple TOA
VST plugins.

1. Create a mix from monophonic sources

MONO TRACK SPATIALIZED TO AMBISONICS, VISUALIZER & BINAURAL DECODER SHOWN

Assuming you have some sound recordings from the set of a video shoot (lavs, booms, etc.),
the stems of a multitrack board feed from a live concert, or some sound effects to add to your
mix, you can insert these into your Reaper project using an ambisonic panner plugin. Panners
take a 1-channel input and provide control over yaw and pitch. Set your desired yaw and pitch
to match the objects position within the scene. The output of the panner will be a 4-channel Bformat soundfield. Repeat this process for as many sources as you like.
In order to audition the mix over speakers or, ideally, in headphones, you will need a decoder
plugin inserted on the master bus. Make sure that the source track is a 4-channel track, and the
master is also 4-channels. The binaural decoder plugin takes a 4-channel B-format input and
outputs 2-channel binaural. Route the master bus to your stereo audio device and you can
audition over headphones.
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 57 of 68

To simulate the effect of head rotation in VR, you can insert an ambisonic rotation plugin in the
chain before the binaural decoder. The rotation plugin will give you control over yaw, pitch, and
roll. It takes a 4-channel B-format input and produces a 4-channel B-format output.
Finally, it can be very helpful to see the soundfield using an ambisonic visualizer plugin. This is
especially useful in debugging your channel routing configuration. Insert a visualizer plugin in
the chain after the rotation plugin and before the binaural decoder. Play a track with a single
panned mono source, and tweak the yaw parameter. You should see the visualizer heatmap
display move from side to side. Tweak the yaw parameter of the rotation plugin, and you should
see the same behavior. The Jaunt Player application also provides a heatmap function which
can be useful in visualizing your mix overlaid onto your video.
Become intimately familiar with panning, rotation, visualization, and decoder plugins. They will
serve as the bread and butter of your ambisonic production work.

2. Edit an ambisonic field recording


Assuming you have recorded some A-format ambisonics with a mic such as the Core Sound
Tetramic, you will need an A-to-B format converter. If using the Tetramic or Brahma mic,
VVEncode from VVAudio is the recommended option, because it provides the ability to perform
calibration using data provided from the manufacturer. The ATK suite also provides an A-to-B
format converter. The input of the converter is 4-channel A-format and the output is 4-channel Bformat.
If your recording is a 4-channel PCM file, you can add it to a 4-channel track and insert the A-toB format converter.If your recording is four mono PCM files, you will need to merge them to a
single track. In Reaper this is straightforward using a 4-channel folder track with four 1-channel
stems nested within. You will need to configure the channel routings to map each A-format
signal to the appropriate channel, in the correct order.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 58 of 68

CONVERTING A-FORMAT TRACKS TO B-FORMAT

When starting with A-format, it is probably easiest to convert to B-format as a separate task, and
work with the B-format audio file in your session. When you do this, be sure to disable any
decoder plugins that might be running on your master bus.
Editing a 4-channel B-format audio file is just like editing any other audio track. You can apply
volume changes, perform cuts, mix multiple sources, etc. Additionally, you can rotate the
recording to fix any camera/microphone alignment issues, or use virtual microphones to extract
specific directional components from the recording. The Blue Ripple TOA Manipulators VST
suite proves an abundance of tools for performing more advanced operations on your B-format
field recordings.

3. Adapt a surround sound recording to B-format


It is sometimes helpful to include audio beds that have been produced in traditional surround
sound formats such as 5.1. You might also wish to convert a complete 5.1 mix to B-format for
distribution in 360 players such as the Jaunt VR app. To do this easily, you will need the TOA
Upmixers VST suite from Blue Ripple Sound, which provides a drop-in solution for this type of
conversion. Assuming 5.1 audio, put the file on a 6-channel track and insert the 5.1-to-B
upmixer VST. The output will be a 4-channel B-format signal that can be rotated, visualized,
decoded, and exported just like any other B-format track.
If you would prefer not to pay for the upmixer plugin, you can achieve the same result by routing
each of the 5.1 mixs five speaker channels to a panner track as described in (1). The yaw of
each panner must be set according to the surround sound standard in which your mix is stored.
For ITU 5.1 (L,R,C,Lfe,Ls,Rs), that means:

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 59 of 68

Channel

ITU 5.1 Designation

Panner yaw value

Left

30

Right

-30

Center

LFE

n/a

Left Surround

110

Right Surround

-110

4. Match audio sources to 360 video


When mixing for 360 video, you will most likely need to be able to view the video file in your
DAW session. You can do this using the DAWs video player with an equirectangular render of
the video. This should be sufficient for syncing temporal events in your timeline. Keep in mind
that you will also have to map the XY position of an actor in the picture to the yaw and pitch
values of your panning plugins. This can become rather difficult, especially as actors move to
the rear of the panorama and wrap around from one side to the other. To resolve this, it is
preferable to view the video in a 360 video player such as the Jaunt Player. Ideally, you will be
able to synchronize your DAW to the 360 player using MTC or LTC. If you rotate the image of
the 360 player, or utilize a head mounted display, you will additionally need to map the headset
rotation to an ambisonic rotation plugin. You can do this with the Jaunt Player by configuring it to
broadcast the headset yaw and pitch using Open Sound Control, and use Reapers Control
Surfaces menu to map the incoming messages to your rotation plugin.
If the setup of synchronization and head tracking proves onerous, there are some alternative
approaches to this procedure. One option is to overlay a mapping of equirectangular
coordinates to polar coordinates onto the video image. If working with the Jaunt toolset, an
overlay of this type can be provided by Jaunt Studios. Also, the Jaunt Player provides a headsup-display which indicates the yaw and pitch of the current direction.
It should be noted that automation of moving audio sources in 360 video has not been solved
by any VR audio production toolset to date. You must carefully automate the panning plugins
parameters in order to automate moving sources. For these instances, working with the
equirectangular overlay has proven to be a good compromise.

5. Combine the mix with 360 video


When you are ready to preview your mix in VR, you must export the master bus in B-format,
being sure to disable any decoder plugins. Per Jaunts specifications, your finished mix should
be exported as a 4-channel FuMa B-format 48 kHz, 24-bit signed integer WAV file. This file can
be played directly in the Jaunt Player for review though Jaunts ambisonic decoding algorithms.
You may want to use the audio visualization features of Jaunt Player, as well, which will reveal
the waveforms of each audio channel and a heatmap of the ambisonic soundfield.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 60 of 68

MP4 VIDEO IN JAUNT PLAYER WITH WAVEFORM OVERLAY

Once you have validated that your mix sounds correct in VR, you will want to attach it to your
video. If using Jaunt Cloud Services, follow the workflow for uploading the mix as a master to
the project you are working on. You can then assign your mix to the cut in the cloud, and
download a transcode of the video for playback within the Jaunt Player. If you are not using
Jaunt Cloud Services, you can combine the mix with an MP4 video file using a muxing tool such
as iffmpeg. Jaunts specs recommend converting to 4-channel AAC at 320 Kbps bitrate.
When your mix is properly combined with video, it should be contained within an MP4 file in
which stream 0 is a h.264 video transcode and stream 1 is your 4-channel AAC audio. This
video file will play in the Jaunt Player and can be viewed in VR using an Oculus Rift DK 2 or
CV-1. Use the heatmap overlay feature of Jaunt Player to ensure your sound sources have
been properly panned relative to the picture.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 61 of 68

MP4 VIDEO IN JAUNT PLAYER WITH AUDIO HEATMAP OVERLAY

Dolby Atmos
As an alternative to B-format, Dolby Atmos provides similar capabilities and fulfills the criteria for
a complete solution for cinematic VR audio. Unlike B-format, Atmos does not encode all the
scene information into a baked PCM audio soundfield. Instead, Atmos utilizes up to 118 object
tracks with spatial metadata in order to convey the full sound scene. For transmission, Atmos
printmasters are encoded to Dolby Digital Plus E-AC3, a codec that is supported by a wide
variety of software and hardware. Using the Dolby Atmos for Virtual Reality Applications
decoding library, E-AC3 streams can be decoded and rendered binaural audio of very high
quality. Since B-format is becoming fairly widespread among 360 video players, the Dolby
Atmos tools also provide a B-format render from the print master, so you can use the Dolby
Atmos authoring tools whether or not your distribution platform supports E-AC3. The Jaunt VR
application supports Dolby Atmos, and Jaunt Cloud Services accepts audio masters in E-AC3
format.
The Atmos authoring tools are a suite of AAX plug-ins for ProTools. If you already mix for film
using ProTools, the Atmos workflow extends your existing setup in order to enable production
for VR experiences. Atmos does not yet natively support B-format inputs, so if you are working
with B-format field recordings you will need to convert these to a surround bed for inclusion
within your mix.
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 62 of 68

Please refer to Dolbys documentation for further details on producing VR audio with the Dolby
Atmos toolset.

Facebook 360 Spatial Workstation


Facebooks Spatial Workstation is another alternative to B-format production. Similarly to Atmos,
the Spatial Workstation provides a suite of plugins for your DAW of choice and encodes to a
proprietary format with .tbe extension. An advantage of this system is that it works with many
other DAWs besides Reaper or ProTools. Like Atmos, you can export your mix to B-format for
distribution in 360 players such as YouTube 360 and Jaunt VR.

Please refer to Facebooks documentation for further details on producing VR audio with the
Spatial Workstation tools.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 63 of 68

Post-Production
Fixing Stitching Artifacts

SO
O
N

Editing
Working with Proxies
Available Tools
Dashwood Stereo VR Toolbox
Mettle Skybox 360

Post Stabilization

IN

Color Correction

Final Conform

O
M

Compositing & Adding VFX


Working in the 360 Equirectangular Format

Nuke & CaraVR

Rendering in 360

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 64 of 68

O
M

IN

SO
O
N

Interactivity

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 65 of 68

Appendix
Guidelines for Avoiding Artifacts using the Jaunt ONE
This appendix provides a checklist of best practices to effectively capture VR video with the
Jaunt One camera rig in conjunction with Jaunt Cloud Services (JCS) while avoiding artifacts.

Distance from the camera rig


All distances in this section are given in meters from the center of the camera rig. Minimum
distance for stitching to work is 1.25m.
In general, stitching quality improves the further content is from the camera rig, but compelling
stereo tends to be at distances of between 2.5 - 15m. Beyond 25m there is no perceptible
parallax.
However, as objects get closer to the camera the following will happen:
Likelihood of stitching artifacts increases.
Stitched imagery stretches vertically. This starts to become noticeable for content less than
2m away.
We recommend that objects of interest, such as people, stay at least 2m away from the camera.
It is best to keep the areas above and below the camera simple (i.e. avoid tree branches above
or objects or anything that crossing the 1.25m distance).

Leveling & placing the camera rig


For stereoscopic output, it is highly advised to keep the camera rig as level as possible. It is
rather challenging to correct for this in post-production without sacrificing stereo-quality. This,
however, can be easily corrected in post for monoscopic deliveries.
Please keep in mind that the camera rig replaces the viewers head. If the camera rig is tilted,
the viewer will look at tilted content. Also, the camera rig should be positioned at roughly the
viewers head/shoulder height.

Challenging situations to avoid


Certain kinds of scenes are more likely to cause artifacts in stitched footage. In practice, JCS
handles most of the following cases correctly, but minimizing these cases lowers the chance of
artifacts and potential discomfort for the viewer:
Version 1.5, January 2017

2016 Jaunt, Inc.

Page 66 of 68

Camera motion - Any motion other than constant velocity in a straight line can lead to
nausea.
Lens flares - Lens flares can cause inconsistencies between individual cameras and should
be avoided wherever possible.
Repeated texture - Repeated similar textures such as a highly repetitive wallpapers can
cause temporal inconsistency and localized stitching artifacts.
Thin structures - Thin structures (e.g. ropes, tree branches, ) are hard to reconstruct
without artifacts. Artifacts can be reduced if thin structures are in front of and close to (similar
depth) a bigger background object. Results also improve by increasing distance between thin
objects and the camera rig. Objects in front or behind thin objects may cause artifacts (e.g.
person behind a mesh fence).
Semi-transparent surfaces - A single depth for each point in the scene is estimated which
can lead to issues for semi-transparent surfaces.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 67 of 68

Legal
Any third party marks or other third party intellectual property used herein is owned by their
respec6ve owners. No right to reproduce or otherwise use such marks or property is granted
herein.
This guide is for informa6onal and educa6onal purposes only. No express or implied
representa6ons or warran6es are made herein and are expressly disclaimed.
Any use of the Jaunt ONE camera and any other products or devices referenced herein is subject
to separate specica6ons and use and safety requirements of Jaunt, Inc. and third party
manufacturers.
This Field Guide is not intended to provide legal or safety advice. See manufacturers
specica6ons for further informa6on.
Jaunt (word mark and logo) is TM of Jaunt, Inc.
Use of the Jaunt ONE camera requires compliance with applicable laws.
No endorsements of third party products is intended by this Field Guide. Any cri6ques of third
party products are based solely on the opinions of the author(s) of this Field Guide.

Version 1.5, January 2017

2016 Jaunt, Inc.

Page 68 of 68

Anda mungkin juga menyukai