Anda di halaman 1dari 6

Augmented TV: An Augmented Reality System

for TV Programs Beyond the TV Screen

Hiroyuki Kawakita*, Toshio Nakagawa


Science and Technology Research Laboratories
NHK (Japan Broadcasting Corporation)
Tokyo, Japan
{kawakita.h-dq, nakagawa.t-ic}@nhk.or.jp

Abstract-TV Service using a mobile device as a second screen We propose a novel AR system, which we named
has been increasing. We propose a new TV system which is able augmented TV, which overlays animated 3DCG related to the
to augment representation of TV programs beyond the TV screen. TV program to augment the TV pictures beyond the TV screen
In the system, which we named augmented TV, since animated by viewing the TV programs through the mobile device camera.
3DCG content interlocked with TV programs is overlaid on live Since our purpose is enrichment of representation for viewer's
video from the mobile device camera in the mobile device screen
impression, the quality of service is required. We developed an
by augmented reality techniques, the representation of having a
accurate synchronization method between the captured pictures
TV character coming out of the screen can be provided. To
and the 3DCG. We also developed the authoring environment
achieve the representation giving such surprise or reality to the
for the content producer to be able to examine the unique
viewer, synchronization accuracy of the overlay display is
representation of augmented TV.
required. A conventional synchronization method for multi­
device or a visible light communication method does not meet the
requirement of the accuracy. Therefore, we developed an
II. SYSTEM MODEL AND REQUIREMENTS
accurate synchronization method and authoring environment of
augmented TV content. We implemented augmented TV, and Figure 1 shows a service model of augmented TV. The
confirmed frame-accurate synchronization (synchronization viewer captures the broadcasted TV program, called main
error time is about 0.03 seconds or less). And we confirmed that content, from the TV screen by using the mobile device camera
the authoring environment is easy to produce augmented TV and watches the mobile device screen, which is running an
content using a video clip using a 3DCG character with TV augmented TV application. In the mobile device, animated
program quality. 3DCG content related to the TV program, called sub content, is
downloaded from the Internet in advance. Then, it is computed
Keywords-Augmented TV; Augmented Reality; Second Screen; based on the viewer's circumstances, especially the relative
Synchronization Method; Authoring Environment position of the TV screen and the mobile device, and overlaid
on the live video from the mobile device camera. The sub
I. INTRODUCTION
content has a scenario with a time code, and the scenario is
driven in accordance with the playback time of the sub content,
Viewing TV with a mobile device, such as a smart phone or called mobile time, indicating the time code to be interlocked
a tablet, as second screens is becoming increasingly common with the main content.
[1,2]. Using second screens, broadcasters or content
distributors can provide new content which cannot be To realize augmented TV, a synchronization method
composed by a single TV. between main and sub content is required. To reduce the cost
of producing the content, an authoring environment is also
On the other hand, as mobile devices are becoming
increasingly popular, augmented reality (AR), where virtual
objects are overlaid on live video from the mobile device
camera, has been a topic of intense research for about ten years
[3,4]. It allows viewers to experience augmented target objects
with virtual objects such as 3DCG.
By applying AR techniques with a second screen to moving
pictures on a TV screen, the popular representation of having a
TV character coming out of the screen [5] can be provided. The Broadcastin
station
� 3DCG data I�
representation's attraction is that the viewer cannot foresee
such circumstances, where a TV character on 2D screen comes
Sub content �
(Internet)
out of the screen and appears as 3D character in front of the
viewer. By using AR techniques, we hope for bringing such
surprise and reality to TV service. Figure 1. Service Model

978-1-4799-3824-7/14/$31.00 ©2014 IEEE


required. To make the method practicable for universal use, the communication [11] with a display, synchronization in the
requirements of the synchronization method are defined below. mobile device screen is not necessarily accurate by a delay
The synchronization accuracy is derived from the frame rate element such as decoding and buffering (see the chapter IV-A).
stated in the broadcasting specifications of Japan. This problem is not only by visible light communication but
also by a general synchronization method for multi-device,
• Synchronization method should be compatible with such as [12]. Therefore, we target more accurate
various kinds of TVs and mobile devices (including synchronization in the mobile device screen.
devices even not connected with a network).
In regard to an authoring environment, there are many
• Playback time of the main content on TV, called TV powerful authoring environments of individual animated
time, must not be delayed. 3DCG content, such as Unity [13]. Since a viewer watches the
• Main and sub content on a mobile device screen should mobile device screen displaying main content with sub content
be synchronized within one frame (about 0.03 seconds overlaid in augmented TV, to facilitate producing the content,
or less). we need to develop an integrated authoring environment where
it is possible to examine whether the overlaid content is as
The requirements of the authoring environment are defmed desired.
below.
• It should be an integrated environment to produce both IV. PROPOSED SYNCHRONIZATION METHOD
main and sub content.
A. Approach and Overview
• Universal formats for component files should be
Generally, a synchronization method makes the clock of
available.
each device the same. In augmented TV when using this
• 3DCG character's basic motions, such as walking or method, as shown in Figure 2, the display process for each
sitting down, can be designated easily, for example, by picture (or 3DCG) frame of the main and sub content starts in
action commands. accordance with the simultaneous clocks, and the content is

displayed after the processes have been completed with a delay
Interactive content can be produced.
due to, for example, decoding and buffering. Since the delay
time depends on the specifications of the device, the frames on
III. RELATED WORK the mobile device screen are not synchronized even when using
In the broadcasting scene, broadcasting specifications using simultaneous clocks. Therefore, we propose a method that
a second screen have been considered [6,7]. In the displays the TV time or mobile time on each device and
specifications, it is supposed that a second screen is used corrects the mobile time by comparing the TV time to mobile
mainly as a terminal which provides the viewer with time on mobile device screen.
information related to the TV program or a new TV user
Figure 3 shows the proposed method. In the method, a
interface and that the TV communicates with the second screen
moving (complementary) time marker symbolizing the
through a network. Only in augmented TV where even relative
playback time of each picture (or 3DCG) frame is overlaid on
position between the TV screen and the mobile device is
considered, it is possible to provide a novel representation the frame. By comparing the markers on the mobile device
integrating a TV and a second screen as the essential screen, the proposed method deals with the difference between
components. the times of the devices to synchronize the pictures of main and
sub content on the mobile device screen.
Related to a synchronization method using a device not
connected to a network, Sugimoto, et al. advocated Display­ B. Marker Design and Details
based Computing [8] which is a concept to achieve Figure 4 shows the time marker design example on the TV.
communication, measurement, control and appropriate To reduce the constraint of the content layout on the TV screen
presentation that exert influence on the real world by using
display devices. Since it supposed to utilize the existing
infrastructure, this concept has cost-benefits. In broadcasting
service required to be public service, this point has
considerable merits. Hence, we adopted the concept to achieve
an accurate synchronization not with a network but using just
display.
In [8], an application of controlling robots by a display
device is introduced. They have not dealt with improvement in
the synchronization accuracy of a virtual object in a mobile
device screen. We developed a fast generation method [9] of a
2D code on a TV using data broadcasting [10], but the 2D code II I :Delay element (processing time depends on device)
is too static to be used for a accurate synchronization. Even if
Figure 2. Block Diagram of General Synchronization
the time code is introduced into TV program by visible light
Method
and achieve accurate synchronization, the TV time is divided shown in Figure 5, the mobile device overlays the fraction part
into an integer part (2D matrix) and a fraction part (time of the mobile time as a complementary time marker in a time
marker). The 2D matrix consisting of black and white cells marker located at the relative position of the patterns. If the
represents the binary number. As shown in Figure 5, the time white part of the time marker is fully covered with a
marker moves continuously and cyclically. complementary time marker, the synchronization process is
finished. If it is not fully covered and a white space remains,
The mobile device detects the position detection patterns in
mobile time is corrected in accordance with the width of the
the picture frame captured by the mobile device camera. As
white space. As shown in Figure 6, the relation between width
n (px) and the synchronous error time t (sec.) is given by the

following equation:
tvw
n= e, (1)
2dtanz
--

TV
Capture: by camerc where v (m/s) is the velocity of the time marker, w (px) is the
Mobile device
width of the captured picture on the mobile device screen, d
(m) is the distance between the TV and the mobile device, and
e (rad.) is the horizontal angle of view of the mobile device
camera.
The details of the algorithm are as follows:
Stepl Set the synchronization decision threshold n to the
value calculated with equation (1) with t as the
Figure 3. Block Diagram of Proposed Synchronization required synchronous error time (0.03 at this time).
Method
Step2 Set the black-and-white decision threshold b to the
mean value of the black and white element at a
Integer part of playback time (2D matrix) fixed location, such as at the position detection
patterns.
Step3 Set the mobile time to zero (the mobile time
Position detection pattern
increasing with system time except for the time
correction in Step6)
Step4 When the fraction part of the mobile time is 0.5,
Fraction part of playback time capture the mobile device screen and scan a
(Time marker)
scanning line as shown in Figure 6.
Step5 Count pixels with a brightness greater than b as
white pixels.
Step6 If the number of white pixels is greater than n,
calculate the time equivalent for the numbers of
Figure 4. Time Marker Design Example on TV white pixels with equation (1), correct the mobile
time with the time, and go to step 4. If not, scan
Fraction part of playback time the 2D matrix and set the integer part of the mobile
I
\ time to the number indicated by the 2D matrix.
0.0 sec. 0.25 sec. 0.50 sec. 0.75 sec.
Time marker
TV

� [] :f(
I
.
v
):
I
.


W n

Complementary time marker


overlaid on time marker
(Both markers move continuously and cyclically.)
Figure 6. Parameters Related to Synchronization
Figure 5. Motion of (Complementary) Time Marker Accuracy
V. PROPOSED AUTHORING ENVIRONMENT environment for the main and sub content. Therefore, we
introduced a boundary corresponding to the TV screen, called a
A. Approach
clipping plane, into the virtual space in TVML.
NHK has developed an easy TV program production
By setting the space in front of the clipping plane (the left
language called TVML (TV program making language)[14,15].
side in Figure 7) as a rendering area, we can examine the sub
A program script written in TVML is read by software called a content from an arbitrary viewpoint with the virtual camera. By
TVML player to produce the program by real-time rendering. setting the clipping plane as a near clipping plane of the virtual
The universal formats of 3DCG data including model, texture, camera (filming the right side in Figure 7), we can get the main
and motion files are supported by the TVML player. The content.
3DCG character's basic motions can be designated by action
commands in TVML. Since the TVML player can be C. Moblie System Architecture
controlled by external software, interactive content can also be A TVML player, where we introduce a clipping plane into
produced using user software, and the software can also control the virtual space, is available as not only an authoring
virtual camera for rendering in TVML player. environment, but also a presentation environment of the sub
From the above, the TVML player as an augmented TV content in the mobile device as it is.
content authoring environment meets the requirements except Figure 8 shows the system architecture in the mobile device.
for the integrated environment. We will discuss the integrated
environment and the system architecture using a TVML player. Virtual space in TVML
A
B. Authoring Environment - -
C1ip;�---
-

l5p1<IrJe -
A real object is augmented in general AR, whereas a __

- --
- -
__
-
moving picture on a TV screen is augmented in augmented TV. -

Related to that, there are various ways to place a 3DCG object


of sub content by representation. Since we supposed how to
augment is similar to general AR from the perspective of
augmenting TV screen, we defined the axis of coordinates of
the sub content in such a way as to simply extend the world
captured by a camera for the main content to outside the TV
screen.
Under the definition, we discuss the integrated environment
for the main and sub content. There is 3D virtual space in
TVML, and the 2D TV program is produced by capturing the
3D virtual space by a virtual camera. When the captured TV
program is considered as main content, we can consider space
in front of the near clipping plane of the virtual camera as space
for sub content. We use virtual space as an authoring Figure 7. Introduction of Clipping Plane

Mobile device
TVML controller (background) TVML player

Touch panel ')


�-------+--

Speaker

Figure 8. System Architecture in Mobile Device


0.0)
0 0
0 0 • •
00.02 0 Average ofthe
Q iii

0

• • � absolute values
i'A Q •
'-"0.01 ofthe
<l) 0
0
0 measurement
§ 0
0
<D values for same
0 �
....

0 0
0
5 -0.01 g !l 0
distance
0 0
::l 0 0
'"

0 -0.02 Q Q
t: 0 o Measurement
0 0
0 value
....t::-om
....

Q 0 0
t:
>,
r:/)-0.04
0

0
-0.06
DIstance between TV and mobile devIce (m)

Figure 9. Synchronous Error Time against Distance

Since the TVML player itself cannot manage time with high Figure 10. Appearance of Our System
accuracy, to achieve accurate synchronization, we compose
that the software controls the TVML player with a time-based
scenario, and implemented the synchronization method as
shown by the red broken line.

VI. MOUNTING EVALUATION

To verify the proposed method, we implemented and


evaluated it with a tablet. Tables 1 and 2 show the parameters
used in the evaluation. Figure 9 shows the synchronous error
time for the distance between the TV and mobile device. We
measured the synchronous error time five times for the same
distance, and extended the distance by 0.25 m. Most
measurement values are within plus or minus 0.03 sec. We
calculated the average of the absolute values of the
measurement values for the same distance. The averages are
about 0.01-0.02 sec. regardless of the distance. From these
results, we generally confirmed the frame-accurate Figure 11. Example of Changes of Tablet Screen
synchronization by capturing the tablet screen. Captures
Figure 10 shows our implementation system of augmented
tablet screen captures of a scene where the character comes out
TV. We produce the content including a video clip with TV
of the TV screen as an example. At first, it was on 2D TV
program quality as an application where a 3DCG character
screen as the main content. Next, it came out from its face, and
watches the TV together. Figure 11 shows the changes of the
part of it outside the TV screen became 3DCG as the sub
Table 1. Specifications of Tablet content naturally. Finally, it became 3DCG, and we could
watch it from an arbitrary viewpoint. In this way, our method
enables a seamless representation beyond the TV screen.
OS Windows 8

CPU Intel Core i 3 1.8GHz


VII. DISCUSSION
System memory 4GB
A. The Synchronization Method
Adopted camera resolution 640 x 480 px
We discuss the lens distortion of the mobile device camera
as one of notes of the proposed synchronization method. Since
Table 2. Synchronization Parameters the method is based on analog physical quantity, the
measurement error by a lens distortion directly links to the
0.03 d 0.50 0.75 1.00 1.25 1.50 175 2.00 2.25 synchronization error time. A lens distortion is a phenomenon
v 0.059 n 5.4 3.6 2.7 2.2 1.8 1.5 1.4 1.2 where non-linear displacement according to the distance from
the optical axis in the radial direction for an ideal pinhole
IV 1440
camera is generated. In our system of augmented TV,
() 50.4 processing that a lens distortion affects are as follows:
(a) detecting position detection patterns VIII. CONCLUSION

(b) synchronization decision. We advocated a novel media concept, named augmented


TV, which is able to augment the TV picture representation
In (a), the distortion affects the coordinate system calculated beyond the TV screen by viewing through the mobile device
from the patterns, which is necessary to represent the relative camera. We developed the synchronization method and the
position relationship between the TV and the mobile device. integrated authoring environment that are necessary for
The system affects a complementary time marker placed in the augmented TV and confumed their effectiveness. Since the
system. In (b), since the complementary time marker with no method has the constraints of the TV program layout on the TV
distortion is overlaid on the captured picture with the distortion, screen, our future work is development of the method
the distortion directly affects synchronization accuracy. In eliminating the constraints.
general AR system, a lens distortion is corrected by the
software processing to reduce the affection of (a).
REFERENCES
Table 3 shows synchronization error time per 1 pixel
measurement error for the measurement parameters (t, v, W, e [I] Nielsen, "Action Figures: How Second Screens are Transforming TV
Viewing,"
in Table 2). The error of 1 pixel affects the time in proportion
http://www.nielsen.com/us/en/newswire/2013/action-figures--how­
to the distance. We ventured not to correct the distortion to second-screens-are-transforming-tv-viewing.html
examine the affection of the distortion in this measurement [2] Disney Second Screen,
including in Figure 9. The used lens had the distortion of barrel http://disneysecondscreen.go.com/
type. In the case of d 2.25 (m), when there was the time
= [3] H. Kato, M. Billinghurst, "Marker Tracking and HMD Calibration for a
marker captured near the left edge of the mobile device screen, video-based Augmented Reality Conferencing System," In Proceedings
of the 2nd International Workshop on Augmented Reality (lWAR 99).
we confirmed that mobile time was about 0.1 seconds slow October 1999, San Francisco, USA.
from TV time by the combined affection of (a) and (b). [4] B. Parhizkar, Z.M. Gebril, W.K. Obeidy, M.N.A. Ngan, S.A.
Conversely, when there was it near the right edge, mobile time Chowdhury, and A.H. Lashkari, "Android Mobile Augmented Reality
was going about 0.1 seconds from TV time. Application Based on Different Learning Theories for Primary School
Children," International Conference on Multimedia Computing and
In the measurement in Figure 9, to reduce the affection, we Systems (ICMCS), 2012.
placed the mobile device in such a way as to set the optical axis [5] Gore Verbinski (Director), 'The Ring" [Motion picture], United States,
of the mobile device camera through the center of the two DreamWorks, 2002.
position detection patterns placed in the lower. As a result, it [6] H. Ohmata, M. Takechi, S. Mitsuya, K. Otsuki, A. Baba, K. Matsumura,
was able to meet the requirement. From the above results, to K. Majima, and S. Sunasaki, "Hybridcast : A New Media Experience by
Integration of Broadcasting and Broadband," Proceedings of the lTU
achieve accurate synchronization, we found out that
Kaleidoscope Academic Conference, S5.1, 2013.
appropriate correction of the distortion is needed in the case of
[7] Hybrid Broadcast Broadband TV, ETSI TS 102 796 V1.1.1, 20 10.
the time marker captured at arbitrary position and that we can
[8] M. Sugimoto, G. Kagotani, M. Kojima, H. Nii, A Nakamura, M. Inami,
draw out sufficient accuracy without the correction if we place "Augmented coliseum: display-based computing for augmented reality
the mobile device at appropriate position. inspiration computing robot," ACM SIGGRAPH 2005 Emerging
technologies, July 31-August 04, 2005, Los Angeles, California.
B. Interactive Content [9] ARIB STD-B24, "Data Coding and Transmission Specification for
Figure 12 shows an example interactive content. In the Digital Broadcasting," Version5.1, Mar.2007.

content if the viewer tap the character on the mobile device [10] H. Kawakita, Y. Nishimoto, T. Inoue, "Fast Generation Method of 2D
Code on DTV Receivers," IEEE International Conference on Consumer
screen, the character looks back and says, "What?". We Electronics 20II, T08-S07/3, pp.0794-795, 20II.
confumed that interactive content could be produced by the [II] Visible Light Communications Consortium,
authoring environment. http://www.vlcc.net!
[12] Y.lshibashi and S.Tasaka, "A synchronization mechanism for
Table 3. Synchronization Error Time per Pixel continuous media in multicast communications," in Conf. Rec. IEEE
GLOBECOM, pp.746-752, Apr.1997.
d 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 [13] Unity,
http://unity3d.com/
0.0055 0.0083 0.0111 0.0138 0.0166 0.0194 0.0221 0.0249
[14] NHK TVML Player,
http://www.nhk.or.jp/strl/tvml/english/player2/index.html
[15] M. Hayashi, H. Ueda, and T. Kurihara, "TVML (TV program Making
Language) - Automatic TV Program Generation from Text-based Script
-," Proceedings of lmagina'99, 1999.

Figure 12. Example of Interactive Content

Anda mungkin juga menyukai