summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc7205.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc7205.txt')
-rw-r--r--doc/rfc/rfc7205.txt955
1 files changed, 955 insertions, 0 deletions
diff --git a/doc/rfc/rfc7205.txt b/doc/rfc/rfc7205.txt
new file mode 100644
index 0000000..5de54f8
--- /dev/null
+++ b/doc/rfc/rfc7205.txt
@@ -0,0 +1,955 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) A. Romanow
+Request for Comments: 7205 Cisco
+Category: Informational S. Botzko
+ISSN: 2070-1721 M. Duckworth
+ Polycom
+ R. Even, Ed.
+ Huawei Technologies
+ April 2014
+
+
+ Use Cases for Telepresence Multistreams
+
+Abstract
+
+ Telepresence conferencing systems seek to create an environment that
+ gives users (or user groups) that are not co-located a feeling of co-
+ located presence through multimedia communication that includes at
+ least audio and video signals of high fidelity. A number of
+ techniques for handling audio and video streams are used to create
+ this experience. When these techniques are not similar,
+ interoperability between different systems is difficult at best, and
+ often not possible. Conveying information about the relationships
+ between multiple streams of media would enable senders and receivers
+ to make choices to allow telepresence systems to interwork. This
+ memo describes the most typical and important use cases for sending
+ multiple streams in a telepresence conference.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for informational purposes.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Not all documents
+ approved by the IESG are a candidate for any level of Internet
+ Standard; see Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc7205.
+
+
+
+
+
+
+
+
+
+Romanow, et al. Informational [Page 1]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+Copyright Notice
+
+ Copyright (c) 2014 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
+ 2. Overview of Telepresence Scenarios . . . . . . . . . . . . . 4
+ 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 6
+ 3.1. Point-to-Point Meeting: Symmetric . . . . . . . . . . . . 7
+ 3.2. Point-to-Point Meeting: Asymmetric . . . . . . . . . . . 7
+ 3.3. Multipoint Meeting . . . . . . . . . . . . . . . . . . . 9
+ 3.4. Presentation . . . . . . . . . . . . . . . . . . . . . . 10
+ 3.5. Heterogeneous Systems . . . . . . . . . . . . . . . . . . 11
+ 3.6. Multipoint Education Usage . . . . . . . . . . . . . . . 12
+ 3.7. Multipoint Multiview (Virtual Space) . . . . . . . . . . 14
+ 3.8. Multiple Presentation Streams - Telemedicine . . . . . . 15
+ 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16
+ 5. Security Considerations . . . . . . . . . . . . . . . . . . . 16
+ 6. Informative References . . . . . . . . . . . . . . . . . . . 16
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Romanow, et al. Informational [Page 2]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+1. Introduction
+
+ Telepresence applications try to provide a "being there" experience
+ for conversational video conferencing. Often, this telepresence
+ application is described as "immersive telepresence" in order to
+ distinguish it from traditional video conferencing and from other
+ forms of remote presence not related to conversational video
+ conferencing, such as avatars and robots. The salient
+ characteristics of telepresence are often described as: being actual
+ sized, providing immersive video, preserving interpersonal
+ interaction, and allowing non-verbal communication.
+
+ Although telepresence systems are based on open standards such as RTP
+ [RFC3550], SIP [RFC3261], H.264 [ITU.H264], and the H.323 [ITU.H323]
+ suite of protocols, they cannot easily interoperate with each other
+ without operator assistance and expensive additional equipment that
+ translates from one vendor's protocol to another.
+
+ The basic features that give telepresence its distinctive
+ characteristics are implemented in disparate ways in different
+ systems. Currently, telepresence systems from diverse vendors
+ interoperate to some extent, but this is not supported in a
+ standards-based fashion. Interworking requires that translation and
+ transcoding devices be included in the architecture. Such devices
+ increase latency, reducing the quality of interpersonal interaction.
+ Use of these devices is often not automatic; it frequently requires
+ substantial manual configuration and a detailed understanding of the
+ nature of underlying audio and video streams. This state of affairs
+ is not acceptable for the continued growth of telepresence -- these
+ systems should have the same ease of interoperability as do
+ telephones. Thus, a standard way of describing the multiple streams
+ constituting the media flows and the fundamental aspects of their
+ behavior would allow telepresence systems to interwork.
+
+ This document presents a set of use cases describing typical
+ scenarios. Requirements will be derived from these use cases in a
+ separate document. The use cases are described from the viewpoint of
+ the users. They are illustrative of the user experience that needs
+ to be supported. It is possible to implement these use cases in a
+ variety of different ways.
+
+ Many different scenarios need to be supported. This document
+ describes in detail the most common and basic use cases. These will
+ cover most of the requirements. There may be additional scenarios
+ that bring new features and requirements that can be used to extend
+ the initial work.
+
+
+
+
+
+Romanow, et al. Informational [Page 3]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ Point-to-point and multipoint telepresence conferences are
+ considered. In some use cases, the number of screens is the same at
+ all sites; in others, the number of screens differs at different
+ sites. Both use cases are considered. Also included is a use case
+ describing display of presentation material or content.
+
+ The multipoint use cases may include a variety of systems from
+ conference room systems to handheld devices, and such a use case is
+ described in the document.
+
+ This document's structure is as follows: Section 2 gives an overview
+ of scenarios, and Section 3 describes use cases.
+
+2. Overview of Telepresence Scenarios
+
+ This section describes the general characteristics of the use cases
+ and what the scenarios are intended to show. The typical setting is
+ a business conference, which was the initial focus of telepresence.
+ Recently, consumer products are also being developed. We
+ specifically do not include in our scenarios the physical
+ infrastructure aspects of telepresence, such as room construction,
+ layout, and decoration. Furthermore, these use cases do not describe
+ all the aspects needed to create the best user experience (for
+ example, the human factors).
+
+ We also specifically do not attempt to precisely define the
+ boundaries between telepresence systems and other systems, nor do we
+ attempt to identify the "best" solution for each presented scenario.
+
+ Telepresence systems are typically composed of one or more video
+ cameras and encoders and one or more display screens of large size
+ (diagonal around 60 inches). Microphones pick up sound, and audio
+ codec(s) produce one or more audio streams. The cameras used to
+ capture the telepresence users are referred to as "participant
+ cameras" (and likewise for screens). There may also be other
+ cameras, such as for document display. These will be referred to as
+ "presentation cameras" or "content cameras", which generally have
+ different formats, aspect ratios, and frame rates from the
+ participant cameras. The presentation streams may be shown on
+ participant screens or on auxiliary display screens. A user's
+ computer may also serve as a virtual content camera, generating an
+ animation or playing a video for display to the remote participants.
+
+ We describe such a telepresence system as sending one or more video
+ streams, audio streams, and presentation streams to the remote
+ system(s).
+
+
+
+
+
+Romanow, et al. Informational [Page 4]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ The fundamental parameters describing today's typical telepresence
+ scenarios include:
+
+ 1. The number of participating sites
+
+ 2. The number of visible seats at a site
+
+ 3. The number of cameras
+
+ 4. The number and type of microphones
+
+ 5. The number of audio channels
+
+ 6. The screen size
+
+ 7. The screen capabilities -- such as resolution, frame rate,
+ aspect ratio
+
+ 8. The arrangement of the screens in relation to each other
+
+ 9. The number of primary screens at each site
+
+ 10. Type and number of presentation screens
+
+ 11. Multipoint conference display strategies -- for example, the
+ camera-to-screen mappings may be static or dynamic
+
+ 12. The camera point of capture
+
+ 13. The cameras fields of view and how they spatially relate to each
+ other
+
+ As discussed in the introduction, the basic features that give
+ telepresence its distinctive characteristics are implemented in
+ disparate ways in different systems.
+
+ There is no agreed upon way to adequately describe the semantics of
+ how streams of various media types relate to each other. Without a
+ standard for stream semantics to describe the particular roles and
+ activities of each stream in the conference, interoperability is
+ cumbersome at best.
+
+ In a multiple-screen conference, the video and audio streams sent
+ from remote participants must be understood by receivers so that they
+ can be presented in a coherent and life-like manner. This includes
+ the ability to present remote participants at their actual size for
+ their apparent distance, while maintaining correct eye contact,
+
+
+
+
+Romanow, et al. Informational [Page 5]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ gesticular cues, and simultaneously providing a spatial audio sound
+ stage that is consistent with the displayed video.
+
+ The receiving device that decides how to render incoming information
+ needs to understand a number of variables such as the spatial
+ position of the speaker, the field of view of the cameras, the camera
+ zoom, which media stream is related to each of the screens, etc. It
+ is not simply that individual streams must be adequately described,
+ to a large extent this already exists, but rather that the semantics
+ of the relationships between the streams must be communicated. Note
+ that all of this is still required even if the basic aspects of the
+ streams, such as the bit rate, frame rate, and aspect ratio, are
+ known. Thus, this problem has aspects considerably beyond those
+ encountered in interoperation of video conferencing systems that have
+ a single camera/screen.
+
+3. Use Cases
+
+ The use cases focus on typical implementations. There are a number
+ of possible variants for these use cases; for example, the audio
+ supported may differ at the end points (such as mono or stereo versus
+ surround sound), etc.
+
+ Many of these systems offer a "full conference room" solution, where
+ local participants sit at one side of a table and remote participants
+ are displayed as if they are sitting on the other side of the table.
+ The cameras and screens are typically arranged to provide a panoramic
+ view of the remote room (left to right from the local user's
+ viewpoint).
+
+ The sense of immersion and non-verbal communication is fostered by a
+ number of technical features, such as:
+
+ 1. Good eye contact, which is achieved by careful placement of
+ participants, cameras, and screens.
+
+ 2. Camera field of view and screen sizes are matched so that the
+ images of the remote room appear to be full size.
+
+ 3. The left side of each room is presented on the right screen at
+ the far end; similarly, the right side of the room is presented
+ on the left screen. The effect of this is that participants of
+ each site appear to be sitting across the table from each other.
+ If 2 participants on the same site glance at each other, all
+ participants can observe it. Likewise, if a participant at one
+ site gestures to a participant on the other site, all
+ participants observe the gesture itself and the participants it
+ includes.
+
+
+
+Romanow, et al. Informational [Page 6]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+3.1. Point-to-Point Meeting: Symmetric
+
+ In this case, each of the 2 sites has an identical number of screens,
+ with cameras having fixed fields of view, and 1 camera for each
+ screen. The sound type is the same at each end. As an example,
+ there could be 3 cameras and 3 screens in each room, with stereo
+ sound being sent and received at each end.
+
+ Each screen is paired with a corresponding camera. Each camera/
+ screen pair is typically connected to a separate codec, producing an
+ encoded stream of video for transmission to the remote site, and
+ receiving a similarly encoded stream from the remote site.
+
+ Each system has one or multiple microphones for capturing audio. In
+ some cases, stereophonic microphones are employed. In other systems,
+ a microphone may be placed in front of each participant (or pair of
+ participants). In typical systems, all the microphones are connected
+ to a single codec that sends and receives the audio streams as either
+ stereo or surround sound. The number of microphones and the number
+ of audio channels are often not the same as the number of cameras.
+ Also, the number of microphones is often not the same as the number
+ of loudspeakers.
+
+ The audio may be transmitted as multi-channel (stereo/surround sound)
+ or as distinct and separate monophonic streams. Audio levels should
+ be matched, so the sound levels at both sites are identical.
+ Loudspeaker and microphone placements are chosen so that the sound
+ "stage" (orientation of apparent audio sources) is coordinated with
+ the video. That is, if a participant at one site speaks, the
+ participants at the remote site perceive her voice as originating
+ from her visual image. In order to accomplish this, the audio needs
+ to be mapped at the received site in the same fashion as the video.
+ That is, audio received from the right side of the room needs to be
+ output from loudspeaker(s) on the left side at the remote site, and
+ vice versa.
+
+3.2. Point-to-Point Meeting: Asymmetric
+
+ In this case, each site has a different number of screens and cameras
+ than the other site. The important characteristic of this scenario
+ is that the number of screens is different between the 2 sites. This
+ creates challenges that are handled differently by different
+ telepresence systems.
+
+ This use case builds on the basic scenario of 3 screens to 3 screens.
+ Here, we use the common case of 3 screens and 3 cameras at one site,
+ and 1 screen and 1 camera at the other site, connected by a point-to-
+ point call. The screen sizes and camera fields of view at both sites
+
+
+
+Romanow, et al. Informational [Page 7]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ are basically similar, such that each camera view is designed to show
+ 2 people sitting side by side. Thus, the 1-screen room has up to 2
+ people seated at the table, while the 3-screen room may have up to 6
+ people at the table.
+
+ The basic considerations of defining left and right and indicating
+ relative placement of the multiple audio and video streams are the
+ same as in the 3-3 use case. However, handling the mismatch between
+ the 2 sites of the number of screens and cameras requires more
+ complicated maneuvers.
+
+ For the video sent from the 1-camera room to the 3-screen room,
+ usually what is done is to simply use 1 of the 3 screens and keep the
+ second and third screens inactive or, for example, put up the current
+ date. This would maintain the "full-size" image of the remote side.
+
+ For the other direction, the 3-camera room sending video to the
+ 1-screen room, there are more complicated variations to consider.
+ Here are several possible ways in which the video streams can be
+ handled.
+
+ 1. The 1-screen system might simply show only 1 of the 3 camera
+ images, since the receiving side has only 1 screen. 2 people are
+ seen at full size, but 4 people are not seen at all. The choice
+ of which one of the 3 streams to display could be fixed, or could
+ be selected by the users. It could also be made automatically
+ based on who is speaking in the 3-screen room, such that the
+ people in the 1-screen room always see the person who is
+ speaking. If the automatic selection is done at the sender, the
+ transmission of streams that are not displayed could be
+ suppressed, which would avoid wasting bandwidth.
+
+ 2. The 1-screen system might be capable of receiving and decoding
+ all 3 streams from all 3 cameras. The 1-screen system could then
+ compose the 3 streams into 1 local image for display on the
+ single screen. All 6 people would be seen, but smaller than full
+ size. This could be done in conjunction with reducing the image
+ resolution of the streams, such that encode/decode resources and
+ bandwidth are not wasted on streams that will be downsized for
+ display anyway.
+
+ 3. The 3-screen system might be capable of including all 6 people in
+ a single stream to send to the 1-screen system. For example, it
+ could use PTZ (Pan Tilt Zoom) cameras to physically adjust the
+ cameras such that 1 camera captures the whole room of 6 people.
+ Or, it could recompose the 3 camera images into 1 encoded stream
+ to send to the remote site. These variations also show all 6
+ people but at a reduced size.
+
+
+
+Romanow, et al. Informational [Page 8]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ 4. Or, there could be a combination of these approaches, such as
+ simultaneously showing the speaker in full size with a composite
+ of all 6 participants in a smaller size.
+
+ The receiving telepresence system needs to have information about the
+ content of the streams it receives to make any of these decisions.
+ If the systems are capable of supporting more than one strategy,
+ there needs to be some negotiation between the 2 sites to figure out
+ which of the possible variations they will use in a specific point-
+ to-point call.
+
+3.3. Multipoint Meeting
+
+ In a multipoint telepresence conference, there are more than 2 sites
+ participating. Additional complexity is required to enable media
+ streams from each participant to show up on the screens of the other
+ participants.
+
+ Clearly, there are a great number of topologies that can be used to
+ display the streams from multiple sites participating in a
+ conference.
+
+ One major objective for telepresence is to be able to preserve the
+ "being there" user experience. However, in multi-site conferences,
+ it is often (in fact, usually) not possible to simultaneously provide
+ full-size video, eye contact, and common perception of gestures and
+ gaze by all participants. Several policies can be used for stream
+ distribution and display: all provide good results, but they all make
+ different compromises.
+
+ One common policy is called site switching. Let's say the speaker is
+ at site A and the other participants are at various "remote" sites.
+ When the room at site A shown, all the camera images from site A are
+ forwarded to the remote sites. Therefore, at each receiving remote
+ site, all the screens display camera images from site A. This can be
+ used to preserve full-size image display, and also provide full
+ visual context of the displayed far end, site A. In site switching,
+ there is a fixed relation between the cameras in each room and the
+ screens in remote rooms. The room or participants being shown are
+ switched from time to time based on who is speaking or by manual
+ control, e.g., from site A to site B.
+
+ Segment switching is another policy choice. In segment switching
+ (assuming still that site A is where the speaker is, and "remote"
+ refers to all the other sites), rather than sending all the images
+ from site A, only the speaker at site A is shown. The camera images
+ of the current speaker and previous speakers (if any) are forwarded
+ to the other sites in the conference. Therefore, the screens in each
+
+
+
+Romanow, et al. Informational [Page 9]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ site are usually displaying images from different remote sites -- the
+ current speaker at site A and the previous ones. This strategy can
+ be used to preserve full-size image display and also capture the non-
+ verbal communication between the speakers. In segment switching, the
+ display depends on the activity in the remote rooms (generally, but
+ not necessarily based on audio/speech detection).
+
+ A third possibility is to reduce the image size so that multiple
+ camera views can be composited onto one or more screens. This does
+ not preserve full-size image display, but it provides the most visual
+ context (since more sites or segments can be seen). Typically in
+ this case, the display mapping is static, i.e., each part of each
+ room is shown in the same location on the display screens throughout
+ the conference.
+
+ Other policies and combinations are also possible. For example,
+ there can be a static display of all screens from all remote rooms,
+ with part or all of one screen being used to show the current speaker
+ at full size.
+
+3.4. Presentation
+
+ In addition to the video and audio streams showing the participants,
+ additional streams are used for presentations.
+
+ In systems available today, generally only one additional video
+ stream is available for presentations. Often, this presentation
+ stream is half-duplex in nature, with presenters taking turns. The
+ presentation stream may be captured from a PC screen, or it may come
+ from a multimedia source such as a document camera, camcorder, or a
+ DVD. In a multipoint meeting, the presentation streams for the
+ currently active presentation are always distributed to all sites in
+ the meeting, so that the presentations are viewed by all.
+
+ Some systems display the presentation streams on a screen that is
+ mounted either above or below the 3 participant screens. Other
+ systems provide screens on the conference table for observing
+ presentations. If multiple presentation screens are used, they
+ generally display identical content. There is considerable variation
+ in the placement, number, and size of presentation screens.
+
+ In some systems, presentation audio is pre-mixed with the room audio.
+ In others, a separate presentation audio stream is provided (if the
+ presentation includes audio).
+
+
+
+
+
+
+
+Romanow, et al. Informational [Page 10]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ In H.323 [ITU.H323] systems, H.239 [ITU.H239] is typically used to
+ control the video presentation stream. In SIP systems, similar
+ control mechanisms can be provided using the Binary Floor Control
+ Protocol (BFCP) [RFC4582] for the presentation token. These
+ mechanisms are suitable for managing a single presentation stream.
+
+ Although today's systems remain limited to a single video
+ presentation stream, there are obvious uses for multiple presentation
+ streams:
+
+ 1. Frequently, the meeting convener is following a meeting agenda,
+ and it is useful for her to be able to show that agenda to all
+ participants during the meeting. Other participants at various
+ remote sites are able to make presentations during the meeting,
+ with the presenters taking turns. The presentations and the
+ agenda are both shown, either on separate screens, or perhaps
+ rescaled and shown on a single screen.
+
+ 2. A single multimedia presentation can itself include multiple
+ video streams that should be shown together. For instance, a
+ presenter may be discussing the fairness of media coverage. In
+ addition to slides that support the presenter's conclusions, she
+ also has video excerpts from various news programs that she shows
+ to illustrate her findings. She uses a DVD player for the video
+ excerpts so that she can pause and reposition the video as
+ needed.
+
+ 3. An educator who is presenting a multiscreen slide show. This
+ show requires that the placement of the images on the multiple
+ screens at each site be consistent.
+
+ There are many other examples where multiple presentation streams are
+ useful.
+
+3.5. Heterogeneous Systems
+
+ It is common in meeting scenarios for people to join the conference
+ from a variety of environments, using different types of endpoint
+ devices. A multiscreen immersive telepresence conference may include
+ someone on a PC-based video conferencing system, a participant
+ calling in by phone, and (soon) someone on a handheld device.
+
+ What experience/view will each of these devices have?
+
+ Some may be able to handle multiple streams, and others can handle
+ only a single stream. (Here, we are not talking about legacy
+ systems, but rather systems built to participate in such a
+ conference, although they are single stream only.) In a single video
+
+
+
+Romanow, et al. Informational [Page 11]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ stream, the stream may contain one or more compositions depending on
+ the available screen space on the device. In most cases, an
+ intermediate transcoding device will be relied upon to produce a
+ single stream, perhaps with some kind of continuous presence.
+
+ Bit rates will vary -- the handheld device and phone having lower bit
+ rates than PC and multiscreen systems.
+
+ Layout is accomplished according to different policies. For example,
+ a handheld device and PC may receive the active speaker stream. The
+ decision can either be made explicitly by the receiver or by the
+ sender if it can receive some kind of rendering hint. The same is
+ true for audio -- i.e., that it receives a mixed stream or a number
+ of the loudest speakers if mixing is not available in the network.
+
+ For the PC-based conferencing participant, the user's experience
+ depends on the application. It could be single stream, similar to a
+ handheld device but with a bigger screen. Or, it could be multiple
+ streams, similar to an immersive telepresence system but with a
+ smaller screen. Control for manipulation of streams can be local in
+ the software application, or in another location and sent to the
+ application over the network.
+
+ The handheld device is the most extreme. How will that participant
+ be viewed and heard? It should be an equal participant, though the
+ bandwidth will be significantly less than an immersive system. A
+ receiver may choose to display output coming from a handheld device
+ differently based on the resolution, but that would be the case with
+ any low-resolution video stream, e.g., from a powerful PC on a bad
+ network.
+
+ The handheld device will send and receive a single video stream,
+ which could be a composite or a subset of the conference. The
+ handheld device could say what it wants or could accept whatever the
+ sender (conference server or sending endpoint) thinks is best. The
+ handheld device will have to signal any actions it wants to take the
+ same way that an immersive system signals actions.
+
+3.6. Multipoint Education Usage
+
+ The importance of this example is that the multiple video streams are
+ not used to create an immersive conferencing experience with
+ panoramic views at all the sites. Instead, the multiple streams are
+ dynamically used to enable full participation of remote students in a
+ university class. In some instances, the same video stream is
+ displayed on multiple screens in the room; in other instances, an
+ available stream is not displayed at all.
+
+
+
+
+Romanow, et al. Informational [Page 12]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ The main site is a university auditorium that is equipped with 3
+ cameras. One camera is focused on the professor at the podium. A
+ second camera is mounted on the wall behind the professor and
+ captures the class in its entirety. The third camera is co-located
+ with the second and is designed to capture a close-up view of a
+ questioner in the audience. It automatically zooms in on that
+ student using sound localization.
+
+ Although the auditorium is equipped with 3 cameras, it is only
+ equipped with 2 screens. One is a large screen located at the front
+ so that the class can see it. The other is located at the rear so
+ the professor can see it. When someone asks a question, the front
+ screen shows the questioner. Otherwise, it shows the professor
+ (ensuring everyone can easily see her).
+
+ The remote sites are typical immersive telepresence rooms, each with
+ 3 camera/screen pairs.
+
+ All remote sites display the professor on the center screen at full
+ size. A second screen shows the entire classroom view when the
+ professor is speaking. However, when a student asks a question, the
+ second screen shows the close-up view of the student at full size.
+ Sometimes the student is in the auditorium; sometimes the speaking
+ student is at another remote site. The remote systems never display
+ the students that are actually in that room.
+
+ If someone at a remote site asks a question, then the screen in the
+ auditorium will show the remote student at full size (as if they were
+ present in the auditorium itself). The screen in the rear also shows
+ this questioner, allowing the professor to see and respond to the
+ student without needing to turn her back on the main class.
+
+ When no one is asking a question, the screen in the rear briefly
+ shows a full-room view of each remote site in turn, allowing the
+ professor to monitor the entire class (remote and local students).
+ The professor can also use a control on the podium to see a
+ particular site -- she can choose either a full-room view or a
+ single-camera view.
+
+ Realization of this use case does not require any negotiation between
+ the participating sites. Endpoint devices (and a Multipoint Control
+ Unit (MCU), if present) need to know who is speaking and what video
+ stream includes the view of that speaker. The remote systems need
+ some knowledge of which stream should be placed in the center. The
+ ability of the professor to see specific sites (or for the system to
+ show all the sites in turn) would also require the auditorium system
+
+
+
+
+
+Romanow, et al. Informational [Page 13]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ to know what sites are available and to be able to request a
+ particular view of any site. Bandwidth is optimized if video that is
+ not being shown at a particular site is not distributed to that site.
+
+3.7. Multipoint Multiview (Virtual Space)
+
+ This use case describes a virtual space multipoint meeting with good
+ eye contact and spatial layout of participants. The use case was
+ proposed very early in the development of video conferencing systems
+ as described in 1983 by Allardyce and Randal [virtualspace]. The use
+ case is illustrated in Figure 2-5 of their report. The virtual space
+ expands the point-to-point case by having all multipoint conference
+ participants "seated" in a virtual room. In this case, each
+ participant has a fixed "seat" in the virtual room, so each
+ participant expects to see a different view having a different
+ participant on his left and right side. Today, the use case is
+ implemented in multiple telepresence-type video conferencing systems
+ on the market. The term "virtual space" was used in their report.
+ The main difference between the result obtained with modern systems
+ and those from 1983 are larger screen sizes.
+
+ Virtual space multipoint as defined here assumes endpoints with
+ multiple cameras and screens. Usually, there is the same number of
+ cameras and screens at a given endpoint. A camera is positioned
+ above each screen. A key aspect of virtual space multipoint is the
+ details of how the cameras are aimed. The cameras are each aimed on
+ the same area of view of the participants at the site. Thus, each
+ camera takes a picture of the same set of people but from a different
+ angle. Each endpoint sender in the virtual space multipoint meeting
+ therefore offers a choice of video streams to remote receivers, each
+ stream representing a different viewpoint. For example, a camera
+ positioned above a screen to a participant's left may take video
+ pictures of the participant's left ear; while at the same time, a
+ camera positioned above a screen to the participant's right may take
+ video pictures of the participant's right ear.
+
+ Since a sending endpoint has a camera associated with each screen, an
+ association is made between the receiving stream output on a
+ particular screen and the corresponding sending stream from the
+ camera associated with that screen. These associations are repeated
+ for each screen/camera pair in a meeting. The result of this system
+ is a horizontal arrangement of video images from remote sites, one
+ per screen. The image from each screen is paired with the camera
+ output from the camera above that screen, resulting in excellent eye
+ contact.
+
+
+
+
+
+
+Romanow, et al. Informational [Page 14]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+3.8. Multiple Presentation Streams - Telemedicine
+
+ This use case describes a scenario where multiple presentation
+ streams are used. In this use case, the local site is a surgery room
+ connected to one or more remote sites that may have different
+ capabilities. At the local site, 3 main cameras capture the whole
+ room (the typical 3-camera telepresence case). Also, multiple
+ presentation inputs are available: a surgery camera that is used to
+ provide a zoomed view of the operation, an endoscopic monitor, a
+ flouroscope (X-ray imaging), an ultrasound diagnostic device, an
+ electrocardiogram (ECG) monitor, etc. These devices are used to
+ provide multiple local video presentation streams to help the surgeon
+ monitor the status of the patient and assist in the surgical process.
+
+ The local site may have 3 main screens and one (or more) presentation
+ screen(s). The main screens can be used to display the remote
+ experts. The presentation screen(s) can be used to display multiple
+ presentation streams from local and remote sites simultaneously. The
+ 3 main cameras capture different parts of the surgery room. The
+ surgeon can decide the number, the size, and the placement of the
+ presentations displayed on the local presentation screen(s). He can
+ also indicate which local presentation captures are provided for the
+ remote sites. The local site can send multiple presentation captures
+ to remote sites, and it can receive from them multiple presentations
+ related to the patient or the procedure.
+
+ One type of remote site is a single- or dual-screen and one-camera
+ system used by a consulting expert. In the general case, the remote
+ sites can be part of a multipoint telepresence conference. The
+ presentation screens at the remote sites allow the experts to see the
+ details of the operation and related data. Like the main site, the
+ experts can decide the number, the size, and the placement of the
+ presentations displayed on the presentation screens. The
+ presentation screens can display presentation streams from the
+ surgery room, from other remote sites, or from local presentation
+ streams. Thus, the experts can also start sending presentation
+ streams that can carry medical records, pathology data, or their
+ references and analysis, etc.
+
+ Another type of remote site is a typical immersive telepresence room
+ with 3 camera/screen pairs, allowing more experts to join the
+ consultation. These sites can also be used for education. The
+ teacher, who is not necessarily the surgeon, and the students are in
+ different remote sites. Students can observe and learn the details
+ of the whole procedure, while the teacher can explain and answer
+ questions during the operation.
+
+
+
+
+
+Romanow, et al. Informational [Page 15]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ All remote education sites can display the surgery room. Another
+ option is to display the surgery room on the center screen, and the
+ rest of the screens can show the teacher and the student who is
+ asking a question. For all the above sites, multiple presentation
+ screens can be used to enhance visibility: one screen for the zoomed
+ surgery stream and the others for medical image streams, such as MRI
+ images, cardiograms, ultrasonic images, and pathology data.
+
+4. Acknowledgements
+
+ The document has benefitted from input from a number of people
+ including Alex Eleftheriadis, Marshall Eubanks, Tommy Andre Nyquist,
+ Mark Gorzynski, Charles Eckel, Nermeen Ismail, Mary Barnes, Pascal
+ Buhler, and Jim Cole.
+
+ Special acknowledgement to Lennard Xiao, who contributed the text for
+ the telemedicine use case, and to Claudio Allocchio for his detailed
+ review of the document.
+
+5. Security Considerations
+
+ While there are likely to be security considerations for any solution
+ for telepresence interoperability, this document has no security
+ considerations.
+
+6. Informative References
+
+ [ITU.H239] ITU-T, "Role management and additional media channels for
+ H.300-series terminals", ITU-T Recommendation H.239,
+ September 2005.
+
+ [ITU.H264] ITU-T, "Advanced video coding for generic audiovisual
+ services", ITU-T Recommendation H.264, April 2013.
+
+ [ITU.H323] ITU-T, "Packet-based Multimedia Communications Systems",
+ ITU-T Recommendation H.323, December 2009.
+
+ [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
+ A., Peterson, J., Sparks, R., Handley, M., and E.
+ Schooler, "SIP: Session Initiation Protocol", RFC 3261,
+ June 2002.
+
+ [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
+ Jacobson, "RTP: A Transport Protocol for Real-Time
+ Applications", STD 64, RFC 3550, July 2003.
+
+ [RFC4582] Camarillo, G., Ott, J., and K. Drage, "The Binary Floor
+ Control Protocol (BFCP)", RFC 4582, November 2006.
+
+
+
+Romanow, et al. Informational [Page 16]
+
+RFC 7205 Telepresence Use Cases April 2014
+
+
+ [virtualspace]
+ Allardyce, L. and L. Randall, "Development of
+ Teleconferencing Methodologies with Emphasis on Virtual
+ Space Video and Interactive Graphics", April 1983,
+ <http://www.dtic.mil/docs/citations/ADA127738>.
+
+Authors' Addresses
+
+ Allyn Romanow
+ Cisco
+ San Jose, CA 95134
+ US
+
+ EMail: allyn@cisco.com
+
+
+ Stephen Botzko
+ Polycom
+ Andover, MA 01810
+ US
+
+ EMail: stephen.botzko@polycom.com
+
+
+ Mark Duckworth
+ Polycom
+ Andover, MA 01810
+ US
+
+ EMail: mark.duckworth@polycom.com
+
+
+ Roni Even (editor)
+ Huawei Technologies
+ Tel Aviv
+ Israel
+
+ EMail: roni.even@mail01.huawei.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+Romanow, et al. Informational [Page 17]
+