summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc6189.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc6189.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc6189.txt')
-rw-r--r--doc/rfc/rfc6189.txt6443
1 files changed, 6443 insertions, 0 deletions
diff --git a/doc/rfc/rfc6189.txt b/doc/rfc/rfc6189.txt
new file mode 100644
index 0000000..0afbf09
--- /dev/null
+++ b/doc/rfc/rfc6189.txt
@@ -0,0 +1,6443 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) P. Zimmermann
+Request for Comments: 6189 Zfone Project
+Category: Informational A. Johnston, Ed.
+ISSN: 2070-1721 Avaya
+ J. Callas
+ Apple, Inc.
+ April 2011
+
+
+ ZRTP: Media Path Key Agreement for Unicast Secure RTP
+
+Abstract
+
+ This document defines ZRTP, a protocol for media path Diffie-Hellman
+ exchange to agree on a session key and parameters for establishing
+ unicast Secure Real-time Transport Protocol (SRTP) sessions for Voice
+ over IP (VoIP) applications. The ZRTP protocol is media path keying
+ because it is multiplexed on the same port as RTP and does not
+ require support in the signaling protocol. ZRTP does not assume a
+ Public Key Infrastructure (PKI) or require the complexity of
+ certificates in end devices. For the media session, ZRTP provides
+ confidentiality, protection against man-in-the-middle (MiTM) attacks,
+ and, in cases where the signaling protocol provides end-to-end
+ integrity protection, authentication. ZRTP can utilize a Session
+ Description Protocol (SDP) attribute to provide discovery and
+ authentication through the signaling channel. To provide best effort
+ SRTP, ZRTP utilizes normal RTP/AVP (Audio-Visual Profile) profiles.
+ ZRTP secures media sessions that include a voice media stream and can
+ also secure media sessions that do not include voice by using an
+ optional digital signature.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for informational purposes.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Not all documents
+ approved by the IESG are a candidate for any level of Internet
+ Standard; see Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc6189.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 1]
+
+RFC 6189 ZRTP April 2011
+
+
+Copyright Notice
+
+ Copyright (c) 2011 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction ....................................................4
+ 2. Terminology .....................................................5
+ 3. Overview ........................................................6
+ 3.1. Key Agreement Modes ........................................7
+ 3.1.1. Diffie-Hellman Mode Overview ........................7
+ 3.1.2. Preshared Mode Overview .............................9
+ 3.1.3. Multistream Mode Overview ...........................9
+ 4. Protocol Description ...........................................10
+ 4.1. Discovery .................................................10
+ 4.1.1. Protocol Version Negotiation .......................11
+ 4.1.2. Algorithm Negotiation ..............................13
+ 4.2. Commit Contention .........................................14
+ 4.3. Matching Shared Secret Determination ......................15
+ 4.3.1. Calculation and Comparison of Hashes of
+ Shared Secrets .....................................17
+ 4.3.2. Handling a Shared Secret Cache Mismatch ............18
+ 4.4. DH and Non-DH Key Agreements ..............................19
+ 4.4.1. Diffie-Hellman Mode ................................19
+ 4.4.1.1. Hash Commitment in Diffie-Hellman Mode ....20
+ 4.4.1.2. Responder Behavior in
+ Diffie-Hellman Mode .......................21
+ 4.4.1.3. Initiator Behavior in
+ Diffie-Hellman Mode .......................22
+ 4.4.1.4. Shared Secret Calculation for DH Mode .....22
+ 4.4.2. Preshared Mode .....................................25
+ 4.4.2.1. Commitment in Preshared Mode ..............25
+ 4.4.2.2. Initiator Behavior in Preshared Mode ......26
+ 4.4.2.3. Responder Behavior in Preshared Mode ......26
+ 4.4.2.4. Shared Secret Calculation for
+ Preshared Mode ............................27
+
+
+
+
+Zimmermann, et al. Informational [Page 2]
+
+RFC 6189 ZRTP April 2011
+
+
+ 4.4.3. Multistream Mode ...................................28
+ 4.4.3.1. Commitment in Multistream Mode ............29
+ 4.4.3.2. Shared Secret Calculation for
+ Multistream Mode ..........................29
+ 4.5. Key Derivations ...........................................31
+ 4.5.1. The ZRTP Key Derivation Function ...................31
+ 4.5.2. Deriving ZRTPSess Key and SAS in DH or
+ Preshared Modes ....................................32
+ 4.5.3. Deriving the Rest of the Keys from s0 ..............33
+ 4.6. Confirmation ..............................................35
+ 4.6.1. Updating the Cache of Shared Secrets ...............35
+ 4.6.1.1. Cache Update Following a Cache Mismatch ...36
+ 4.7. Termination ...............................................37
+ 4.7.1. Termination via Error Message ......................37
+ 4.7.2. Termination via GoClear Message ....................37
+ 4.7.2.1. Key Destruction for GoClear Message .......39
+ 4.7.3. Key Destruction at Termination .....................40
+ 4.8. Random Number Generation ..................................40
+ 4.9. ZID and Cache Operation ...................................41
+ 4.9.1. Cacheless Implementations ..........................42
+ 5. ZRTP Messages ..................................................42
+ 5.1. ZRTP Message Formats ......................................44
+ 5.1.1. Message Type Block .................................44
+ 5.1.2. Hash Type Block ....................................45
+ 5.1.2.1. Negotiated Hash and MAC Algorithm .........46
+ 5.1.2.2. Implicit Hash and MAC Algorithm ...........47
+ 5.1.3. Cipher Type Block ..................................47
+ 5.1.4. Auth Tag Type Block ................................48
+ 5.1.5. Key Agreement Type Block ...........................49
+ 5.1.6. SAS Type Block .....................................51
+ 5.1.7. Signature Type Block ...............................52
+ 5.2. Hello Message .............................................53
+ 5.3. HelloACK Message ..........................................56
+ 5.4. Commit Message ............................................56
+ 5.5. DHPart1 Message ...........................................60
+ 5.6. DHPart2 Message ...........................................62
+ 5.7. Confirm1 and Confirm2 Messages ............................63
+ 5.8. Conf2ACK Message ..........................................66
+ 5.9. Error Message .............................................66
+ 5.10. ErrorACK Message .........................................68
+ 5.11. GoClear Message ..........................................68
+ 5.12. ClearACK Message .........................................69
+ 5.13. SASrelay Message .........................................69
+ 5.14. RelayACK Message .........................................72
+ 5.15. Ping Message .............................................72
+ 5.16. PingACK Message ..........................................73
+ 6. Retransmissions ................................................74
+
+
+
+
+Zimmermann, et al. Informational [Page 3]
+
+RFC 6189 ZRTP April 2011
+
+
+ 7. Short Authentication String ....................................77
+ 7.1. SAS Verified Flag .........................................78
+ 7.2. Signing the SAS ...........................................79
+ 7.2.1. OpenPGP Signatures .................................81
+ 7.2.2. ECDSA Signatures with X.509v3 Certs ................82
+ 7.2.3. Signing the SAS without a PKI ......................83
+ 7.3. Relaying the SAS through a PBX ............................84
+ 7.3.1. PBX Enrollment and the PBX Enrollment Flag .........87
+ 8. Signaling Interactions .........................................89
+ 8.1. Binding the Media Stream to the Signaling Layer
+ via the Hello Hash ........................................90
+ 8.1.1. Integrity-Protected Signaling Enables
+ Integrity-Protected DH Exchange ....................92
+ 8.2. Deriving the SRTP Secret (srtps) from the
+ Signaling Layer ...........................................93
+ 8.3. Codec Selection for Secure Media ..........................94
+ 9. False ZRTP Packet Rejection ....................................95
+ 10. Intermediary ZRTP Devices .....................................97
+ 11. The ZRTP Disclosure Flag ......................................98
+ 11.1. Guidelines on Proper Implementation of the
+ Disclosure Flag .........................................100
+ 12. Mapping between ZID and AOR (SIP URI) ........................100
+ 13. IANA Considerations ..........................................102
+ 14. Media Security Requirements ..................................102
+ 15. Security Considerations ......................................104
+ 15.1. Self-Healing Key Continuity Feature .....................107
+ 16. Acknowledgments ..............................................108
+ 17. References ...................................................109
+ 17.1. Normative References ....................................109
+ 17.2. Informative References ..................................111
+
+1. Introduction
+
+ ZRTP is a key agreement protocol that performs a Diffie-Hellman key
+ exchange during call setup in the media path and is transported over
+ the same port as the Real-time Transport Protocol (RTP) [RFC3550]
+ media stream which has been established using a signaling protocol
+ such as Session Initiation Protocol (SIP) [RFC3261]. This generates
+ a shared secret, which is then used to generate keys and salt for a
+ Secure RTP (SRTP) [RFC3711] session. ZRTP borrows ideas from
+ [PGPfone]. A reference implementation of ZRTP is available in
+ [Zfone].
+
+ The ZRTP protocol has some nice cryptographic features lacking in
+ many other approaches to media session encryption. Although it uses
+ a public key algorithm, it does not rely on a public key
+ infrastructure (PKI). In fact, it does not use persistent public
+ keys at all. It uses ephemeral Diffie-Hellman (DH) with hash
+
+
+
+Zimmermann, et al. Informational [Page 4]
+
+RFC 6189 ZRTP April 2011
+
+
+ commitment and allows the detection of man-in-the-middle (MiTM)
+ attacks by displaying a short authentication string (SAS) for the
+ users to read and verbally compare over the phone. It has Perfect
+ Forward Secrecy, meaning the keys are destroyed at the end of the
+ call, which precludes retroactively compromising the call by future
+ disclosures of key material. But even if the users are too lazy to
+ bother with short authentication strings, we still get reasonable
+ authentication against a MiTM attack, based on a form of key
+ continuity. It does this by caching some key material to use in the
+ next call, to be mixed in with the next call's DH shared secret,
+ giving it key continuity properties analogous to Secure SHell (SSH).
+ All this is done without reliance on a PKI, key certification, trust
+ models, certificate authorities, or key management complexity that
+ bedevils the email encryption world. It also does not rely on SIP
+ signaling for the key management, and in fact, it does not rely on
+ any servers at all. It performs its key agreements and key
+ management in a purely peer-to-peer manner over the RTP packet
+ stream.
+
+ ZRTP can be used and discovered without being declared or indicated
+ in the signaling path. This provides a best effort SRTP capability.
+ Also, this reduces the complexity of implementations and minimizes
+ interdependency between the signaling and media layers. However,
+ when ZRTP is indicated in the signaling via the zrtp-hash SDP
+ attribute, ZRTP has additional useful properties. By sending a hash
+ of the ZRTP Hello message in the signaling, ZRTP provides a useful
+ binding between the signaling and media paths, which is explained in
+ Section 8.1. When this is done through a signaling path that has
+ end-to-end integrity protection, the DH exchange is automatically
+ protected from a MiTM attack, which is explained in Section 8.1.1.
+
+ ZRTP is designed for unicast media sessions in which there is a voice
+ media stream. For multiparty secure conferencing, separate ZRTP
+ sessions may be negotiated between each party and the conference
+ bridge. For sessions lacking a voice media stream, MiTM protection
+ may be provided by the mechanisms in Sections 8.1.1 or 7.2. In terms
+ of the RTP topologies defined in [RFC5117], ZRTP is designed for
+ Point-to-Point topologies only.
+
+2. Terminology
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ [RFC2119].
+
+ In this document, a "call" is synonymous with a "session".
+
+
+
+
+Zimmermann, et al. Informational [Page 5]
+
+RFC 6189 ZRTP April 2011
+
+
+3. Overview
+
+ This section provides a description of how ZRTP works. This
+ description is non-normative in nature but is included to build
+ understanding of the protocol.
+
+ ZRTP is negotiated the same way a conventional RTP session is
+ negotiated in an offer/answer exchange using the standard RTP/AVP
+ profile. The ZRTP protocol begins after two endpoints have utilized
+ a signaling protocol, such as SIP, and are ready to exchange media.
+ If Interactive Connectivity Establishment (ICE) [RFC5245] is being
+ used, ZRTP begins after ICE has completed its connectivity checks.
+
+ ZRTP is multiplexed on the same ports as RTP. It uses a unique
+ header that makes it clearly differentiable from RTP or Session
+ Traversal Utilities for NAT (STUN).
+
+ ZRTP support can be discovered in the signaling path by the presence
+ of a ZRTP SDP attribute. However, even in cases where this is not
+ received in the signaling, an endpoint can still send ZRTP Hello
+ messages to see if a response is received. If a response is not
+ received, no more ZRTP messages will be sent during this session.
+ This is safe because ZRTP has been designed to be clearly different
+ from RTP and have a similar structure to STUN packets received
+ (sometimes by non-supporting endpoints) during an ICE exchange.
+
+ Both ZRTP endpoints begin the ZRTP exchange by sending a ZRTP Hello
+ message to the other endpoint. The purpose of the Hello message is
+ to confirm that the endpoint supports the protocol and to see what
+ algorithms the two ZRTP endpoints have in common.
+
+ The Hello message contains the SRTP configuration options and the
+ ZID. Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID
+ that is generated once at installation time. ZIDs are discovered
+ during the Hello message exchange. The received ZID is used to look
+ up retained shared secrets from previous ZRTP sessions with the
+ endpoint.
+
+ A response to a ZRTP Hello message is a ZRTP HelloACK message. The
+ HelloACK message simply acknowledges receipt of the Hello. Since RTP
+ commonly uses best effort UDP transport, ZRTP has retransmission
+ timers in case of lost datagrams. There are two timers, both with
+ exponential backoff mechanisms. One timer is used for
+ retransmissions of Hello messages and the other is used for
+ retransmissions of all other messages after receipt of a HelloACK.
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 6]
+
+RFC 6189 ZRTP April 2011
+
+
+ If an integrity-protected signaling channel is available, a hash of
+ the Hello message can be sent. This allows rejection of false ZRTP
+ Hello messages injected by an attacker.
+
+ Hello and other ZRTP messages also contain a hash image that is used
+ to link the messages together. This allows rejection of false ZRTP
+ messages injected during an exchange.
+
+3.1. Key Agreement Modes
+
+ After both endpoints exchange Hello and HelloACK messages, the key
+ agreement exchange can begin with the ZRTP Commit message. ZRTP
+ supports a number of key agreement modes including both Diffie-
+ Hellman and non-Diffie-Hellman modes as described in the following
+ sections.
+
+ The Commit message may be sent immediately after both endpoints have
+ completed the Hello/HelloACK discovery handshake, or it may be
+ deferred until later in the call, after the participants engage in
+ some unencrypted conversation. The Commit message may be manually
+ activated by a user interface element, such as a GO SECURE button,
+ which becomes enabled after the Hello/HelloACK discovery phase. This
+ emulates the user experience of a number of secure phones in the
+ Public Switched Telephone Network (PSTN) world [comsec]. However, it
+ is expected that most simple ZRTP user agents will omit such buttons
+ and proceed directly to secure mode by sending a Commit message
+ immediately after the Hello/HelloACK handshake.
+
+3.1.1. Diffie-Hellman Mode Overview
+
+ An example ZRTP call flow is shown in Figure 1. Note that the order
+ of the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be reversed.
+ That is, either Alice or Bob might send the first Hello message.
+ Note that the endpoint that sends the Commit message is considered
+ the initiator of the ZRTP session and drives the key agreement
+ exchange. The Diffie-Hellman public values are exchanged in the
+ DHPart1 and DHPart2 messages. SRTP keys and salts are then
+ calculated.
+
+ The initiator needs to generate its ephemeral key pair before sending
+ the Commit, and the responder generates its key pair before sending
+ DHPart1.
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 7]
+
+RFC 6189 ZRTP April 2011
+
+
+ Alice Bob
+ | |
+ | Alice and Bob establish a media session. |
+ | They initiate ZRTP on media ports |
+ | |
+ | F1 Hello (version, options, Alice's ZID) |
+ |-------------------------------------------------->|
+ | HelloACK F2 |
+ |<--------------------------------------------------|
+ | Hello (version, options, Bob's ZID) F3 |
+ |<--------------------------------------------------|
+ | F4 HelloACK |
+ |-------------------------------------------------->|
+ | |
+ | Bob acts as the initiator. |
+ | |
+ | Commit (Bob's ZID, options, hash value) F5 |
+ |<--------------------------------------------------|
+ | F6 DHPart1 (pvr, shared secret hashes) |
+ |-------------------------------------------------->|
+ | DHPart2 (pvi, shared secret hashes) F7 |
+ |<--------------------------------------------------|
+ | |
+ | Alice and Bob generate SRTP session key. |
+ | |
+ | F8 Confirm1 (MAC, D,A,V,E flags, sig) |
+ |-------------------------------------------------->|
+ | Confirm2 (MAC, D,A,V,E flags, sig) F9 |
+ |<--------------------------------------------------|
+ | F10 Conf2ACK |
+ |-------------------------------------------------->|
+ | SRTP begins |
+ |<=================================================>|
+ | |
+
+ Figure 1: Establishment of an SRTP Session Using ZRTP
+
+ ZRTP authentication uses a Short Authentication String (SAS), which
+ is ideally displayed for the human user. Alternatively, the SAS can
+ be authenticated by exchanging an optional digital signature (sig)
+ over the SAS in the Confirm1 or Confirm2 messages (described in
+ Section 7.2).
+
+ The ZRTP Confirm1 and Confirm2 messages are sent for a number of
+ reasons, not the least of which is that they confirm that all the key
+ agreement calculations were successful and thus the encryption will
+ work. They also carry other information such as the Disclosure flag
+ (D), the Allow Clear flag (A), the SAS Verified flag (V), and the
+
+
+
+Zimmermann, et al. Informational [Page 8]
+
+RFC 6189 ZRTP April 2011
+
+
+ Private Branch Exchange (PBX) Enrollment flag (E). All flags are
+ encrypted to shield them from a passive observer.
+
+3.1.2. Preshared Mode Overview
+
+ In the Preshared mode, endpoints can skip the DH calculation if they
+ have a shared secret from a previous ZRTP session. Preshared mode is
+ indicated in the Commit message and results in the same call flow as
+ Multistream mode. The principal difference between Multistream mode
+ and Preshared mode is that Preshared mode uses a previously cached
+ shared secret, rs1, instead of an active ZRTP Session key as the
+ initial keying material.
+
+ This mode could be useful for slow processor endpoints so that a DH
+ calculation does not need to be performed every session. Or, this
+ mode could be used to rapidly re-establish an earlier session that
+ was recently torn down or interrupted without the need to perform
+ another DH calculation.
+
+ Preshared mode has forward secrecy properties. If a phone's cache is
+ captured by an opponent, the cached shared secrets cannot be used to
+ recover earlier encrypted calls, because the shared secrets are
+ replaced with new ones in each new call, as in DH mode. However, the
+ captured secrets can be used by a passive wiretapper in the media
+ path to decrypt the next call, if the next call is in Preshared mode.
+ This differs from DH mode, which requires an active MiTM wiretapper
+ to exploit captured secrets in the next call. However, if the next
+ call is missed by the wiretapper, he cannot wiretap any further
+ calls. Thus, it preserves most of the self-healing properties
+ (Section 15.1) of key continuity enjoyed by DH mode.
+
+3.1.3. Multistream Mode Overview
+
+ Multistream mode is an alternative key agreement method used when two
+ endpoints have an established SRTP media stream between them with an
+ active ZRTP Session key. ZRTP can derive multiple SRTP keys from a
+ single DH exchange. For example, an established secure voice call
+ that adds a video stream uses Multistream mode to quickly initiate
+ the video stream without a second DH exchange.
+
+ When Multistream mode is indicated in the Commit message, a call flow
+ similar to Figure 1 is used, but no DH calculation is performed by
+ either endpoint and the DHPart1 and DHPart2 messages are omitted.
+ The Confirm1, Confirm2, and Conf2ACK messages are still sent. Since
+ the cache is not affected during this mode, multiple Multistream ZRTP
+ exchanges can be performed in parallel between two endpoints.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 9]
+
+RFC 6189 ZRTP April 2011
+
+
+ When adding additional media streams to an existing call, only
+ Multistream mode is used. Only one DH operation is performed, just
+ for the first media stream.
+
+4. Protocol Description
+
+ This section begins the normative description of the protocol.
+
+ ZRTP MUST be multiplexed on the same ports as the RTP media packets.
+
+ To support best effort encryption from the Media Security
+ Requirements [RFC5479], ZRTP uses normal RTP/AVP profile (AVP) media
+ lines in the initial offer/answer exchange. The ZRTP SDP attribute
+ a=zrtp-hash defined in Section 8 SHOULD be used in all offers and
+ answers to indicate support for the ZRTP protocol.
+
+ ZRTP can be utilized by endpoints that do not have a common
+ signaling protocol but both support SRTP and are relying on a
+ gateway for conversion. As such, it is not always possible for
+ the signaling protocol to relay the zrtp-hash as can be done using
+ SIP.
+
+ The Secure RTP/AVP (SAVP) profile MAY be used in subsequent offer/
+ answer exchanges after a successful ZRTP exchange has resulted in an
+ SRTP session, or if it is known that the other endpoint supports this
+ profile. Other profiles MAY also be used.
+
+ The use of the RTP/SAVP profile has caused failures in negotiating
+ best effort SRTP due to the limitations on negotiating profiles
+ using SDP. This is why ZRTP supports the RTP/AVP profile and
+ includes its own discovery mechanisms.
+
+ In all key agreement modes, the initiator SHOULD NOT send RTP media
+ after sending the Commit message, and it MUST NOT send SRTP media
+ before receiving either the Conf2ACK or the first SRTP media (with a
+ valid SRTP auth tag) from the responder. The responder SHOULD NOT
+ send RTP media after receiving the Commit message, and MUST NOT send
+ SRTP media before receiving the Confirm2 message.
+
+4.1. Discovery
+
+ During the ZRTP discovery phase, a ZRTP endpoint discovers if the
+ other endpoint supports ZRTP and the supported algorithms and
+ options. This information is transported in a Hello message, which
+ is described in Section 5.2.
+
+ ZRTP endpoints SHOULD include the SDP attribute a=zrtp-hash in offers
+ and answers, as defined in Section 8.
+
+
+
+Zimmermann, et al. Informational [Page 10]
+
+RFC 6189 ZRTP April 2011
+
+
+ The Hello message includes the ZRTP version, Hash Type, Cipher Type,
+ SRTP authentication tag type, Key Agreement Type, and Short
+ Authentication String (SAS) algorithms that are supported. The Hello
+ message also includes a hash image as described in Section 9. In
+ addition, each endpoint sends and discovers ZIDs. The received ZID
+ is used later in the protocol as an index into a cache of shared
+ secrets that were previously negotiated and retained between the two
+ parties.
+
+ A Hello message can be sent at any time, but it is usually sent at
+ the start of an RTP session to determine if the other endpoint
+ supports ZRTP and also if the SRTP implementations are compatible. A
+ Hello message is retransmitted using timer T1 and an exponential
+ backoff mechanism detailed in Section 6 until the receipt of a
+ HelloACK message or a Commit message.
+
+ The use of the a=zrtp-hash SDP attribute to authenticate the Hello
+ message is described in Section 8.1.
+
+ If a Hello message, or any other ZRTP message, indicates that there
+ is a synchronization source (SSRC) collision, an Error message
+ (Section 5.9) MUST be sent with the Error Code indicating SSRC
+ collision, and the ZRTP negotiation MUST be terminated. The
+ procedures of RFC 3550, Section 8.2 [RFC3550], SHOULD be followed by
+ both endpoints to resolve this condition, and if it is resolved, a
+ new ZRTP secure session SHOULD be negotiated.
+
+4.1.1. Protocol Version Negotiation
+
+ This specification defines ZRTP version 1.10. Since new versions of
+ ZRTP may be developed in the future, this specification defines a
+ protocol version negotiation in this section.
+
+ Each party declares what version of the ZRTP protocol they support
+ via the version field in the Hello message (Section 5.2). If both
+ parties have the same version number in their Hello messages, they
+ can proceed with the rest of the protocol. To facilitate both
+ parties reaching this state of protocol version agreement in their
+ Hello messages, ZRTP should use information provided in the signaling
+ layer, if available. If a ZRTP endpoint supports more than one
+ version of the protocol, it SHOULD declare them all in a list of SIP
+ SDP a=zrtp-hash attributes (defined in Section 8), listing separate
+ hashes, with separate ZRTP version numbers in each item in the list.
+
+ Both parties should inspect the list of ZRTP version numbers supplied
+ by the other party in the SIP SDP a=zrtp-hash attributes. Both
+ parties SHOULD choose the highest version number that appears in both
+ parties' list of a=zrtp-hash version numbers, and use that version
+
+
+
+Zimmermann, et al. Informational [Page 11]
+
+RFC 6189 ZRTP April 2011
+
+
+ for their Hello messages. If both parties use the SIP signaling in
+ this manner, their initial Hello messages will have the same ZRTP
+ version number, provided they both have at least one supported
+ protocol version in common. Before the ZRTP key agreement can
+ proceed, an endpoint MUST have sent and received Hellos with the same
+ protocol version.
+
+ It is best if the signaling layer is used to negotiate the protocol
+ version number. However, the a=zrtp-hash SDP attribute is not always
+ present in the SIP packet, as explained in Section 8.1. In the
+ absence of any guidance from the signaling layer, an endpoint MUST
+ send the highest supported version in initial Hello messages. If the
+ two parties send different protocol version numbers in their Hello
+ messages, they can reach an agreement to use a common version, if one
+ exists. They iteratively apply the following rules until they both
+ have matching version fields in their Hello messages and the key
+ agreement can proceed:
+
+ o If an endpoint receives a Hello message with an unsupported
+ version number that is higher than the endpoint's current Hello
+ message version, the received Hello message MUST be ignored. The
+ endpoint continues to retransmit Hello messages on the standard
+ retry schedule (Section 6).
+
+ o If an endpoint receives a Hello message with a version number that
+ is lower than the endpoint's current Hello message, and the
+ endpoint supports a version that is less than or equal to the
+ received version number, the endpoint MUST stop retransmitting the
+ old version number and MUST start sending a Hello message with the
+ highest supported version number that is less than or equal to the
+ received version number.
+
+ o If an endpoint receives a Hello message with an unsupported
+ version number that is lower than the endpoint's current Hello
+ message, the endpoint MUST send an Error message (Section 5.9)
+ indicating failure to support this ZRTP version.
+
+ The above comparisons are iterated until the version numbers match,
+ or until it exits on a failure to match.
+
+ For example, assume that Alice supports protocol versions 1.10 and
+ 2.00, and Bob supports versions 1.10 and 1.20. Alice initially
+ sends a Hello with version 2.00, and Bob initially sends a Hello
+ with version 1.20. Bob ignores Alice's 2.00 Hello and continues
+ to send his 1.20 Hellos. Alice detects that Bob does not support
+ 2.00 and she stops sending her 2.00 Hellos and starts sending a
+ stream of 1.10 Hellos. Bob sees the 1.10 Hello from Alice and
+ stops sending his 1.20 Hellos and switches to sending 1.10 Hellos.
+
+
+
+Zimmermann, et al. Informational [Page 12]
+
+RFC 6189 ZRTP April 2011
+
+
+ At that point, they have converged on using version 1.10 and the
+ protocol proceeds on that basis.
+
+ When comparing protocol versions, a ZRTP endpoint MUST include only
+ the first three octets of the version field in the comparison. The
+ final octet is ignored, because it is not significant for
+ interoperability. For example, "1.1 ", "1.10", "1.11", or "1.1a" are
+ all regarded as a version match, because they would all be
+ interoperable versions.
+
+ Changes in protocol version numbers are expected to be infrequent
+ after version 1.10. Supporting multiple versions adds code
+ complexity and may introduce security weaknesses in the
+ implementation. The old adage about keeping it simple applies
+ especially to implementing security protocols. Endpoints SHOULD NOT
+ support protocol versions earlier than version 1.10.
+
+4.1.2. Algorithm Negotiation
+
+ A method is provided to allow the two parties to mutually and
+ deterministically choose the same DH key size and algorithm before a
+ Commit message is sent.
+
+ Each Hello message lists the algorithms in the order of preference
+ for that ZRTP endpoint. Endpoints eliminate the non-intersecting
+ choices from each of their own lists, resulting in each endpoint
+ having a list of algorithms in common that might or might not be
+ ordered the same as the other endpoint's list. Each endpoint
+ compares the first item on their own list with the first item on the
+ other endpoint's list and SHOULD choose the faster of the two
+ algorithms. For example:
+
+ o Alice's full list: DH2k, DH3k, EC25
+
+ o Bob's full list: EC38, EC25, DH3k
+
+ o Alice's intersecting list: DH3k, EC25
+
+ o Bob's intersecting list: EC25, DH3k
+
+ o Alice's first choice is DH3k, and Bob's first choice is EC25.
+
+ o Thus, both parties choose EC25 (ECDH-256) because it's faster.
+
+ To decide which DH algorithm is faster, the following ranking, from
+ fastest to slowest is defined: DH-2048, ECDH-256, DH-3072, ECDH-384,
+ ECDH-521. These are all defined in Section 5.1.5.
+
+
+
+
+Zimmermann, et al. Informational [Page 13]
+
+RFC 6189 ZRTP April 2011
+
+
+ If both endpoints follow this method, they may each start their DH
+ calculations as soon as they receive the Hello message, and there
+ will be no need for either endpoint to discard their DH calculation
+ if the other endpoint becomes the initiator.
+
+ This method is used only to negotiate DH key size. For the rest of
+ the algorithm choices, it's simply whatever the initiator selects
+ from the algorithms in common. Note that the DH key size influences
+ the Hash Type and the size of the symmetric cipher key, as explained
+ in Section 5.1.5.
+
+ Unfavorable choices will never be made by this method, because each
+ endpoint will omit from their respective lists choices that are too
+ slow or not secure enough to meet their security policy.
+
+4.2. Commit Contention
+
+ After both parties have received compatible Hello messages, a Commit
+ message (Section 5.4) can be sent to begin the ZRTP key exchange.
+ The endpoint that sends the Commit is known as the initiator, while
+ the receiver of the Commit is known as the responder.
+
+ If both sides send Commit messages initiating a secure session at the
+ same time, the following rules are used to break the tie:
+
+ o If one Commit is for a DH mode while the other is for Preshared
+ mode, then the Preshared Commit MUST be discarded and the DH
+ Commit proceeds.
+
+ o If the two Commits are both Preshared mode, and one party has set
+ the MiTM (M) flag in the Hello message and the other has not, the
+ Commit message from the party who set the (M) flag MUST be
+ discarded, and the one who has not set the (M) flag becomes the
+ initiator, regardless of the nonce values. In other words, for
+ Preshared mode, the phone is the initiator and the PBX is the
+ responder.
+
+ o If the two Commits are either both DH modes or both non-DH modes,
+ then the Commit message with the lowest hvi (hash value of
+ initiator) value (for DH Commits), or lowest nonce value (for
+ non-DH Commits), MUST be discarded and the other side is the
+ initiator, and the protocol proceeds with the initiator's Commit.
+ The two hvi or nonce values are compared as large unsigned
+ integers in network byte order.
+
+ If one Commit is for Multistream mode while the other is for non-
+ Multistream (DH or Preshared) mode, a software error has occurred and
+ the ZRTP negotiation should be terminated. This should never occur
+
+
+
+Zimmermann, et al. Informational [Page 14]
+
+RFC 6189 ZRTP April 2011
+
+
+ because of the constraints on Multistream mode described in
+ Section 4.4.3.
+
+ In the event that Commit messages are sent by both ZRTP endpoints at
+ the same time, but are received in different media streams, the same
+ resolution rules apply as if they were received on the same stream.
+ The media stream in which the Commit was received or sent will
+ proceed through the ZRTP exchange while the media stream with the
+ discarded Commit must wait for the completion of the other ZRTP
+ exchange.
+
+ If a commit contention forces a DH Commit message to be discarded,
+ the responder's DH public value should only be discarded if it does
+ not match the initiator's DH key size. This will not happen if both
+ endpoints choose a common key size via the method described in
+ Section 4.1.2.
+
+4.3. Matching Shared Secret Determination
+
+ The following sections describe how ZRTP endpoints generate and/or
+ use the set of shared secrets s1, auxsecret, and pbxsecret through
+ the exchange of the DHPart1 and DHPart2 messages. This doesn't cover
+ the Diffie-Hellman calculations. It only covers the method whereby
+ the two parties determine if they already have shared secrets in
+ common in their caches.
+
+ Each ZRTP endpoint maintains a long-term cache of shared secrets that
+ it has previously negotiated with the other party. The ZID of the
+ other party, received in the other party's Hello message, is used as
+ an index into this cache to find the set of shared secrets, if any
+ exist. This cache entry may contain previously retained shared
+ secrets, rs1 and rs2, which give ZRTP its key continuity features.
+ If the other party is a PBX, the cache may also contain a trusted
+ MiTM PBX shared secret, called pbxsecret, defined in Section 7.3.1.
+
+ The DHPart1 and DHPart2 messages contain a list of hashes of these
+ shared secrets to allow the two endpoints to compare the hashes with
+ what they have in their caches to detect whether the two sides share
+ any secrets that can be used in the calculation of the session key.
+ The use of this shared secret cache is described in Section 4.9.
+
+ If no secret of a given type is available, a random value is
+ generated and used for that secret to ensure a mismatch in the hash
+ comparisons in the DHPart1 and DHPart2 messages. This prevents an
+ eavesdropper from knowing which types of shared secrets are available
+ between the endpoints.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 15]
+
+RFC 6189 ZRTP April 2011
+
+
+ Section 4.3.1 refers to the auxiliary shared secret auxsecret. The
+ auxsecret shared secret may be defined by the VoIP user agent out-of-
+ band from the ZRTP protocol. In some cases, it may be provided by
+ the signaling layer as srtps, which is defined in Section 8.2. If it
+ is not provided by the signaling layer, the auxsecret shared secret
+ may be manually provisioned in other application-specific ways that
+ are out of band, such as computed from a hashed pass phrase by prior
+ agreement between the two parties or supplied by a hardware token.
+ Or, it may be a family key used by an institution to which the two
+ parties both belong. It is a generalized mechanism for providing a
+ shared secret that is agreed to between the two parties out of scope
+ of the ZRTP protocol. It is expected that most typical ZRTP
+ endpoints will rarely use auxsecret.
+
+ For both the initiator and the responder, the shared secrets s1, s2,
+ and s3 will be calculated so that they can all be used later to
+ calculate s0 in Section 4.4.1.4. Here is how s1, s2, and s3 are
+ calculated by both parties.
+
+ The shared secret s1 will be either the initiator's rs1 or the
+ initiator's rs2, depending on which of them can be found in the
+ responder's cache. If the initiator's rs1 matches the responder's
+ rs1 or rs2, then s1 MUST be set to the initiator's rs1. If and only
+ if that match fails, then if the initiator's rs2 matches the
+ responder's rs1 or rs2, then s1 MUST be set to the initiator's rs2.
+ If that match also fails, then s1 MUST be set to null. The
+ complexity of the s1 calculation is to recover from any loss of cache
+ sync from an earlier aborted session, due to the Two Generals'
+ Problem [Byzantine].
+
+ The shared secret s2 MUST be set to the value of auxsecret if and
+ only if both parties have matching values for auxsecret, as
+ determined by comparing the hashes of auxsecret sent in the DH
+ messages. If they don't match, s2 MUST be set to null.
+
+ The shared secret s3 MUST be set to the value of pbxsecret if and
+ only if both parties have matching values for pbxsecret, as
+ determined by comparing the hashes of pbxsecret sent in the DH
+ messages. If they don't match, s3 MUST be set to null.
+
+ If s1, s2, or s3 have null values, they are assumed to have a zero
+ length for the purposes of hashing them later during the s0
+ calculation in Section 4.4.1.4.
+
+ The comparison of hashes of rs1, rs2, auxsecret, and pbxsecret is
+ described in Section 4.3.1.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 16]
+
+RFC 6189 ZRTP April 2011
+
+
+4.3.1. Calculation and Comparison of Hashes of Shared Secrets
+
+ Both parties calculate a set of non-invertible hashes (implemented
+ via the MAC defined in Section 5.1.2.1) of shared secrets that may be
+ present in each of their caches. These hashes are truncated to the
+ leftmost 64 bits:
+
+ rs1IDr = MAC(rs1, "Responder")
+
+ rs2IDr = MAC(rs2, "Responder")
+
+ auxsecretIDr = MAC(auxsecret, Responder's H3)
+
+ pbxsecretIDr = MAC(pbxsecret, "Responder")
+
+ rs1IDi = MAC(rs1, "Initiator")
+
+ rs2IDi = MAC(rs2, "Initiator")
+
+ auxsecretIDi = MAC(auxsecret, Initiator's H3)
+
+ pbxsecretIDi = MAC(pbxsecret, "Initiator")
+
+ The responder sends rs1IDr, rs2IDr, auxsecretIDr, and pbxsecretIDr in
+ the DHPart1 message. The initiator sends rs1IDi, rs2IDi,
+ auxsecretIDi, and pbxsecretIDi in the DHPart2 message.
+
+ The responder uses the locally computed rs1IDi, rs2IDi, auxsecretIDi,
+ and pbxsecretIDi to compare against the corresponding fields in the
+ received DHPart2 message. The initiator uses the locally computed
+ rs1IDr, rs2IDr, auxsecretIDr, and pbxsecretIDr to compare against the
+ corresponding fields in the received DHPart1 message.
+
+ From these comparisons, s1, s2, and s3 are calculated per the methods
+ described in Section 4.3. The secrets corresponding to matching
+ hashes are kept while the secrets corresponding to the non-matching
+ ones are replaced with a null, which is assumed to have a zero length
+ for the purposes of hashing them later. The resulting s1, s2, and s3
+ values are used later to calculate s0 in Section 4.4.1.4.
+
+ For example, consider two ZRTP endpoints who share secrets rs1 and
+ pbxsecret (defined in Section 7.3.1). During the comparison, rs1ID
+ and pbxsecretID will match but auxsecretID will not. As a result,
+ s1 = rs1, s2 will be null, and s3 = pbxsecret.
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 17]
+
+RFC 6189 ZRTP April 2011
+
+
+4.3.2. Handling a Shared Secret Cache Mismatch
+
+ A shared secret cache mismatch is defined to mean that we expected a
+ cache match because rs1 exists in our local cache, but we computed a
+ null value for s1 (per the method described in Section 4.3).
+
+ If one party has a cached shared secret and the other party does not,
+ this indicates one of two possible situations. Either there is a
+ MiTM attack or one of the legitimate parties has lost their cached
+ shared secret by some mishap. Perhaps they inadvertently deleted
+ their cache or their cache was lost or disrupted due to restoring
+ their disk from an earlier backup copy. The party that has the
+ surviving cache entry can easily detect that a cache mismatch has
+ occurred, because they expect their own cached secret to match the
+ other party's cached secret, but it does not match. It is possible
+ for both parties to detect this condition if both parties have
+ surviving cached secrets that have fallen out of sync, due perhaps to
+ one party restoring from a disk backup.
+
+ If either party discovers a cache mismatch, the user agent who makes
+ this discovery must treat this as a possible security event and MUST
+ alert their own user that there is a heightened risk of a MiTM
+ attack, and that the user should verbally compare the SAS with the
+ other party to ascertain that no MiTM attack has occurred. If a
+ cache mismatch is detected and it is not possible to compare the SAS,
+ either because the user interface does not support it or because one
+ or both endpoints are unmanned devices, and no other SAS comparison
+ mechanism is available, the session MAY be terminated.
+
+ The session need not be terminated on a cache mismatch event if:
+
+ o the mechanism described in Section 8.1.1 is available, which
+ allows authentication of the DH exchange without human assistance,
+ or
+
+ o any mechanism is available to determine if the SAS matches. This
+ would require either circumstances that allow human verbal
+ comparisons of the SAS or by use of the OPTIONAL digital signature
+ feature on the SAS hash, as described in Section 7.2.
+
+ Even if the user interface does not permit an SAS comparison, the
+ human user MUST be warned and may elect to proceed with the call at
+ their own risk.
+
+ If and only if a cache mismatch event occurs, the cache update
+ mechanism in Section 4.6.1 is affected, requiring the user to verify
+ the SAS before the cache is updated. The user will thus be alerted
+ of this security condition on every call until the SAS is verified.
+
+
+
+Zimmermann, et al. Informational [Page 18]
+
+RFC 6189 ZRTP April 2011
+
+
+ This is described in Section 4.6.1.1.
+
+ Here is a non-normative example of a cache-mismatch alert message
+ from a ZRTP user agent (specifically, [Zfone]), designed for a
+ desktop PC graphical user interface environment. It is by no means
+ required that the alert be this detailed:
+
+ We expected the other party to have a shared secret cached from a
+ previous call, but they don't have it. This may mean your partner
+ simply lost his cache of shared secrets, but it could also mean
+ someone is trying to wiretap you. To resolve this question you
+ must check the authentication string with your partner. If it
+ doesn't match, it indicates the presence of a wiretapper.
+
+ If the alert is rendered by a robot voice instead of a GUI, brevity
+ may be more important:
+
+ Something's wrong. You must check the authentication string with
+ your partner. If it doesn't match, it indicates the presence of a
+ wiretapper.
+
+ A mismatch of auxsecret is handled differently than a mismatch of
+ rs1. An auxsecret mismatch is defined to mean that auxsecret exists
+ locally, but we computed a null value for s2 (per the method
+ described in Section 4.3). This mismatch should be made visible to
+ whichever user has auxsecret defined. The mismatch should be made
+ visible to both users if they both have auxsecret defined but they
+ fail to match. The severity of the user notification is
+ implementation dependent. Aborting the session is not required. If
+ auxsecret matches, it should not excuse a mismatch of rs1, which
+ still requires a strong warning to the user.
+
+4.4. DH and Non-DH Key Agreements
+
+ The next step is the generation of a secret for deriving SRTP keying
+ material. ZRTP uses Diffie-Hellman and two non-Diffie-Hellman modes,
+ described in the following subsections.
+
+4.4.1. Diffie-Hellman Mode
+
+ The purpose of the Diffie-Hellman (either Finite Field Diffie-Hellman
+ or Elliptic Curve Diffie-Hellman) exchange is for the two ZRTP
+ endpoints to generate a new shared secret, s0. In addition, the
+ endpoints discover if they have any cached or previously stored
+ shared secrets in common, and it uses them as part of the calculation
+ of the session keys.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 19]
+
+RFC 6189 ZRTP April 2011
+
+
+ Because the DH exchange affects the state of the retained shared
+ secret cache, only one in-process ZRTP DH exchange may occur at a
+ time between two ZRTP endpoints. Otherwise, race conditions and
+ cache integrity problems will result. When multiple media streams
+ are established in parallel between the same pair of ZRTP endpoints
+ (determined by the ZIDs in the Hello messages), only one can be
+ processed. Once that exchange completes with Confirm2 and Conf2ACK
+ messages, another ZRTP DH exchange can begin. This constraint does
+ not apply when Multistream mode key agreement is used since the
+ cached shared secrets are not affected.
+
+4.4.1.1. Hash Commitment in Diffie-Hellman Mode
+
+ From the intersection of the algorithms in the sent and received
+ Hello messages, the initiator chooses a hash, cipher, auth tag, Key
+ Agreement Type, and SAS Type to be used.
+
+ A Diffie-Hellman mode is selected by setting the Key Agreement Type
+ in the Commit to one of the DH or Elliptic Curve Diffie-Hellman
+ (ECDH) values from the table in Section 5.1.5. In this mode, the key
+ agreement begins with the initiator choosing a fresh random Diffie-
+ Hellman (DH) secret value (svi) based on the chosen Key Agreement
+ Type value, and computing the public value. (Note that to speed up
+ processing, this computation can be done in advance.) For guidance
+ on generating random numbers, see Section 4.8.
+
+ For Finite Field Diffie-Hellman, the value for the DH generator g,
+ the DH prime p, and the length of the DH secret value, svi, are
+ defined in Section 5.1.5.
+
+ pvi = g^svi mod p
+
+ where g and p are determined by the Key Agreement Type value. The DH
+ public value pvi value is formatted as a big-endian octet string and
+ fixed to the bit-length of the DH prime; leading zeros MUST NOT be
+ truncated.
+
+ For Elliptic Curve DH, pvi is calculated and formatted according to
+ the ECDH specification in Section 5.1.5, which refers in detail to
+ certain sections of NIST SP 800-56A [NIST-SP800-56A].
+
+ The hash commitment is performed by the initiator of the ZRTP
+ exchange. The hash value of the initiator, hvi, includes a hash of
+ the entire DHPart2 message as shown in Figure 9 (which includes the
+ Diffie-Hellman public value, pvi), and the responder's Hello message
+ (where '||' means concatenation). The hvi hash is truncated to 256
+ bits:
+
+
+
+
+Zimmermann, et al. Informational [Page 20]
+
+RFC 6189 ZRTP April 2011
+
+
+ hvi = hash(initiator's DHPart2 message ||
+ responder's Hello message)
+
+ Note that the Hello message includes the fields shown in Figure 3.
+
+ The information from the responder's Hello message is included in the
+ hash calculation to prevent a bid-down attack by modification of the
+ responder's Hello message.
+
+ The initiator sends the hvi in the Commit message.
+
+ The use of hash commitment in the DH exchange constrains the attacker
+ to only one guess to generate the correct Short Authentication String
+ (SAS) (Section 7) in his attack, which means the SAS can be quite
+ short. A 16-bit SAS, for example, provides the attacker only one
+ chance out of 65536 of not being detected. Without this hash
+ commitment feature, a MiTM attacker would acquire both the pvi and
+ pvr public values from the two parties before having to choose his
+ own two DH public values for his MiTM attack. He could then use that
+ information to quickly perform a bunch of trial DH calculations for
+ both sides until he finds two with a matching SAS. To raise the cost
+ of this birthday attack, the SAS would have to be much longer. The
+ Short Authentication String would have to become a Long
+ Authentication String, which would be unacceptable to the user. A
+ hash commitment precludes this attack by forcing the MiTM to choose
+ his own two DH public values before learning the public values of
+ either of the two parties.
+
+4.4.1.2. Responder Behavior in Diffie-Hellman Mode
+
+ Upon receipt of the Commit message, the responder generates its own
+ fresh random DH secret value, svr, and computes the public value.
+ (Note that to speed up processing, this computation can be done in
+ advance, with no need to discard this computation if both endpoints
+ chose the same algorithm via Section 4.1.2.) For guidance on random
+ number generation, see Section 4.8.
+
+ For Finite Field Diffie-Hellman, the value for the DH generator g,
+ the DH prime p, and the length of the DH secret value, svr, are
+ defined in Section 5.1.5.
+
+ pvr = g^svr mod p
+
+ The pvr value is formatted as a big-endian octet string, fixed to the
+ bit-length of the DH prime; leading zeros MUST NOT be truncated.
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 21]
+
+RFC 6189 ZRTP April 2011
+
+
+ For Elliptic Curve DH, pvr is calculated and formatted according to
+ the ECDH specification in Section 5.1.5, which refers in detail to
+ certain sections of NIST SP 800-56A.
+
+ Upon receipt of the DHPart2 message, the responder checks that the
+ initiator's DH public value is not equal to 1 or p-1. An attacker
+ might inject a false DHPart2 message with a value of 1 or p-1 for
+ g^svi mod p, which would cause a disastrously weak final DH result to
+ be computed. If pvi is 1 or p-1, the user SHOULD be alerted of the
+ attack and the protocol exchange MUST be terminated. Otherwise, the
+ responder computes its own value for the hash commitment using the DH
+ public value (pvi) received in the DHPart2 message and its own Hello
+ message and compares the result with the hvi received in the Commit
+ message. If they are different, a MiTM attack is taking place and
+ the user is alerted and the protocol exchange terminated.
+
+ The responder then calculates the Diffie-Hellman result:
+
+ DHResult = pvi^svr mod p
+
+4.4.1.3. Initiator Behavior in Diffie-Hellman Mode
+
+ Upon receipt of the DHPart1 message, the initiator checks that the
+ responder's DH public value is not equal to 1 or p-1. An attacker
+ might inject a false DHPart1 message with a value of 1 or p-1 for
+ g^svr mod p, which would cause a disastrously weak final DH result to
+ be computed. If pvr is 1 or p-1, the user should be alerted of the
+ attack and the protocol exchange MUST be terminated.
+
+ The initiator then sends a DHPart2 message containing the initiator's
+ DH public value and the set of calculated shared secret IDs as
+ defined in Section 4.3.1.
+
+ The initiator calculates the same Diffie-Hellman result using:
+
+ DHResult = pvr^svi mod p
+
+4.4.1.4. Shared Secret Calculation for DH Mode
+
+ A hash of the received and sent ZRTP messages in the current ZRTP
+ exchange in the following order is calculated by both parties:
+
+ total_hash = hash(Hello of responder || Commit || DHPart1 ||
+ DHPart2)
+
+ Note that only the ZRTP messages (Figures 3, 5, 8, and 9), not the
+ entire ZRTP packets, are included in the total_hash.
+
+
+
+
+Zimmermann, et al. Informational [Page 22]
+
+RFC 6189 ZRTP April 2011
+
+
+ For both the initiator and responder, the DHResult is formatted as a
+ big-endian octet string and fixed to the width of the DH prime;
+ leading zeros MUST NOT be truncated. For example, for a 3072-bit p,
+ DHResult would be a 384 octet value, with the first octet the most
+ significant. DHResult may also be the result of an ECDH calculation,
+ which is discussed in Section 5.1.5.
+
+ Key | Size of
+ Agreement | DHResult
+ ------------------------
+ DH-3072 | 384 octets
+ ------------------------
+ DH-2048 | 256 octets
+ ------------------------
+ ECDH P-256 | 32 octets
+ ------------------------
+ ECDH P-384 | 48 octets
+ ------------------------
+
+ The authors believe the calculation of the final shared secret, s0,
+ is in compliance with the recommendations in Sections 5.8.1 and
+ 6.1.2.1 of NIST SP 800-56A [NIST-SP800-56A]. This is done by hashing
+ a concatenation of a number of items, including the DHResult, the
+ ZID's of the initiator (ZIDi) and the responder (ZIDr), the
+ total_hash, and the set of non-null shared secrets as described in
+ Section 4.3.
+
+ In Section 5.8.1 of [NIST-SP800-56A], NIST requires certain
+ parameters to be hashed together in a particular order, which NIST
+ refers to as: Z, AlgorithmID, PartyUInfo, PartyVInfo, SuppPubInfo,
+ and SuppPrivInfo. In our implementation, our DHResult corresponds to
+ Z, "ZRTP-HMAC-KDF" corresponds to AlgorithmID, our ZIDi and ZIDr
+ correspond to PartyUInfo and PartyVInfo, our total_hash corresponds
+ to SuppPubInfo, and the set of three shared secrets s1, s2, and s3
+ corresponds to SuppPrivInfo. NIST also requires a 32-bit big-endian
+ integer counter to be included in the hash each time the hash is
+ computed, which we have set to the fixed value of 1 because we only
+ compute the hash once. NIST refers to the final hash output as
+ DerivedKeyingMaterial, which corresponds to our s0 in this
+ calculation.
+
+ s0 = hash(counter || DHResult || "ZRTP-HMAC-KDF" || ZIDi ||
+ ZIDr || total_hash || len(s1) || s1 || len(s2) ||
+ s2 || len(s3) || s3)
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 23]
+
+RFC 6189 ZRTP April 2011
+
+
+ Note that temporary values s1, s2, and s3 were calculated per the
+ methods described in Section 4.3. DHResult, s1, s2, and s3 MUST all
+ be erased from memory immediately after they are used to calculate
+ s0.
+
+ The length of the DHResult field was implicitly agreed to by the
+ negotiated DH prime size. The length of total_hash is implicitly
+ determined by the negotiated hash algorithm. All of the explicit
+ length fields, len(), in the above hash are 32-bit big-endian
+ integers, giving the length in octets of the field that follows.
+ Some members of the set of shared secrets (s1, s2, and s3) may have
+ lengths of zero if they are null (not shared) and are each preceded
+ by a 4-octet length field. For example, if s2 is null, len(s2) is
+ 0x00000000, and s2 itself would be absent from the hash calculation,
+ which means len(s3) would immediately follow len(s2). While
+ inclusion of ZIDi and ZIDr may be redundant, because they are
+ implicitly included in the total_hash, we explicitly include them
+ here to follow NIST SP 800-56A. The fixed-length string "ZRTP-HMAC-
+ KDF" (not null-terminated) identifies for what purpose the resulting
+ s0 will be used, which is to serve as the key derivation key for the
+ ZRTP HMAC-based key derivation function (KDF) defined in
+ Section 4.5.1 and used in Section 4.5.3.
+
+ The authors believe ZRTP DH mode is in full compliance with two
+ relevant NIST documents that cover key derivations. First, Section
+ 5.8.1 of [NIST-SP800-56A] computes what NIST refers to as
+ DerivedKeyingMaterial, which ZRTP refers to as s0. This s0 then
+ serves as the key derivation key, which NIST refers to as KI in the
+ key derivation function described in Sections 5 and 5.1 of
+ [NIST-SP800-108], to derive all the rest of the subkeys needed by
+ ZRTP. For ECDH mode, the authors believe the s0 calculation is also
+ in compliance with Section 3.1 of the National Security Agency's
+ (NSA's) Suite B Implementer's Guide to NIST SP 800-56A
+ [NSA-Suite-B-Guide-56A].
+
+ The ZRTP key derivation function (KDF) (Section 4.5.1) requires the
+ use of a KDF Context field (per [NIST-SP800-108] guidelines), which
+ should include the ZIDi, ZIDr, and a nonce value known to both
+ parties. The total_hash qualifies as a nonce value, because its
+ computation included nonce material from the initiator's Commit
+ message and the responder's Hello message.
+
+ KDF_Context = (ZIDi || ZIDr || total_hash)
+
+ At this point in DH mode, the two endpoints proceed to the key
+ derivations of ZRTPSess and the rest of the keys in Section 4.5.2,
+ now that there is a defined s0.
+
+
+
+
+Zimmermann, et al. Informational [Page 24]
+
+RFC 6189 ZRTP April 2011
+
+
+4.4.2. Preshared Mode
+
+ The Preshared key agreement mode can be used to generate SRTP keys
+ and salts without a DH calculation, instead relying on a shared
+ secret from previous DH calculations between the endpoints.
+
+ This key agreement mode is useful to rapidly re-establish a secure
+ session between two parties who have recently started and ended a
+ secure session that has already performed a DH key agreement, without
+ performing another lengthy DH calculation, which may be desirable on
+ slow processors in resource-limited environments. Preshared mode
+ MUST NOT be used for adding additional media streams to an existing
+ call. Multistream mode MUST be used for this purpose.
+
+ In the most severe resource-limited environments, Preshared mode may
+ be useful with processors that cannot perform a DH calculation in an
+ ergonomically acceptable time limit. Shared key material may be
+ manually provisioned between two such endpoints in advance and still
+ allow a limited subset of functionality. Such a "better than
+ nothing" implementation would have to be regarded as non-compliant
+ with the ZRTP specification, but it could interoperate in Preshared
+ (and if applicable, Multistream) mode with a compliant ZRTP endpoint.
+
+ Because Preshared mode affects the state of the retained shared
+ secret cache, only one in-process ZRTP Preshared exchange may occur
+ at a time between two ZRTP endpoints. This rule is explained in more
+ detail in Section 4.4.1, and applies for the same reasons as in DH
+ mode.
+
+ Preshared mode is only included in this specification to meet the
+ R-REUSE requirement in the Media Security Requirements [RFC5479]
+ document. A series of preshared-keyed calls between two ZRTP
+ endpoints should use a DH key exchange periodically. Preshared mode
+ is only used if a cached shared secret has been established in an
+ earlier session by a DH exchange, as discussed in Section 4.9.
+
+4.4.2.1. Commitment in Preshared Mode
+
+ Preshared mode is selected by setting the Key Agreement Type to
+ Preshared in the Commit message. This results in the same call flow
+ as Multistream mode. The principal difference between Multistream
+ mode and Preshared mode is that Preshared mode uses a previously
+ cached shared secret, rs1, instead of an active ZRTP Session key,
+ ZRTPSess, as the initial keying material.
+
+ Preshared mode depends on having a reliable shared secret in its
+ cache. Before Preshared mode is used, the initial DH exchange that
+ gave rise to the shared secret SHOULD have used at least one of these
+
+
+
+Zimmermann, et al. Informational [Page 25]
+
+RFC 6189 ZRTP April 2011
+
+
+ anti-MiTM mechanisms: 1) A verbal comparison of the SAS, evidenced by
+ the SAS Verified flag, or 2) an end-to-end integrity-protected
+ delivery of the a=zrtp-hash in the signaling (Section 8.1.1), or 3) a
+ digital signature on the sashash (Section 7.2).
+
+4.4.2.2. Initiator Behavior in Preshared Mode
+
+ The Commit message (Figure 7) is sent by the initiator of the ZRTP
+ exchange. From the intersection of the algorithms in the sent and
+ received Hello messages, the initiator chooses a hash, cipher, auth
+ tag, Key Agreement Type, and SAS Type to be used.
+
+ To assemble a Preshared commit, we must first construct a temporary
+ preshared_key, which is constructed from one of several possible
+ combinations of cached key material, depending on what is available
+ in the shared secret cache. If rs1 is not available in the
+ initiator's cache, then Preshared mode MUST NOT be used.
+
+ preshared_key = hash(len(rs1) || rs1 || len(auxsecret) || auxsecret ||
+ len(pbxsecret) || pbxsecret)
+
+ All of the explicit length fields, len(), in the above hash are 32-
+ bit big-endian integers, giving the length in octets of the field
+ that follows. Some members of the set of shared secrets (rs1,
+ auxsecret, and pbxsecret) may have lengths of zero if they are null
+ (not available), and are each preceded by a 4-octet length field.
+ For example, if auxsecret is null, len(auxsecret) is 0x00000000, and
+ auxsecret itself would be absent from the hash calculation, which
+ means len(pbxsecret) would immediately follow len(auxsecret).
+
+ In place of hvi in the Commit message, two smaller fields are
+ inserted by the initiator:
+
+ - A random nonce of length 4 words (16 octets).
+
+ - A keyID = MAC(preshared_key, "Prsh") truncated to 64 bits.
+
+ Note: Since the nonce is used to calculate different SRTP key and
+ salt pairs for each session, a duplication will result in the same
+ key and salt being generated for the two sessions, which would
+ have disastrous security consequences.
+
+4.4.2.3. Responder Behavior in Preshared Mode
+
+ The responder uses the received keyID to search for matching key
+ material in its cache. It does this by computing a preshared_key
+ value and keyID value using the same formula as the initiator,
+ depending on what is available in the responder's local cache. If
+
+
+
+Zimmermann, et al. Informational [Page 26]
+
+RFC 6189 ZRTP April 2011
+
+
+ the locally computed keyID does not match the received keyID in the
+ Commit, the responder recomputes a new preshared_key and keyID from a
+ different subset of shared keys from the cache, dropping auxsecret,
+ pbxsecret, or both from the hash calculation, until a matching
+ preshared_key is found or it runs out of possibilities. Note that
+ rs2 is not included in the process.
+
+ If it finds the appropriate matching shared key material, it is used
+ to derive s0 and a new ZRTPSess key, as described in the next section
+ on shared secret calculation, Section 4.4.2.4.
+
+ If the responder determines that it does not have a cached shared
+ secret from a previous DH exchange, or it fails to match the keyID
+ hash from the initiator with any combination of its shared keys, it
+ SHOULD respond with its own DH Commit message. This would reverse
+ the roles and the responder would become the initiator, because the
+ DH Commit must always "trump" the Preshared Commit message as
+ described in Section 4.2. The key exchange would then proceed using
+ DH mode. However, if a severely resource-limited responder lacks the
+ computing resources to respond in a reasonable time with a DH Commit,
+ it MAY respond with a ZRTP Error message (Section 5.9) indicating
+ that no shared secret is available.
+
+ If both sides send Preshared Commit messages initiating a secure
+ session at the same time, the contention is resolved and the
+ initiator/responder roles are settled according to Section 4.2, and
+ the protocol proceeds.
+
+ In Preshared mode, both the DHPart1 and DHPart2 messages are skipped.
+ After receiving the Commit message from the initiator, the responder
+ sends the Confirm1 message after calculating this stream's SRTP keys,
+ as described below.
+
+4.4.2.4. Shared Secret Calculation for Preshared Mode
+
+ Preshared mode requires that the s0 and ZRTPSess keys be derived from
+ the preshared_key, and this must be done in a way that guarantees
+ uniqueness for each session. This is done by using nonce material
+ from both parties: the explicit nonce in the initiator's Preshared
+ Commit message (Figure 7) and the H3 field in the responder's Hello
+ message (Figure 3). Thus, both parties force the resulting shared
+ secret to be unique for each session.
+
+ A hash of the received and sent ZRTP messages in the current ZRTP
+ exchange for the current media stream is calculated:
+
+ total_hash = hash(Hello of responder || Commit)
+
+
+
+
+Zimmermann, et al. Informational [Page 27]
+
+RFC 6189 ZRTP April 2011
+
+
+ Note that only the ZRTP messages (Figures 3 and 7), not the entire
+ ZRTP packets, are included in the total_hash.
+
+ The ZRTP key derivation function (KDF) (Section 4.5.1) requires the
+ use of a KDF Context field (per [NIST-SP800-108] guidelines), which
+ should include the ZIDi, ZIDr, and a nonce value known to both
+ parties. The total_hash qualifies as a nonce value, because its
+ computation included nonce material from the initiator's Commit
+ message and the responder's Hello message.
+
+ KDF_Context = (ZIDi || ZIDr || total_hash)
+
+ The s0 key is derived via the ZRTP key derivation function
+ (Section 4.5.1) from preshared_key and the nonces implicitly included
+ in the total_hash. The nonces also ensure KDF_Context is unique for
+ each session, which is critical for security.
+
+ s0 = KDF(preshared_key, "ZRTP PSK", KDF_Context,
+ negotiated hash length)
+
+ The preshared_key MUST be erased as soon as it has been used to
+ calculate s0.
+
+ At this point in Preshared mode, the two endpoints proceed to the key
+ derivations of ZRTPSess and the rest of the keys in Section 4.5.2,
+ now that there is a defined s0.
+
+4.4.3. Multistream Mode
+
+ The Multistream key agreement mode can be used to generate SRTP keys
+ and salts for additional media streams established between a pair of
+ endpoints. Multistream mode cannot be used unless there is an active
+ SRTP session established between the endpoints, which means a ZRTP
+ Session key is active. This ZRTP Session key can be used to generate
+ keys and salts without performing another DH calculation. In this
+ mode, the retained shared secret cache is not used or updated. As a
+ result, multiple ZRTP Multistream mode exchanges can be processed in
+ parallel between two endpoints.
+
+ Multistream mode is also used to resume a secure call that has gone
+ clear using a GoClear message as described in Section 4.7.2.1.
+
+ When adding additional media streams to an existing call, Multistream
+ mode MUST be used. The first media stream MUST use either DH mode or
+ Preshared mode. Only one DH exchange or Preshared exchange is
+ performed, just for the first media stream. The DH exchange or
+ Preshared exchange MUST be completed for the first media stream
+ before Multistream mode is used to add any other media streams. In a
+
+
+
+Zimmermann, et al. Informational [Page 28]
+
+RFC 6189 ZRTP April 2011
+
+
+ Multistream session, a ZRTP endpoint MUST use the same ZID for all
+ media streams, matching the ZID used in the first media stream.
+
+4.4.3.1. Commitment in Multistream Mode
+
+ Multistream mode is selected by the initiator setting the Key
+ Agreement Type to "Mult" in the Commit message (Figure 6). The
+ Cipher Type, Auth Tag Length, and Hash in Multistream mode SHOULD be
+ set by the initiator to the same as the values as in the initial DH
+ Mode Commit. The SAS Type is ignored as there is no SAS
+ authentication in this mode.
+
+ Note: This requirement is needed since some endpoints cannot
+ support different SRTP algorithms for different media streams.
+ However, in the case of Multistream mode being used to go secure
+ after a GoClear, the requirement to use the same SRTP algorithms
+ is relaxed if there are no other active SRTP sessions.
+
+ In place of hvi in the Commit, a random nonce of length 4 words (16
+ octets) is chosen. Its value MUST be unique for all nonce values
+ chosen for active ZRTP sessions between a pair of endpoints. If a
+ Commit is received with a reused nonce value, the ZRTP exchange MUST
+ be immediately terminated.
+
+ Note: Since the nonce is used to calculate different SRTP key and
+ salt pairs for each media stream, a duplication will result in the
+ same key and salt being generated for the two media streams, which
+ would have disastrous security consequences.
+
+ If a Commit is received selecting Multistream mode, but the responder
+ does not have a ZRTP Session Key available, the exchange MUST be
+ terminated. Otherwise, the responder proceeds to the next section on
+ shared secret calculation, Section 4.4.3.2.
+
+ If both sides send Multistream Commit messages at the same time, the
+ contention is resolved and the initiator/responder roles are settled
+ according to Section 4.2, and the protocol proceeds.
+
+ In Multistream mode, both the DHPart1 and DHPart2 messages are
+ skipped. After receiving the Commit message from the initiator, the
+ responder sends the Confirm1 message after calculating this stream's
+ SRTP keys, as described below.
+
+4.4.3.2. Shared Secret Calculation for Multistream Mode
+
+ In Multistream mode, each media stream requires that a set of keys be
+ derived from the ZRTPSess key, and this must be done in a way that
+ guarantees uniqueness for each media stream. This is done by using
+
+
+
+Zimmermann, et al. Informational [Page 29]
+
+RFC 6189 ZRTP April 2011
+
+
+ nonce material from both parties: the explicit nonce in the
+ initiator's Multistream Commit message (Figure 6) and the H3 field in
+ the responder's Hello message (Figure 3). Thus, both parties force
+ the resulting shared secret to be unique for each media stream.
+
+ A hash of the received and sent ZRTP messages in the current ZRTP
+ exchange for the current media stream is calculated:
+
+ total_hash = hash(Hello of responder || Commit)
+
+ This refers to the Hello and Commit messages for the current media
+ stream, which is using Multistream mode, not the original media
+ stream that included a full DH key agreement. Note that only the
+ ZRTP messages (Figures 3 and 6), not the entire ZRTP packets, are
+ included in the hash.
+
+ The ZRTP key derivation function (KDF) (Section 4.5.1) requires the
+ use of a KDF Context field (per [NIST-SP800-108] guidelines), which
+ should include the ZIDi, ZIDr, and a nonce value known to both
+ parties. The total_hash qualifies as a nonce value, because its
+ computation included nonce material from the initiator's Commit
+ message and the responder's Hello message.
+
+ KDF_Context = (ZIDi || ZIDr || total_hash)
+
+ The current stream's SRTP keys and salts for the initiator and
+ responder are calculated using the ZRTP Session Key ZRTPSess and the
+ nonces implicitly included in the total_hash. The nonces also ensure
+ that KDF_Context will be unique for each media stream, which is
+ critical for security. For each additional media stream, a separate
+ s0 is derived from ZRTPSess via the ZRTP key derivation function
+ (Section 4.5.1):
+
+ s0 = KDF(ZRTPSess, "ZRTP MSK", KDF_Context,
+ negotiated hash length)
+
+ Note that the ZRTPSess key was previously derived from material that
+ also includes a different and more inclusive total_hash from the
+ entire packet sequence that performed the original DH exchange for
+ the first media stream in this ZRTP session.
+
+ At this point in Multistream mode, the two endpoints begin key
+ derivations in Section 4.5.3.
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 30]
+
+RFC 6189 ZRTP April 2011
+
+
+4.5. Key Derivations
+
+4.5.1. The ZRTP Key Derivation Function
+
+ To derive keys from a shared secret, ZRTP uses an HMAC-based key
+ derivation function, or KDF. It is used throughout Section 4.5.3 and
+ in other sections. The HMAC function for the KDF is based on the
+ negotiated hash algorithm defined in Section 5.1.2.
+
+ The authors believe the ZRTP KDF is in full compliance with the
+ recommendations in NIST SP 800-108 [NIST-SP800-108]. Section 7.5 of
+ the NIST document describes "key separation", which is a security
+ requirement for the cryptographic keys derived from the same key
+ derivation key. The keys shall be separate in the sense that the
+ compromise of some derived keys will not degrade the security
+ strength of any of the other derived keys or the security strength of
+ the key derivation key. Strong preimage resistance is provided.
+
+ The ZRTP KDF runs the NIST pseudorandom function (PRF) in counter
+ mode, with only a single iteration of the counter. The NIST PRF is
+ based on the HMAC function. The ZRTP KDF never has to generate more
+ than 256 bits (or 384 bits for Suite B applications) of output key
+ material, so only a single invocation of the HMAC function is needed.
+
+ The ZRTP KDF is defined in this manner, per Sections 5 and 5.1 of
+ [NIST-SP800-108]:
+
+ KDF(KI, Label, Context, L) = HMAC(KI, i || Label ||
+ 0x00 || Context || L)
+
+ The HMAC in the KDF is keyed by KI, which is a secret key derivation
+ key that is unknown to the wiretapper (for example, s0). The HMAC is
+ computed on a concatenated set of nonsecret fields that are defined
+ as follows. The first field is a 32-bit big-endian integer counter
+ (i) required by NIST to be included in the HMAC each time the HMAC is
+ computed, which we have set to the fixed value of 0x000001 because we
+ only compute the HMAC once. Label is a string of nonzero octets that
+ identifies the purpose for the derived keying material. The octet
+ 0x00 is a delimiter required by NIST. The NIST KDF formula has a
+ "Context" field that includes ZIDi, ZIDr, and some optional nonce
+ material known to both parties. L is a 32-bit big-endian positive
+ integer, not to exceed the length in bits of the output of the HMAC.
+ The output of the KDF is truncated to the leftmost L bits. If SHA-
+ 384 is the negotiated hash algorithm, the HMAC would be HMAC-SHA-384;
+ thus, the maximum value of L would be 384, the negotiated hash
+ length.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 31]
+
+RFC 6189 ZRTP April 2011
+
+
+ The ZRTP KDF is not to be confused with the SRTP KDF defined in
+ [RFC3711].
+
+4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared Modes
+
+ Both DH mode and Preshared mode (but not Multistream mode) come to
+ this common point in the protocol to derive ZRTPSess and the SAS from
+ s0, via the ZRTP Key Derivation Function (Section 4.5.1). At this
+ point, s0 has been calculated, as well as KDF_Context. These
+ calculations are done only for the first media stream, not for
+ Multistream mode.
+
+ The ZRTPSess key is used only for these two purposes: 1) to generate
+ the additional s0 keys (Section 4.4.3.2) for adding additional media
+ streams to this session in Multistream mode, and 2) to generate the
+ pbxsecret (Section 7.3.1) that may be cached for use in future
+ sessions. The ZRTPSess key is kept for the duration of the call
+ signaling session between the two ZRTP endpoints. That is, if there
+ are two separate calls between the endpoints (in SIP terms, separate
+ SIP dialogs), then a ZRTP Session Key MUST NOT be used across the two
+ call signaling sessions. ZRTPSess MUST be destroyed no later than
+ the end of the call signaling session.
+
+ ZRTPSess = KDF(s0, "ZRTP Session Key", KDF_Context,
+ negotiated hash length)
+
+ Note that KDF_Context is unique for each media stream, but only the
+ first media stream is permitted to calculate ZRTPSess.
+
+ There is only one Short Authentication String (SAS) (Section 7)
+ computed per call, which is applicable to all media streams derived
+ from a single DH key agreement in a ZRTP session. KDF_Context is
+ unique for each media stream, but only the first media stream is
+ permitted to calculate sashash.
+
+ sashash = KDF(s0, "SAS", KDF_Context, 256)
+
+ sasvalue = sashash [truncated to leftmost 32 bits]
+
+ Despite the exposure of the SAS to the two parties, the rest of the
+ keying material is protected by the key separation properties of the
+ KDF (Section 4.5.1).
+
+ ZRTP-enabled VoIP clients may need to support additional forms of
+ communication, such as text chat, instant messaging, or file
+ transfers. These other forms of communication may need to be
+ encrypted, and would benefit from leveraging the ZRTP key exchange
+ used for the VoIP part of the call. In that case, more key material
+
+
+
+Zimmermann, et al. Informational [Page 32]
+
+RFC 6189 ZRTP April 2011
+
+
+ MAY be derived and "exported" from the ZRTP protocol and provided as
+ a shared secret to the VoIP client for these non-VoIP purposes. The
+ application can use this exported key in application-specific ways,
+ outside the scope of the ZRTP protocol.
+
+ ExportedKey = KDF(s0, "Exported key", KDF_Context,
+ negotiated hash length)
+
+ Only one ExportedKey is computed per call. KDF_Context is unique for
+ each media stream, but only the first media stream is permitted to
+ calculate ExportedKey.
+
+ The application may use this exported key to derive other subkeys for
+ various non-ZRTP purposes, via a KDF using separate KDF label strings
+ defined by the application. This key or its derived subkeys can be
+ used for encryption, or used to authenticate other key exchanges
+ carried out by the application, protected by ZRTP's MiTM defense
+ umbrella. The exported key and its descendants may be used for as
+ long as needed by the application, maintained in a separate crypto
+ context that may outlast the VoIP session.
+
+ At this point in DH mode or Preshared mode, the two endpoints proceed
+ on to the key derivations in Section 4.5.3, now that there is a
+ defined s0 and ZRTPSess key.
+
+4.5.3. Deriving the Rest of the Keys from s0
+
+ DH mode, Multistream mode, and Preshared mode all come to this common
+ point in the protocol to derive a set of keys from s0. It can be
+ assumed that s0 has been calculated, as well the ZRTPSess key and
+ KDF_Context. A separate s0 key is associated with each media stream.
+
+ Subkeys are not drawn directly from s0, as done in NIST SP 800-56A.
+ To enhance key separation, ZRTP uses s0 to key a Key Derivation
+ Function (Section 4.5.1) based on [NIST-SP800-108]. Since s0 already
+ included total_hash in its derivation, it is redundant to use
+ total_hash again in the KDF Context in all the invocations of the KDF
+ keyed by s0. Nonetheless, NIST SP 800-108 always requires KDF
+ Context to be defined for the KDF, and nonce material is required in
+ some KDF invocations (especially for Multistream mode and Preshared
+ mode), so total_hash is included as a nonce in the KDF Context.
+
+ Separate SRTP master keys and master salts are derived for use in
+ each direction for each media stream. Unless otherwise specified,
+ ZRTP uses SRTP with no Master Key Identifier (MKI), 32-bit
+ authentication using HMAC-SHA1, AES-CM 128 or 256-bit key length,
+ 112-bit session salt key length, 2^48 key derivation rate, and SRTP
+ prefix length 0. Secure RTCP (SRTCP) is also used, deriving the
+
+
+
+Zimmermann, et al. Informational [Page 33]
+
+RFC 6189 ZRTP April 2011
+
+
+ SRTCP keys from the same master keys and salts as SRTP, using the
+ mechanisms specified in [RFC3711], without requiring a separate ZRTP
+ negotiation for RTCP.
+
+ The ZRTP initiator encrypts and the ZRTP responder decrypts packets
+ by using srtpkeyi and srtpsalti, while the ZRTP responder encrypts
+ and the ZRTP initiator decrypts packets by using srtpkeyr and
+ srtpsaltr. The SRTP key and salt values are truncated (taking the
+ leftmost bits) to the length determined by the chosen SRTP profile.
+ These are generated by:
+
+ srtpkeyi = KDF(s0, "Initiator SRTP master key", KDF_Context,
+ negotiated AES key length)
+
+ srtpsalti = KDF(s0, "Initiator SRTP master salt", KDF_Context, 112)
+
+ srtpkeyr = KDF(s0, "Responder SRTP master key", KDF_Context,
+ negotiated AES key length)
+
+ srtpsaltr = KDF(s0, "Responder SRTP master salt", KDF_Context, 112)
+
+ The MAC keys are the same length as the output of the underlying hash
+ function in the KDF and are thus generated without truncation. They
+ are used only by ZRTP and not by SRTP. Different MAC keys are needed
+ for the initiator and the responder to ensure that GoClear messages
+ in each direction are unique and can not be cached by an attacker and
+ reflected back to the endpoint.
+
+ mackeyi = KDF(s0, "Initiator HMAC key", KDF_Context,
+ negotiated hash length)
+
+ mackeyr = KDF(s0, "Responder HMAC key", KDF_Context,
+ negotiated hash length)
+
+ ZRTP keys are generated for the initiator and responder to use to
+ encrypt the Confirm1 and Confirm2 messages. They are truncated to
+ the same size as the negotiated SRTP key size.
+
+ zrtpkeyi = KDF(s0, "Initiator ZRTP key", KDF_Context,
+ negotiated AES key length)
+
+ zrtpkeyr = KDF(s0, "Responder ZRTP key", KDF_Context,
+ negotiated AES key length)
+
+ All key material is destroyed as soon as it is no longer needed, no
+ later than the end of the call. s0 is erased in Section 4.6.1, and
+ the rest of the session key material is erased in Sections 4.7.2.1
+ and 4.7.3.
+
+
+
+Zimmermann, et al. Informational [Page 34]
+
+RFC 6189 ZRTP April 2011
+
+
+4.6. Confirmation
+
+ The Confirm1 and Confirm2 messages (Figure 10) contain the cache
+ expiration interval (defined in Section 4.9) for the newly generated
+ retained shared secret. The flagoctet is an 8-bit unsigned integer
+ made up of these flags: the PBX Enrollment flag (E) defined in
+ Section 7.3.1, the SAS Verified flag (V) defined in Section 7.1, the
+ Allow Clear flag (A) defined in Section 4.7.2, and the Disclosure
+ flag (D) defined in Section 11.
+
+ flagoctet = (E * 2^3) + (V * 2^2) + (A * 2^1) + (D * 2^0)
+
+ Part of the Confirm1 and Confirm2 messages are encrypted using full-
+ block Cipher Feedback Mode and contain a 128-bit random Cipher
+ FeedBack (CFB) Initialization Vector (IV). The Confirm1 and Confirm2
+ messages also contain a MAC covering the encrypted part of the
+ Confirm1 or Confirm2 message that includes a string of zeros, the
+ signature length, flag octet, cache expiration interval, signature
+ type block (if present), and signature (Section 7.2) (if present).
+ For the responder:
+
+ confirm_mac = MAC(mackeyr, encrypted part of Confirm1)
+
+ For the initiator:
+
+ confirm_mac = MAC(mackeyi, encrypted part of Confirm2)
+
+ The mackeyi and mackeyr keys are computed in Section 4.5.3.
+
+ The exchange is completed when the responder sends either the
+ Conf2ACK message or the responder's first SRTP media packet (with a
+ valid SRTP auth tag). The initiator MUST treat the first valid SRTP
+ media from the responder as equivalent to receiving a Conf2ACK. The
+ responder may respond to Confirm2 with either SRTP media, Conf2ACK,
+ or both, in whichever order the responder chooses (or whichever order
+ the "cloud" chooses to deliver them).
+
+4.6.1. Updating the Cache of Shared Secrets
+
+ After receiving the Confirm messages, both parties must now update
+ their retained shared secret rs1 in their respective caches, provided
+ the following conditions hold:
+
+ (1) This key exchange is either DH or Preshared mode, not
+ Multistream mode, which does not update the cache.
+
+ (2) Depending on the values of the cache expiration intervals that
+ are received in the two Confirm messages, there are some
+
+
+
+Zimmermann, et al. Informational [Page 35]
+
+RFC 6189 ZRTP April 2011
+
+
+ scenarios that do not update the cache, as explained in
+ Section 4.9.
+
+ (3) The responder MUST receive the initiator's Confirm2 message
+ before updating the responder's cache.
+
+ (4) The initiator MUST receive either the responder's Conf2ACK
+ message or the responder's SRTP media (with a valid SRTP auth
+ tag) before updating the initiator's cache.
+
+ The cache update may also be affected by a cache mismatch, according
+ to Section 4.6.1.1.
+
+ For DH mode only, before updating the retained shared secret rs1 in
+ the cache, each party first discards their old rs2 and copies their
+ old rs1 to rs2. The old rs1 is saved to rs2 because of the risk of
+ session interruption after one party has updated his own rs1 but
+ before the other party has enough information to update her own rs1.
+ If that happens, they may regain cache sync in the next session by
+ using rs2 (per Section 4.3). This mitigates the well-known Two
+ Generals' Problem [Byzantine]. The old rs1 value is not saved in
+ Preshared mode.
+
+ For DH mode and Preshared mode, both parties compute a new rs1 value
+ from s0 via the ZRTP key derivation function (Section 4.5.1):
+
+ rs1 = KDF(s0, "retained secret", KDF_Context, 256)
+
+ Note that KDF_Context is unique for each media stream, but only the
+ first media stream is permitted to update rs1.
+
+ Each media stream has its own s0. At this point in the protocol for
+ each media stream, the corresponding s0 MUST be erased.
+
+4.6.1.1. Cache Update Following a Cache Mismatch
+
+ If a shared secret cache mismatch (as defined in Section 4.3.2) is
+ detected in the current session, it indicates a possible MiTM attack.
+ However, there may be evidence to the contrary, if either one of the
+ following conditions are met:
+
+ o Successful use of the mechanism described in Section 8.1.1, but
+ only if fully supported by end-to-end integrity-protected delivery
+ of the a=zrtp-hash in the signaling via SIP Identity [RFC4474] or
+ better still, Dan Wing's SIP Identity using Media Path
+ [SIP-IDENTITY]. This allows authentication of the DH exchange
+ without human assistance.
+
+
+
+
+Zimmermann, et al. Informational [Page 36]
+
+RFC 6189 ZRTP April 2011
+
+
+ o A good signature is received and verified using the digital
+ signature feature on the SAS hash, as described in Section 7.2, if
+ this feature is supported.
+
+ If there is a cache mismatch in the absence of the aforementioned
+ mitigating evidence, the cache update MUST be delayed in the current
+ session until the user verbally compares the SAS with his partner
+ during the call and confirms a successful SAS verify via his user
+ interface as described in Section 7.1. If the session ends before
+ that happens, the cache update is not performed, leaving the rs1/rs2
+ values unmodified in the cache. Regardless of whether a cache
+ mismatch occurs, s0 must still be erased.
+
+ If no cache entry exists, as is the case in the initial call, the
+ cache update is handled in the normal fashion.
+
+4.7. Termination
+
+ A ZRTP session is normally terminated at the end of a call, but it
+ may be terminated early by either the Error message or the GoClear
+ message.
+
+4.7.1. Termination via Error Message
+
+ The Error message (Section 5.9) is used to terminate an in-progress
+ ZRTP exchange due to an error. The Error message contains an integer
+ Error Code for debugging purposes. The termination of a ZRTP key
+ agreement exchange results in no updates to the cached shared secrets
+ and deletion of all crypto context for that media stream. The ZRTP
+ Session key, ZRTPSess, is only deleted if all ZRTP media streams that
+ are using it are terminated.
+
+ Because no key agreement has been reached, the Error message cannot
+ use the same MAC protection as the GoClear message. A denial of
+ service is possible by injecting fake Error messages. (However, even
+ if the Error message were somehow designed with integrity protection,
+ it would raise other questions. What would a badly formed Error
+ message mean if it were sent to report a badly formed message? A
+ good message?)
+
+4.7.2. Termination via GoClear Message
+
+ The GoClear message (Section 5.11) is used to switch from SRTP to
+ RTP, usually because the user has chosen to do that by pressing a
+ button. The GoClear uses a MAC of the Message Type Block sent in the
+ GoClear message computed with the mackey derived from the shared
+ secret. This MAC is truncated to the leftmost 64 bits. When sent by
+ the initiator:
+
+
+
+Zimmermann, et al. Informational [Page 37]
+
+RFC 6189 ZRTP April 2011
+
+
+ clear_mac = MAC(mackeyi, "GoClear ")
+
+ When sent by the responder:
+
+ clear_mac = MAC(mackeyr, "GoClear ")
+
+ Both of these MACs are calculated across the 8-octet "GoClear "
+ Message Type Block, including the trailing space.
+
+ A GoClear message that does not receive a ClearACK response must be
+ resent. If a GoClear message is received with a bad MAC, ClearACK
+ MUST NOT be sent and the GoClear MUST NOT be acted on by the
+ recipient, but it MAY be processed as a security exception, perhaps
+ by logging or alerting the user.
+
+ A ZRTP endpoint MAY choose to accept GoClear messages after the
+ session has switched to SRTP, allowing the session to revert to RTP.
+ This is indicated in the Confirm1 or Confirm2 messages (Figure 10) by
+ setting the Allow Clear flag (A). If an endpoint sets the Allow
+ Clear (A) flag in their Confirm message, it indicates that they
+ support receiving GoClear messages.
+
+ A ZRTP endpoint that receives a GoClear MUST authenticate the message
+ by checking the clear_mac. If the message authenticates, the
+ endpoint stops sending SRTP packets, and generates a ClearACK in
+ response. It MUST also delete all the crypto key material for all
+ the SRTP media streams, as defined in Section 4.7.2.1.
+
+ Until confirmation from the user is received (e.g., clicking a
+ button, pressing a dual-tone multi-frequency (DTMF) key, etc.), the
+ ZRTP endpoint MUST NOT resume sending RTP packets. The endpoint then
+ renders to the user an indication that the media session has switched
+ to clear mode and waits for confirmation from the user. This blocks
+ the flow of sensitive discourse until the user is forced to take
+ notice that he's no longer protected by encryption. To prevent
+ pinholes from closing or NAT bindings from expiring, the ClearACK
+ message MAY be resent at regular intervals (e.g., every 5 seconds)
+ while waiting for confirmation from the user. After confirmation of
+ the notification is received from the user, the sending of RTP
+ packets may begin.
+
+ After sending a GoClear message, the ZRTP endpoint stops sending SRTP
+ packets. When a ClearACK is received, the ZRTP endpoint deletes the
+ crypto context for the SRTP session, as defined in Section 4.7.2.1,
+ and may then resume sending RTP packets.
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 38]
+
+RFC 6189 ZRTP April 2011
+
+
+ In the event a ClearACK is not received before the retransmissions of
+ GoClear are exhausted, the key material is deleted, as defined in
+ Section 4.7.2.1.
+
+ After the users have transitioned from SRTP media back to RTP media
+ (clear mode), they may decide later to return to secure mode by
+ manual activation, usually by pressing a GO SECURE button. In that
+ case, a new secure session is initiated by the party that presses the
+ button, by sending a new Commit message, leading to a new session key
+ negotiation. It is not necessary to send another Hello message, as
+ the two parties have already done that at the start of the call and
+ thus have already discovered each other's ZRTP capabilities. It is
+ possible for users to toggle back and forth between clear and secure
+ modes multiple times in the same session, just as they could in the
+ old days of secure PSTN phones.
+
+4.7.2.1. Key Destruction for GoClear Message
+
+ All SRTP session key material MUST be erased by the receiver of the
+ GoClear message upon receiving a properly authenticated GoClear. The
+ same key destruction MUST be done by the sender of GoClear message,
+ upon receiving the ClearACK. This must be done for the key material
+ for all of the media streams.
+
+ All key material that would have been erased at the end of the SIP
+ session MUST be erased, as described in Section 4.7.3, with the
+ single exception of ZRTPSess. In this case, ZRTPSess is destroyed in
+ a manner different from the other key material. Both parties replace
+ ZRTPSess with a KDF-derived non-invertible function of itself:
+
+ ZRTPSess = KDF(ZRTPSess, "New ZRTP Session", (ZIDi || ZIDr),
+ negotiated hash length)
+
+ ZRTPSess will be replaced twice if a session generates separate
+ GoClear messages for both audio and video streams, and the two
+ endpoints need not carry out the replacements in the same order.
+
+ The destruction of key material meets the requirements of Perfect
+ Forward Secrecy (PFS), but still preserves a new version of ZRTPSess,
+ so that the user can later re-initiate secure mode during the same
+ session without performing another Diffie-Hellman calculation using
+ Multistream mode, which requires and assumes the existence of
+ ZRTPSess with the same value at both ZRTP endpoints. A new key
+ negotiation after a GoClear SHOULD use a Multistream Commit message.
+
+ Note: Multistream mode is preferred over a Diffie-Hellman mode
+ since this does not require the generation of a new hash chain and
+ a new signaling exchange to exchange new Hello Hash values.
+
+
+
+Zimmermann, et al. Informational [Page 39]
+
+RFC 6189 ZRTP April 2011
+
+
+ Later, at the end of the entire call, ZRTPSess is finally destroyed
+ along with the other key material, as described in Section 4.7.3.
+
+4.7.3. Key Destruction at Termination
+
+ All SRTP session key material MUST be erased by both parties at the
+ end of the call. In particular, the destroyed key material includes
+ the SRTP session keys and salts, SRTP master keys and salts, and all
+ material sufficient to reconstruct the SRTP keys and salts, including
+ ZRTPSess and s0 (although s0 should have been destroyed earlier, in
+ Section 4.6.1). This must be done for the key material for all of
+ the media streams. The only exceptions are the cached shared secrets
+ needed for future sessions, including rs1, rs2, and pbxsecret.
+
+4.8. Random Number Generation
+
+ The ZRTP protocol uses random numbers for cryptographic key material,
+ notably for the DH secret exponents and nonces, which must be freshly
+ generated with each session. Whenever a random number is needed, all
+ of the following criteria must be satisfied:
+
+ Random numbers MUST be freshly generated, meaning that they must not
+ have been used in a previous calculation.
+
+ When generating a random number k of L bits in length, k MUST be
+ chosen with equal probability from the range of [1 < k < 2^L].
+
+ It MUST be derived from a physical entropy source, such as radio
+ frequency (RF) noise, acoustic noise, thermal noise, high-resolution
+ timings of environmental events, or other unpredictable physical
+ sources of entropy. One possible source of entropy for a VoIP client
+ would be microphone noise. For a detailed explanation of
+ cryptographic grade random numbers and guidance for collecting
+ suitable entropy, see [RFC4086] and Chapter 10 of "Practical
+ Cryptography" [Ferguson]. The raw entropy must be distilled and
+ processed through a deterministic random-bit generator (DRBG).
+ Examples of DRBGs may be found in [NIST-SP800-90], in [Ferguson], and
+ in [RFC5869]. Failure to use true entropy from the physical
+ environment as a basis for generating random cryptographic key
+ material would lead to a disastrous loss of security.
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 40]
+
+RFC 6189 ZRTP April 2011
+
+
+4.9. ZID and Cache Operation
+
+ Each instance of ZRTP has a unique 96-bit random ZRTP ID, or ZID,
+ that is generated once at installation time. It is used to look up
+ retained shared secrets in a local cache. A single global ZID for a
+ single installation is the simplest way to implement ZIDs. However,
+ it is specifically not precluded for an implementation to use
+ multiple ZIDs, up to the limit of a separate one per callee. This
+ then turns it into a long-lived "association ID" that does not apply
+ to any other associations between a different pair of parties. It is
+ a goal of this protocol to permit both options to interoperate
+ freely. A PBX acting as a trusted man in the middle will also
+ generate a single ZID and use that ZID for all endpoints behind it,
+ as described in Section 10.
+
+ There is no protocol mechanism to invalidate a previously used ZID.
+ An endpoint wishing to change ZIDs would simply generate a new one
+ and begin using it.
+
+ The ZID should not be hard coded or hard defined in the firmware of a
+ product. It should be randomly generated by the software and stored
+ at installation or initialization time. It should be randomly
+ generated rather than allocated from a preassigned range of ZID
+ values, because 96 bits should be enough to avoid birthday collisions
+ in realistic scenarios.
+
+ Each time a new s0 is calculated, a new retained shared secret rs1 is
+ generated and stored in the cache, indexed by the ZID of the other
+ endpoint. This cache updating is described in Section 4.6.1. For
+ the new retained shared secret, each endpoint chooses a cache
+ expiration value that is an unsigned 32-bit integer of the number of
+ seconds that this secret should be retained in the cache. The time
+ interval is relative to when the Confirm1 message is sent or
+ received.
+
+ The cache intervals are exchanged in the Confirm1 and Confirm2
+ messages (Figure 10). The actual cache interval used by both
+ endpoints is the minimum of the values from the Confirm1 and Confirm2
+ messages. A value of 0 seconds means the newly computed shared
+ secret SHOULD NOT be stored in the cache, and if a cache entry
+ already exists from an earlier call, the stored cache interval should
+ be set to 0. This means if either Confirm message contains a null
+ cache expiration interval, and there is no cache entry already
+ defined, no new cache entry is created. A value of 0xffffffff means
+ the secret should be cached indefinitely and is the recommended
+ value. If the ZRTP exchange is Multistream mode, the field in the
+ Confirm1 and Confirm2 is set to 0xffffffff and is ignored; the cache
+ is not updated.
+
+
+
+Zimmermann, et al. Informational [Page 41]
+
+RFC 6189 ZRTP April 2011
+
+
+ The expiration interval need not be used to force the deletion of a
+ shared secret from the cache when the interval has expired. It just
+ means the shared secret MAY be deleted from that cache at any point
+ after the interval has expired without causing the other party to
+ note it as an unexpected security event when the next key negotiation
+ occurs between the same two parties. This means there need not be
+ perfectly synchronized deletion of expired secrets from the two
+ caches, and makes it easy to avoid a race condition that might
+ otherwise be caused by clock skew.
+
+ If the expiration interval is not properly agreed to by both
+ endpoints, it may later result in false alarms of MiTM attacks, due
+ to apparent cache mismatches (Section 4.3.2).
+
+ The relationship between a ZID and a SIP AOR is explained in
+ Section 12.
+
+4.9.1. Cacheless Implementations
+
+ It is possible to implement a simplified but nonetheless useful (and
+ still compliant) profile of the ZRTP protocol that does not support
+ any caching of shared secrets. In this case, the users would have to
+ rely exclusively on the verbal SAS comparison for every call. That
+ is, unless MiTM protection is provided by the mechanisms in Section
+ 8.1.1 or 7.2, which introduce their own forms of complexity.
+
+ If a ZRTP endpoint does not support the caching of shared secrets, it
+ MUST set the cache expiration interval to zero, and MUST set the SAS
+ Verified (V) flag (Section 7.1) to false. In addition, because the
+ ZID serves mainly as a cache index, the ZID would not be required to
+ maintain the same value across separate SIP sessions, although there
+ is no reason why it should not.
+
+ Cacheless operation would sacrifice the key continuity (Section 15.1)
+ features, as well as Preshared mode (Section 4.4.2). Further, if the
+ pbxsecret is also not cached, there would be no PBX trusted MiTM
+ (Section 7.3) features, including the PBX security enrollment
+ (Section 7.3.1) mechanism.
+
+5. ZRTP Messages
+
+ All ZRTP messages use the message format defined in Figure 2. All
+ word lengths referenced in this specification are 32 bits, or 4
+ octets. All integer fields are carried in network byte order, that
+ is, most-significant byte (octet) first, commonly known as big-
+ endian.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 42]
+
+RFC 6189 ZRTP April 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 0 0 1|Not Used (set to zero) | Sequence Number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Magic Cookie 'ZRTP' (0x5a525450) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Source Identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | ZRTP Message (length depends on Message Type) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | CRC (1 word) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 2: ZRTP Packet Format
+
+ The Sequence Number is a count that is incremented for each ZRTP
+ packet sent. The count is initialized to a random value. This is
+ useful in estimating ZRTP packet loss and also detecting when ZRTP
+ packets arrive out of sequence.
+
+ The ZRTP Magic Cookie is a 32-bit string that uniquely identifies a
+ ZRTP packet and has the value 0x5a525450.
+
+ Source Identifier is the SSRC number of the RTP stream to which this
+ ZRTP packet relates. For cases of forking or forwarding, RTP, and
+ hence ZRTP, may arrive at the same port from several different
+ sources -- each of these sources will have a different SSRC and may
+ initiate an independent ZRTP protocol session. SSRC collisions would
+ be disruptive to ZRTP. SSRC collision handling procedures are
+ described in Section 4.1.
+
+ This format is clearly identifiable as non-RTP due to the first two
+ bits being zero, which looks like RTP version 0, which is not a valid
+ RTP version number. It is clearly distinguishable from STUN since
+ the Magic Cookies are different. The 12 unused bits are set to zero
+ and MUST be ignored when received. In early versions of this spec,
+ ZRTP messages were encapsulated in RTP header extensions, which made
+ ZRTP an eponymous variant of RTP. In later versions, the packet
+ format changed to make it syntactically distinguishable from RTP.
+
+ The ZRTP messages are defined in Figures 3 to 17 and are of variable
+ length.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 43]
+
+RFC 6189 ZRTP April 2011
+
+
+ The ZRTP protocol uses a 32-bit Cyclic Redundancy Check (CRC) as
+ defined in RFC 4960, Appendix B [RFC4960], in each ZRTP packet to
+ detect transmission errors. ZRTP packets are typically transported
+ by UDP, which carries its own built-in 16-bit checksum for integrity,
+ but ZRTP does not rely on it. This is because of the effect of an
+ undetected transmission error in a ZRTP message. For example, an
+ undetected error in the DH exchange could appear to be an active man-
+ in-the-middle attack. A false announcement of this by ZRTP clients
+ can be psychologically distressing. The probability of such a false
+ alarm hinges on a mere 16-bit checksum that usually protects UDP
+ packets, so more error detection is needed. For these reasons, this
+ belt-and-suspenders approach is used to minimize the chance of a
+ transmission error affecting the ZRTP key agreement.
+
+ The CRC is calculated across the entire ZRTP packet shown in
+ Figure 2, including the ZRTP header and the ZRTP message, but not
+ including the CRC field. If a ZRTP message fails the CRC check, it
+ is silently discarded.
+
+5.1. ZRTP Message Formats
+
+ ZRTP messages are designed to simplify endpoint parsing requirements
+ and to reduce the opportunities for buffer overflow attacks (a good
+ goal of any security extension should be to not introduce new attack
+ vectors).
+
+ ZRTP uses a block of 8 octets (2 words) to encode the Message Type.
+ 4-octet (1 word) blocks are used to encode Hash Type, Cipher Type,
+ Key Agreement Type, and Authentication Tag Type. The values in the
+ blocks are ASCII strings that are extended with spaces (0x20) to make
+ them the desired length. Currently defined block values are listed
+ in Tables 1-6.
+
+ Additional block values may be defined and used.
+
+ ZRTP uses this ASCII encoding to simplify debugging and make it
+ "Wireshark (Ethereal) friendly".
+
+5.1.1. Message Type Block
+
+ Currently, 16 Message Type Blocks are defined -- they represent the
+ set of ZRTP message primitives. ZRTP endpoints MUST support the
+ Hello, HelloACK, Commit, DHPart1, DHPart2, Confirm1, Confirm2,
+ Conf2ACK, SASrelay, RelayACK, Error, ErrorACK, and PingACK message
+ types. ZRTP endpoints MAY support the GoClear, ClearACK, and Ping
+ messages. In order to generate a PingACK message, it is necessary to
+ parse a Ping message. Additional messages may be defined in
+ extensions to ZRTP.
+
+
+
+Zimmermann, et al. Informational [Page 44]
+
+RFC 6189 ZRTP April 2011
+
+
+ Message Type Block | Meaning
+ ---------------------------------------------------
+ "Hello " | Hello Message
+ ---------------------------------------------------
+ "HelloACK" | HelloACK Message
+ ---------------------------------------------------
+ "Commit " | Commit Message
+ ---------------------------------------------------
+ "DHPart1 " | DHPart1 Message
+ ---------------------------------------------------
+ "DHPart2 " | DHPart2 Message
+ ---------------------------------------------------
+ "Confirm1" | Confirm1 Message
+ ---------------------------------------------------
+ "Confirm2" | Confirm2 Message
+ ---------------------------------------------------
+ "Conf2ACK" | Conf2ACK Message
+ ---------------------------------------------------
+ "Error " | Error Message
+ ---------------------------------------------------
+ "ErrorACK" | ErrorACK Message
+ ---------------------------------------------------
+ "GoClear " | GoClear Message
+ ---------------------------------------------------
+ "ClearACK" | ClearACK Message
+ ---------------------------------------------------
+ "SASrelay" | SASrelay Message
+ ---------------------------------------------------
+ "RelayACK" | RelayACK Message
+ ---------------------------------------------------
+ "Ping " | Ping Message
+ ---------------------------------------------------
+ "PingACK " | PingACK Message
+ ---------------------------------------------------
+
+ Table 1. Message Type Block Values
+
+5.1.2. Hash Type Block
+
+ The hash algorithm and its related MAC algorithm are negotiated via
+ the Hash Type Block found in the Hello message (Section 5.2) and the
+ Commit message (Section 5.4).
+
+ All ZRTP endpoints MUST support a Hash Type of SHA-256 [FIPS-180-3].
+ SHA-384 SHOULD be supported and MUST be supported if ECDH-384 is
+ used. Additional Hash Types MAY be used, such as the NIST SHA-3 hash
+ [SHA-3] when it becomes available. Note that the Hash Type refers to
+
+
+
+
+Zimmermann, et al. Informational [Page 45]
+
+RFC 6189 ZRTP April 2011
+
+
+ the hash algorithm that will be used throughout the ZRTP key
+ exchange, not the hash algorithm to be used in the SRTP
+ Authentication Tag.
+
+ The choice of the negotiated Hash Type is coupled to the Key
+ Agreement Type, as explained in Section 5.1.5.
+
+ Hash Type Block | Meaning
+ ----------------------------------------------------------
+ "S256" | SHA-256 Hash defined in FIPS 180-3
+ ----------------------------------------------------------
+ "S384" | SHA-384 Hash defined in FIPS 180-3
+ ----------------------------------------------------------
+ "N256" | NIST SHA-3 256-bit hash (when published)
+ ----------------------------------------------------------
+ "N384" | NIST SHA-3 384-bit hash (when published)
+ ----------------------------------------------------------
+
+ Table 2. Hash Type Block Values
+
+ At the time of this writing, the NIST SHA-3 hashes [SHA-3] are not
+ yet available. NIST is expected to publish SHA-3 in 2012, as a
+ successor to the SHA-2 hashes in [FIPS-180-3].
+
+5.1.2.1. Negotiated Hash and MAC Algorithm
+
+ ZRTP makes use of message authentication codes (MACs) that are keyed
+ hashes based on the negotiated Hash Type. For the SHA-2 and SHA-3
+ hashes, the negotiated MAC is the HMAC based on the negotiated hash.
+ This MAC function is also used in the ZRTP key derivation function
+ (Section 4.5.1).
+
+ The HMAC function is defined in [FIPS-198-1]. A discussion of the
+ general security of the HMAC construction may be found in [RFC2104].
+ Test vectors for HMAC-SHA-256 and HMAC-SHA-384 may be found in
+ [RFC4231].
+
+ The negotiated Hash Type does not apply to the hash used in the
+ digital signature defined in Section 7.2. For example, even if the
+ negotiated Hash Type is SHA-256, the digital signature may use SHA-
+ 384 if an Elliptic Curve Digital Signature Algorithm (ECDSA) P-384
+ signature key is used. Digital signatures are optional in ZRTP.
+
+ Except for the aforementioned digital signatures, and the special
+ cases noted in Section 5.1.2.2, all the other hashes and MACs used
+ throughout the ZRTP protocol will use the negotiated Hash Type.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 46]
+
+RFC 6189 ZRTP April 2011
+
+
+ A future hash may include its own built-in MAC, not based on the HMAC
+ construct, for example, the Skein hash function [Skein]. If NIST
+ chooses such a hash as the SHA-3 winner, Hash Types "N256", and
+ "N384" will still use the related HMAC as the negotiated MAC. If an
+ implementer wishes to use Skein and its built-in MAC as the
+ negotiated MAC, new Hash Types must be used.
+
+5.1.2.2. Implicit Hash and MAC Algorithm
+
+ While most of the hash and MAC usage in ZRTP is defined by the
+ negotiated Hash Type (Section 5.1.2), some hashes and MACs must be
+ precomputed prior to negotiations, and thus cannot have their
+ algorithms negotiated during the ZRTP exchange. They are implicitly
+ predetermined to use SHA-256 [FIPS-180-3] and HMAC-SHA-256.
+
+ These are the hashes and MACs that MUST use the Implicit hash and MAC
+ algorithm:
+
+ The hash chain H0-H3 defined in Section 9.
+
+ The MACs that are keyed by this hash chain, as defined in
+ Section 8.1.1.
+
+ The Hello Hash in the a=zrtp-hash attribute defined in
+ Section 8.1.
+
+ ZRTP defines a method for negotiating different ZRTP protocol
+ versions (Section 4.1.1). SHA-256 is the Implicit Hash and HMAC-SHA-
+ 256 is the Implicit MAC for ZRTP protocol version 1.10. Future ZRTP
+ protocol versions may, if appropriate, use another hash algorithm as
+ the Implicit Hash, such as the NIST SHA-3 hash [SHA-3], when it
+ becomes available. For example, a future SIP packet may list two
+ a=zrtp-hash SDP attributes, one based on SHA-256 for ZRTP version
+ 1.10, and another based on SHA-3 for ZRTP version 2.00.
+
+5.1.3. Cipher Type Block
+
+ The block cipher algorithm is negotiated via the Cipher Type Block
+ found in the Hello message (Section 5.2) and the Commit message
+ (Section 5.4).
+
+ All ZRTP endpoints MUST support AES-128 (AES1) and MAY support AES-
+ 192 (AES2), AES-256 (AES3), or other Cipher Types. The Advanced
+ Encryption Standard is defined in [FIPS-197].
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 47]
+
+RFC 6189 ZRTP April 2011
+
+
+ The use of AES-128 in SRTP is defined by [RFC3711]. The use of AES-
+ 192 and AES-256 in SRTP is defined by [RFC6188]. The choice of the
+ AES key length is coupled to the Key Agreement Type, as explained in
+ Section 5.1.5.
+
+ Other block ciphers may be supported that have the same block size
+ and key sizes as AES. If implemented, they may be used anywhere in
+ ZRTP or SRTP in place of the AES, in the same modes of operation and
+ key size. Notably, in counter mode to replace AES-CM in [RFC3711]
+ and [RFC6188], as well as in CFB mode to encrypt a portion of the
+ Confirm message (Figure 10) and SASrelay message (Figure 16). ZRTP
+ endpoints MAY support the TwoFish [TwoFish] block cipher.
+
+ Cipher Type Block | Meaning
+ -------------------------------------------------
+ "AES1" | AES with 128-bit keys
+ -------------------------------------------------
+ "AES2" | AES with 192-bit keys
+ -------------------------------------------------
+ "AES3" | AES with 256-bit keys
+ -------------------------------------------------
+ "2FS1" | TwoFish with 128-bit keys
+ -------------------------------------------------
+ "2FS2" | TwoFish with 192-bit keys
+ -------------------------------------------------
+ "2FS3" | TwoFish with 256-bit keys
+ -------------------------------------------------
+
+ Table 3. Cipher Type Block Values
+
+5.1.4. Auth Tag Type Block
+
+ All ZRTP endpoints MUST support HMAC-SHA1 authentication tags for
+ SRTP, with both 32-bit and 80-bit length tags as defined in
+ [RFC3711].
+
+ ZRTP endpoints MAY support 32-bit and 64-bit SRTP authentication tags
+ based on the Skein hash function [Skein]. The Skein-512-MAC key
+ length is fixed at 256 bits for this application, and the output
+ length is adjustable. The Skein MAC is defined in Sections 2.6 and
+ 4.3 of [Skein] and is not based on the HMAC construct. Reference
+ implementations for Skein may be found at [Skein1]. A Skein-based
+ MAC is significantly more efficient than HMAC-SHA1, especially for
+ short SRTP payloads.
+
+ The Skein MAC key is computed by the SRTP key derivation function,
+ which is also referred to as the AES-CM PRF, or pseudorandom
+ function. This is defined either in [RFC3711] or in [RFC6188],
+
+
+
+Zimmermann, et al. Informational [Page 48]
+
+RFC 6189 ZRTP April 2011
+
+
+ depending on the selected SRTP AES key length. To compute a Skein
+ MAC key, the SRTP PRF output for the authentication key is left
+ untruncated at 256 bits, instead of the usual truncated length of 160
+ bits (the key length used by HMAC-SHA1).
+
+ Auth Tag Type Block | Meaning
+ ----------------------------------------------------------
+ "HS32" | 32-bit authentication tag based on
+ | HMAC-SHA1 as defined in RFC 3711.
+ ----------------------------------------------------------
+ "HS80" | 80-bit authentication tag based on
+ | HMAC-SHA1 as defined in RFC 3711.
+ ----------------------------------------------------------
+ "SK32" | 32-bit authentication tag based on
+ | Skein-512-MAC as defined in [Skein],
+ | with 256-bit key, 32-bit MAC length.
+ ----------------------------------------------------------
+ "SK64" | 64-bit authentication tag based on
+ | Skein-512-MAC as defined in [Skein],
+ | with 256-bit key, 64-bit MAC length.
+ ----------------------------------------------------------
+
+ Table 4. Auth Tag Type Values
+
+ Implementers should be aware that AES-GCM and AES-CCM for SRTP are
+ expected to become available when [SRTP-AES-GCM] is published as an
+ RFC. If an implementer wishes to use these modes when they become
+ available, new Auth Tag Types must be added.
+
+5.1.5. Key Agreement Type Block
+
+ All ZRTP endpoints MUST support DH3k, SHOULD support Preshared, and
+ MAY support EC25, EC38, and DH2k.
+
+ If a ZRTP endpoint supports multiple concurrent media streams, such
+ as audio and video, it MUST support Multistream (Section 4.4.3) mode.
+ Also, if a ZRTP endpoint supports the GoClear message
+ (Section 4.7.2), it SHOULD support Multistream, to be used if the two
+ parties choose to return to the secure state after going Clear (as
+ explained in Section 4.7.2.1).
+
+ For Finite Field Diffie-Hellman, ZRTP endpoints MUST use the DH
+ parameters defined in [RFC3526], as follows. DH3k uses the 3072-bit
+ modular exponentiation group (MODP). DH2k uses the 2048-bit MODP
+ group. The DH generator g is 2. The random Diffie-Hellman secret
+ exponent SHOULD be twice as long as the AES key length. If AES-128
+ is used, the DH secret value SHOULD be 256 bits long. If AES-256 is
+ used, the secret value SHOULD be 512 bits long.
+
+
+
+Zimmermann, et al. Informational [Page 49]
+
+RFC 6189 ZRTP April 2011
+
+
+ If Elliptic Curve DH is used, the ECDH algorithm and key generation
+ is from [NIST-SP800-56A]. The curves used are from [NSA-Suite-B],
+ which uses the same curves as ECDSA defined by [FIPS-186-3], and can
+ also be found in RFC 5114, Sections 2.6 through 2.8 [RFC5114]. ECDH
+ test vectors may be found in RFC 5114, appendices A.6 through A.8
+ [RFC5114]. The validation procedures are from [NIST-SP800-56A],
+ Section 5.6.2.6, method 3, Elliptic Curve Cryptography (ECC) Partial
+ Validation. Both the X and Y coordinates of the point on the curve
+ are sent, in the first and second half of the ECDH public value,
+ respectively. The ECDH result returns only the X coordinate, as
+ specified in SP 800-56A. Useful strategies for implementing ECC may
+ be found in [RFC6090].
+
+ The choice of the negotiated hash algorithm (Section 5.1.2) is
+ coupled to the choice of Key Agreement Type. If ECDH-384 (EC38) is
+ chosen as the key agreement, the negotiated hash algorithm MUST be
+ either SHA-384 or the corresponding SHA-3 successor.
+
+ The choice of AES key length is coupled to the choice of Key
+ Agreement Type. If EC38 is chosen as the key agreement, AES-256
+ (AES3) SHOULD be used but AES-192 MAY be used. If DH3k or EC25 is
+ chosen, any AES key size MAY be used. Note that SRTP as defined in
+ [RFC3711] only supports AES-128.
+
+ DH2k is intended to provide acceptable security for low power
+ applications, or for applications that require faster key
+ negotiations. NIST asserts in Table 4 of [NIST-SP800-131A] that DH-
+ 2048 is safe to use through 2013. The security of DH2k can be
+ augmented by implementing ZRTP's key continuity features
+ (Section 15.1). DH2k SHOULD use AES-128. If an implementor must use
+ slow hardware, DH2k should precede DH3k in the Hello message.
+
+ ECDH-521 SHOULD NOT be used, due to disruptive computational delays.
+ These delays may lead to exhaustion of the retransmission schedule,
+ unless both endpoints have very fast hardware. Note that ECDH-521 is
+ not part of NSA Suite B.
+
+ ZRTP also defines two non-DH modes, Multistream and Preshared, in
+ which the SRTP key is derived from a shared secret and some nonce
+ material.
+
+ The table below lists the pv length in words and DHPart1 and DHPart2
+ message length in words for each Key Agreement Type Block.
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 50]
+
+RFC 6189 ZRTP April 2011
+
+
+ Key Agreement | pv | message | Meaning
+ Type Block | words | words |
+ -----------------------------------------------------------
+ "DH3k" | 96 | 117 | DH mode with p=3072 bit prime
+ | | | per RFC 3526, Section 4.
+ -----------------------------------------------------------
+ "DH2k" | 64 | 85 | DH mode with p=2048 bit prime
+ | | | per RFC 3526, Section 3.
+ -----------------------------------------------------------
+ "EC25" | 16 | 37 | Elliptic Curve DH, P-256
+ | | | per RFC 5114, Section 2.6
+ -----------------------------------------------------------
+ "EC38" | 24 | 45 | Elliptic Curve DH, P-384
+ | | | per RFC 5114, Section 2.7
+ -----------------------------------------------------------
+ "EC52" | 33 | 54 | Elliptic Curve DH, P-521
+ | | | per RFC 5114, Section 2.8
+ | | | (deprecated - do not use)
+ -----------------------------------------------------------
+ "Prsh" | - | - | Preshared Non-DH mode
+ -----------------------------------------------------------
+ "Mult" | - | - | Multistream Non-DH mode
+ -----------------------------------------------------------
+
+ Table 5. Key Agreement Type Block Values
+
+5.1.6. SAS Type Block
+
+ The SAS Type determines how the SAS is rendered to the user so that
+ the user may verbally compare it with his partner over the voice
+ channel. This allows detection of a MiTM attack.
+
+ All ZRTP endpoints MUST support the base32 and MAY support the
+ base256 rendering schemes for the Short Authentication String, and
+ other SAS rendering schemes. See Section 4.5.2 for how the sasvalue
+ is computed and Section 7 for how the SAS is used.
+
+ SAS Type Block | Meaning
+ ---------------------------------------------------
+ "B32 " | Short Authentication String using
+ | base32 encoding
+ ---------------------------------------------------
+ "B256" | Short Authentication String using
+ | base256 encoding (PGP Word List)
+ ---------------------------------------------------
+
+ Table 6. SAS Type Block Values
+
+
+
+
+Zimmermann, et al. Informational [Page 51]
+
+RFC 6189 ZRTP April 2011
+
+
+ For the SAS Type of "B256", the most-significant (leftmost) 16 bits
+ of the 32-bit sasvalue are rendered in network byte order using the
+ PGP Word List [pgpwordlist] [Juola1][Juola2].
+
+ For the SAS Type of "B32 ", the most-significant (leftmost) 20 bits
+ of the 32-bit sasvalue are rendered as a form of base32 encoding.
+ The leftmost 20 bits of the sasvalue results in four base32
+ characters that are rendered, most-significant quintet first, to both
+ ZRTP endpoints. Here is a normative pseudocode implementation of the
+ base32 function:
+
+ char[4] base32(uint32 bits)
+ { int i, n, shift;
+ char result[4];
+ for (i=0,shift=27; i!=4; ++i,shift-=5)
+ { n = (bits>>shift) & 31;
+ result[i] = "ybndrfg8ejkmcpqxot1uwisza345h769"[n];
+ }
+ return result;
+ }
+
+ This base32 encoding scheme differs from RFC 4648, and was designed
+ (by Bryce Wilcox-O'Hearn) to represent bit sequences in a form that
+ is convenient for human users to manipulate with minimal ambiguity.
+ The unusually permuted character ordering was designed for other
+ applications that use bit sequences that do not end on quintet
+ boundaries.
+
+5.1.7. Signature Type Block
+
+ The Signature Type Block specifies what signature algorithm is used
+ to sign the SAS as discussed in Section 7.2. The 4-octet Signature
+ Type Block, along with the accompanying signature block, are OPTIONAL
+ and may be present in the Confirm message (Figure 10) or the SASrelay
+ message (Figure 16). The signature types are given in the table
+ below.
+
+ Signature | Meaning
+ Type Block |
+ ------------------------------------------------
+ "PGP " | OpenPGP Signature, per RFC 4880
+ |
+ ------------------------------------------------
+ "X509" | ECDSA, with X.509v3 cert
+ | per RFC 5759 and FIPS-186-3
+ ------------------------------------------------
+
+ Table 7. Signature Type Block Values
+
+
+
+Zimmermann, et al. Informational [Page 52]
+
+RFC 6189 ZRTP April 2011
+
+
+ Additional details on the signature and signing key format may be
+ found in Section 7.2. OpenPGP signatures (Signature Type "PGP ") are
+ discussed in Section 7.2.1. The ECDSA curves are over prime fields
+ only, drawn from Appendix D.1.2 of [FIPS-186-3]. X.509v3 ECDSA
+ Signatures (Signature Type "X509") are discussed in Section 7.2.2.
+
+5.2. Hello Message
+
+ The Hello message has the format shown in Figure 3.
+
+ All ZRTP messages begin with the preamble value 0x505a, then a 16-bit
+ length in 32-bit words. This length includes only the ZRTP message
+ (including the preamble and the length) but not the ZRTP packet
+ header or CRC. The 8-octet Message Type follows the length field.
+
+ Next, there is a 4-character string containing the version (ver) of
+ the ZRTP protocol, which is "1.10" for this specification. Next,
+ there is the Client Identifier string (cid), which is 4 words long
+ and identifies the vendor and release of the ZRTP software. The 256-
+ bit hash image H3 is defined in Section 9. The next parameter is the
+ ZID, the 96-bit-long unique identifier for the ZRTP endpoint, defined
+ in Section 4.9.
+
+ The next four bits include three flag bits:
+
+ o The Signature-capable flag (S) indicates this Hello message is
+ sent from a ZRTP endpoint which is able to parse and verify
+ digital signatures, as described in Section 7.2. If signatures
+ are not supported, the (S) flag MUST be set to zero.
+
+ o The MiTM flag (M) is a Boolean that is set to true if and only if
+ this Hello message is sent from a device, usually a PBX, that has
+ the capability to send an SASrelay message (Section 5.13).
+
+ o The Passive flag (P) is a Boolean normally set to false, and is
+ set to true if and only if this Hello message is sent from a
+ device that is configured to never send a Commit message
+ (Section 5.4). This would mean it cannot initiate secure
+ sessions, but may act as a responder.
+
+ The next 8 bits are unused and SHOULD be set to zero when sent and
+ MUST be ignored on receipt.
+
+ Next is a list of supported Hash algorithms, Cipher algorithms, SRTP
+ Auth Tag Types, Key Agreement Types, and SAS Types. The number of
+ listed algorithms are listed for each type: hc=hash count, cc=cipher
+ count, ac=auth tag count, kc=key agreement count, and sc=sas count.
+ The values for these algorithms are defined in Tables 2, 3, 4, 5, and
+
+
+
+Zimmermann, et al. Informational [Page 53]
+
+RFC 6189 ZRTP April 2011
+
+
+ 6. A count of zero means that only the mandatory-to-implement
+ algorithms are supported. Mandatory algorithms MAY be included in
+ the list. The order of the list indicates the preferences of the
+ endpoint. If a mandatory algorithm is not included in the list, it
+ is implicitly added to the end of the list for preference.
+
+ The 64-bit MAC at the end of the message is computed across the whole
+ message, not including the MAC, using the MAC algorithm defined in
+ Section 5.1.2.2. The MAC key is the sender's H2 (defined in
+ Section 9), and thus the MAC cannot be checked by the receiving party
+ until the sender's H2 value is known to the receiving party later in
+ the protocol.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 54]
+
+RFC 6189 ZRTP April 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="Hello " (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | version="1.10" (1 word) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | Client Identifier (4 words) |
+ | |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | Hash image H3 (8 words) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | ZID (3 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0|S|M|P| unused (zeros)| hc | cc | ac | kc | sc |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | hash algorithms (0 to 7 values) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | cipher algorithms (0 to 7 values) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | auth tag types (0 to 7 values) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Key Agreement Types (0 to 7 values) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SAS Types (0 to 7 values) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | MAC (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 3: Hello Message Format
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 55]
+
+RFC 6189 ZRTP April 2011
+
+
+5.3. HelloACK Message
+
+ The HelloACK message is used to stop retransmissions of a Hello
+ message. A HelloACK is sent regardless if the version number in the
+ Hello is supported or the algorithm list supported. The receipt of a
+ HelloACK stops retransmission of the Hello message. The format is
+ shown in the figure below. A Commit message may be sent in place of
+ a HelloACK by an Initiator, if a Commit message is ready to be sent
+ promptly.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="HelloACK" (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 4: HelloACK Message Format
+
+5.4. Commit Message
+
+ The Commit message is sent to initiate the key agreement process
+ after both sides have received a Hello message, which means it can
+ only be sent after receiving both a Hello message and a HelloACK
+ message. There are three subtypes of Commit messages, whose formats
+ are shown in Figures 5, 6, and 7.
+
+ The Commit message contains the Message Type Block, then the 256-bit
+ hash image H2, which is defined in Section 9. The next parameter is
+ the initiator's ZID, the 96-bit-long unique identifier for the ZRTP
+ endpoint, which MUST have the same value as was used in the Hello
+ message.
+
+ Next, there is a list of algorithms selected by the initiator (hash,
+ cipher, auth tag type, key agreement, sas type). For a DH Commit,
+ the hash value hvi is a hash of the DHPart2 of the Initiator and the
+ Responder's Hello message, as explained in Section 4.4.1.1.
+
+ The 64-bit MAC at the end of the message is computed across the whole
+ message, not including the MAC, using the MAC algorithm defined in
+ Section 5.1.2.2. The MAC key is the sender's H1 (defined in
+ Section 9), and thus the MAC cannot be checked by the receiving party
+ until the sender's H1 value is known to the receiving party later in
+ the protocol.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 56]
+
+RFC 6189 ZRTP April 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=29 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="Commit " (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | Hash image H2 (8 words) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | ZID (3 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | hash algorithm |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | cipher algorithm |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | auth tag type |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Key Agreement Type |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SAS Type |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | hvi (8 words) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | MAC (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 5: DH Commit Message Format
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 57]
+
+RFC 6189 ZRTP April 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=25 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="Commit " (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | Hash image H2 (8 words) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | ZID (3 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | hash algorithm |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | cipher algorithm |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | auth tag type |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Key Agreement Type = "Mult" |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SAS Type |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | nonce (4 words) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | MAC (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 6: Multistream Commit Message Format
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 58]
+
+RFC 6189 ZRTP April 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=27 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="Commit " (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | Hash image H2 (8 words) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | ZID (3 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | hash algorithm |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | cipher algorithm |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | auth tag type |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Key Agreement Type = "Prsh" |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | SAS Type |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | nonce (4 words) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | keyID (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | MAC (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 7: Preshared Commit Message Format
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 59]
+
+RFC 6189 ZRTP April 2011
+
+
+5.5. DHPart1 Message
+
+ The DHPart1 message shown in Figure 8 begins the DH exchange. It is
+ sent by the Responder if a valid Commit message is received from the
+ Initiator. The length of the pvr value and the length of the DHPart1
+ message depends on the Key Agreement Type chosen. This information
+ is contained in the table in Section 5.1.5. Note that for both
+ Multistream and Preshared modes, no DHPart1 or DHPart2 message will
+ be sent.
+
+ The 256-bit hash image H1 is defined in Section 9.
+
+ The next four parameters are non-invertible hashes (computed in
+ Section 4.3.1) of potential shared secrets used in generating the
+ ZRTP secret s0. The first two, rs1IDr and rs2IDr, are the hashes of
+ the responder's two retained shared secrets, truncated to 64 bits.
+ Next, there is auxsecretIDr, a hash of the responder's auxsecret
+ (defined in Section 4.3), truncated to 64 bits. The last parameter
+ is a hash of the trusted MiTM PBX shared secret pbxsecret, defined in
+ Section 7.3.1.
+
+ The 64-bit MAC at the end of the message is computed across the whole
+ message, not including the MAC, using the MAC algorithm defined in
+ Section 5.1.2.2. The MAC key is the sender's H0 (defined in
+ Section 9), and thus the MAC cannot be checked by the receiving party
+ until the sender's H0 value is known to the receiving party later in
+ the protocol.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 60]
+
+RFC 6189 ZRTP April 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="DHPart1 " (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | Hash image H1 (8 words) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | rs1IDr (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | rs2IDr (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | auxsecretIDr (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | pbxsecretIDr (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | pvr (length depends on KA Type) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | MAC (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 8: DHPart1 Message Format
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 61]
+
+RFC 6189 ZRTP April 2011
+
+
+5.6. DHPart2 Message
+
+ The DHPart2 message, shown in Figure 9, completes the DH exchange.
+ It is sent by the Initiator if a valid DHPart1 message is received
+ from the Responder. The length of the pvi value and the length of
+ the DHPart2 message depends on the Key Agreement Type chosen. This
+ information is contained in the table in Section 5.1.5. Note that
+ for both Multistream and Preshared modes, no DHPart1 or DHPart2
+ message will be sent.
+
+ The 256-bit hash image H1 is defined in Section 9.
+
+ The next four parameters are non-invertible hashes (computed in
+ Section 4.3.1) of potential shared secrets used in generating the
+ ZRTP secret s0. The first two, rs1IDi and rs2IDi, are the hashes of
+ the initiator's two retained shared secrets, truncated to 64 bits.
+ Next, there is auxsecretIDi, a hash of the initiator's auxsecret
+ (defined in Section 4.3), truncated to 64 bits. The last parameter
+ is a hash of the trusted MiTM PBX shared secret pbxsecret, defined in
+ Section 7.3.1.
+
+ The 64-bit MAC at the end of the message is computed across the whole
+ message, not including the MAC, using the MAC algorithm defined in
+ Section 5.1.2.2. The MAC key is the sender's H0 (defined in
+ Section 9), and thus the MAC cannot be checked by the receiving party
+ until the sender's H0 value is known to the receiving party later in
+ the protocol.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 62]
+
+RFC 6189 ZRTP April 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="DHPart2 " (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | Hash image H1 (8 words) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | rs1IDi (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | rs2IDi (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | auxsecretIDi (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | pbxsecretIDi (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | pvi (length depends on KA Type) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | MAC (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 9: DHPart2 Message Format
+
+5.7. Confirm1 and Confirm2 Messages
+
+ The Confirm1 message is sent by the Responder in response to a valid
+ DHPart2 message after the SRTP session key and parameters have been
+ negotiated. The Confirm2 message is sent by the Initiator in
+ response to a Confirm1 message. The format is shown in Figure 10.
+ The message contains the Message Type Block "Confirm1" or "Confirm2".
+ Next, there is the confirm_mac, a MAC computed over the encrypted
+ part of the message (shown enclosed by "====" in Figure 10). This
+ confirm_mac is keyed and computed according to Section 4.6. The next
+ 16 octets contain the CFB Initialization Vector. The rest of the
+ message is encrypted using CFB and protected by the confirm_mac.
+
+
+
+Zimmermann, et al. Informational [Page 63]
+
+RFC 6189 ZRTP April 2011
+
+
+ The first field inside the encrypted region is the hash preimage H0,
+ which is defined in detail in Section 9.
+
+ The next 15 bits are not used and SHOULD be set to zero when sent and
+ MUST be ignored when received in Confirm1 or Confirm2 messages.
+
+ The next 9 bits contain the signature length. If no SAS signature
+ (described in Section 7.2) is present, all bits are set to zero. The
+ signature length is in words and includes the signature type block.
+ If the calculated signature octet count is not a multiple of 4, zeros
+ are added to pad it out to a word boundary. If no signature is
+ present, the overall length of the Confirm1 or Confirm2 message will
+ be set to 19 words.
+
+ The next 8 bits are used for flags. Undefined flags are set to zero
+ and ignored. Four flags are currently defined. The PBX Enrollment
+ flag (E) is a Boolean bit defined in Section 7.3.1. The SAS Verified
+ flag (V) is a Boolean bit defined in Section 7.1. The Allow Clear
+ flag (A) is a Boolean bit defined in Section 4.7.2. The Disclosure
+ Flag (D) is a Boolean bit defined in Section 11. The cache
+ expiration interval is defined in Section 4.9.
+
+ If the signature length (in words) is non-zero, a signature type
+ block will be present along with a signature block. Next, there is
+ the signature block. The signature block includes the signature and
+ the key (or a link to the key) used to generate the signature
+ (Section 7.2).
+
+ CFB mode [NIST-SP800-38A] is applied with a feedback length of 128
+ bits, a full cipher block, and the final block is truncated to match
+ the exact length of the encrypted data. The CFB Initialization
+ Vector is a 128-bit random nonce. The block cipher algorithm and the
+ key size are the same as the negotiated block cipher (Section 5.1.3)
+ for media encryption. CFB is used to encrypt the part of the
+ Confirm1 message beginning after the CFB IV to the end of the message
+ (the encrypted region is enclosed by "====" in Figure 10).
+
+ The responder uses the zrtpkeyr to encrypt the Confirm1 message. The
+ initiator uses the zrtpkeyi to encrypt the Confirm2 message.
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 64]
+
+RFC 6189 ZRTP April 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="Confirm1" or "Confirm2" (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | confirm_mac (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | CFB Initialization Vector (4 words) |
+ | |
+ | |
+ +===============================================================+
+ | |
+ | Hash preimage H0 (8 words) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|E|V|A|D|
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | cache expiration interval (1 word) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | optional signature type block (1 word if present) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | optional signature block (variable length) |
+ | . . . |
+ | |
+ | |
+ +===============================================================+
+
+ Figure 10: Confirm1 and Confirm2 Message Format
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 65]
+
+RFC 6189 ZRTP April 2011
+
+
+5.8. Conf2ACK Message
+
+ The Conf2ACK message is sent by the Responder in response to a valid
+ Confirm2 message. The message format for the Conf2ACK is shown in
+ the figure below. The receipt of a Conf2ACK stops retransmission of
+ the Confirm2 message. Note that the first SRTP media (with a valid
+ SRTP auth tag) from the responder also stops retransmission of the
+ Confirm2 message.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="Conf2ACK" (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+
+ Figure 11: Conf2ACK Message Format
+
+5.9. Error Message
+
+ The Error message is sent to terminate an in-process ZRTP key
+ agreement exchange due to an error. The format is shown in the
+ figure below. The use of the Error message is described in
+ Section 4.7.1.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=4 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="Error " (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Integer Error Code (1 word) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 12: Error Message Format
+
+ Defined hexadecimal values for the Error Code are listed in the table
+ below.
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 66]
+
+RFC 6189 ZRTP April 2011
+
+
+ Error Code | Meaning
+ -----------------------------------------------------------
+ 0x10 | Malformed packet (CRC OK, but wrong structure)
+ -----------------------------------------------------------
+ 0x20 | Critical software error
+ -----------------------------------------------------------
+ 0x30 | Unsupported ZRTP version
+ -----------------------------------------------------------
+ 0x40 | Hello components mismatch
+ -----------------------------------------------------------
+ 0x51 | Hash Type not supported
+ -----------------------------------------------------------
+ 0x52 | Cipher Type not supported
+ -----------------------------------------------------------
+ 0x53 | Public key exchange not supported
+ -----------------------------------------------------------
+ 0x54 | SRTP auth tag not supported
+ -----------------------------------------------------------
+ 0x55 | SAS rendering scheme not supported
+ -----------------------------------------------------------
+ 0x56 | No shared secret available, DH mode required
+ -----------------------------------------------------------
+ 0x61 | DH Error: bad pvi or pvr ( == 1, 0, or p-1)
+ -----------------------------------------------------------
+ 0x62 | DH Error: hvi != hashed data
+ -----------------------------------------------------------
+ 0x63 | Received relayed SAS from untrusted MiTM
+ -----------------------------------------------------------
+ 0x70 | Auth Error: Bad Confirm pkt MAC
+ -----------------------------------------------------------
+ 0x80 | Nonce reuse
+ -----------------------------------------------------------
+ 0x90 | Equal ZIDs in Hello
+ -----------------------------------------------------------
+ 0x91 | SSRC collision
+ -----------------------------------------------------------
+ 0xA0 | Service unavailable
+ -----------------------------------------------------------
+ 0xB0 | Protocol timeout error
+ -----------------------------------------------------------
+ 0x100 | GoClear message received, but not allowed
+ -----------------------------------------------------------
+
+ Table 8. ZRTP Error Codes
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 67]
+
+RFC 6189 ZRTP April 2011
+
+
+5.10. ErrorACK Message
+
+ The ErrorACK message is sent in response to an Error message. The
+ receipt of an ErrorACK stops retransmission of the Error message.
+ The format is shown in the figure below.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="ErrorACK" (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 13: ErrorACK Message Format
+
+5.11. GoClear Message
+
+ Support for the GoClear message is OPTIONAL in the protocol, and it
+ is sent to switch from SRTP to RTP. The format is shown in the
+ figure below. The clear_mac is used to authenticate the GoClear
+ message so that bogus GoClear messages introduced by an attacker can
+ be detected and discarded. The use of GoClear is described in
+ Section 4.7.2.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=5 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="GoClear " (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | clear_mac (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 14: GoClear Message Format
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 68]
+
+RFC 6189 ZRTP April 2011
+
+
+5.12. ClearACK Message
+
+ Support for the ClearACK message is OPTIONAL in the protocol, and it
+ is sent to acknowledge receipt of a GoClear. A ClearACK is only sent
+ if the clear_mac from the GoClear message is authenticated.
+ Otherwise, no response is returned. The format is shown in the
+ figure below.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="ClearACK" (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 15: ClearACK Message Format
+
+5.13. SASrelay Message
+
+ The SASrelay message is sent by a trusted MiTM, most often a PBX. It
+ is not sent as a response to a packet, but is sent as a self-
+ initiated packet by the trusted MiTM (Section 7.3). It can only be
+ sent after the rest of the ZRTP key negotiations have completed,
+ after the Confirm messages and their ACKs. It can only be sent after
+ the trusted MiTM has finished key negotiations with the other party,
+ because it is the other party's SAS that is being relayed. It is
+ sent with retry logic until a RelayACK message (Section 5.14) is
+ received or the retry schedule has been exhausted.
+
+ If a device, usually a PBX, sends an SASrelay message, it MUST have
+ previously declared itself as a MiTM device by setting the MiTM (M)
+ flag in the Hello message (Section 5.2). If the receiver of the
+ SASrelay message did not previously receive a Hello message with the
+ MiTM (M) flag set, the Relayed SAS SHOULD NOT be rendered. A
+ RelayACK is still sent, but no Error message is sent.
+
+ The SASrelay message format is shown in Figure 16. The message
+ contains the Message Type Block "SASrelay". Next, there is a MAC
+ computed over the encrypted part of the message (shown enclosed by
+ "====" in Figure 16). This MAC is keyed the same way as the
+ confirm_mac in the Confirm messages (see Section 4.6). The next 16
+ octets contain the CFB Initialization Vector. The rest of the
+ message is encrypted using CFB and protected by the MAC.
+
+ The next 15 bits are not used and SHOULD be set to zero when sent,
+ and they MUST be ignored when received in SASrelay messages.
+
+
+
+Zimmermann, et al. Informational [Page 69]
+
+RFC 6189 ZRTP April 2011
+
+
+ The next 9 bits contain the signature length. The trusted MiTM MAY
+ compute a digital signature on the SAS hash, as described in
+ Section 7.2, using a persistent signing key owned by the trusted
+ MiTM. If no SAS signature is present, all bits are set to zero. The
+ signature length is in words and includes the signature type block.
+ If the calculated signature octet count is not a multiple of 4, zeros
+ are added to pad it out to a word boundary. If no signature block is
+ present, the overall length of the SASrelay message will be set to 19
+ words.
+
+ The next 8 bits are used for flags. Undefined flags are set to zero
+ and ignored. Three flags are currently defined. The Disclosure Flag
+ (D) is a Boolean bit defined in Section 11. The Allow Clear flag (A)
+ is a Boolean bit defined in Section 4.7.2. The SAS Verified flag (V)
+ is a Boolean bit defined in Section 7.1. These flags are updated
+ values to the same flags provided earlier in the Confirm message, but
+ they are updated to reflect the new flag information relayed by the
+ PBX from the other party.
+
+ The next 32-bit word contains the SAS rendering scheme for the
+ relayed sashash, which will be the same rendering scheme used by the
+ other party on the other side of the trusted MiTM. Section 7.3
+ describes how the PBX determines whether the ZRTP client regards the
+ PBX as a trusted MiTM. If the PBX determines that the ZRTP client
+ trusts the PBX, the next 8 words contain the sashash relayed from the
+ other party. The first 32-bit word of the sashash contains the
+ sasvalue, which may be rendered to the user using the specified SAS
+ rendering scheme. If this SASrelay message is being sent to a ZRTP
+ client that does not trust this MiTM, the sashash will be ignored by
+ the recipient and should be set to zeros by the PBX.
+
+ If the signature length (in words) is non-zero, a signature type
+ block will be present along with a signature block. Next, there is
+ the signature block. The signature block includes the signature and
+ the key (or a link to the key) used to generate the signature
+ (Section 7.2).
+
+ CFB mode [NIST-SP800-38A] is applied with a feedback length of 128
+ bits, a full cipher block, and the final block is truncated to match
+ the exact length of the encrypted data. The CFB Initialization
+ Vector is a 128-bit random nonce. The block cipher algorithm and the
+ key size is same as the negotiated block cipher (Section 5.1.3) for
+ media encryption. CFB is used to encrypt the part of the SASrelay
+ message beginning after the CFB IV to the end of the message (the
+ encrypted region is enclosed by "====" in Figure 16).
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 70]
+
+RFC 6189 ZRTP April 2011
+
+
+ Depending on whether the trusted MiTM had taken the role of the
+ initiator or the responder during the ZRTP key negotiation, the
+ SASrelay message is encrypted with zrtpkeyi or zrtpkeyr.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="SASrelay" (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | MAC (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | CFB Initialization Vector (4 words) |
+ | |
+ | |
+ +===============================================================+
+ | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|0|V|A|D|
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | rendering scheme of relayed SAS (1 word) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | Trusted MiTM relayed sashash (8 words) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | optional signature type block (1 word if present) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | optional signature block (variable length) |
+ | . . . |
+ | |
+ | |
+ +===============================================================+
+
+ Figure 16: SASrelay Message Format
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 71]
+
+RFC 6189 ZRTP April 2011
+
+
+5.14. RelayACK Message
+
+ The RelayACK message is sent in response to a valid SASrelay message.
+ The message format for the RelayACK is shown in the figure below.
+ The receipt of a RelayACK stops retransmission of the SASrelay
+ message.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="RelayACK" (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 17: RelayACK Message Format
+
+5.15. Ping Message
+
+ The Ping and PingACK messages are unrelated to the rest of the ZRTP
+ protocol. No ZRTP endpoint is required to generate a Ping message,
+ but every ZRTP endpoint MUST respond to a Ping message with a PingACK
+ message.
+
+ Although Ping and PingACK messages have no effect on the rest of the
+ ZRTP protocol, their inclusion in this specification simplifies the
+ design of "bump-in-the-wire" ZRTP proxies (Section 10) (notably,
+ [Zfone]). It enables proxies to be designed that do not rely on
+ assistance from the signaling layer to map out the associations
+ between media streams and ZRTP endpoints.
+
+ Before sending a ZRTP Hello message, a ZRTP proxy MAY send a Ping
+ message as a means to sort out which RTP media streams are connected
+ to particular ZRTP endpoints. Ping messages are generated only by
+ ZRTP proxies. If neither party is a ZRTP proxy, no Ping messages
+ will be encountered. Ping retransmission behavior is discussed in
+ Section 6.
+
+ The Ping message (Figure 18) contains an "EndpointHash", defined in
+ Section 5.16.
+
+ The Ping message contains a version number that defines what version
+ of PingACK is requested. If that version number is supported by the
+ Ping responder, a PingACK with a format that matches that version
+ will be received. Otherwise, a PingACK with a lower version number
+ may be received.
+
+
+
+
+Zimmermann, et al. Informational [Page 72]
+
+RFC 6189 ZRTP April 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=6 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="Ping " (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | version="1.10" (1 word) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | EndpointHash (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 18: Ping Message Format
+
+5.16. PingACK Message
+
+ A PingACK message is sent only in response to a Ping. A ZRTP
+ endpoint MUST respond to a Ping with a PingACK message. The version
+ of PingACK requested is contained in the Ping message. If that
+ version number is supported, a PingACK with a format that matches
+ that version MUST be sent. Otherwise, if the version number of the
+ Ping is not supported, a PingACK SHOULD be sent in the format of the
+ highest supported version known to the Ping responder. Only version
+ "1.10" is supported in this specification.
+
+ The PingACK message carries its own 64-bit EndpointHash, distinct
+ from the EndpointHash of the other party's Ping message. It is
+ REQUIRED that it be highly improbable for two participants in a call
+ to have the same EndpointHash and that an EndpointHash maintains a
+ persistent value between calls. For a normal ZRTP endpoint, such as
+ a ZRTP-enabled VoIP client, the EndpointHash can be just the
+ truncated ZID. For a ZRTP endpoint such as a PBX that has multiple
+ endpoints behind it, the EndpointHash must be a distinct value for
+ each endpoint behind it. It is recommended that the EndpointHash be
+ a truncated hash of the ZID of the ZRTP endpoint concatenated with
+ something unique about the actual endpoint or phone behind the PBX.
+ This may be the SIP URI of the phone, the PBX extension number, or
+ the local IP address of the phone, whichever is more readily
+ available in the application environment:
+
+ EndpointHash = hash(ZID || SIP URI of the endpoint)
+
+ EndpointHash = hash(ZID || PBX extension number of the endpoint)
+
+ EndpointHash = hash(ZID || local IP address of the endpoint)
+
+
+
+
+Zimmermann, et al. Informational [Page 73]
+
+RFC 6189 ZRTP April 2011
+
+
+ Any of these formulae confer uniqueness for the simple case of
+ terminating the ZRTP connection at the VoIP client, or the more
+ complex case of a PBX terminating the ZRTP connection for multiple
+ VoIP phones in a conference call, all sharing the PBX's ZID, but with
+ separate IP addresses behind the PBX. There is no requirement for
+ the same hash function to be used by both parties.
+
+ The PingACK message contains the EndpointHash of the sender of the
+ PingACK as well as the EndpointHash of the sender of the Ping. The
+ Source Identifier (SSRC) received in the ZRTP header from the Ping
+ packet (Figure 2) is copied into the PingACK message body
+ (Figure 19). This SSRC is not the SSRC of the sender of the PingACK.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=9 words |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Message Type Block="PingACK " (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | version="1.10" (1 word) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | EndpointHash of PingACK Sender (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | EndpointHash of Received Ping (2 words) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Source Identifier (SSRC) of Received Ping (1 word) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 19: PingACK Message Format
+
+6. Retransmissions
+
+ ZRTP uses two retransmission timers T1 and T2. T1 is used for
+ retransmission of Hello messages, when the support of ZRTP by the
+ other endpoint may not be known. T2 is used in retransmissions of
+ all the other ZRTP messages.
+
+ All message retransmissions MUST be identical to the initial message
+ including nonces, public values, etc; otherwise, hashes of the
+ message sequences may not agree.
+
+ Practical experience has shown that RTP packet loss at the start of
+ an RTP session can be extremely high. Since the entire ZRTP message
+ exchange occurs during this period, the defined retransmission scheme
+
+
+
+Zimmermann, et al. Informational [Page 74]
+
+RFC 6189 ZRTP April 2011
+
+
+ is defined to be aggressive. Since ZRTP packets with the exception
+ of the DHPart1 and DHPart2 messages are small, this should have
+ minimal effect on overall bandwidth utilization of the media session.
+
+ ZRTP endpoints MUST NOT exceed the bandwidth of the resulting media
+ session as determined by the offer/answer exchange in the signaling
+ layer.
+
+ The Ping message (Section 5.15) may follow the same retransmission
+ schedule as the Hello message, but this is not required in this
+ specification. Ping message retransmission is subject to
+ application-specific ZRTP proxy heuristics.
+
+ Hello ZRTP messages are retransmitted at an interval that starts at
+ T1 seconds and doubles after every retransmission, capping at 200 ms.
+ T1 has a recommended initial value of 50 ms. A Hello message is
+ retransmitted 20 times before giving up, which means the entire retry
+ schedule for Hello messages is exhausted after 3.75 seconds (50 + 100
+ + 18*200 ms). Retransmission of a Hello ends upon receipt of a
+ HelloACK or Commit message.
+
+ The post-Hello ZRTP messages are retransmitted only by the session
+ initiator -- that is, only Commit, DHPart2, and Confirm2 are
+ retransmitted if the corresponding message from the responder,
+ DHPart1, Confirm1, and Conf2ACK, are not received. Note that the
+ Confirm2 message retransmission can also be stopped by receiving the
+ first SRTP media (with a valid SRTP auth tag) from the responder.
+
+ The GoClear, Error, and SASrelay messages may be initiated and
+ retransmitted by either party, and responded to by the other party,
+ regardless of which party is the overall session initiator. They are
+ retransmitted if the corresponding response message ClearACK,
+ ErrorACK, and RelayACK are not received.
+
+ Non-Hello (and non-Ping) ZRTP messages are retransmitted at an
+ interval that starts at T2 seconds and doubles after every
+ retransmission, capping at 1200 ms. T2 has a recommended initial
+ value of 150 ms. Each non-Hello message is retransmitted 10 times
+ before giving up, which means the entire retry schedule is exhausted
+ after 9.45 seconds (150 + 300 + 600 + 7*1200 ms). Only the initiator
+ performs retransmissions. Each message has a response message that
+ stops retransmissions, as shown in the table below. The higher
+ values of T2 means that retransmissions will likely occur only in the
+ event of packet loss.
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 75]
+
+RFC 6189 ZRTP April 2011
+
+
+ Message Acknowledgement Message
+ ------- -----------------------
+ Hello HelloACK or Commit
+ Commit DHPart1 or Confirm1
+ DHPart2 Confirm1
+ Confirm2 Conf2ACK or SRTP media
+ GoClear ClearACK
+ Error ErrorACK
+ SASrelay RelayACK
+ Ping PingACK
+
+ Table 9. Retransmitted ZRTP Messages and Responses
+
+ The retry schedule must handle not only packet loss, but also slow or
+ heavily loaded peers that need additional time to perform their DH
+ calculations. The following mitigations are recommended:
+
+ o Slow or heavily loaded ZRTP endpoints that are at risk of taking
+ too long to perform their DH calculation SHOULD use a HelloACK
+ message instead of a Commit message to reply to a Hello from the
+ other party.
+
+ o If a ZRTP endpoint has evidence that the other party is a ZRTP
+ endpoint, by receiving a Hello message or Ping message, or by
+ receiving a Hello Hash in the signaling layer, it SHOULD extend
+ its own Hello retry schedule to span at least 12 seconds of
+ retries. If this extended Hello retry schedule is exhausted
+ without receiving a HelloACK or Commit message, a late Commit
+ message from the peer SHOULD still be accepted.
+
+ These recommended retransmission intervals are designed for a typical
+ broadband Internet connection. In some high-latency communication
+ channels, such as those provided by some mobile phone environments or
+ geostationary satellites, a different retransmission schedule may be
+ used. The initial value for the T1 or T2 retransmission timer should
+ be increased to be no less than the round-trip time provided by the
+ communications channel. It should take into account the time
+ required to transmit the entire message and the entire reply, as well
+ as a reasonable time estimate to perform the DH calculation.
+
+ ZRTP has its own retransmission schedule because it is carried along
+ with RTP, usually over UDP. In unusual cases, RTP can run over a
+ non-UDP transport, such as TCP or DCCP, which provides its own
+ built-in retransmission mechanism. It may be hard for the ZRTP
+ endpoint to detect that TCP is being used if media relays are
+ involved. The ZRTP endpoint may be sending only UDP, but there may
+ be a media relay along the media path that converts from UDP to TCP
+ for part of the journey. Or, if the ZRTP endpoint is sending TCP,
+
+
+
+Zimmermann, et al. Informational [Page 76]
+
+RFC 6189 ZRTP April 2011
+
+
+ the media relay might be converting from TCP to UDP. There have been
+ empirical observations of this in the wild. In cases where TCP is
+ used, ZRTP and TCP might together generate some extra
+ retransmissions. It is tempting to avoid this effect by eliminating
+ the ZRTP retransmission schedule when connected to a TCP channel, but
+ that would risk failure of the protocol, because it may not be TCP
+ all the way to the remote ZRTP endpoint. It only takes a few packets
+ to complete a ZRTP exchange, so trying to optimize out the extra
+ retransmissions in that scenario is not worth the risk.
+
+ After receiving a Commit message, but before receiving a Confirm2
+ message, if a ZRTP responder receives no ZRTP messages for more than
+ 10 seconds, the responder MAY send a protocol timeout Error message
+ and terminate the ZRTP protocol.
+
+7. Short Authentication String
+
+ This section will discuss the implementation of the Short
+ Authentication String, or SAS in ZRTP. The SAS can be verbally
+ compared by the human users reading the string aloud, or it can be
+ compared by validating an OPTIONAL digital signature (described in
+ Section 7.2) exchanged in the Confirm1 or Confirm2 messages.
+
+ The use of hash commitment in the DH exchange (Section 4.4.1.1)
+ constrains the attacker to only one guess to generate the correct SAS
+ in his attack, which means the SAS can be quite short. A 16-bit SAS,
+ for example, provides the attacker only one chance out of 65536 of
+ not being detected. How the hash commitment enables the SAS to be so
+ short is explained in Section 4.4.1.1.
+
+ There is only one SAS value computed per call. That is the SAS value
+ for the first media stream established, which is calculated in
+ Section 4.5.2. This SAS applies to all media streams for the same
+ session.
+
+ The SAS SHOULD be rendered to the user for authentication. The
+ rendering of the SAS value through the user interface at both
+ endpoints depends on the SAS Type agreed upon in the Commit message.
+ See Section 5.1.6 for a description of how the SAS is rendered to the
+ user.
+
+ The SAS is not treated as a secret value, but it must be compared to
+ see if it matches at both ends of the communications channel. The
+ two users verbally compare it using their human voices, human ears,
+ and human judgement. If it doesn't match, it indicates the presence
+ of a MiTM attack.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 77]
+
+RFC 6189 ZRTP April 2011
+
+
+ It is worse than useless and absolutely unsafe to rely on a robot
+ voice from the remote endpoint to compare the SAS, because a robot
+ voice can be trivially forged by a MiTM. The SAS verbal comparison
+ can only be done with a real live human at the remote endpoint.
+
+7.1. SAS Verified Flag
+
+ The SAS Verified flag (V) is set based on the user indicating that
+ SAS comparison has been successfully performed. The SAS Verified
+ flag is exchanged securely in the Confirm1 and Confirm2 messages
+ (Figure 10) of the next session. In other words, each party sends
+ the SAS Verified flag from the previous session in the Confirm
+ message of the current session. It is perfectly reasonable to have a
+ ZRTP endpoint that never sets the SAS Verified flag, because it would
+ require adding complexity to the user interface to allow the user to
+ set it. The SAS Verified flag is not required to be set, but if it
+ is available to the client software, it allows for the possibility
+ that the client software could render to the user that the SAS verify
+ procedure was carried out in a previous session.
+
+ Regardless of whether there is a user interface element to allow the
+ user to set the SAS Verified flag, it is worth caching a shared
+ secret, because doing so reduces opportunities for an attacker in the
+ next call.
+
+ If at any time the users carry out the SAS comparison procedure, and
+ it actually fails to match, then this means there is a very
+ resourceful MiTM. If this is the first call, the MiTM was there on
+ the first call, which is impressive enough. If it happens in a later
+ call, it also means the MiTM must also know the cached shared secret,
+ because you could not have carried out any voice traffic at all
+ unless the session key was correctly computed and is also known to
+ the attacker. This implies the MiTM must have been present in all
+ the previous sessions, since the initial establishment of the first
+ shared secret. This is indeed a resourceful attacker. It also means
+ that if at any time he ceases his participation as a MiTM on one of
+ your calls, the protocol will detect that the cached shared secret is
+ no longer valid -- because it was really two different shared secrets
+ all along, one of them between Alice and the attacker, and the other
+ between the attacker and Bob. The continuity of the cached shared
+ secrets makes it possible for us to detect the MiTM when he inserts
+ himself into the ongoing relationship, as well as when he leaves.
+ Also, if the attacker tries to stay with a long lineage of calls, but
+ fails to execute a DH MiTM attack for even one missed call, he is
+ permanently excluded. He can no longer resynchronize with the chain
+ of cached shared secrets.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 78]
+
+RFC 6189 ZRTP April 2011
+
+
+ A user interface element (i.e., a checkbox or button) is needed to
+ allow the user to tell the software the SAS verify was successful,
+ causing the software to set the SAS Verified flag (V), which
+ (together with our cached shared secret) obviates the need to perform
+ the SAS procedure in the next call. An additional user interface
+ element can be provided to let the user tell the software he detected
+ an actual SAS mismatch, which indicates a MiTM attack. The software
+ can then take appropriate action, clearing the SAS Verified flag, and
+ erase the cached shared secret from this session. It is up to the
+ implementer to decide if this added user interface complexity is
+ warranted.
+
+ If the SAS matches, it means there is no MiTM, which also implies it
+ is now safe to trust a cached shared secret for later calls. If
+ inattentive users don't bother to check the SAS, it means we don't
+ know whether there is or is not a MiTM, so even if we do establish a
+ new cached shared secret, there is a risk that our potential attacker
+ may have a subsequent opportunity to continue inserting himself in
+ the call, until we finally get around to checking the SAS. If the
+ SAS matches, it means no attacker was present for any previous
+ session since we started propagating cached shared secrets, because
+ this session and all the previous sessions were also authenticated
+ with a continuous lineage of shared secrets.
+
+7.2. Signing the SAS
+
+ In most applications, it is desirable to avoid the added complexity
+ of a PKI-backed digital signature, which is why ZRTP is designed not
+ to require it. Nonetheless, in some applications, it may be hard to
+ arrange for two human users to verbally compare the SAS. Or, an
+ application may already be using an existing PKI and wants to use it
+ to augment ZRTP.
+
+ To handle these cases, ZRTP allows for an OPTIONAL signature feature,
+ which allows the SAS to be checked without human participation. The
+ SAS MAY be signed and the signature sent inside the Confirm1,
+ Confirm2 (Figure 10), or SASrelay (Figure 16) messages. The
+ signature type (Section 5.1.7), length of the signature, and the key
+ used to create the signature (or a link to it) are all sent along
+ with the signature. The signature is calculated across the entire
+ SAS hash result (sashash), from which the sasvalue was derived. The
+ signatures exchanged in the encrypted Confirm1, Confirm2, or SASrelay
+ messages MAY be used to authenticate the ZRTP exchange. A signature
+ may be sent only in the initial media stream in a DH or ECDH ZRTP
+ exchange, not in Multistream mode.
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 79]
+
+RFC 6189 ZRTP April 2011
+
+
+ Although the signature is sent, the material that is signed, the
+ sashash, is not sent with it in the Confirm message, since both
+ parties have already independently calculated the sashash. That is
+ not the case for the SASrelay message, which must relay the sashash.
+ To avoid unnecessary signature calculations, a signature SHOULD NOT
+ be sent if the other ZRTP endpoint did not set the (S) flag in the
+ Hello message (Section 5.2).
+
+ Note that the choice of hash algorithm used in the digital signature
+ is independent of the hash used in the sashash. The sashash is
+ determined by the negotiated Hash Type (Section 5.1.2), while the
+ hash used by the digital signature is separately defined by the
+ digital signature algorithm. For example, the sashash may be based
+ on SHA-256, while the digital signature might use SHA-384, if an
+ ECDSA P-384 key is used.
+
+ If the sashash (which is always truncated to 256 bits) is shorter
+ than the signature hash, the security is not weakened because the
+ hash commitment precludes the attacker from searching for sashash
+ collisions.
+
+ ECDSA algorithms may be used with either OpenPGP-formatted keys, or
+ X.509v3 certificates. If the ZRTP key exchange is ECDH, and the SAS
+ is signed, then the signature SHOULD be ECDSA, and SHOULD use the
+ same size curve as the ECDH exchange if an ECDSA key of that size is
+ available.
+
+ If a ZRTP endpoint supports incoming signatures (evidenced by setting
+ the (S) flag in the Hello message), it SHOULD be able to parse
+ signatures from the other endpoint in OpenPGP format and MUST be able
+ to parse them in X.509v3 format. If the incoming signature is in an
+ unsupported format, or the trust model does not lead to a trusted
+ introducer or a trusted certificate authority (CA), another
+ authentication method may be used if available, such as the SAS
+ compare, or a cached shared secret from a previous session. If none
+ of these methods are available, it is up to the ZRTP user agent and
+ the user to decide whether to proceed with the call, after the user
+ is informed.
+
+ Both ECDSA and DSA [FIPS-186-3] have a feature that allows most of
+ the signature calculation to be done in advance of the session,
+ reducing latency during call setup. This is useful for low-power
+ mobile handsets.
+
+ ECDSA is preferred because it has compact keys as well as compact
+ signatures. If the signature along with its public key certificate
+ are insufficiently compact, the Confirm message may become too long
+ for the maximum transmission unit (MTU) size, and UDP fragmentation
+
+
+
+Zimmermann, et al. Informational [Page 80]
+
+RFC 6189 ZRTP April 2011
+
+
+ may result. Some firewalls and NATs may discard fragmented UDP
+ packets, which would cause the ZRTP exchange to fail. It is
+ RECOMMENDED that a ZRTP endpoint avoid sending signatures if they
+ would cause UDP fragmentation. For a discussion on MTU size and PMTU
+ discovery, see [RFC1191] and [RFC1981].
+
+ From a packet-size perspective, ECDSA and DSA both produce equally
+ compact signatures for a given signature strength. DSA keys are much
+ bigger than ECDSA keys, but in the case of OpenPGP signatures, the
+ public key is not sent along with the signature.
+
+ All signatures generated MUST use only NIST-approved hash algorithms,
+ and MUST avoid using SHA1. This applies to both OpenPGP and X.509v3
+ signatures. NIST-approved hash algorithms are found in [FIPS-180-3]
+ or its SHA-3 successor. All ECDSA curves used throughout this spec
+ are over prime fields, drawn from Appendix D.1.2 of [FIPS-186-3].
+
+7.2.1. OpenPGP Signatures
+
+ If the SAS Signature Type (Section 5.1.7) specifies an OpenPGP
+ signature ("PGP "), the signature-related fields are arranged as
+ follows.
+
+ The first field after the 4-octet Signature Type Block is the OpenPGP
+ signature. The format of this signature and the algorithms that
+ create it are specified by [RFC4880]. The signature is comprised of
+ a complete OpenPGP version 4 signature in binary form (not Radix-64),
+ as specified in RFC 4880, Section 5.2.3, enclosed in the full OpenPGP
+ packet syntax. The length of the OpenPGP signature is parseable from
+ the signature, and depends on the type and length of the signing key.
+
+ If OpenPGP signatures are supported, an implementation SHOULD NOT
+ generate signatures using any other signature algorithm except DSA or
+ ECDSA (ECDSA is a reserved algorithm type in RFC 4880), but MAY
+ accept other signature types from the other party. DSA signatures
+ with keys shorter than 2048 bits or longer than 3072 bits MUST NOT be
+ generated.
+
+ Implementers should be aware that ECDSA signatures for OpenPGP are
+ expected to become available when the work in progress [ECC-OpenPGP]
+ becomes an RFC. Any use of ECDSA signatures in ZRTP SHOULD NOT
+ generate signatures using ECDSA key sizes other than P-224, P-256,
+ and P-384, as defined in [FIPS-186-3].
+
+ RFC 4880, Section 5.2.3.18, specifies a way to embed, in an OpenPGP
+ signature, a URI of the preferred key server. The URI should be
+ fully specified to obtain the public key of the signing key that
+ created the signature. This URI MUST be present. It is up to the
+
+
+
+Zimmermann, et al. Informational [Page 81]
+
+RFC 6189 ZRTP April 2011
+
+
+ recipient of the signature to obtain the public key of the signing
+ key and determine its validity status using the OpenPGP trust model
+ discussed in [RFC4880].
+
+ The contents of Figure 20 lie inside the encrypted region of the
+ Confirm message (Figure 10) or the SASrelay message (Figure 16).
+
+ The total length of all the material in Figure 20, including the key
+ server URI, must not exceed 511 32-bit words (2044 octets). This
+ length, in words, is stored in the signature length field in the
+ Confirm or SASrelay message containing the signature. It is
+ desirable to avoid UDP fragmentation, so the URI should be kept
+ short.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Signature Type Block = "PGP " (1 word) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | OpenPGP signature |
+ | (variable length) |
+ | . . . |
+ | |
+ +===============================================================+
+
+ Figure 20: OpenPGP Signature Format
+
+7.2.2. ECDSA Signatures with X.509v3 Certs
+
+ If the SAS Signature Type (Section 5.1.7) is "X509", the ECDSA
+ signature-related fields are arranged as follows.
+
+ The first field after the 4-octet Signature Type Block is the DER
+ encoded X.509v3 certificate (the signed public key) of the ECDSA
+ signing key that created the signature. The format of this
+ certificate is specified by the NSA's Suite B Certificate and CRL
+ Profile [RFC5759].
+
+ Following the X.509v3 certificate at the next word boundary is the
+ ECDSA signature itself. The size of this field depends on the size
+ and type of the public key in the aforementioned certificate. The
+ format of this signature and the algorithms that create it are
+ specified by [FIPS-186-3]. The signature is comprised of the ECDSA
+ signature output parameters (r, s) in binary form, concatenated, in
+ network byte order, with no truncation of leading zeros. The first
+ half of the signature is r and the second half is s. If ECDSA P-256
+
+
+
+
+Zimmermann, et al. Informational [Page 82]
+
+RFC 6189 ZRTP April 2011
+
+
+ is specified, the signature fills 16 words (64 octets), 32 octets
+ each for r and s. If ECDSA P-384 is specified, the signature fills
+ 24 words (96 octets), 48 octets each for r and s.
+
+ It is up to the recipient of the signature to use information in the
+ certificate and path discovery mechanisms to trace the chain back to
+ the root CA. It is recommended that end user certificates issued for
+ secure telephony should contain appropriate path discovery links to
+ facilitate this.
+
+ Figure 21 shows a certificate and an ECDSA signature. All this
+ material lies inside the encrypted region of the Confirm message
+ (Figure 10) or the SASrelay message (Figure 16).
+
+ The total length of all the material in Figure 21, including the
+ X.509v3 certificate, must not exceed 511 32-bit words (2044 octets).
+ This length, in words, is stored in the signature length field in the
+ Confirm or SASrelay message containing the signature. It is
+ desirable to avoid UDP fragmentation, so the certificate material
+ should be kept to a much smaller size than this. End user certs
+ issued for this purpose should minimize the size of extraneous
+ material such as legal notices.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Signature Type Block = "X509" (1 word) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | Signing key's X.509v3 certificate |
+ | (variable length) |
+ | . . . |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | ECDSA P-256 or P-384 signature |
+ | (16 words or 24 words) |
+ | . . . |
+ | |
+ +===============================================================+
+
+ Figure 21: X.509v3 ECDSA Signature Format
+
+7.2.3. Signing the SAS without a PKI
+
+ It's not strictly necessary to use a PKI to back the public key that
+ signs the SAS. For example, it is possible to use a self-signed
+ X.509v3 certificate or an OpenPGP key that is not signed by any other
+
+
+
+Zimmermann, et al. Informational [Page 83]
+
+RFC 6189 ZRTP April 2011
+
+
+ key. In this scenario, the same key continuity technique used by SSH
+ [RFC4251] may be used. The public key is cached locally the first
+ time it is encountered, and when the same public key is encountered
+ again in subsequent sessions, it's deemed not to be a MiTM attack.
+ If there is no MiTM attack in the first session, there cannot be a
+ MiTM attack in any subsequent session. This is exactly how SSH does
+ it.
+
+ Of course, the security rests on the assumption that the MiTM did not
+ attack in the first session. That assumption seems to work most of
+ the time in the SSH world. The user would have to be warned the
+ first time a public key is encountered, just as in SSH. If possible,
+ the SAS should be checked before the user consents to caching the new
+ public key. If the SAS matches in the first session, there is no
+ MiTM, and it's safe to cache the public key. If no SAS comparison is
+ possible, it's up to the user, or up to the application, to decide
+ whether to take a leap of faith and proceed. That's how SSH works
+ most of the time, because SSH users don't have the chance to verbally
+ compare an SAS with anyone.
+
+ For a phone that is SIP-registered to a PBX, it may be provisioned
+ with the public key of the PBX, using a trusted automated
+ provisioning process. Even without a PKI, the phone knows that the
+ public key is the correct one, since it was provisioned into the
+ phone by a trusted provisioning mechanism. This makes it easy for
+ the phone to access several automated services commonly offered by a
+ PBX, such as voice mail or a conference bridge, where there is no
+ human at the PBX to do a verbal SAS compare. The same provisioning
+ may be used to preload the pbxsecret into the phone, which is
+ discussed in Section 7.3.1.
+
+7.3. Relaying the SAS through a PBX
+
+ ZRTP is designed to use end-to-end encryption. The two parties'
+ verbal comparison of the short authentication string (SAS) depends on
+ this assumption. But in some PBX environments, such as Asterisk,
+ there are usage scenarios that have the PBX acting as a trusted MiTM,
+ which means there are two back-to-back ZRTP connections with separate
+ session keys and separate SASs.
+
+ For example, imagine that Bob has a ZRTP-enabled VoIP phone that has
+ been registered with his company's PBX, so that it is regarded as an
+ extension of the PBX. Alice, whose phone is not associated with the
+ PBX, might dial the PBX from the outside, and a ZRTP connection is
+ negotiated between her phone and the PBX. She then selects Bob's
+ extension from the company directory in the PBX. The PBX makes a
+ call to Bob's phone (which might be offsite, many miles away from the
+ PBX through the Internet) and a separate ZRTP connection is
+
+
+
+Zimmermann, et al. Informational [Page 84]
+
+RFC 6189 ZRTP April 2011
+
+
+ negotiated between the PBX and Bob's phone. The two ZRTP sessions
+ have different session keys and different SASs, which would render
+ the SAS useless for verbal comparison between Alice and Bob. They
+ might even mistakenly believe that a wiretapper is present because of
+ the SAS mismatch, causing undue alarm.
+
+ ZRTP has a mechanism for solving this problem by having the PBX relay
+ the Alice/PBX SAS to Bob, sending it through to Bob in a special
+ SASrelay message as defined in Section 5.13, which is sent after the
+ PBX/Bob ZRTP negotiation is complete, after the Confirm messages.
+ Only the PBX, acting as a special trusted MiTM (trusted by the
+ recipient of the SASrelay message), will relay the SAS. The SASrelay
+ message protects the relayed SAS from tampering via an included MAC,
+ similar to how the Confirm message is protected. Bob's ZRTP-enabled
+ phone accepts the relayed SAS for rendering only because Bob's phone
+ had previously been configured to trust the PBX. This special
+ trusted relationship with the PBX can be established through a
+ special security enrollment procedure (Section 7.3.1). After that
+ enrollment procedure, the PBX is treated by Bob as a special trusted
+ MiTM. This results in Alice's SAS being rendered to Bob, so that
+ Alice and Bob may verbally compare them and thus prevent a MiTM
+ attack by any other untrusted MiTM.
+
+ A real "bad-guy" MiTM cannot exploit this protocol feature to mount a
+ MiTM attack and relay Alice's SAS to Bob, because Bob has not
+ previously carried out a special registration ritual with the bad
+ guy. The relayed SAS would not be rendered by Bob's phone, because
+ it did not come from a trusted PBX. The recognition of the special
+ trust relationship is achieved with the prior establishment of a
+ special shared secret between Bob and his PBX, which is called
+ pbxsecret (defined in Section 7.3.1), also known as the trusted MiTM
+ key.
+
+ The trusted MiTM key can be stored in a special cache at the time of
+ the initial enrollment (which is carried out only once for Bob's
+ phone), and Bob's phone associates this key with the ZID of the PBX,
+ while the PBX associates it with the ZID of Bob's phone. After the
+ enrollment has established and stored this trusted MiTM key, it can
+ be detected during subsequent ZRTP session negotiations between the
+ PBX and Bob's phone, because the PBX and the phone MUST pass the hash
+ of the trusted MiTM key in the DH message. It is then used as part
+ of the key agreement to calculate s0.
+
+ The PBX can determine whether it is trusted by the ZRTP user agent of
+ a phone. The presence of a shared trusted MiTM key in the key
+ negotiation sequence indicates that the phone has been enrolled with
+ this PBX and therefore trusts it to act as a trusted MiTM. During a
+ key agreement with two other ZRTP endpoints, the PBX may have a
+
+
+
+Zimmermann, et al. Informational [Page 85]
+
+RFC 6189 ZRTP April 2011
+
+
+ shared trusted MiTM key with both endpoints, only one endpoint, or
+ neither endpoint. If the PBX has a shared trusted MiTM key with
+ neither endpoint, the PBX MUST NOT relay the SAS. If the PBX has a
+ shared trusted MiTM key with only one endpoint, the PBX MUST relay
+ the SAS from one party to the other by sending an SASrelay message to
+ the endpoint with which it shares a trusted MiTM key. If the PBX has
+ a (separate) shared trusted MiTM key with each of the endpoints, the
+ PBX MUST relay the SAS to only one endpoint, not both endpoints.
+
+ Note: In the case of a PBX sharing trusted MiTM keys with both
+ endpoints, it does not matter which endpoint receives the relayed
+ SAS as long as only one endpoint receives it.
+
+ The relayed SAS fields contain the SAS rendering type and the
+ complete sashash. The receiver absolutely MUST NOT render the
+ relayed SAS if it does not come from a specially trusted ZRTP
+ endpoint. The security of the ZRTP protocol depends on not rendering
+ a relayed SAS from an untrusted MiTM, because it may be relayed by a
+ MiTM attacker. See the SASrelay message definition (Figure 16) for
+ further details.
+
+ To ensure that both Alice and Bob will use the same SAS rendering
+ scheme after the keys are negotiated, the PBX also sends the SASrelay
+ message to the unenrolled party (which does not regard this PBX as a
+ trusted MiTM), conveying the SAS rendering scheme, but not the
+ sashash, which it sets to zero. The unenrolled party will ignore the
+ relayed SAS field, but will use the specified SAS rendering scheme.
+
+ It is possible to route a call through two ZRTP-enabled PBXs using
+ this scheme. Assume Alice is a ZRTP endpoint who trusts her local
+ PBX in Atlanta, and Bob is a ZRTP endpoint who trusts his local PBX
+ in Biloxi. The call is routed from Alice to the Atlanta PBX to the
+ Biloxi PBX to Bob. Atlanta would relay the Atlanta-Biloxi SAS to
+ Alice because Alice is enrolled with Atlanta, and Biloxi would relay
+ the Atlanta-Biloxi SAS to Bob because Bob is enrolled with Biloxi.
+ The two PBXs are not assumed to be enrolled with each other in this
+ example. Both Alice and Bob would view and verbally compare the same
+ relayed SAS, the Atlanta-Biloxi SAS. No more than two trusted MiTM
+ nodes can be traversed with this relaying scheme. This behavior is
+ extended to two PBXs that are enrolled with each other, via this
+ rule: In the case of a PBX sharing trusted MiTM keys with both
+ endpoints (i.e., both enrolled with this PBX), one of which is
+ another PBX (evidenced by the M-flag) and one of which is a non-PBX,
+ the MiTM PBX must always relay the PBX-to-PBX SAS to the non-PBX
+ endpoint.
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 86]
+
+RFC 6189 ZRTP April 2011
+
+
+ A ZRTP endpoint phone that trusts a PBX to act as a trusted MiTM is
+ effectively delegating its own policy decisions of algorithm
+ negotiation to the PBX.
+
+ When a PBX is between two ZRTP endpoints and is terminating their
+ media streams at the PBX, the PBX presents its own ZID to the two
+ parties, eclipsing the ZIDs of the two parties from each other. For
+ example, if several different calls are routed through such a PBX to
+ several different ZRTP-enabled phones behind the PBX, only a single
+ ZID is presented to the calling party in every case -- the ZID of the
+ PBX itself.
+
+ The next section describes the initial enrollment procedure that
+ establishes a special shared secret, a trusted MiTM key, between a
+ PBX and a phone, so that the phone will learn to recognize the PBX as
+ a trusted MiTM.
+
+7.3.1. PBX Enrollment and the PBX Enrollment Flag
+
+ Both the PBX and the endpoint need to know when enrollment is taking
+ place. One way of doing this is to set up an enrollment extension on
+ the PBX that a newly configured endpoint would call and establish a
+ ZRTP session. The PBX would then play audio media that offers the
+ user an opportunity to configure his phone to trust this PBX as a
+ trusted MiTM. The PBX calculates and stores the trusted MiTM shared
+ secret in its cache and associates it with this phone, indexed by the
+ phone's ZID. The trusted MiTM PBX shared secret is derived from
+ ZRTPSess via the ZRTP key derivation function (Section 4.5.1) in this
+ manner:
+
+ pbxsecret = KDF(ZRTPSess, "Trusted MiTM key", (ZIDi || ZIDr), 256)
+
+ The pbxsecret is calculated for the whole ZRTP session, not for each
+ stream within a session, thus the KDF Context field in this case does
+ not include any stream-specific nonce material.
+
+ The PBX signals the enrollment process by setting the PBX Enrollment
+ flag (E) in the Confirm message (Figure 10). This flag is used to
+ trigger the ZRTP endpoint's user interface to prompt the user to see
+ if it wants to trust this PBX and calculate and store the pbxsecret
+ in the cache. If the user decides to respond by activating the
+ appropriate user interface element (a menu item, checkbox, or
+ button), his ZRTP user agent calculates pbxsecret using the same
+ formula, and saves it in a special cache entry associated with this
+ PBX.
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 87]
+
+RFC 6189 ZRTP April 2011
+
+
+ During a PBX enrollment, the GoClear features are disabled. If the
+ (E) flag is set by the PBX, the PBX MUST NOT set the Allow Clear (A)
+ flag. Thus, (E) implies not (A). If a received Confirm message has
+ the (E) flag set, the (A) flag MUST be disregarded and treated as
+ false.
+
+ If the user elects not to enroll, perhaps because he dialed a wrong
+ number or does not yet feel comfortable with this PBX, he can simply
+ hang up and not save the pbxsecret in his cache. The PBX will have
+ it saved in the PBX cache, but that will do no harm. The SASrelay
+ scheme does not depend on the PBX trusting the phone. It only
+ depends on the phone trusting the PBX. It is the phone (the user)
+ who is at risk if the PBX abuses its MiTM privileges.
+
+ An endpoint MUST NOT store the pbxsecret in the cache without
+ explicit user authorization.
+
+ After this enrollment process, the PBX and the ZRTP-enabled phone
+ both share a secret that enables the phone to recognize the PBX as a
+ trusted MiTM in future calls. This means that when a future call
+ from an outside ZRTP-enabled caller is relayed through the PBX to
+ this phone, the phone will render a relayed SAS from the PBX. If the
+ SASrelay message comes from a MiTM that does not know the pbxsecret,
+ the phone treats it as a bad-guy MiTM, and refuses to render the
+ relayed SAS. Regardless of which party initiates any future phone
+ calls through the PBX, the enrolled phone or the outside phone, the
+ PBX will relay the SAS to the enrolled phone.
+
+ This enrollment procedure is designed primarily for phones that are
+ already associated with the PBX -- enterprise phones that are
+ "behind" the PBX. It is not intended for the countless outside
+ phones that are not registered to this PBX's SIP server. It should
+ be regarded as part of the installation and provisioning process for
+ a new phone in the organization.
+
+ There are more streamlined methods to configure ZRTP user agents to
+ trust a PBX. In large scale deployments, the pbxsecret may be
+ configured into the phone by an automated provisioning process, which
+ may be less burdensome for the users and less error prone. This
+ specification does not require a manual enrollment process. Any
+ process that results in a pbxsecret to be computed and shared between
+ the PBX and the phone will suffice, as long as the user is made aware
+ that this puts the PBX in a position to wiretap the calls.
+
+ It is recommended that a ZRTP client not proceed with the PBX
+ enrollment procedure without evidence that a MiTM attack is not
+ taking place during the enrollment session. It would be especially
+ damaging if a MiTM tricks the client into enrolling with the wrong
+
+
+
+Zimmermann, et al. Informational [Page 88]
+
+RFC 6189 ZRTP April 2011
+
+
+ PBX. That would enable the malevolent MiTM to wiretap all future
+ calls without arousing suspicion, because he would appear to be
+ trusted.
+
+8. Signaling Interactions
+
+ This section discusses how ZRTP, SIP, and SDP work together.
+
+ Note that ZRTP may be implemented without coupling with the SIP
+ signaling. For example, ZRTP can be implemented as a "bump in the
+ wire" or as a "bump in the stack" in which RTP sent by the SIP User
+ Agent (UA) is converted to ZRTP. In these cases, the SIP UA will
+ have no knowledge of ZRTP. As a result, the signaling path discovery
+ mechanisms introduced in this section should not be definitive --
+ they are a hint. Despite the absence of an indication of ZRTP
+ support in an offer or answer, a ZRTP endpoint SHOULD still send
+ Hello messages.
+
+ ZRTP endpoints that have control over the signaling path include a
+ ZRTP SDP attributes in their SDP offers and answers. The ZRTP
+ attribute, a=zrtp-hash, is used to indicate support for ZRTP and to
+ convey a hash of the Hello message. The hash is computed according
+ to Section 8.1.
+
+ Aside from the advantages described in Section 8.1, there are a
+ number of potential uses for this attribute. It is useful when
+ signaling elements would like to know when ZRTP may be utilized by
+ endpoints. It is also useful if endpoints support multiple methods
+ of SRTP key management. The ZRTP attribute can be used to ensure
+ that these key management approaches work together instead of against
+ each other. For example, if only one endpoint supports ZRTP, but
+ both support another method to key SRTP, then the other method will
+ be used instead. When used in parallel, an SRTP secret carried in an
+ a=keymgt [RFC4567] or a=crypto [RFC4568] attribute can be used as a
+ shared secret for the srtps computation defined in Section 8.2. The
+ ZRTP attribute is also used to signal to an intermediary ZRTP device
+ not to act as a ZRTP endpoint, as discussed in Section 10.
+
+ The a=zrtp-hash attribute can only be included in the SDP at the
+ media level since Hello messages sent in different media streams will
+ have unique hashes.
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 89]
+
+RFC 6189 ZRTP April 2011
+
+
+ The ABNF for the ZRTP attribute is as follows:
+
+ zrtp-attribute = "a=zrtp-hash:" zrtp-version zrtp-hash-value
+
+ zrtp-version = token
+
+ zrtp-hash-value = 1*(HEXDIG)
+
+ Here's an example of the ZRTP attribute in an initial SDP offer or
+ answer used at the media level, using the <allOneLine> convention
+ defined in RFC 4475, Section 2.1 [RFC4475]:
+
+ v=0
+ o=bob 2890844527 2890844527 IN IP4 client.biloxi.example.com
+ s=
+ c=IN IP4 client.biloxi.example.com
+ t=0 0
+ m=audio 3456 RTP/AVP 97 33
+ a=rtpmap:97 iLBC/8000
+ a=rtpmap:33 no-op/8000
+ <allOneLine>
+ a=zrtp-hash:1.10 fe30efd02423cb054e50efd0248742ac7a52c8f91bc2
+ df881ae642c371ba46df
+ </allOneLine>
+
+ A mechanism for carrying this same zrtp-hash information in the
+ Jingle signaling protocol is defined in [XEP-0262].
+
+ It should be safe to send ZRTP messages even when there is no
+ evidence in the signaling that the other party supports it, because
+ ZRTP has been designed to be clearly different from RTP, having a
+ similar structure to STUN packets sent during an ICE exchange.
+
+8.1. Binding the Media Stream to the Signaling Layer via the Hello Hash
+
+ Tying the media stream to the signaling channel can help prevent a
+ third party from inserting false media packets. If the signaling
+ layer contains information that ties it to the media stream, false
+ media streams can be rejected.
+
+ To accomplish this, the entire Hello message (Figure 3) is hashed,
+ using the hash algorithm defined in Section 5.1.2.2. The ZRTP packet
+ framing from Figure 2 is not included in the hash. The resulting
+ hash image is made available without truncation to the signaling
+ layer, where it is transmitted as a hexadecimal value in the SIP
+ channel using the SDP attribute a=zrtp-hash, defined in this
+ specification. Assuming Section 5.1.2.2 defines a 256-bit hash
+ length, the a=zrtp-hash field in the SDP attribute carries 64
+
+
+
+Zimmermann, et al. Informational [Page 90]
+
+RFC 6189 ZRTP April 2011
+
+
+ hexadecimal digits. Each media stream (audio or video) will have a
+ separate Hello message, and thus will require a separate a=zrtp-hash
+ in an SDP attribute. The recipient of the SIP/SDP message can then
+ use this hash image to detect and reject false Hello messages in the
+ media channel, as well as identify which media stream is associated
+ with this SIP call. Each Hello message hashes uniquely, because it
+ contains the H3 field derived from a random nonce, defined in
+ Section 9.
+
+ The Hello Hash as an SDP attribute is not a REQUIRED feature, because
+ some ZRTP endpoints do not have the ability to add SDP attributes to
+ the signaling. For example, if ZRTP is implemented in a hardware
+ bump-in-the-wire device, it might only have the ability to modify the
+ media packets, not the SIP packets, especially if the SIP packets are
+ integrity protected and thus cannot be modified on the wire. If the
+ SDP has no hash image of the ZRTP Hello message, the recipient's ZRTP
+ user agent cannot check it, and thus will not be able to reject Hello
+ messages based on this hash.
+
+ After the Hello Hash is used to properly identify the ZRTP Hello
+ message as belonging to this particular SIP call, the rest of the
+ ZRTP message sequence is protected from false packet injection by
+ other protection mechanisms, such as the hash chaining mechanism
+ defined in Section 9.
+
+ An attacker who controls only the signaling layer, such as an
+ uncooperative VoIP service provider, may be able to deny service by
+ corrupting the hash of the Hello message in the SDP attribute, which
+ would force ZRTP to reject perfectly good Hello messages. If there
+ is reason to believe this is happening, the ZRTP endpoint MAY allow
+ Hello messages to be accepted that do not match the hash image in the
+ SDP attribute.
+
+ Even in the absence of SIP integrity protection, the inclusion of the
+ a=zrtp-hash SDP attribute, when coupled with the hash chaining
+ mechanism defined in Section 9, meets the R-ASSOC requirement in the
+ Media Security Requirements [RFC5479], which requires:
+
+ ...a mechanism for associating key management messages with both
+ the signaling traffic that initiated the session and with
+ protected media traffic. It is useful to associate key management
+ messages with call signaling messages, as this allows the SDP
+ offerer to avoid performing CPU-consuming operations (e.g.,
+ Diffie-Hellman or public key operations) with attackers that have
+ not seen the signaling messages.
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 91]
+
+RFC 6189 ZRTP April 2011
+
+
+ The a=zrtp-hash SDP attribute becomes especially useful if the SDP is
+ integrity-protected end-to-end by SIP Identity [RFC4474] or better
+ still, Dan Wing's SIP Identity using Media Path [SIP-IDENTITY]. This
+ leads to an ability to stop MiTM attacks independent of ZRTP's SAS
+ mechanism, as explained in Section 8.1.1.
+
+8.1.1. Integrity-Protected Signaling Enables Integrity-Protected DH
+ Exchange
+
+ If and only if the signaling path and the SDP is protected by some
+ form of end-to-end integrity protection, such as one of the
+ abovementioned mechanisms, so that it can guarantee delivery of the
+ a=zrtp-hash attribute without any tampering by a third party, and if
+ there is good reason to trust the signaling layer to protect the
+ interests of the end user, it is possible to authenticate the key
+ exchange and prevent a MiTM attack. This can be done without
+ requiring the users to verbally compare the SAS, by using the hash
+ chaining mechanism defined in Section 9 to provide a series of MAC
+ keys that protect the entire ZRTP key exchange. Thus, an end-to-end
+ integrity-protected signaling layer automatically enables an
+ integrity-protected Diffie-Hellman exchange in ZRTP, which in turn
+ means immunity from a MiTM attack. Here's how it works.
+
+ The integrity-protected SIP SDP contains a hash commitment to the
+ entire Hello message. The Hello message contains H3, which provides
+ a hash commitment for the rest of the hash chain H0-H2 (Section 9).
+ The Hello message is protected by a 64-bit MAC, keyed by H2. The
+ Commit message is protected by a 64-bit MAC, keyed by H1. The
+ DHPart1 or DHPart2 messages are protected by a 64-bit MAC, keyed by
+ H0. The MAC protecting the Confirm messages is computed by a
+ different MAC key derived from the resulting key agreement. Each
+ message's MAC is checked when the MAC key is received in the next
+ message. If a bad MAC is discovered, it MUST be treated as a
+ security exception indicating a MiTM attack, perhaps by logging or
+ alerting the user, and MUST NOT be treated as a random error. Random
+ errors are already discovered and quietly rejected by bad CRCs
+ (Figure 2).
+
+ The Hello message must be assembled before any hash algorithms are
+ negotiated, so an implicit predetermined hash algorithm and MAC
+ algorithm (both defined in Section 5.1.2.2) must be used. All of the
+ aforementioned MACs keyed by the hashes in the aforementioned hash
+ chain MUST be computed with the MAC algorithm defined in
+ Section 5.1.2.2, with the MAC truncated to 64 bits.
+
+ The Media Security Requirements [RFC5479] R-EXISTING requirement can
+ be fully met by leveraging a certificate-backed PKI in the signaling
+ layer to integrity protect the delivery of the a=zrtp-hash SDP
+
+
+
+Zimmermann, et al. Informational [Page 92]
+
+RFC 6189 ZRTP April 2011
+
+
+ attribute. This would thereby protect ZRTP against a MiTM attack,
+ without requiring the user to check the SAS, without adding any
+ explicit signatures or signature keys to the ZRTP key exchange and
+ without any extra public key operations or extra packets.
+
+ Without an end-to-end integrity-protection mechanism in the signaling
+ layer to guarantee delivery of the a=zrtp-hash SDP attribute without
+ modification by a third party, these MACs alone will not prevent a
+ MiTM attack. In that case, ZRTP's built-in SAS mechanism will still
+ have to be used to authenticate the key exchange. At the time of
+ this writing, very few deployed VoIP clients offer a fully
+ implemented SIP stack that provides end-to-end integrity protection
+ for the delivery of SDP attributes. Also, end-to-end signaling
+ integrity becomes more problematic if E.164 numbers [RFC3824] are
+ used in SIP. Thus, real-world implementations of ZRTP endpoints will
+ continue to depend on SAS authentication for quite some time. Even
+ after there is widespread availability of SIP user agents that offer
+ integrity protected delivery of SDP attributes, many users will still
+ be faced with the fact that the signaling path may be controlled by
+ institutions that do not have the best interests of the end user in
+ mind. In those cases, SAS authentication will remain the gold
+ standard for the prudent user.
+
+ Even without SIP integrity protection, the Media Security
+ Requirements [RFC5479] R-ACT-ACT requirement can be met by ZRTP's SAS
+ mechanism. Although ZRTP may benefit from an integrity-protected SIP
+ layer, it is fortunate that ZRTP's self-contained MiTM defenses do
+ not actually require an integrity-protected SIP layer. ZRTP can
+ bypass the delays and problems that SIP integrity faces, such as
+ E.164 number usage, and the complexity of building and maintaining a
+ PKI.
+
+ In contrast, DTLS-SRTP [RFC5764] appears to depend heavily on end-to-
+ end integrity protection in the SIP layer. Further, DTLS-SRTP must
+ bear the additional cost of a signature calculation of its own, in
+ addition to the signature calculation the SIP layer uses to achieve
+ its integrity protection. ZRTP needs no signature calculation of its
+ own to leverage the signature calculation carried out in the SIP
+ layer.
+
+8.2. Deriving the SRTP Secret (srtps) from the Signaling Layer
+
+ The shared secret calculations defined in Section 4.3 make use of the
+ SRTP secret (srtps), if it is provided by the signaling layer.
+
+ It is desirable for only one SRTP key negotiation protocol to be
+ used, and that protocol should be ZRTP. But in the event the
+ signaling layer negotiates its own SRTP master key and salt, using
+
+
+
+Zimmermann, et al. Informational [Page 93]
+
+RFC 6189 ZRTP April 2011
+
+
+ the SDP Security Descriptions (SDES [RFC4568]) or [RFC4567], it can
+ be passed from the signaling to the ZRTP layer and mixed into ZRTP's
+ own shared secret calculations, without compromising security by
+ creating a dependency on the signaling for media encryption.
+
+ ZRTP computes srtps from the SRTP master key and salt parameters
+ provided by the signaling layer in this manner, truncating the result
+ to 256 bits:
+
+ srtps = KDF(SRTP master key, "SRTP Secret", (ZIDi || ZIDr ||
+ SRTP master salt), 256)
+
+ It is expected that the srtps parameter will be rarely computed or
+ used in typical ZRTP endpoints, because it is likely and desirable
+ that ZRTP will be the sole means of negotiating SRTP keys, needing no
+ help from [RFC4568] or [RFC4567]. If srtps is computed, it will be
+ stored in the auxiliary shared secret auxsecret, defined in
+ Section 4.3 and used in Section 4.3.1.
+
+8.3. Codec Selection for Secure Media
+
+ Codec selection is negotiated in the signaling layer. If the
+ signaling layer determines that ZRTP is supported by both endpoints,
+ this should provide guidance in codec selection to avoid variable
+ bitrate (VBR) codecs that leak information.
+
+ When voice is compressed with a VBR codec, the packet lengths vary
+ depending on the types of sounds being compressed. This leaks a lot
+ of information about the content even if the packets are encrypted,
+ regardless of what encryption protocol is used [Wright1]. It is
+ RECOMMENDED that VBR codecs be avoided in encrypted calls. It is not
+ a problem if the codec adapts the bitrate to the available channel
+ bandwidth. The vulnerable codecs are the ones that change their
+ bitrate depending on the type of sound being compressed.
+
+ It also appears that voice activity detection (VAD) leaks information
+ about the content of the conversation, but to a lesser extent than
+ VBR. This effect can be mitigated by lengthening the VAD hangover
+ time by a random amount between 1 and 2 seconds, if this is feasible
+ in your application. Only short bursts of speech would benefit from
+ lengthening the VAD hangover time.
+
+ The security problems of VBR and VAD are addressed in detail by the
+ guidelines in [VBR-AUDIO]. It is RECOMMENDED that ZRTP endpoints
+ follow these guidelines.
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 94]
+
+RFC 6189 ZRTP April 2011
+
+
+9. False ZRTP Packet Rejection
+
+ An attacker who is not in the media path may attempt to inject false
+ ZRTP protocol packets, possibly to effect a denial-of-service attack
+ or to inject his own media stream into the call. VoIP, by its
+ nature, invites various forms of denial-of-service attacks and
+ requires protocol features to reject such attacks. While bogus SRTP
+ packets may be easily rejected via the SRTP auth tag field, that can
+ only be applied after a key agreement is completed. During the ZRTP
+ key negotiation phase, other false packet rejection mechanisms are
+ needed. One such mechanism is the use of the total_hash in the final
+ shared secret calculation, but that can only detect false packets
+ after performing the computationally expensive Diffie-Hellman
+ calculation.
+
+ A lot of work has been done on the analysis of denial-of-service
+ attacks, especially from attackers who are not in the media path.
+ Such an attacker might inject false ZRTP packets to force a ZRTP
+ endpoint to engage in an endless series of pointless and expensive DH
+ calculations. To detect and reject false packets cheaply and rapidly
+ as soon as they are received, ZRTP uses a one-way hash chain, which
+ is a series of successive hash images. Before each session, the
+ following values are computed:
+
+ H0 = 256-bit random nonce (different for each party)
+
+ H1 = hash (H0)
+
+ H2 = hash (H1)
+
+ H3 = hash (H2)
+
+ This one-way hash chain MUST use the hash algorithm defined in
+ Section 5.1.2.2, truncated to 256 bits. Each 256-bit hash image is
+ the preimage of the next, and the sequence of images is sent in
+ reverse order in the ZRTP packet sequence. The hash image H3 is sent
+ in the Hello message, H2 is sent in the Commit message, H1 is sent in
+ the DHPart1 or DHPart2 messages, and H0 is sent in the Confirm1 or
+ Confirm2 messages. The initial random H0 nonces that each party
+ generates MUST be unpredictable to an attacker and unique within a
+ ZRTP session, which thereby forces the derived hash images H1-H3 to
+ also be unique and unpredictable.
+
+ The recipient checks if the packet has the correct hash preimage, by
+ hashing it and comparing the result with the hash image for the
+ preceding packet. Packets that contain an incorrect hash preimage
+ MUST NOT be used by the recipient, but they MAY be processed as
+ security exceptions, perhaps by logging or alerting the user. As
+
+
+
+Zimmermann, et al. Informational [Page 95]
+
+RFC 6189 ZRTP April 2011
+
+
+ long as these bogus packets are not used, and correct packets are
+ still being received, the protocol SHOULD be allowed to run to
+ completion, thereby rendering ineffective this denial-of-service
+ attack.
+
+ Note that since H2 is sent in the Commit message, and the initiator
+ does not receive a Commit message, the initiator computes the
+ responder's missing H2 by hashing the responder's H1. An analogous
+ interpolation is performed by both parties to handle the skipped
+ DHPart1 and DHPart2 messages in Preshared (Section 3.1.2) or
+ Multistream (Section 3.1.3) modes.
+
+ Because these hash images alone do not protect the rest of the
+ contents of the packet they reside in, this scheme assumes the
+ attacker cannot modify the packet contents from a legitimate party,
+ which is a reasonable assumption for an attacker who is not in the
+ media path. This covers an important range of denial-of-service
+ attacks. For dealing with the remaining set of attacks that involve
+ packet modification, other mechanisms are used, such as the
+ total_hash in the final shared secret calculation, and the hash
+ commitment in the Commit message.
+
+ Hello messages injected by an attacker may be detected and rejected
+ by the inclusion of a hash of the Hello message in the signaling, as
+ described in Section 8. This mechanism requires that each Hello
+ message be unique, and the inclusion of the H3 hash image meets that
+ requirement.
+
+ If and only if an integrity-protected signaling channel is available,
+ the MACs that are keyed by this hash chaining scheme can be used to
+ authenticate the entire ZRTP key exchange, and thereby prevent a MiTM
+ attack, without relying on the users verbally comparing the SAS. See
+ Section 8.1.1 for details.
+
+ Some ZRTP user agents allow the user to manually switch to clear mode
+ (via the GoClear message) in the middle of a secure call, and then
+ later initiate secure mode again. Many consumer client products will
+ omit this feature, but those that allow it may return to secure mode
+ again in the same media stream. Although the same chain of hash
+ images will be reused and thus rendered ineffective the second time,
+ no real harm is done because the new SRTP session keys will be
+ derived in part from a cached shared secret, which was safely
+ protected from the MiTM in the previous DH exchange earlier in the
+ same session.
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 96]
+
+RFC 6189 ZRTP April 2011
+
+
+10. Intermediary ZRTP Devices
+
+ This section discusses the operation of a ZRTP endpoint that is
+ actually an intermediary. For example, consider a device that
+ proxies both signaling and media between endpoints. There are three
+ possible ways in which such a device could support ZRTP.
+
+ An intermediary device can act transparently to the ZRTP protocol.
+ To do this, a device MUST pass non-RTP protocols multiplexed on the
+ same port as RTP (to allow ZRTP and STUN). This is the RECOMMENDED
+ behavior for intermediaries as ZRTP and SRTP are best when done end-
+ to-end.
+
+ An intermediary device could implement the ZRTP protocol and act as a
+ ZRTP endpoint on behalf of non-ZRTP endpoints behind the intermediary
+ device. The intermediary could determine on a call-by-call basis
+ whether the endpoint behind it supports ZRTP based on the presence or
+ absence of the ZRTP SDP attribute flag (a=zrtp-hash). For non-ZRTP
+ endpoints, the intermediary device could act as the ZRTP endpoint
+ using its own ZID and cache. This approach SHOULD only be used when
+ there is some other security method protecting the confidentiality of
+ the media between the intermediary and the inside endpoint, such as
+ IPsec or physical security.
+
+ The third mode, which is NOT RECOMMENDED, is for the intermediary
+ device to attempt to back-to-back the ZRTP protocol. The only
+ exception to this case is where the intermediary device is a trusted
+ element providing services to one of the endpoints -- e.g., a Private
+ Branch Exchange or PBX. In this mode, the intermediary would attempt
+ to act as a ZRTP endpoint towards both endpoints of the media
+ session. This approach MUST NOT be used except as described in
+ Section 7.3 as it will always result in a detected MiTM attack and
+ will generate alarms on both endpoints and likely result in the
+ immediate termination of the session. The PBX MUST uses a single ZID
+ for all endpoints behind it.
+
+ In cases where centralized media mixing is taking place, the SAS will
+ not match when compared by the humans. This situation can sometimes
+ be known in the SIP signaling by the presence of the isfocus feature
+ tag [RFC4579]. As a result, when the isfocus feature tag is present,
+ the DH exchange can be authenticated by the mechanism defined in
+ Section 8.1.1 or by validating signatures (Section 7.2) in the
+ Confirm or SASrelay messages. For example, consider an audio
+ conference call with three participants Alice, Bob, and Carol hosted
+ on a conference bridge in Dallas. There will be three ZRTP encrypted
+ media streams, one encrypted stream between each participant and
+ Dallas. Each will have a different SAS. Each participant will be
+ able to validate their SAS with the conference bridge by using
+
+
+
+Zimmermann, et al. Informational [Page 97]
+
+RFC 6189 ZRTP April 2011
+
+
+ signatures optionally present in the Confirm messages (described in
+ Section 7.2). Or, if the signaling path has end-to-end integrity
+ protection, each DH exchange will have automatic MiTM protection by
+ using the mechanism in Section 8.1.1.
+
+ SIP feature tags can also be used to detect if a session is
+ established with an automaton such as an Interactive Voice Response
+ (IVR), voicemail system, or speech recognition system. The display
+ of SAS strings to users should be disabled in these cases.
+
+ It is possible that an intermediary device acting as a ZRTP endpoint
+ might still receive ZRTP Hello and other messages from the inside
+ endpoint. This could occur if there is another inline ZRTP device
+ that does not include the ZRTP SDP attribute flag. An intermediary
+ acting as a ZRTP endpoint receiving ZRTP Hello and other messages
+ from the inside endpoint MUST NOT pass these ZRTP messages.
+
+11. The ZRTP Disclosure Flag
+
+ There are no back doors defined in the ZRTP protocol specification.
+ The designers of ZRTP would like to discourage back doors in ZRTP-
+ enabled products. However, despite the lack of back doors in the
+ actual ZRTP protocol, it must be recognized that a ZRTP implementer
+ might still deliberately create a rogue ZRTP-enabled product that
+ implements a back door outside the scope of the ZRTP protocol. For
+ example, they could create a product that discloses the SRTP session
+ key generated using ZRTP out-of-band to a third party. They may even
+ have a legitimate business reason to do this for some customers.
+
+ For example, some environments have a need to monitor or record
+ calls, such as stock brokerage houses who want to discourage insider
+ trading, or special high-security environments with special needs to
+ monitor their own phone calls. We've all experienced automated
+ messages telling us that "This call may be monitored for quality
+ assurance". A ZRTP endpoint in such an environment might
+ unilaterally disclose the session key to someone monitoring the call.
+ ZRTP-enabled products that perform such out-of-band disclosures of
+ the session key can undermine public confidence in the ZRTP protocol,
+ unless we do everything we can in the protocol to alert the other
+ user that this is happening.
+
+ If one of the parties is using a product that is designed to disclose
+ their session key, ZRTP requires them to confess this fact to the
+ other party through a protocol message to the other party's ZRTP
+ client, which can properly alert that user, perhaps by rendering it
+ in a graphical user interface. The disclosing party does this by
+ sending a Disclosure flag (D) in Confirm1 and Confirm2 messages as
+ described in Section 5.7.
+
+
+
+Zimmermann, et al. Informational [Page 98]
+
+RFC 6189 ZRTP April 2011
+
+
+ Note that the intention here is to have the Disclosure flag identify
+ products that are designed to disclose their session keys, not to
+ identify which particular calls are compromised on a call-by-call
+ basis. This is an important legal distinction, because most
+ government sanctioned wiretap regulations require a VoIP service
+ provider to not reveal which particular calls are wiretapped. But
+ there is nothing illegal about revealing that a product is designed
+ to be wiretap-friendly. The ZRTP protocol mandates that such a
+ product "out" itself.
+
+ You might be using a ZRTP-enabled product with no back doors, but if
+ your own graphical user interface tells you the call is (mostly)
+ secure, except that the other party is using a product that is
+ designed in such a way that it may have disclosed the session key for
+ monitoring purposes, you might ask him what brand of secure telephone
+ he is using, and make a mental note not to purchase that brand
+ yourself. If we create a protocol environment that requires such
+ back-doored phones to confess their nature, word will spread quickly,
+ and the "invisible hand" of the free market will act. The free
+ market has effectively dealt with this in the past.
+
+ Of course, a ZRTP implementer can lie about his product having a back
+ door, but the ZRTP standard mandates that ZRTP-compliant products
+ MUST adhere to the requirement that a back door be confessed by
+ sending the Disclosure flag to the other party.
+
+ There will be inevitable comparisons to Steve Bellovin's 2003 April
+ fool joke, when he submitted RFC 3514 [RFC3514], which defined the
+ "Evil bit" in the IPv4 header, for packets with "evil intent". But
+ we submit that a similar idea can actually have some merit for
+ securing VoIP. Sure, one can always imagine that some implementer
+ will not be fazed by the rules and will lie, but they would have lied
+ anyway even without the Disclosure flag. There are good reasons to
+ believe that it will improve the overall percentage of
+ implementations that at least tell us if they put a back door in
+ their products, and may even get some of them to decide not to put in
+ a back door at all. From a civic hygiene perspective, we are better
+ off with having the Disclosure flag in the protocol.
+
+ If an endpoint stores or logs SRTP keys or information that can be
+ used to reconstruct or recover SRTP keys after they are no longer in
+ use (i.e., the session is active), or otherwise discloses or passes
+ SRTP keys or information that can be used to reconstruct or recover
+ SRTP keys to another application or device, the Disclosure flag D
+ MUST be set in the Confirm1 or Confirm2 message.
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 99]
+
+RFC 6189 ZRTP April 2011
+
+
+11.1. Guidelines on Proper Implementation of the Disclosure Flag
+
+ Some implementers have asked for guidance on implementing the
+ Disclosure flag. Some people have incorrectly thought that a
+ connection secured with ZRTP cannot be used in a call center, with
+ voluntary voice recording, or even with a voicemail system.
+ Similarly, some potential users of ZRTP have over considered the
+ protection that ZRTP can give them. These guidelines clarify both
+ concerns.
+
+ The ZRTP Disclosure flag only governs the ZRTP/SRTP stream itself.
+ It does not govern the underlying RTP media stream, nor the actual
+ media itself. Consequently, a PBX that uses ZRTP may provide
+ conference calls, call monitoring, call recording, voicemail, or
+ other PBX features and still say that it does not disclose the ZRTP
+ key material. A video system may provide DVR features and still say
+ that it does not disclose the ZRTP key material. The ZRTP Disclosure
+ flag, when not set, means only that the ZRTP cryptographic key
+ material stays within the bounds of the ZRTP subsystem.
+
+ If an application has a need to disclose the ZRTP cryptographic key
+ material, the easiest way to comply with the protocol is to set the
+ flag to the proper value. The next easiest way is to overestimate
+ disclosure. For example, a call center that commonly records calls
+ might choose to set the Disclosure flag even though all recording is
+ an analog recording of a call (and thus outside the ZRTP scope)
+ because it sets an expectation with clients that their calls might be
+ recorded.
+
+ Note also that the ZRTP Disclosure Flag does not require an
+ implementation to preclude hacking or malware. Malware that leaks
+ ZRTP cryptographic key material does not create a liability for the
+ implementer from non-compliance with the ZRTP specification.
+
+ A user of ZRTP should note that ZRTP is not a panacea against
+ unauthorized recording. ZRTP does not and cannot protect against an
+ untrustworthy partner who holds a microphone up to the speaker. It
+ does not protect against someone else being in the room. It does not
+ protect against analog wiretaps in the phone or in the room. It does
+ not mean your partner has not been hacked with spyware. It does not
+ mean that the software has no flaws. It means that the ZRTP
+ subsystem is not knowingly leaking ZRTP cryptographic key material.
+
+12. Mapping between ZID and AOR (SIP URI)
+
+ The role of the ZID in the management of the local cache of shared
+ secrets is explained in Section 4.9. A particular ZID is associated
+ with a particular ZRTP endpoint, typically a VoIP client. A single
+
+
+
+Zimmermann, et al. Informational [Page 100]
+
+RFC 6189 ZRTP April 2011
+
+
+ SIP URI (also known as an Address-of-Record, or AOR) may be hosted on
+ several different soft VoIP clients, desktop phones, and mobile
+ handsets, and each of them will have a different ZID. Further, a
+ single VoIP client may have several SIP URIs configured into its
+ profiles, but only one ZID. There is not a one-to-one mapping
+ between a ZID and a SIP URI. A single SIP URI may be associated with
+ several ZIDs, and a single ZID may be associated with several SIP
+ URIs on the same client.
+
+ Not only that, but ZRTP is independent of which signaling protocol is
+ used. It works equally well with SIP, Jingle, H.323, or any
+ proprietary signaling protocol. Thus, a ZRTP ZID has little to do
+ with SIP, per se, which means it has little to do with a SIP URI.
+
+ Even though a ZID is associated with a device, not a human, it is
+ often the case that a ZRTP endpoint is controlled mainly by a
+ particular human. For example, it may be a mobile phone. To get the
+ full benefit of the key continuity features, a local cache entry (and
+ thus a ZID) should be associated with some sort of name of the remote
+ party. That name could be a human name, or it could be made more
+ precise by specifying which ZRTP endpoint he's using. For example
+ "Jon Callas", or "Jon Callas on his iPhone", or "Jon on his iPad", or
+ "Alice on her office phone". These name strings can be stored in the
+ local cache, indexed by ZID, and may have been initially provided by
+ the local user by hand. Or the local cache entry may contain a
+ pointer to an entry in the local address book. When a secure session
+ is established, if a prior session has established a cache entry, and
+ the new session has a matching cache entry indexed by the same ZID,
+ and the SAS has been previously verified, the person's name stored in
+ that cache entry should be displayed.
+
+ If the remote ZID originates from a PBX, the displayed name would be
+ the name of that PBX, which might be the name of the company who owns
+ that PBX.
+
+ If it is desirable to associate some key material with a particular
+ AOR, digital signatures (Section 7.2) may be used, with public key
+ certificates that associate the signature key with an AOR. If more
+ than one ZRTP endpoint shares the same AOR, they may all use the same
+ signature key and provide the same public key certificate with their
+ signatures.
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 101]
+
+RFC 6189 ZRTP April 2011
+
+
+13. IANA Considerations
+
+ This specification defines a new SDP [RFC4566] attribute in
+ Section 8.
+
+ Contact name: Philip Zimmermann <prz@mit.edu>
+
+ Attribute name: "zrtp-hash"
+
+ Type of attribute: Media level
+
+ Subject to charset: Not
+
+ Purpose of attribute: The 'zrtp-hash' indicates that a UA supports
+ the ZRTP protocol and provides a hash of the
+ ZRTP Hello message. The ZRTP protocol
+ version number is also specified.
+
+ Allowed attribute values: Hex
+
+14. Media Security Requirements
+
+ This section discuses how ZRTP meets all RTP security requirements
+ discussed in the Media Security Requirements [RFC5479] document
+ without any dependencies on other protocols or extensions, unlike
+ DTLS-SRTP [RFC5764] which requires additional protocols and
+ mechanisms.
+
+ R-FORK-RETARGET is met since ZRTP is a media path key agreement
+ protocol.
+
+ R-DISTINCT is met since ZRTP uses ZIDs and allows multiple
+ independent ZRTP exchanges to proceed.
+
+ R-HERFP is met since ZRTP is a media path key agreement protocol.
+
+ R-REUSE is met using the Multistream and Preshared modes.
+
+ R-AVOID-CLIPPING is met since ZRTP is a media path key agreement
+ protocol.
+
+ R-RTP-CHECK is met since the ZRTP packet format does not pass the
+ RTP validity check.
+
+ R-ASSOC is met using the a=zrtp-hash SDP attribute in INVITEs and
+ responses (Section 8.1).
+
+ R-NEGOTIATE is met using the Commit message.
+
+
+
+Zimmermann, et al. Informational [Page 102]
+
+RFC 6189 ZRTP April 2011
+
+
+ R-PSTN is met since ZRTP can be implemented in Gateways.
+
+ R-PFS is met using ZRTP Diffie-Hellman key agreement methods.
+
+ R-COMPUTE is met using the Hello/Commit ZRTP exchange.
+
+ R-CERTS is met using the verbal comparison of the SAS.
+
+ R-FIPS is met since ZRTP uses only FIPS-approved algorithms in all
+ relevant categories. The authors believe ZRTP is compliant with
+ [NIST-SP800-56A], [NIST-SP800-108], [FIPS-198-1], [FIPS-180-3],
+ [NIST-SP800-38A], [FIPS-197], and [NSA-Suite-B], which should meet
+ the FIPS-140 validation requirements set by [FIPS-140-2-Annex-A]
+ and [FIPS-140-2-Annex-D].
+
+ R-DOS is met since ZRTP does not introduce any new denial-of-
+ service attacks.
+
+ R-EXISTING is met since ZRTP can support the use of certificates
+ or keys.
+
+ R-AGILITY is met since the set of hash, cipher, SRTP
+ authentication tag type, key agreement method, SAS type, and
+ signature type can all be extended and negotiated.
+
+ R-DOWNGRADE is met since ZRTP has protection against downgrade
+ attacks.
+
+ R-PASS-MEDIA is met since ZRTP prevents a passive adversary with
+ access to the media path from gaining access to keying material
+ used to protect SRTP media packets.
+
+ R-PASS-SIG is met since ZRTP prevents a passive adversary with
+ access to the signaling path from gaining access to keying
+ material used to protect SRTP media packets.
+
+ R-SIG-MEDIA is met using the a=zrtp-hash SDP attribute in INVITEs
+ and responses.
+
+ R-ID-BINDING is met using the a=zrtp-hash SDP attribute
+ (Section 8.1).
+
+ R-ACT-ACT is met using the a=zrtp-hash SDP attribute in INVITEs
+ and responses.
+
+ R-BEST-SECURE is met since ZRTP utilizes the RTP/AVP profile and
+ hence best effort SRTP in every case.
+
+
+
+
+Zimmermann, et al. Informational [Page 103]
+
+RFC 6189 ZRTP April 2011
+
+
+ R-OTHER-SIGNALING is met since ZRTP can utilize modes in which
+ there is no dependency on the signaling path.
+
+ R-RECORDING is met using the ZRTP Disclosure flag.
+
+ R-TRANSCODER is met if the transcoder operates as a trusted MitM
+ (i.e., a PBX).
+
+ R-ALLOW-RTP is met due to ZRTP's best effort encryption.
+
+15. Security Considerations
+
+ This document is all about securely keying SRTP sessions. As such,
+ security is discussed in every section.
+
+ Most secure phones rely on a Diffie-Hellman exchange to agree on a
+ common session key. But since DH is susceptible to a MiTM attack, it
+ is common practice to provide a way to authenticate the DH exchange.
+ In some military systems, this is done by depending on digital
+ signatures backed by a centrally managed PKI. A decade of industry
+ experience has shown that deploying centrally managed PKIs can be a
+ painful and often futile experience. PKIs are just too messy and
+ require too much activation energy to get them started. Setting up a
+ PKI requires somebody to run it, which is not practical for an
+ equipment provider. A service provider, like a carrier, might
+ venture down this path, but even then you have to deal with cross-
+ carrier authentication, certificate revocation lists, and other
+ complexities. It is much simpler to avoid PKIs altogether,
+ especially when developing secure commercial products. It is
+ therefore more common for commercial secure phones in the PSTN world
+ to augment the DH exchange with a Short Authentication String (SAS)
+ combined with a hash commitment at the start of the key exchange, to
+ shorten the length of SAS material that must be read aloud. No PKI
+ is required for this approach to authenticating the DH exchange. The
+ AT&T TSD 3600, Eric Blossom's COMSEC secure phones [comsec],
+ [PGPfone], and the GSMK CryptoPhone are all examples of products that
+ took this simpler lightweight approach. The main problem with this
+ approach is inattentive users who may not execute the voice
+ authentication procedure.
+
+ Some questions have been raised about voice spoofing during the short
+ authentication string (SAS) comparison. But it is a mistake to think
+ this is simply an exercise in voice impersonation (perhaps this could
+ be called the "Rich Little" attack). Although there are digital
+ signal processing techniques for changing a person's voice, that does
+ not mean a MiTM attacker can safely break into a phone conversation
+ and inject his own SAS at just the right moment. He doesn't know
+ exactly when or in what manner the users will choose to read aloud
+
+
+
+Zimmermann, et al. Informational [Page 104]
+
+RFC 6189 ZRTP April 2011
+
+
+ the SAS, or in what context they will bring it up or say it, or even
+ which of the two speakers will say it, or if indeed they both will
+ say it. In addition, some methods of rendering the SAS involve using
+ a list of words such as the PGP word list[Juola2], in a manner
+ analogous to how pilots use the NATO phonetic alphabet to convey
+ information. This can make it even more complicated for the
+ attacker, because these words can be worked into the conversation in
+ unpredictable ways. If the session also includes video (an
+ increasingly common usage scenario), the MiTM may be further deterred
+ by the difficulty of making the lips sync with the voice-spoofed SAS.
+ The PGP word list is designed to make each word phonetically
+ distinct, which also tends to create distinctive lip movements.
+ Remember that the attacker places a very high value on not being
+ detected, and if he makes a mistake, he doesn't get to do it over.
+
+ A question has been raised regarding the safety of the SAS procedure
+ for people who don't know each other's voices, because it may allow
+ an attack from a MiTM even if he lacks voice impersonation
+ capabilities. This is not as much of a problem as it seems, because
+ it isn't necessary that users recognize each other by their voice.
+ It is only necessary that they detect that the voice used for the SAS
+ procedure doesn't match the voice in the rest of the phone
+ conversation.
+
+ Special consideration must be given to secure phone calls with
+ automated systems that cannot perform a verbal SAS comparison between
+ two humans (e.g., a voice mail system). If a well-functioning PKI is
+ available to all parties, it is recommended that credentials be
+ provisioned at the automated system sufficient to use one of the
+ automatic MiTM detection mechanisms from Section 8.1.1 or
+ Section 7.2. Or rely on a previously established cached shared
+ secret (pbxsecret or rs1 or both), backed by a human-executed SAS
+ comparison during an initial call. Note that it is worse than
+ useless and absolutely unsafe to rely on a robot voice from the
+ remote endpoint to compare the SAS, because a robot voice can be
+ trivially forged by a MiTM. However, a robot voice may be safe to
+ use strictly locally for a different purpose. A ZRTP user agent may
+ render its locally computed SAS to the local user via a robot voice
+ if no visual display is available, provided the user can readily
+ determine that the robot voice is generated locally, not from the
+ remote endpoint.
+
+ A popular and field-proven approach to MiTM protection is used by SSH
+ (Secure Shell) [RFC4251], which Peter Gutmann likes to call the "baby
+ duck" security model. SSH establishes a relationship by exchanging
+ public keys in the initial session, when we assume no attacker is
+ present, and this makes it possible to authenticate all subsequent
+ sessions. A successful MiTM attacker has to have been present in all
+
+
+
+Zimmermann, et al. Informational [Page 105]
+
+RFC 6189 ZRTP April 2011
+
+
+ sessions all the way back to the first one, which is assumed to be
+ difficult for the attacker. ZRTP's key continuity features are
+ actually better than SSH, at least for VoIP, for reasons described in
+ Section 15.1. All this is accomplished without resorting to a
+ centrally managed PKI.
+
+ We use an analogous baby duck security model to authenticate the DH
+ exchange in ZRTP. We don't need to exchange persistent public keys,
+ we can simply cache a shared secret and re-use it to authenticate a
+ long series of DH exchanges for secure phone calls over a long period
+ of time. If we verbally compare just one SAS, and then cache a
+ shared secret for later calls to use for authentication, no new voice
+ authentication rituals need to be executed. We just have to remember
+ we did one already.
+
+ If one party ever loses this cached shared secret, it is no longer
+ available for authentication of DH exchanges. This cache mismatch
+ situation is easy to detect by the party that still has a surviving
+ shared secret cache entry. If it fails to match, either there is a
+ MiTM attack or one side has lost their shared secret cache entry.
+ The user agent that discovers the cache mismatch must alert the user
+ that a cache mismatch has been detected, and that he must do a verbal
+ comparison of the SAS to distinguish if the mismatch is because of a
+ MiTM attack or because of the other party losing her cache (normative
+ language is in Section 4.3.2). Voice confirmation is absolutely
+ essential in this situation. From that point on, the two parties
+ start over with a new cached shared secret. Then, they can go back
+ to omitting the voice authentication on later calls.
+
+ Precautions must be observed when using a trusted MiTM device such as
+ a trusted PBX, as described in Section 7.3. Make sure you really
+ trust that this PBX will never be compromised before establishing it
+ as a trusted MiTM, because it is in a position to wiretap calls for
+ any phone that trusts it. It is "licensed" to be in a position to
+ wiretap. You are safer to try to arrange the connection topology to
+ route the media directly between the two ZRTP peers, not through a
+ trusted PBX. Real end-to-end encryption is preferred.
+
+ The security of the SAS mechanism depends on the user verifying it
+ verbally with his peer at the other endpoint. There is some risk the
+ user will not be so diligent and may ignore the SAS. For a
+ discussion on how users become habituated to security warnings in the
+ PKI certificate world, see [Sunshine]. Part of the problems
+ discussed in that paper are from the habituation syndrome common to
+ most warning messages, and part of them are from the fact that users
+ simply don't understand trust models. Fortunately, ZRTP doesn't need
+ a trust model to use the SAS mechanism, so it's easier for the user
+ to grasp the idea of comparing the SAS verbally with the other party;
+
+
+
+Zimmermann, et al. Informational [Page 106]
+
+RFC 6189 ZRTP April 2011
+
+
+ it's easier than understanding a trust model, at least. Also, the
+ verbal comparison of the SAS gets both users involved, and they will
+ notice a mismatch of the SAS. Also, the ZRTP user agent will know
+ when the SAS has been previously verified because of the SAS verified
+ flag (V) (Section 7.1), and only ask the user to verify it when
+ needed. After it has been verified once, the key continuity features
+ make it unnecessary to verify it again.
+
+15.1. Self-Healing Key Continuity Feature
+
+ The key continuity features of ZRTP are analogous to those provided
+ by SSH (Secure Shell) [RFC4251], but they differ in one respect. SSH
+ caches public signature keys that never change, and uses a permanent
+ private signature key that must be guarded from disclosure. If
+ someone steals your SSH private signature key, they can impersonate
+ you in all future sessions and can mount a successful MiTM attack any
+ time they want.
+
+ ZRTP caches symmetric key material used to compute secret session
+ keys, and these values change with each session. If someone steals
+ your ZRTP shared secret cache, they only get one chance to mount a
+ MiTM attack, in the very next session. If they miss that chance, the
+ retained shared secret is refreshed with a new value, and the window
+ of vulnerability heals itself, which means they are locked out of any
+ future opportunities to mount a MiTM attack. This gives ZRTP a
+ "self-healing" feature if any cached key material is compromised.
+
+ A MiTM attacker must always be in the media path. This presents a
+ significant operational burden for the attacker in many VoIP usage
+ scenarios, because being in the media path for every call is often
+ harder than being in the signaling path. This will likely create
+ coverage gaps in the attacker's opportunities to mount a MiTM attack.
+ ZRTP's self-healing key continuity features are better than SSH at
+ exploiting any temporary gaps in MiTM attack opportunities. Thus,
+ ZRTP quickly recovers from any disclosure of cached key material.
+
+ In systems that use a persistent private signature key, such as SSH,
+ the stored signature key is usually protected from disclosure by
+ encryption that requires a user-supplied high-entropy passphrase.
+ This arrangement may be acceptable for a diligent user with a desktop
+ computer sitting in an office with a full ASCII keyboard. But it
+ would be prohibitively inconvenient and unsafe to type a high-entropy
+ passphrase on a mobile phone's numeric keypad while driving a car.
+ Users will reject any scheme that requires the use of a passphrase on
+ such a platform, which means mobile phones carry an elevated risk of
+ compromise of stored key material, and thus would especially benefit
+ from the self-healing aspects of ZRTP's key continuity features.
+
+
+
+
+Zimmermann, et al. Informational [Page 107]
+
+RFC 6189 ZRTP April 2011
+
+
+ The infamous Debian OpenSSL weak key vulnerability [dsa-1571]
+ (discovered and patched in May 2008) offers a real-world example of
+ why ZRTP's self-healing scheme is a good way to do key continuity.
+ The Debian bug resulted in the production of a lot of weak SSH (and
+ TLS/SSL) keys, which continued to compromise security even after the
+ bug had been patched. In contrast, ZRTP's key continuity scheme adds
+ new entropy to the cached key material with every call, so old
+ deficiencies in entropy are washed away with each new session.
+
+ It should be noted that the addition of shared secret entropy from
+ previous sessions can extend the strength of the new session key to
+ AES-256 levels, even if the new session uses Diffie-Hellman keys no
+ larger than DH-3072 or ECDH-256, provided the cached shared secrets
+ were initially established when the wiretapper was not present. This
+ is why AES-256 MAY be used with the smaller DH key sizes in
+ Section 5.1.5, despite the key strength comparisons in Table 2 of
+ [NIST-SP800-57-Part1].
+
+ Caching shared symmetric key material is also less CPU intensive
+ compared with using digital signatures, which may be important for
+ low-power mobile platforms.
+
+ Unlike the long-lived non-updated key material used by SSH, the
+ dynamically updated shared secrets of ZRTP may lose sync if
+ traditional backup/restore mechanisms are used. This limitation is a
+ consequence of the otherwise beneficial aspects of this approach to
+ key continuity, and it is partially mitigated by ZRTP's built-in
+ cache backup logic (Section 4.6.1).
+
+16. Acknowledgments
+
+ The authors would like to thank Bryce "Zooko" Wilcox-O'Hearn and
+ Colin Plumb for their contributions to the design of this protocol.
+ Also, thanks to Hal Finney, Viktor Krikun, Werner Dittmann, Dan Wing,
+ Sagar Pai, David McGrew, Colin Perkins, Dan Harkins, David Black, Tim
+ Polk, Richard Harris, Roni Even, Jon Peterson, and Robert Sparks for
+ their helpful comments and suggestions. Thanks to Lily Chen at NIST
+ for her assistance in ensuring compliance with NIST SP800-56A and
+ SP800-108.
+
+ The use of one-way hash chains to key HMACs in ZRTP is similar to
+ Adrian Perrig's TESLA protocol [TESLA].
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 108]
+
+RFC 6189 ZRTP April 2011
+
+
+17. References
+
+17.1. Normative References
+
+ [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-
+ Hashing for Message Authentication", RFC 2104,
+ February 1997.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+ [RFC3526] Kivinen, T. and M. Kojo, "More Modular Exponential (MODP)
+ Diffie-Hellman groups for Internet Key Exchange (IKE)",
+ RFC 3526, May 2003.
+
+ [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
+ Jacobson, "RTP: A Transport Protocol for Real-Time
+ Applications", STD 64, RFC 3550, July 2003.
+
+ [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+ Norrman, "The Secure Real-time Transport Protocol (SRTP)",
+ RFC 3711, March 2004.
+
+ [RFC4231] Nystrom, M., "Identifiers and Test Vectors for HMAC-SHA-
+ 224, HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512",
+ RFC 4231, December 2005.
+
+ [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
+ Description Protocol", RFC 4566, July 2006.
+
+ [RFC4880] Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R.
+ Thayer, "OpenPGP Message Format", RFC 4880, November 2007.
+
+ [RFC4960] Stewart, R., "Stream Control Transmission Protocol",
+ RFC 4960, September 2007.
+
+ [RFC5114] Lepinski, M. and S. Kent, "Additional Diffie-Hellman
+ Groups for Use with IETF Standards", RFC 5114,
+ January 2008.
+
+ [RFC5479] Wing, D., Fries, S., Tschofenig, H., and F. Audet,
+ "Requirements and Analysis of Media Security Management
+ Protocols", RFC 5479, April 2009.
+
+ [RFC5759] Solinas, J. and L. Zieglar, "Suite B Certificate and
+ Certificate Revocation List (CRL) Profile", RFC 5759,
+ January 2010.
+
+
+
+
+Zimmermann, et al. Informational [Page 109]
+
+RFC 6189 ZRTP April 2011
+
+
+ [RFC6188] McGrew, D., "The Use of AES-192 and AES-256 in Secure
+ RTP", RFC 6188, March 2011.
+
+ [FIPS-140-2-Annex-A]
+ "Annex A: Approved Security Functions for FIPS PUB 140-2",
+ NIST FIPS PUB 140-2 Annex A, January 2011.
+
+ [FIPS-140-2-Annex-D]
+ "Annex D: Approved Key Establishment Techniques for FIPS
+ PUB 140-2", NIST FIPS PUB 140-2 Annex D, January 2011.
+
+ [FIPS-180-3]
+ "Secure Hash Standard (SHS)", NIST FIPS PUB 180-3, October
+ 2008.
+
+ [FIPS-186-3]
+ "Digital Signature Standard (DSS)", NIST FIPS PUB 186-
+ 3, June 2009.
+
+ [FIPS-197] "Advanced Encryption Standard (AES)", NIST FIPS PUB
+ 197, November 2001.
+
+ [FIPS-198-1]
+ "The Keyed-Hash Message Authentication Code (HMAC)", NIST
+ FIPS PUB 198-1, July 2008.
+
+ [NIST-SP800-38A]
+ Dworkin, M., "Recommendation for Block Cipher Modes of
+ Operation", NIST Special Publication 800-38A, 2001
+ Edition.
+
+ [NIST-SP800-56A]
+ Barker, E., Johnson, D., and M. Smid, "Recommendation for
+ Pair-Wise Key Establishment Schemes Using Discrete
+ Logarithm Cryptography", NIST Special Publication 800-
+ 56A Revision 1, March 2007.
+
+ [NIST-SP800-90]
+ Barker, E. and J. Kelsey, "Recommendation for Random
+ Number Generation Using Deterministic Random Bit
+ Generators", NIST Special Publication 800-90 (Revised),
+ March 2007.
+
+ [NIST-SP800-108]
+ Chen, L., "Recommendation for Key Derivation Using
+ Pseudorandom Functions", NIST Special Publication 800-
+ 108, October 2009.
+
+
+
+
+Zimmermann, et al. Informational [Page 110]
+
+RFC 6189 ZRTP April 2011
+
+
+ [NSA-Suite-B]
+ "NSA Suite B Cryptography", NSA Information Assurance
+ Directorate, NSA Suite B Cryptography.
+
+ [NSA-Suite-B-Guide-56A]
+ "Suite B Implementer's Guide to NIST SP 800-56A", Suite B
+ Implementer's Guide to NIST SP 800-56A, 28 July 2009.
+
+ [TwoFish] Schneier, B., Kelsey, J., Whiting, D., Hall, C., and N.
+ Ferguson, "Twofish: A 128-Bit Block Cipher", June 1998,
+ <http://www.schneier.com/paper-twofish-paper.html>.
+
+ [Skein] Ferguson, N., Lucks, S., Schneier, B., Whiting, D.,
+ Bellare, M., Kohno, T., Callas, J., and J. Walker, "The
+ Skein Hash Function Family, Version 1.3 - 1 Oct 2010", <ht
+ tp://www.skein-hash.info/sites/default/files/
+ skein1.3.pdf>.
+
+ [pgpwordlist]
+ "PGP Word List", December 2010, <http://en.wikipedia.org/
+ w/index.php?title=PGP_word_list&oldid=400752943>.
+
+17.2. Informative References
+
+ [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
+ November 1990.
+
+ [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
+ for IP version 6", RFC 1981, August 1996.
+
+ [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
+ A., Peterson, J., Sparks, R., Handley, M., and E.
+ Schooler, "SIP: Session Initiation Protocol", RFC 3261,
+ June 2002.
+
+ [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header",
+ RFC 3514, April 1 2003.
+
+ [RFC3824] Peterson, J., Liu, H., Yu, J., and B. Campbell, "Using
+ E.164 numbers with the Session Initiation Protocol (SIP)",
+ RFC 3824, June 2004.
+
+ [RFC4086] Eastlake, D., Schiller, J., and S. Crocker, "Randomness
+ Requirements for Security", BCP 106, RFC 4086, June 2005.
+
+ [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH)
+ Protocol Architecture", RFC 4251, January 2006.
+
+
+
+
+Zimmermann, et al. Informational [Page 111]
+
+RFC 6189 ZRTP April 2011
+
+
+ [RFC4474] Peterson, J. and C. Jennings, "Enhancements for
+ Authenticated Identity Management in the Session
+ Initiation Protocol (SIP)", RFC 4474, August 2006.
+
+ [RFC4475] Sparks, R., Hawrylyshen, A., Johnston, A., Rosenberg, J.,
+ and H. Schulzrinne, "Session Initiation Protocol (SIP)
+ Torture Test Messages", RFC 4475, May 2006.
+
+ [RFC4567] Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E.
+ Carrara, "Key Management Extensions for Session
+ Description Protocol (SDP) and Real Time Streaming
+ Protocol (RTSP)", RFC 4567, July 2006.
+
+ [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session
+ Description Protocol (SDP) Security Descriptions for Media
+ Streams", RFC 4568, July 2006.
+
+ [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol
+ (SIP) Call Control - Conferencing for User Agents",
+ BCP 119, RFC 4579, August 2006.
+
+ [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
+ January 2008.
+
+ [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment
+ (ICE): A Protocol for Network Address Translator (NAT)
+ Traversal for Offer/Answer Protocols", RFC 5245,
+ April 2010.
+
+ [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer
+ Security (DTLS) Extension to Establish Keys for the Secure
+ Real-time Transport Protocol (SRTP)", RFC 5764, May 2010.
+
+ [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand
+ Key Derivation Function (HKDF)", RFC 5869, May 2010.
+
+ [RFC6090] McGrew, D., Igoe, K., and M. Salter, "Fundamental Elliptic
+ Curve Cryptography Algorithms", RFC 6090, February 2011.
+
+ [SRTP-AES-GCM]
+ McGrew, D., "AES-GCM and AES-CCM Authenticated Encryption
+ in Secure RTP (SRTP)", Work in Progress, January 2011.
+
+ [ECC-OpenPGP]
+ Jivsov, A., "ECC in OpenPGP", Work in Progress,
+ March 2011.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 112]
+
+RFC 6189 ZRTP April 2011
+
+
+ [VBR-AUDIO]
+ Perkins, C. and J. Valin, "Guidelines for the use of
+ Variable Bit Rate Audio with Secure RTP", Work
+ in Progress, December 2010.
+
+ [SIP-IDENTITY]
+ Wing, D. and H. Kaplan, "SIP Identity using Media Path",
+ Work in Progress, February 2008.
+
+ [NIST-SP800-57-Part1]
+ Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid,
+ "Recommendation for Key Management - Part 1: General
+ (Revised)", NIST Special Publication 800-57 - Part
+ 1 Revised March 2007.
+
+ [NIST-SP800-131A]
+ Barker, E. and A. Roginsky, "Recommendation for the
+ Transitioning of Cryptographic Algorithms and Key
+ Lengths", NIST Special Publication 800-131A January 2011.
+
+ [SHA-3] "Cryptographic Hash Algorithm Competition", NIST Computer
+ Security Resource Center Cryptographic Hash Project.
+
+ [Skein1] "The Skein Hash Function Family - Web site",
+ <http://www.skein-hash.info/>.
+
+ [XEP-0262] Saint-Andre, P., "Use of ZRTP in Jingle RTP Sessions", XSF
+ XEP 0262, August 2010.
+
+ [Ferguson] Ferguson, N. and B. Schneier, "Practical Cryptography",
+ Wiley Publishing, 2003.
+
+ [Juola1] Juola, P. and P. Zimmermann, "Whole-Word Phonetic
+ Distances and the PGPfone Alphabet", Proceedings of the
+ International Conference of Spoken Language Processing
+ (ICSLP-96), 1996.
+
+ [Juola2] Juola, P., "Isolated Word Confusion Metrics and the
+ PGPfone Alphabet", Proceedings of New Methods in Language
+ Processing, 1996.
+
+ [PGPfone] Zimmermann, P., "PGPfone", July 1996,
+ <http://philzimmermann.com/docs/pgpfone10b7.pdf>.
+
+ [Zfone] Zimmermann, P., "Zfone Project", 2006,
+ <http://www.philzimmermann.com/zfone>.
+
+
+
+
+
+Zimmermann, et al. Informational [Page 113]
+
+RFC 6189 ZRTP April 2011
+
+
+ [Byzantine]
+ "The Two Generals' Problem", March 2011, <http://
+ en.wikipedia.org/w/
+ index.php?title=Two_Generals%27_Problem&oldid=417855753>.
+
+ [TESLA] Perrig, A., Canetti, R., Tygar, J., and D. Song, "The
+ TESLA Broadcast Authentication Protocol", October 2002, <h
+ ttp://www.ece.cmu.edu/~adrian/projects/tesla-cryptobytes/
+ tesla-cryptobytes.pdf>.
+
+ [comsec] Blossom, E., "The VP1 Protocol for Voice Privacy Devices
+ Version 1.2", <http://www.comsec.com/vp1-protocol.pdf>.
+
+ [Wright1] Wright, C., Ballard, L., Coull, S., Monrose, F., and G.
+ Masson, "Spot me if you can: Uncovering spoken phrases in
+ encrypted VoIP conversations", Proceedings of the 2008
+ IEEE Symposium on Security and Privacy 2008,
+ <http://cs.jhu.edu/~cwright/oakland08.pdf>.
+
+ [Sunshine] Sunshine, J., Egelman, S., Almuhimedi, H., Atri, N., and
+ L. Cranor, "Crying Wolf: An Empirical Study of SSL Warning
+ Effectiveness", USENIX Security Symposium 2009,
+ <http://lorrie.cranor.org/pubs/sslwarnings.pdf>.
+
+ [dsa-1571] "Debian Security Advisory - OpenSSL predictable random
+ number generator", May 2008,
+ <http://www.debian.org/security/2008/dsa-1571>.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 114]
+
+RFC 6189 ZRTP April 2011
+
+
+Authors' Addresses
+
+ Philip Zimmermann
+ Zfone Project
+ Santa Cruz, California
+
+ EMail: prz@mit.edu
+ URI: http://philzimmermann.com
+
+
+ Alan Johnston (editor)
+ Avaya
+ St. Louis, MO 63124
+
+ EMail: alan.b.johnston@gmail.com
+
+
+ Jon Callas
+ Apple, Inc.
+
+ EMail: jon@callas.org
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zimmermann, et al. Informational [Page 115]
+