Secure RTP Frequently Asked Questions

Mark Baugher, David McGrew, Mats Näslund and Karl Norrman, October 27, 2005

Question Index

Encryption [4] [5] [14] [21] [24] [28] [38] [40]
Message authentication/integrity [3] [5] [14] [27] [29] [35]
Keys [1] [4] [5] [10] [14] [16] [22] [23] [25] [30] [31] [32] [33] [35] [39]
RTP [17] [19] [20] [21] [34]
IPsec [18] [22]
SRTP/SRTCP packets [13] [18] [29] [34]
SRTP/SRTCP header fields [6] [8] [11] [15] [19] [26] [29] [37]
Implementations [2] [7] [36]
Applications [9] [12] [42]
Resources [41]
Forward error correction (FEC) [9]

Questions and Answers

Q1. I don't understand how the key derivation works. It seems that if I derive a key every r:th packet as stated, will I get out of synch if I loose some packets?

A. In general, packet loss is not a problem for SRTP key derivation because the r:th packet is the packet with the r:th packet index: The receiver will extract the sequence number from the first SRTP packet that it receives, compute the packet index (see section 3.3.1 of RFC 3711), and compare the packet index modulo r to zero (see section 4.3.1 ibid.). The packet index is a 48-bit quantity that has the rollover counter (ROC) in the high-order 32 bits and the sequence number in the low-order 16 bits. If the ROC is synchronized between sender and receiver, there is no problem. There is a problem, however, if the ROC values diverge between sender and receiver. In fact, a mismatch of ROC values, other than a difference of plus or minus one, is a problem regardless of the setting of the SRTP key derivation rate as explained below.

Q2. How big is an SRTP context?

A. In software, each AES-128 key takes about 160 bytes of information (that's the expanded key, 128 bits times ten rounds). The HMAC-SHA1 context will take up at least 20 bytes, but perhaps more. The AES algorithm requires about 4kilobytes worth of tables, but this is not key-dependent data; only one instance of these tables need to be maintained in software, independent of the number of keys that are stored. Note that more aggressive AES implementations that have a considerably larger per-key storage requirement.

Q3. The SRTP authentication tag greatly reduces channel capacity. How dangerous is it to use null authentication for SRTP?

A. It is dangerous: Owing to the properties of SRTP's predefined ciphers, an attacker can deduce the keystream bits that encrypted the ciphertext whenever the underlying message plaintext is known. Armed with this knowledge, the attacker can change the ciphertext to decrypt to any plaintext that the attacker chooses. For voice calls, the threat might be small to non-existent, but in general the threat is substantial: IP telephony includes DTMF and other types of data that might include financial transactions or otherwise sensitive information. If the known plaintext were a credit card number, for example, an attacker might be able to alter the number; this attack is easily detectable with SRTP message authentication and undetectable without it. In many cases, it is not always clear that the bearer channel will be used only for a media type that can safely use a zero-length tag, and some systems change the payload type during the call. Thus, a zero-length authentication tag is in general a dangerous option and should be avoided.

For a litany of potential problems with null authentication, see Section 9.5.1 of RFC 3711, Risks of Weak or Null Message Authentication.

Q4. I don't get the concept of "key stream prefix." As specified, it is always set to "null."

A. The keystream prefix is not used by any SRTP pre-defined message authentication code (MAC). The prefix is there to accommodate future MACs, especially ones based on universal hashing. These MACs are interesting because they offer good performance and provable security properties. See the references to Carter-Wegman hashing, or universal hashing, on http://www.cs.ut.ee/~helger/crypto/link/mac/, for example. The availability of the keystream prefix admits some significant optimizations for such MACs, which is why it was included.

Q5. What is the software overhead due to SRTP?

A. As a rough guide, here are the performance numbers for libSRTP on a 1.67 Ghz PowerPC G4 processor, using the default cryptoalgorithms (AES-128 Counter Mode and HMAC-SHA1), for various packet sizes:

RTP payload length (bytes) 16 32 64 128 256 512 1024 2048

Throughput (megabits/second) encryption only
183

213

223

238

243

245

248

249

encryption and authentication
37.6

64.0

88.3

119

142

159

168

173

authentication only
38.8

88.3

138.4

233

341

436

512

569

This information was produced by running test/srtp_driver -t.

It is interesting that HMAC-SHA1 is several times slower than AES for small packets, though it is over twice as fast for large packets. This is because the HMAC construction performs a significant amount of computation for each authentication tag that it produces, in addition to the computation that it does for each byte of input.

Q6. The use of implicit synchronization via ROC seems dangerous. Can senders and receivers lose ROC synchronization?

A. It is possible to lose ROC synchronization between sender and receiver(s), though it is not likely in practice, and practical steps can be taken to avoid it. A burst loss of 2¹⁶ packets or more will always break synchronization. For example, a conversational voice codec that sends 50 packets per second will have its ROC increment about every 22 minutes. A network with a burst of packet loss that long has problems other than ROC synchronization.

There is a higher sensitivity to loss at the very outset of an SRTP stream. If the sender's initial sequence number is close to the maximum value of 2¹⁶ -1, and all packets are lost from the initial packet until the sequence number cycles back to zero, the sender will increment its ROC, but the receiver will not. The receiver cannot determine that the initial packets were lost and that sequence-number rollover has occurred. In this case, the receiver's ROC would be zero whereas the sender's ROC would be one, while their sequence numbers would be so close that the ROC-guessing algorithm could not detect this fact.

There is a simple solution to this problem: the SRTP sender should randomly select an initial sequence number that is always less than 2¹⁵. This ensures correct SRTP operation so long as fewer than 2¹⁵ initial packets are lost in succession, which is within the maximum tolerance of SRTP packet-index determination (see Appendix A and page 14, first paragraph of RFC 3711). An SRTP receiver should carefully implement the index-guessing algorithm. A naive implementation can unintentionally guess the value of 0xffffffffffffLL whenever the SEQ in the packet is greater than 2^15 and the locally stored SEQ and ROC are zero. (This can happen when the implementation fails to treat those zero values as a special case.)

When ROC synchronization is lost, the receiver will not be able to properly process the packets. If anti-replay protection is turned on, then the desynchronization will appear as a burst of replay check failures. Otherwise, if authentication is being checked, then it will appear as a burst of authentication failures. Otherwise, if encryption is being used, the desynchronization may not be detected by the SRTP layer, and the packets may be improperly decrypted.

Q7. Where can I get test packets for SRTP?

A. libSRTP uses reference packets in test/srtp_driver.c. Look in the function srtp_validate(), or run test/srtp_driver with the arguments -v -d driver to cause it to print out the reference packets while it runs the validation function. Adding the argument -d srtp will cause it to also print out SRTP information, including the master key and derived keys. Currently, there are no published SRTCP test packets.

Q8. Re-keying can be based on a <From, To> index concept, or, using an MKI. How do I choose? Do we need both?

A. To be clear, one never uses both the <From, To> and the MKI. Both feature are optional, they perform an identical function, and cannot be used together. It is likely true, moreover, that few applications have need for either feature. <From, To> and the MKI signal that the sender has started using a new master key, which has been established by key management along with its <From, To> or MKI value. For IP telephony and most security applications, there is no need to switch from one master key to another because the lifetime of a master key greatly exceeds the lifetime of an SRTP session. For continuous IPTV broadcast streams, however, the session might outlive the key lifetime when the session never terminates and carries very high packet-rate media such as high-definition video. Furthermore, it is prudent to automatically refresh a key after an extended period time regardless of the cryptographic lifetime of the key. In addition to its security uses, moreover, some systems rotate keys as a technical protection measure. Key rotation is commonly done by cable and satellite conditional access systems: Scientific Atlanta's PowerVu systems can change the key ("control word") twice per second (see Figure 2 of the PowerVu white paper). When packets are small, <From, To> is much more efficient than the MKI. For large video packets, however, some vendors may choose to use the MKI to convey additional information (see the answer about MKI length below), and this is analogous to features found in some of the proprietary conditional access systems today. Most applications, however, do not need to use either <From, To> or an MKI since an SRTP master key lifetime may be as long as 2⁴⁸ SRTP packets.

Q9. How does packet-level Forward Error Correction (RTP FEC) and SRTP work?

A. SRTP only defines the default order of the combined processing (first FEC, then SRTP, at sender side).

RFC 3711 does not include normative language about the case when the FEC stream is separated from the original RTP stream, which is one possibility in RFC 2733, and the latter uses encryption. For this case the separate FEC stream MUST be encrypted if the original RTP stream is encrypted. This is because an unencrypted FEC stream for a separate encrypted stream leaks information that violates the confidentiality provided by the encryption. RFC2733 discusses this issue and says that the separate FEC stream SHOULD be encrypted. It is planned to change this SHOULD to a MUST in a future revision of the SRTP specification.

Additionally, there are constraints on how master keys can be shared in some FEC scenarios. SRTP mandates that different RTP sessions must use separate master keys, and the sharing of the master key within the same RTP session requires that SSRCs values are unique (i.e. via external coordination). RFC2733 is open in its definition of what a "separate stream" is. The following cases are both possible:

If the RTP stream and the FEC stream have different SSRCs, and are in the same RTP session, then they MAY share the same master key. Note, in case they share it, the FEC stream and the original RTP stream simply use the same session keys.

If the RTP stream and the FEC stream have the same SSRC, and are in different RTP sessions, they MUST use different master keys.

Q10. How can we be sure the sender and receiver uses the same cryptographic transforms? For instance, what if the sender choose counter mode encryption, and the receiver chooses f8?

A. Information such as the cipher, mode and key length are parameters that are typically established with the SRTP master key (see the answer on SRTP key management protocols below).

Q11. What is the length of the MKI?

A. RFC 3711 does not specify the length of the MKI field, though it does require that it be fixed for each SRTP session. Additionally, it does not set a bound on this length, though it is regarded by many that a four-octet MKI is sufficient; this is the length of the IPSec ESP SPI.

The SDP Security Descriptions requires that the MKI be between zero and 128 bytes.

Q12. I would like to use SRTP for Digital Rights Management. Can I do that?

A. SRTP can be used as a component of a DRM system, though it is not in and of itself a DRM technology. SRTP is a security protocol and its security rests on the protection of keying material at all endpoints. This protection can only be assured if the human being at each endpoint has a vested interest to abide by policies that maintain the secrecy of cryptographic keys, or if the endpoint holds its keys inside a tamper-resistant module. Some users of DRM systems, however, are less motivated to abide by policies that apply to licensed data than they are to their personal data. For this reason, many if not most DRM systems cannot be considered to be security systems. Nonetheless, DRM systems use cryptographic protocols for encrypting streams and files as well as for establishing encryption keys (some systems even recognize the importance of message authentication). In this regard, SRTP is at least as good as any cryptographic protocol for adding confidentiality and message authentication to RTP streams. A system that manages rights for media streams, therefore, might choose to use SRTP to protect against the snooping of the media that are in transit through a network (for the encryption and message authentication of MPEG-4 files and streams, see the ISMAcryp specification for the Encryption and Authentication of MPEG-4 media).

Q13. Is there a max possible size of an SRTP packet?

A. The largest possible payload, when the default encryption transform of AES Counter Mode is used, is at most 2¹⁶ AES blocks, or 2²⁰ bytes (RFC 3711, pg. 21).

Q14. How does key derivation work when key_derivation_rate != 0?

A. The key derivation mechanism provides a means of periodically re-deriving keys based on the packet index. Due to packet loss or reorder it may not suffice for a receiver to only perform derivation when "index mod key_derivation_rate equals zero". This informative text in RFC 3711 describes the principle behind the index-based key derivation, but was not intended as mandating a particular implementation.

For a given packet index, the value of "index mod key_derivation_rate" determines which authentication and encryption keys in the sequence are to be used to process that packet. For example, if key_derivation_rate is four, then packets with indices 0 through 3 use the first authentication and encryption key set in the sequence, the packets with indices 4 through 7 use the second key set, and so on. (This example uses a very low key_derivation_rate just for illustration.) It may be desirable to store more than one key set at a time in order to compensate for packet reorder. Alternatively, in some scenarios, it may be acceptable to store only a single key set at a time, and drop any packets with indices that correspond to earlier key sets.

Note that a key_derivation_rate of zero is a special case, indicating that only a single key derivation is ever to be done.

Q15. Is the MKI the same thing as the SPI used in IPsec?

A. The two features are very similar; so is the IPsec notion of a "security association" and the SRTP "cryptographic context." But the SPI identifies an IPsec security association whereas the MKI identifies a key within an SRTP cryptographic context. A crypto context can contain multiple keys; this feature allows for rapid key changes without churning the other data in the context, such as the ROC and the anti-replay data.

Q16. How do I provide keys and other information to SRTP endpoints?

A. This is out of scope of RFC 3711, though Section 8 does describe the interface to provide keys and other information to SRTP. This data can be provided via signaling, and can be expressed using the SDP Security Descriptions (if the signaling is cryptographically protected) or MIKEY (if the signaling is not protected). Alternatively, the SRTP data could be provided by a group key management protocol, such as GDOI. Unlike MIKEY and SDP Security Descriptions, GDOI is not encapsulated in the Session Description Protocol.

Q17. Is SRTP a separate protocol from RTP? Can I run the SRTP protection on a node separate from the one generating the RTP packets?

A. SRTP is not a separate protocol but a profile of RTP. SRTP's SAVP profile encapsulates RTP packets, encrypts the RTP payload, optionally adds a message authentication tag (message authentication is strongly recommended) and optionally adds an MKI. SRTP's SAVP profile accepts all of the RTP AVP profile's payload types. As with any RTP system, there can be an SRTP intermediate system that intercepts RTP packets and converts them to SRTP packets, or vice-versa.

Q18. Can I tunnel SRTP traffic in an SRTP-tunnel and get two layers of protection but with different end-points, like I can do with IPsec tunnels?

A. IPsec Tunnel Mode is the appropriate solution for tunneling transport data such as SRTP whenever confidentiality of endpoint IP addresses are needed. Use of SRTP as a tunneling mechanism will usually cause SRTCP streams to be sent for the encapsulating SRTP, which is redundant to the encapsulated SRTP and generally not needed. Although it may be possible to put two SRTP layers on top of each other, there is currently no way to signal this behavior when the two layers are terminated in the same host. There is no payload type defined for such a payload. All in all, this is probably a bad idea.

Q19. The MKI is optional. Does that mean it is only present in some of the packets?

A. No, the MKI is signaled for the SRTP session and is not signaled on a packet-by-packet basis. In a given SRTP session, it is either present in every packet, or not present at all. Also note that the MKI length if fixed for a particular session.

Q20. Does SRTP work through NATs?

A. Not necessarily since SRTP inherits RTP's NAT problems.

However, it is worth noting that SRTP significantly improves the security of an unauthenticated NAT traversal mechanism, such as STUN (see RFC 3489, Section 12.3 ).

Q21. I heard some encryption algorithms need "padding," how does this padding relate to RTP padding?

A. RTP padding is used to align RTP header fields on a 16-bit boundary for 16-bit header fields or a 32-bit boundary for a 32-bit header field. Cryptographic padding is needed by block ciphers and many of their modes that require plaintext data to be a multiple of the block length of the particular block cipher (16-bytes in the case of 128-bit AES). When the plaintext is not as long as the cipher block, the difference is padded with some cipher-specific value. SRTP's predefined ciphers, AES-CM and AES-f8, do not suffer from this problem because they form a stream of blocks and use only as much of this key stream as there are bits in the plaintext.

Q22. Can I use SRTP with MIKEY? In general, can I use SRTP with an arbitrary key management protocol such as IKEv2?

A. Please see above.

Q23. I don't see why the "salting key" is needed. It doesn't need to be kept secret, so how can it add to security? Isn't it just a waste of processor time and bandwidth?

A. In short, the motivation for the salt is to provide additional randomization to counter mode and f8. In the cipher block chaining (CBC) mode of operation, a random initialization vector is used for each message that is encrypted (note that this value is not kept secret either). Since the modes that SRTP uses do not use a random initialization vector, the salt was provided so that the protocol can provide a level of security commensurate with CBC. The use of the salting key helps to future-proof SRTP's predefined ciphers from attacks that will weaken its key strength below its key size; the salting key eliminates a potential threat that may worsen in the future as computer capacity increases according to "Moore's Law" and the state of the art in cryptanalysis improves.

For a description of the sort of attacks that are possible when counter mode is used without any randomness in the counter, see McGrew and Fluhrer, Attacks on Additive Encryption of Redundant Plaintext and Implications on Internet Security, The Proceedings of the Seventh Annual Workshop on Selected Areas in Cryptography (SAC 2000), Springer-Verlag, August, 2000, available online.

Q24. SRTP uses by default additive stream ciphers. In my application there's a lot of known plaintext (e.g. "comfort noise"). This will give an attacker access to parts of the key stream. Does that mean that he can break the encryption?

A. No. Modern ciphers such as AES are designed so that they provide strong confidentiality, even against attackers that know massive amounts of keystream.

There is a problem with null message authentication when there is known plaintext in the message, which is one of the reasons why authentication is recommended.

Another problem arises with key stream reuse of the additive stream cipher because the portion of the key stream that has been applied to known plaintext is revealed to the cryptanalyst, who can subsequently decrypt any ciphertext that are encrypted with that portion of the key stream. The solution to this problem is to never, never reuse the key stream. For this reason, the SRTP master key MUST be refreshed no later than 2⁴⁸ packets. Second, SRTP senders MUST NOT share the SRTP master key unless each sender is certain to have an unique, non-colliding SSRC. MIKEY assigns the SSRC to each sender, and SDP Security Descriptions assigns a unique master key for each sender.

Q25. Is the keystream a stream or keys or what?

A. See the discussion on above.

Q26. Why isn't the MKI integrity protected, it seems dangerous not to protect such an important parameter?

A. The MKI is implicitly authenticated since any change to the MKI will cause the session key in the SRTP cryptographic context to be misidentified by the receiver of the SRTP message. When this happens, the authentication check will fail. So the MKI is 'implicitly' authenticated, without explicitly being included in the authenticated portion of the packet.

Q27. Why do you allow integrity to be turned off? And does a short 32-bit MAC offer any security?

A. At least one application, voice over IP, needs to operate over over low-speed links and sometimes has extremely short packet sizes (e.g. a 10 or 20 byte payload for G.729) but has less need for message authentication (since the receiver is a decoder that is stateless, or nearly stateless). In other words, there is a tradeoff between increasing the packet size by 50% or more, and offering a service where each speaker authenticates the other, as is done on telephone networks today. When this type of service is assumed, the capacity of the media bearer is increased up to twofold. It's risky, however, to limit null authentication to just this particular application, as describe above. Even a 32-bit tag, as opposed to an 80-bit tag, offers good integrity protection of SRTP packets: On average, an attacker will need to fabricate 2³¹ packets before any packets are accepted by the receiver (who would almost certainly recognize the attack well before such a success occurs).

Q28. If I turn integrity off, do I still have some integrity protection, thanks to the encryption?

A. No, please see above .

Q29. I would like tag lengths to vary from packet to packet. Can I do this?

A. Both the SRTP authentication tag length and the SRTCP authentication tag length must be fixed for each master key, though those values can differ; please read Section 4.2 of the spec. Of course, different master keys can be used for the different sources within a single SRTP session, and may be associated with different tag lengths.

Q30. All keys are derived from a single master key, is this secure enough?

A. Yes, so long as the master key from which all keys are derived is not used beyond its recommended lifetime of 2³¹ SRTCP packets or 2⁴⁸ RTP packets, whichever comes first.

Q31. Can you give me an idea how secure a 128-bit key really is?

A. A 128-bit key provides a very strong level of security, and it is expected that this level of security will be adequate for several decades. For instance, NIST Special Publication 800-57, Recommendation for Key Management – Part 1: General (August, 2005), says in Section 5.6.2 that:

A minimum of eighty bits of security shall be provided until 2010. Between 2011 and 2030, a minimum of 112 bits of security shall be provided. Thereafter, at least 128 bits of security shall be provided.

Similarly, the Yearly Report on Algorithms and Keysizes (2004) of the European Network of Excellence for Cryptology (ECRYPT) also finds 128-bit keys to be more than sufficient (see Section 5.4). So SRTP was designed for 2030 and beyond. Additionally, AES-128 represents the industry standard for strong encryption.

If you are worried that 128 bits sounds small in comparison with some other keys that you have worked with, you are perhaps making an unwitting comparison between symmetric cryptography, such as AES, and asymmetric cryptography, such as RSA. Asymmetric methods require larger keys to provide equivalent security levels; see Table 2, NIST SP 800-57, for example.

A block cipher with a 128-bit wide block, like AES, does leak some information when used in counter mode. However, SRTP limits the amount of data that is processed under a single key so that this information leakage is negligible. The amount of information leakage is determined by the birthday problem. On average, an adversary will need to see 2⁶⁴ blocks of ciphertext, and know the corresponding plaintext blocks, before any information is leaked. SRTP will never use an AES key on more than 2⁶⁴ blocks, as each key processes no more than 2⁴⁸ SRTP messages, which are no larger than 2¹⁶ blocks in length.

Q32. I am thinking of optimizing by using the same master key for several SRTP contexts, is that OK?

A. Yes, please read Section 8 of RFC 3711, which describes the restrictions for sharing a master key across cryptographic contexts and note that this sharing is restricted to cryptographic contexts that belong to the same SRTP session.

Q33. I have a group communication scenario, can the group members share a single master key?

A. Yes, but it is a risky thing to do given SRTP's pre-defined ciphers and requires that each group member have a unique SSRC value and that an SSRC collision is impossible. See above for further considerations.

Q34. I have a multicast scenario, won't all the overhead in processing (S)RTCP reports overwhelm the sender?

A. This is an RTCP consideration that is made only slightly worse with SRTCP message authentication and option encryption. Please read Section 6.3.1 of RFC 3550.

Q35. How do I get source origin authentication when several group members share the same key?

A. Please refer to SRTP-TESLA specification.

Q36. The RFC says that SRTP is a "bump-in-the-stack". Does this mean that I can plug an SRTP implementation between any RTP implementation and the transport layer?

A. It is certainly possible to implement SRTP in a separate layer, though there are some considerations. If a single master key is shared across multiple SRTP sources within an RTP session, then SSRC values must be generated externally to RTP. The SRTP layer could enforce this by translating between the SSRC value randomly chosen by RTP and the SSRC value provided to it. Of course, if there is no key sharing, then this problem does not occur.

In most cases, it is desirable to have coordination between the SRTP and RTP layers, for example, to have the RTP layer indicate which RTCP packets do not require encryption.

Q37. It is said that the replay window can be implemented as in RFC 2401, but that RFC only allows 32 bits in the window and SRTP requires at least 64 bits, is this a mistake?

A. No, SRTP defaults to a different (64-message) window size than IPsec (32-message). The larger window is more robust to loss, and the IETF AVT working group agreed that this was preferable for real-time media. Note that the SRTP default window size is configurable by key management.

Q38. Why is AES Counter Mode Used by SRTP?

A. AES Counter Mode is the default cipher in SRTP because it requires zero packet expansion (i.e. there is no cryptographic padding), it processes out-of-order (non-sequential) data blocks efficiently, allows processing of data blocks in parallel, and it allows pre-computation of the keystream (see the NIST contribution).

Q39. In SRTCP key derivation, does "Replace the SRTP index by the 32-bit quantity: 0 || SRTCP index ..." mean that quantity should be padded on the left with 16 leading zeros?

A. Yes; the specification means to treat the SRTP index as an unsigned integer, in which case the shorter value would be implicitly padded on the left with zeros, so that its length matches that of the SRTP index.

Q40. Why does SRTP have a NULL cipher?

A. The SRTP Null cipher is used when authentication is desired without encryption.

Q41. Where can I find an SRTP white paper, presentation, tutorial or other materials?

A. The Ericsson team has a presentation and wrote an article on SRTP that was published in a book on multimedia systems. Cisco has published a white paper on the general topic of security for IP telephony systems using SRTP.

Q42. What is the relationship between SRTP and ZRTP?

A. ZRTP is an extension to RTP that provides keys for SRTP. It is not a replacement for SRTP. In fact, the Zfone public beta uses libsrtp, the SRTP reference implementation. ZRTP adds an in-band key establishment protocol that uses a Diffie-Hellman key exchange to provide keys for SRTP; see the ZRTP Internet Draft for more information.

RTP payload length (bytes)		16	32	64	128	256	512	1024	2048
Throughput (megabits/second)	encryption only	183	213	223	238	243	245	248	249
	encryption and authentication	37.6	64.0	88.3	119	142	159	168	173
	authentication only	38.8	88.3	138.4	233	341	436	512	569