When you create a VoIP call, you invite the remote peer, who accepts or rejects, and there is a “SIP” session initiated. A SIP session is valid for pretty much anything, text, voice, audio, video, images, data – anything. SIP being the session, there’s no limit to what you can do with that ‘session’. The SIP protocol allows for specifying the ‘type’ of media the session will contain – in the case of a VoIP call, the media type is “RTP” or Realtime Transport Protocol. When you speak, the audio is encapsulated into tiny UDP packets which are transported directly from one endpoint to the other.
In the sense of a VoIP call, Â you typically will terminate a call to the PSTN, so you send an invite to your VoIP service provider. The VoIP service provider verifies you have the right money, the right allowance to make such a call. The provider then picks up one of it’s many PRI channels, dials the requested number, and then the call is answered.
The audio in that scenario is proxied via a media proxy, the PRI channel is connected to software, such as asterisk, which then sends the audio to / from you and the PSTN gateway.
The SIP server, and media proxy can be on two different servers, in two completely different locations – the SIP server is there to initiate the session, the media proxy is there to handle the RTP audio data.
The SIP server still can disconnect the session as well – by sending a BYE packet to the calling device which gets the device to terminate the call, and cuts off the media stream.
When one places a call on hold, the device typically sends back RTP data with an ‘a=inactive’ state request, this causes the session to be placed on hold – in the case of asterisk, plays hold music to the channel placed on hold. Picking the channel back up will then route the audio back through.
When you transfer a call, however, you don’t simply change the RTP stream, the change involves sending another SIP invite to the affected peer, and depending on whether it’s attended or unattended, the other user will end up with you first, before yet another invite is sent to transfer the call “attended” mode, or unattended, they are simply sent an invite to the SIP stream, with the person transferring the call removed.
To avoid issues with NAT firewalls, you should port forward the applicable RTP ports to the device, and specify the public IP the device will communicate on, this allows for the RTP packet to be routed directly, and can avoid ‘no audio’ or ‘call drop out’ issues experienced.