Video Streaming Performance Metrics

Video traffic has become the dominant component of Internet as well as mobile networks in the recent years and is expected to continue – applications include video calls (e.g., FaceTime, Skype), video conferencing (e.g., Cisco’s Telepresence, Skype), video chat, and video streaming (e.g., YouTube, Netflix). All of us had many times satisfaction issues with video applications over the network. One is with the picture quality or resolution of the video itself – e.g., NTSC versus HD quality which has mainly to do with the encoding. For a given picture quality, the other equally important aspect is the delivery quality. We will focus here on the delivery quality with reference to video streaming. We will try here to define a measure like the response time for data applications. Note that video streaming uses TCP/IP protocol and there should not be any corruption in the bits delivered.

The metrics that count for the video quality delivery are the following:
1. Start time of the video after initiation
2. Number of pauses during the video
3. Start time of the first pause
4. Time to recover from each pause

The overall testing duration of the video stream (D) has to be stated along with the actual length of the video program (L). Although a particular video stream could be anywhere from 2 minutes to 2 hours long, the test duration itself cannot be much shorter – say 10 minutes for a 2 hour stream. It is rare that any person would continue to watch the video of a 2-hour hour duration if the delivery quality is poor for a sustained period of 10 minutes or more.

The above metrics should also be stated along with the environment – video source or provider, and the receiving device details (hardware and software versions) – laptop/PC, Tablet, Smartphone, Roku, Wii etc.

A composite quality parameter could possibly be defined using the above four parameters after normalizing with the test duration or full duration of the video. One way of expressing the video quality is to simply add up the start time, combined duration of all pauses and divide it by the duration of the video stream. For example, a 10 min video tested for its entire duration may have an initial start time of 1 min and 1 pause of 1 min somewhere in the middle. We can say that the video quality is (1-1/10-1/10-1*1/10) or 70%. Here we are not yet using the third parameter. One way to use the third parameter is to give more credit if the first pause occurs towards the end rather than right near the beginning. But note that for each pause that is counted we are deducting 1 extra minute.

If we can understand the impact each of the above four parameters on the perception of video quality by the user, it might turn out that the parameters may have to be weighted non-uniformly. For example, a long starting delay may be less tolerable than a pause in the middle. Or a pause that occurs toward the end of the video stream may be less annoying than the one that occurs near the beginning. We can come up with a more realistic score similar to the mean-opinion-score (MOS) used for voice quality – let us call it Video MOS or VMOS.

Measuring VMOS may be a challenge as it would take enormous testing time if done manually. The user himself may not be able to measure those time intervals. Hence we would expect the device itself to have the capability to collect the above parameters and also provide the final VMOS score for each video clip. The VMOS scores for each streaming can be stored on the device itself or in the cloud for data mining.

A measure similar to VMOS can be used for audio streaming especially applicable for audios streamed over mobile connections (e.g., Pandora or Bloomberg radio).

Providing the highest VMOS will be equally important to all the stakeholders (content providers, network owners, and device manufacturers). The content providers may have a little more at stake (e.g., Netflix) and hopefully take the initiative to spearhead defining a standardized VMOS in the industry.