Qasim Zafar

Menu

Category: Projects

Fast Video Compression With FFmpeg

As I mentioned previously, at Cricingif we are focusing on building a fast, efficient play by play highlights platform. One of the bottlenecks we have been facing is video compression – some of the programs we have been using for video and audio capture are very clunky, slow and CPU-intensive. After burning through a couple of laptop motherboards, I’ve decided to give FFmpeg a go.

Here are the results of my experiment with video compression using FFmpeg. For the purpose of this experiment, I used a pre-recorded video clip, but in our production environment at Cricingif we record video and audio capture through live streams. Nevertheless, the results are equally valid in that scenario as well.

I’ve taken an input video with the following specs:

Format
Resolution
Bitrate
Size
Duration
MTS
1920×1080 16394 52,199,424b25.51s

For the record, the test machine is a Microsoft Surface Pro 4 with 8 gigabytes of RAM, 256gb of hard drive space and powered by the i7-6650u processor. Granted this isn’t the best machine for video encoding, but it’s my work machine and it’s what I’m experimenting on. I have run the tests on our production setups but these results are illustrative of what I achieved there.

The first step is to do a simple mp4 video encoding using the h.264 codec:

ffmpeg.exe -i "Take 4.MTS" -vcodec libx264 -acodec aac test1.mp4
Format
Resolution
Bitrate
Size
Time
MTS
1920×10801639452,199,424b
mp4
1920×1080
10220
33,236,062b32s

Thirty-two seconds to encode a 25-second video at run-time is too much, we cannot afford to have an encoding time longer than the video itself in our real-time application. However, for now we will be streaming 640×480 video to our users due to the limited capability of Pakistan’s network and device landscape. Let’s try that:

ffmpeg.exe -i "Take 4.MTS" -vcodec libx264 -filter:v scale=640:-1 -acodec aac -pix_fmt yuv420p test3.mp4

Using this, the file size is now down to 3,504,438 bytes at 640×480. The encoding time is also now just 7.2 seconds. This works, but at 134 kbps this is more than most Wi-Fi can handle. Time to improve further.

Playing around with the constant rate factor (default: 23) allows for lossy compression of the input stream while negligibly affecting the output. In addition, since Cricingif videos are mainly consumed on mobile, the minor loss of quality is acceptable.

Also realized that the x.264 codec does not allow for odd values, so in case the input video dimensions are slightly off, it will throw a “width or height not divisible by 2” error. A very nice explanation is here. So I’ve updated the command:
ffmpeg.exe -i "Take 4.MTS" -vcodec libx264 -crf 29 -filter:v scale="640:trunc(ow/a/2)*2" -acodec aac -pix_fmt yuv420p test3.mp4
Format
Resolution
Bitrate
Size (bytes)
Time
MTS
1920×10801639452,199,424b
mp4
1920×1080
10220
33,236,062b32s
mp4
640×360
507
1,671,197b
7s

A 7 second encoding time is a decent trade-off. Using different options such as the superfast preset or defining fixed rates does bring down the compression time to 5 seconds or thereabouts, but the trade-off is in the increased file size. In the current state, the data transmission is about 63kbps, something reasonably achievable with only about 220MB of data consumed per hour of streaming.

One further test involved testing out at 360p resolution, which is a standard YouTube playback format. With this, a file size of just 1,222,874 bytes is achieved with an encoding time of only 6.2 seconds. At 46kbps, only 160MB of data per hour is consumed. Good enough for now.

A detailed table of results for all experiments:

File
ResolutionBitrateFPS
Size (b)
Encoding Time (s)
Input File1920×1080163942552,199,424
Test 11920×1080102202533,236,06232.01
Test 21920×1080102202533,236,06229.80
Test 3640×4801071253,504,4387.17
Test 4640×360202332565,774,1068.31
Test 5640×36032832510,690,9167.68
Test 6640×360507251,671,1977.07
Test 7640×3601381254,504,8015.91
Test 8640×3601490254,865,8146.06
Test 9480×270369251,222,8746.24

For more on what I’ve done, do check out my resume
Get in touch with me at: qasimzafar AT outlook DOT com

Augmented Reality Eyewear

A couple of weeks ago, I had my first virtual reality experience with a dev kit of Oculus’ Rift. I can say with certainty that augmented reality is going to be a large part of our future. I’m very intrigued by the idea of augmented reality for shoppers wishing to try out apparel or accessories, the only goods that customers like to try on physically before they buy.

I have fleshed out a working proof-of-concept that is very light and efficient, so that even low-powered devices such as smartphones and smart displays can successfully present a proper AR virtual dressing room experience.

The idea is to be able to affix articles of clothing and accessories onto the user’s body in a video feed from a camera. It needs to be good enough to make the experience believable without stretching the user’s imagination. Real-time articulated full-body pose detection algorithms are a work in progress and would have to be coupled to a face tracker to get a proper usable detection, so for the sake of simplicity I have focused my efforts on the head only. I decided to choose glasses as a proof-of-concept accessory.

The project involved four stages: Tracking the face of the subject, recovering the 3D pose of the face from a 2D image, aligning a 3D model of the eyewear to ‘fit’ the head, and then projecting the 3D eyewear back onto the 2D face.

AR Face Tracker Glasses Fitting

Continue reading “Augmented Reality Eyewear”

Visual Servoing of a Mine Detector Arm: Marwa

Marwa Autonomous Landmine Detection System

My senior-year capstone project was under the tutelage of Dr. Abubakar Muhammad at the Laboratory for Cyber Physical Systems at LUMS (CyPhyNets). The ultimate goal was to create a robust, cost-effective mine detector arm for the award-winning Marwa autonomous landmine detection platform.

Due to the close functional proximity of standard minesweeping equipment, the primary requirement was for the arm to track the profile of a patch of terrain in front of the robot as accurately as possible. The setup consisted of multiple modules which, when combined, were able to track the terrain in the sweep space in front of the robot to an accuracy of +/- 2 cm.

Continue reading “Visual Servoing of a Mine Detector Arm: Marwa”