Metallic video processing for iOS and tvOS


Actual-time video processing is a specific case of digital sign processing. Applied sciences similar to Digital Actuality (VR) and Augmented Actuality (AR) strongly depend on real-time video processing to extract semantic info from every video body and use it for object detection and monitoring, face recognition, and different laptop imaginative and prescient methods.

Processing video in realtime on a cellular machine is a fairly complicated activity, due to the restricted sources out there on smartphones and tablets, however you possibly can obtain wonderful outcomes when utilizing the suitable methods.

On this submit, I’ll present you easy methods to course of a video in realtime utilizing the Metallic framework leveraging the ability of the GPU. In certainly one of our earlier posts, you possibly can test the main points on easy methods to setup the Metallic rendering pipeline and course of compute shaders for picture processing. Right here, we’re going to do one thing comparable, however this time we’ll course of video frames.

AV Basis

Earlier than we proceed with the implementation of the video processing in Metallic, let’s give a fast have a look at the AV Basis framework and the parts we have to play a video. In a earlier submit, I demonstrated you easy methods to use AV Basis to seize video with the iPhone or iPad digicam. Right here, we’re going to use a special set of AV Basis courses to learn and play a video file on an iOS or tvOS machine.

You may play a video on the iPhone or the Apple TV in several methods, however for the aim of this submit, I’m going to make use of the AVPlayer class and extract every video body, and move them to Metallic for realtime processing on the GPU.

An AVPlayer is a controller object used to handle the playback and timing of a media asset. You should utilize an AVPlayer to play native and distant file-based media, similar to video and audio recordsdata. In addition to the usual controls to play, pause, change the playback fee, and search to varied cut-off dates inside the media’s timeline, an AVPlayer object affords the likelihood to entry every single body of a video asset by means of an AVPlayerItemVideoOutput object. This object returns a reference to a Core Video pixel buffer (an object of kind CVPixelBuffer). When you get the pixel buffer, you possibly can then convert it right into a Metallic texture and thus, course of it on the GPU.

Creating an AVPlayer may be very easy. You may both use the file URL of the video or an AVPlayerItem object. So, to initialize an AVPlayer, you should use one of many following init strategies:

An AVPlayerItem shops a reference to an AVAsset object, which represents the media to be performed. An AVAsset is an summary, immutable class used to mannequin timed audiovisual media similar to movies and audio. Since AVAsset is an summary class, you can not use it straight. As an alternative, you need to use one of many 2 subclasses offered by the framework. You may select between an AVURLAsset and an AVComposition (or AVMutableComposition. An AVURLAsset is a concrete subclass of AVAsset that you should use to initialize an asset from an area or distant URL. An AVComposition permits you to mix media knowledge from a number of file-based sources in a customized temporal association or its mutable subclass AVMutableComposition.

On this submit, I’m going to make use of the AVURLAsset. The next snippet of supply code highlights easy methods to mix all these AV Basis courses collectively:

To extract the frames from the video file whereas the participant is taking part in, you should use an AVPlayerItemVideoOutput object. When you get a video body, you should use Metallic to course of it on the GPU. Let’s now construct a full instance to reveal it.

Video Processor App

Create a brand new Xcode challenge. Select an iOS Single View Software and title it VideoProcessor. Open the ViewController.swift file and import AVFoundation.

Since we’d like an AVPlayer, let’s add the next property to the view controller:

As mentioned above, the participant offers entry to every single video body by means of an AVPlayerItemVideoOutput object. So, let’s add an extra property to the view controller:

For this property, the attributes is a dictionary defining the pixel buffer format. Right here, I’m declaring that every pixel is 32 bits and arranged in 4 bytes (8 bits) representing the Blue, Inexperienced, Crimson and Alpha channels. Then, I exploit the attributes dictionary to initialize the AVPlayerItemVideoOutput.

Within the viewDidLoad() technique, I load the video:

Since I wish to learn the video body by body, I would like a timer that repeatedly asks the AVPlayerItemVideoOutput object to offer every body. I’m going to make use of a show hyperlink. This can be a particular timer that fires at every display refresh. So, add the next property to the view controller:

The show hyperlink fires at 60 frames per second and executes the readBuffer(_:) technique each ~16.6 ms. On this property, I pause the show hyperlink instantly. I’ll restart it once I begin taking part in the video within the viewDidAppear(_:) technique:

Now, let’s implement the readBuffer() technique:

The tactic readBuffer(_:) executes each 16.6 ms. It extracts a body from the participant merchandise video output and converts it right into a Core Video pixel buffer. I then take the pixel buffer and its timestamp and move them to a Metallic view (see later).

Now, I have to setup Metallic. Open the Foremost.storyboard file and add a brand new view on prime of the view controller view. Broaden this new view to cowl fully the view controller view. Then, add auto structure to this new view in order that it’s constrained prime, backside, left and proper to the view controller view. After that, add a brand new Swift file to the challenge and title it MetalView. Edit this new file within the following approach:

This can be a subclass of a Metallic View (outlined within the MetalKit framework). Open once more the storyboard and set the category kind of the beforehand added view to be a MetalView. Add the next outlet to the view controller:

and join this outlet to the MetalView within the storyboard.

I’m going to setup the Metallic view in order that once I launch the appliance, the show hyperlink outlined within the view controller will redraw the view executing the MetalView draw(_:) technique. First, let’s add the draw(_:) technique to the MetalView class:

The draw(_:) technique units its personal autorelease pool and calls the render(_:) technique:

This technique will do a lot of the work. Within the render(_:) technique, we’re going to convert the CMSampleBuffer obtained from the AVPlayerItemVideoOutput to a Metallic texture and course of it on the GPU. This step is essential and the efficiency of your utility are strictly depending on the way you carry out this conversion. So, it is rather vital you do it accurately.

Let’s put together the MetalView class. First, add the next imports to the MetalView class:

Then, add the next properties to the MetalView:

The primary two properties represents the pixel buffer obtained within the readBuffer(_:) technique of the view controller and the timestamp of every video body, respectively. Each time the view controller units the pixel buffer property, the property executes setNeedsDisplay() that forces the redraw of the MetalView.

The textureCache property is a cache provided by the Core Video framework to speedup the conversion of a Core Video pixel buffer to a Metallic texture (see later). The commandQueue and the computePipelineState are two vital parts of the Metallic pipeline. Please, confer with this submit for deeper particulars or test the Apple documentation. In the identical submit, you will see additionally a helper extension for the MTLTexture class. I would like the identical extension right here. So, add the next code to the MetalView class:

Now, let’s do the vital portion of the supply code. First, I would like an init technique to initialize the MetalView. Then, I have to implement the render(_:) technique.

For the reason that MetalView was setup within the Storyboard, I can solely initialize the view utilizing the init(coder:) technique. So, let’s add the next init technique to the MetalView class:

I added feedback for every line of code. Test the Apple documentation for extra particulars. The final step is the implement of the render(_:) technique:

Now, Metallic and AV Basis are fully setup. We miss the steel shader. Within the init(coder:) technique, I outlined a steel operate named colorKernel. So, add a brand new file to the challenge and title it ColorKernel.steel. Initially, I’m going to implement a move by means of (or pass-all) shader to easily visualize the video as it’s. So, edit the ColorKernel file within the following approach:

Let’s run the challenge. It’s best to see the video taking part in on the display of your machine.
Please, discover that we’re at present not utilizing the time enter.

In my check, I performed the next chuck of a video:

Let’s now add some particular impact on to the video. How? Properly, both you understand how to create the results by utilizing some math or you possibly can search for it within the Web. There are completely different web sites utilized by folks to publish their creations. One among them is ShaderToy. All of the shaders you will see within the web are based mostly on OpenGL and the GLSL programming language. Nonetheless, it’s fairly easy to port them to the Metallic Shading Language. I took one of many visible results from ShaderToy, modified a little bit bit and ported to the Metallic Shading Language. Right here it’s my new kernel operate or compute shader:

As soon as achieved, you possibly can run once more the appliance in your iOS machine. The next video exhibits the ultimate end result making use of the brand new shader to the earlier video:

Actually cool, proper?


Working with shader is basically enjoyable. You may create actually spectacular results and animations inside your app. You may mix the shader with SceneKit and create additionally particular animations that the Core Animation framework can’t present.

In a really comparable approach, you should use shaders to course of the video body and extract semantic info to trace object or folks within the video and so forth. There may be a whole lot of math concerned in shader programming, nevertheless it helps to understand all these formulation and features you study in class. Subsequent time, I’ll present you easy methods to leverage your math information to construct actually lovely visible results.

Have enjoyable and see you at WWDC 2017.


Geppy Parziale (@geppyp) is cofounder of InvasiveCode. He has developed many iOS functions and taught iOS improvement to many engineers world wide since 2008. He labored at Apple as iOS and OS X Engineer within the Core Recognition workforce. He has developed a number of iOS and OS X apps and frameworks for Apple, and lots of of his improvement tasks are top-grossing iOS apps which are featured within the App Retailer. Geppy is an skilled in laptop imaginative and prescient and machine studying.



(Visited 11,997 occasions, 1 visits immediately)



Leave a Comment