Pure Language Processing and Speech Recognition in iOS

Pure Language Processing (NLP) is a discipline of Synthetic Intelligence (AI) and Computational Linguistics (CL) involved with the interactions between computer systems and human pure languages. NPL is expounded to the realm of Human-Laptop Interplay (HCI) and the flexibility of a pc program to know human speech as it’s spoken.

Speech Recognition (SR) is a sub-field of computational linguistics that develops methodologies and applied sciences enabling the popularity and translation of spoken language into textual content by computer systems.

The event of NLP and SR purposes is difficult, as a result of computer systems historically require people to ‘communicate’ to them in a programming language that’s exact, unambiguous and extremely structured. Let’s have a look at right here what iOS 10 presently affords for these applied sciences.

Linguistic Tagger

The NSLinguisticTagger is a category of the Basis framework. Launched with iOS 5, this class can be utilized to section natural-language textual content and tag it with info, reminiscent of elements of speech. It could actually additionally tag languages, scripts, stem types of phrases, and so on.

Mixed with the brand new Speech framework (obtainable in iOS 10), the linguistic tagger can get recognition of stay and prerecorded speeches and might obtain transcriptions, different interpretations, and confidence ranges.

To make use of the linguistic tagger, you create an occasion of NSLinguisticTagger utilizing the init(tagSchemes:choices:) methodology. This init requires an array of linguistic tag schemes and a set of choices (for instance, to omit white areas, punctuation, or to affix names).

The API supplies many linguist tag schemes: NSLinguisticTagSchemeTokenType, NSLinguisticTagSchemeLexicalClass, NSLinguisticTagSchemeNameType, NSLinguisticTagSchemeNameTypeOrLexicalClass, NSLinguisticTagSchemeLemma, NSLinguisticTagSchemeLanguage, and NSLinguisticTagSchemeScript. Every of those tag schemes supplies totally different info associated to the ingredient of a sentence.

Let’s make a small instance. Suppose we wish to analyze the next sentence:

Have you learnt concerning the legendary iOS coaching in San Francisco offered by InvasiveCode?

Right here the supply code:

This supply code generates the next outcome:

As you’ll be able to see, the linguistic tagger generated a tag for every ingredient of the sentence.

Which languages do you communicate?

You should utilize the linguistic tagger with different spoken languages too. Certainly, the linguistic tagger acknowledges the language of every a part of a sentence.

Within the following instance, I enter an Italian sentence:

When you execute the above supply code, the worth of the fixed language is “it“, because the sentence is in Italian.

Let’s have a look at now what iOS affords for the Speech Recognition.

Speech Framework

Launched in iOS 10, the Speech framework performs speech recognition by speaking with Apple’s servers or utilizing an on-device speech recognizer, if obtainable.

The speech recognizer isn’t obtainable for each language. To search out out if the speech recognizer is on the market for a selected spoken language, you’ll be able to request the record of the supported languages utilizing the category methodology supportedLocales() outlined within the SFSpeechRecognizer class.

As a result of your app may have to connect with the Apple servers to carry out recognition, it’s important that you simply respect the privateness of your customers and deal with their utterances as delicate information. Therefore, you will need to get the consumer’s specific permission earlier than you provoke the speech recognition. Equally to different iOS frameworks (for instance, Core Location), you’ll be able to request consumer permissions by including the NSSpeechRecognitionUsageDescription key to the app Information.plist and offering a sentence explaining to the consumer why your software must entry the speech recognizer. After that, you request consumer authorization in your software utilizing the category methodology requestAuthorization(_:). When this methodology is executed, the appliance presents an alert to the consumer requesting authorization to entry the speech recognizer. If the consumer supplies the entry to the recognizer, then you need to use it.

As soon as the consumer grants your software permission to make use of the recognizer, you’ll be able to create an occasion of the SFSpeechRecognizer class and a speech recognition request (an occasion of SFSpeechRecognitionRequest). You may create two forms of requests: SFSpeechURLRecognitionRequest and SFSpeechAudioBufferRecognitionRequest. The primary kind performs the popularity of a prerecorded on-disk audio file. The second request kind performs stay audio recognition (utilizing the iPhone or iPad microphone).

Earlier than beginning a speech recognition request, it is best to verify if the speech recognizer is on the market utilizing the isAvailable property of the SFSpeechRecognizer class. If the recognizer is on the market, you then cross the speech recognition request to the SFSpeechRecognizer occasion utilizing both the recognitionTask(with:delegate:) methodology or the recognitionTask(with:resultHandler:) methodology. Each strategies return an SFSpeechRecognitionTask and begin the speech recognition.

In the course of the speech recognition you need to use the speech recognition activity to verify the standing of the popularity. The attainable states are: beginning, working, ending, canceling, and accomplished. It’s also possible to cancel and end the speech recognition duties utilizing the cancel() and the end() strategies. If you don’t name end() on the duty, the duty will go on. So, you’ll want to name this methodology when the audio supply is exhausted.

In the course of the speech recognition you need to use the SFSpeechRecognitionTaskDelegate protocol for a fine-grained management of the speech recognition activity. The protocol supplies the next strategies:

For the reason that speech recognition is a network-based service, some limits are enforced by Apple. On this manner, the service can stay freely obtainable to all apps. Particular person gadgets could also be restricted within the variety of recognitions that may be carried out per day and a person app could also be throttled globally, primarily based on the variety of requests it makes per day. For these causes, your software should be ready to deal with the failures brought on by reaching the speech recognition limits.

Combine NPL and Speech Recognition

Let’s begin integrating the speech recognition framework in a brand new app named YourSpeech. Create a brand new single-view software. Open the Information.plist file and add the “Privateness – Speech Recognition Utilization Description” or the NSSpeechRecognitionUsageDescription key. Then, present a sentence explaining to the customers how they will use speech recognition in your app. Since we’re going to use the microphone, we additionally must ask permission to entry the microphone. So, add additionally “Privateness – Microphone Utilization Description” or the important thing NSMicrophoneUsageDescription and supply a sentence explaining why your app needs to entry the microphone.

Open the ViewController.swift file and add import Speech to import the Speech module. In the identical view controller, let’s add the next property:

Right here, I instantiate the SFSpeechRecognizer to American English. Then, I set the view controller as delegate of the speech recognizer. If the speech recognizer can’t be initialized, then nil is returned. Add additionally the SFSpeechRecognizerDelegate protocol near the category identify:

Let’s additionally add the next outlet for a button that we’ll use to start out the voice recording:

Within the viewDidLoad() methodology, you’ll be able to add the next traces of code to print the spoken languages supported by the speech recognizer:

Then, let’s verify for the consumer authorization standing. So, add this supply code to the viewDidLoad methodology:

If the consumer grants permission, you do not have to request it once more. After the consumer grants your app permission to carry out speech recognition, create a speech recognition request.

Let’s add the next property to the view controller:

The audio engine will handle the recording and the microphone.

The startRecodingButton will execute the next motion:

On this motion methodology, I verify if the audio engine is working. Whether it is working, I cease it and inform the popularity request that the audio ended. Then, I disable the startRecordingButton and set its title to “Stopping”. If the audio engine isn’t working, I name the strategy startRecording (see under) and set the startRecordingButton to “Cease recording”.

The recognitionRequest is a property of the view controller:

As defined earlier than this is without doubt one of the kind of recognition requests the Speech framework can carry out. Earlier than defining the startRecording() methodology, let’s add a brand new property to the view controller:

This property defines the speech recognition activity. The startRecording() methodology does many of the job:

You’ll need so as to add the next properties to the view controller:

Moreover, add additionally a textual content view to the view controller within the storyboard and add and join the next outlet to the textual content view:


I demonstrated you the right way to use the linguistic tagger to research textual content and the right way to carry out speech recognition with the brand new Speech framework. You may mix these two functionalities with different iOS frameworks acquiring unbelievable outcomes.

Completely happy coding!


Eva Diaz-Santana (@evdiasan) is cofounder of InvasiveCode. She develops iOS purposes and teaches iOS improvement since 2008. Eva additionally labored at Apple as Cocoa Architect and UX designer. She is an skilled of distant sensing and 3D reconstruction.



(Visited 1,782 occasions, 1 visits as we speak)


Leave a Reply

Your email address will not be published.