Skip to content

Natural Language Processing

Natural Language Processing is basically a field of artificial intelligence which forces computers to process large natural language data. I know, it’s a difficult definition. Let’s try this way – you write or speak to a computer and it knows the language which you’re using and a lot more 🙂

Does Apple includes the NLP inside the iOS? Of course.

We will use a class called:

1
NSLinguisticTagger

Apple says that NSLinguisticTagger provides a uniform interface to a variety of natural language processing functionality with support for many different languages and scripts. You can use this class to segment natural language text into paragraphs, sentences, or words, and tag information about those segments, such as part of speech, lexical class, lemma, script, and language.

For testing purposes, I will use the famous speech from the movie V for Vendetta:

1
2
3
let speech = "But on this most auspicious of nights, permit me then, in lieu of the more commonplace soubriquet, to suggest the character of this dramatis persona. Voila! In view humble vaudevillian veteran, cast vicariously as both victim and villain by the vicissitudes of fate. This visage, no mere veneer of vanity, is a vestige of the "vox populi" now vacant, vanished. However, this valorous visitation of a bygone vexation stands vivified, and has vowed to vanquish these venal and virulent vermin, van guarding vice and vouchsafing the violently vicious and voracious violation of volition.
The only verdict is vengeance; a vendetta, held as a votive not in vain, for the value and veracity of such shall one day vindicate the vigilant and the virtuous.
Verily this vichyssoise of verbiage veers most verbose, so let me simply add that it’s my very good honour to meet you and you may call me V."

Now, let’s create our tagger:

1
2
let tagger = NSLinguisticTagger(tagSchemes: [.tokenType, .language, .lexicalClass, .nameType, .lemma], options: 0)
let options: NSLinguisticTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]

Let’s iterate over these tag schemes:

  • .tokenType: Classifies tokens according to their broad type: word, punctuation, or whitespace.
  • .language: Supplies the language for a token, if one can be determined.
  • .lexicalClass: Classifies tokens according to class: part of speech, type of punctuation, or whitespace.
  • .nameType: Classifies tokens according to whether they are part of a named entity (personal name, organization name or place name).
  • .lemma: Supplies a stem form of a word token, if known. For example, the stem of the English word “reading” is “read”.

Let’s see if we can find out the language of the speech:

1
2
3
4
5
func findLanguage(for text: String) {
    tagger.string = text
    print("Language is: \(tagger.dominantLanguage)")
}
findLanguage(for: speech)

Result:

1
Language is: en

Tokens

1
2
3
4
5
6
7
8
9
func tokenizeText(for text: String) {
    tagger.string = text
    let range = NSRange(location: 0, length: text.utf16.count)
    tagger.enumerateTags(in: range, unit: .word, scheme: .tokenType, options: options) { tag, tokenRange, stop in
    let word = (text as NSString).substring(with: tokenRange)
        print(word)
    }
}
tokenizeText(for: speech)

Here is the result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
But
on
this
most
auspicious
of
nights
permit
me
then
in
lieu
of
the
more
commonplace
soubriquet
to
suggest
the
character
of
this
dramatis
persona
Voila

Identifying parts of speech

Let’s try and take every word in the speech and identify its part.

1
2
3
4
5
6
7
8
9
10
11
func identifyParts(for text: String) {
    tagger.string = text
    let range = NSRange(location: 0, length: text.utf16.count)
    tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange, _ in
        if let tag = tag {
            let word = (text as NSString).substring(with: tokenRange)
            print("\(word): \(tag.rawValue)")
        }
    }
}
identifyParts(for: speech)

Results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
But: Conjunction
on: Preposition
this: Determiner
most: Adverb
auspicious: Adjective
of: Preposition
nights: Noun
permit: Verb
me: Pronoun
then: Adverb
in: Preposition
lieu: Noun
of: Preposition
the: Determiner
more: Adverb
commonplace: Noun
soubriquet: Noun
to: Particle
suggest: Verb
the: Determiner
character: Noun
of: Preposition
this: Determiner
dramatis: Noun
persona: Noun
Voila: Noun

Named Entity

Finally, let’s try and recognize named entity:

1
2
3
4
5
6
7
8
9
10
11
12
func recognizeNamedEntity(for text: String) {
    tagger.string = text
    let range = NSRange(location: 0, length: text.utf16.count)
    let tags: [NSLinguisticTag] = [.personalName, .placeName, .organizationName]
    tagger.enumerateTags(in: range, unit: .word, scheme: .nameType, options: options) { tag, tokenRange, stop in
        if let tag = tag, tags.contains(tag) {
            let name = (text as NSString).substring(with: tokenRange)
            print("\(name): \(tag.rawValue)")
        }
    }
}
recognizeNamedEntity(for: speech)

Look at the results:

Well, I will not give you the results, because you need to fire up the playground and see the results for yourself 😛

It’s easy, right?

Have fun! Happy coding! Happy New Year!
Published iniOSTutorials

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: