Member-only story
Exploring Literature with the Stanza NLP Package
Using Natural Language Processing to Analyze Text
The Stanford NLP Group has long been an active player in natural language processing, particularly through their well-known CoreNLP Java toolkit. Until recently though, Stanford NLP has been a less well-known player in the Python community, which is a shame since many NLP practitioners work primarily in Python. But there’s good news! Stanford NLP’s Stanza Python library is coming into its own with the recent release of version 1.1.1!
The new Stanza version supports 66 different human languages (which is a big step forward, since NLP has long been very English-centric) and can carry out core NLP tasks like lemmatization and named entity recognition. Stanza is also customizable, which means that users can build their own pipelines and train their own models.
So, for all you Pythonistas out there, let’s take a look at Stanza and what it can do. We’ll start with a brief overview of core Stanza functionality and then we’ll use it to explore the characters in the classic novel, Moby Dick.
Pipeline
The Stanza Pipeline
can be configured with a variety of options to select the language model, processors, etc. The language model must be downloaded before it can be used in a pipeline.