There’s a strong buzz around large language models (LLMs) like GPT-3 for building new NLP products. But there’s a big question of what kind of products make sense.
LLMs aren’t good enough to use straight out of the box, and may never be. But they can be powerful for users willing to iterate on outputs. In other words, it’s pretty great for writers.
There’s one kind of writing that really matters above the rest at this point in time: writing software. Like software, great writing offers great leverage when plugged into the right channels, but average writing ability is quite common while average software skills are not.
If software skills are hard to come by, natural language processing (NLP) skills are exceptionally rare. Contrast that with the enormous unrealized potential of NLP in our world. Consider how every aspect of our pre-metaverse Internet involves text, or how the lifeblood of every human organization on Earth flows in text form, whether that’s emails, Slack messages, or documents. And where there is text, there is knowledge.
If the usefulness of knowledge feels abstract, think about that one email you can’t seem to find, or how often you can Google something and find exactly what you need. If the computer is a bicycle for the mind, search is sticking a twin-turbo engine on it. Search is important, and good search requires good knowledge.
The reality of today is that knowledge locked away in text is largely untapped because NLP is some of the most challenging software to write. We’ll dig into why that is and how LLMs could change that.
Why NLP Software is Hard
The value of text data is gated by our ability to summarize it. Whether that is extracting information, like facts, events, or people, surfacing the most insightful information in a document, or answering questions using a collection of documents, it all boils down to summarization of some kind.
But NLP software is hard to write precisely because language is so rich and expressive. Writing down rules in code is very brittle, and while machine learning is more adaptive it is also very expensive and time-consuming. LLMs offer us a potentially easier way to interface with text for building NLP apps.
Let’s consider this example sentence from a medical note written by a doctor:
ROS: no nausea, rash, arthralagias, fever / chills, urinary symptoms
We want to extract the existence or explicit inexistence of the patient’s symptoms.
Detecting lists of symptoms can be tricky—perhaps finding comma-separated entities that match to a pre-existing list of symptoms can work. That’s very brittle (what about other symbols for separating lists, like semicolons?), and while a named-entity recognition model could do a better job it will be work to integrate your list, build label sets or rules for it, train it, and analyze its performance. Extracting entities isn’t that easy, and it takes significant work.
Then comes the pesky “no”. Is it referring to “nausea” or the whole list? Ideally modifiers like “no” would exist in front of each entity to resolve ambiguity, but that’s not the case so a simple rule of “does ‘no’ precede the entity” won’t work. Perhaps the linear data structure of the sentence is the issue, and using a more advanced structure based on grammar can help.
Maybe we can traverse this dependency tree to write a rule:

The first step is to understand all of those labels. And that’s not even the hard part–knowing what rules can be effective comes through challenging and hard-won experience. For instance, observe the numerous differences in the parse tree just by adding “or” before “urinary symptoms” at the end:

At this point we’re pretty far from what most engineers or data scientists understand about NLP.
Maybe we can just try ML. What does it take? Hire some medical annotators to write down a structured list of symptoms for each note. Good medical annotators are very expensive and hard to find, but it’s doable. Then train an off-the-shelf model to do the same. Not easy, it requires extensive infrastructure, but the knowledge is out there.
But then your only control mechanism for when it doesn’t work well enough is to collect more data (tuning the model is generally unproductive). It’s hard to know what data to label, it’s expensive to label it, testing is labor intensive and boring, and it’s very slow to iterate.
Seems like we’re stuck between a rock (classical NLP) and a hard place (ML).
LLMs Offer New Capabilities
If the example above was a little more structured, we can imagine building simpler rules to extract information.
For example, if the note was in this form:
Review of systems: * No nausea * No rash * No arthralagias (joint pains) * No fever or chills * No urinary symptoms
It would be a lot easier to work with. Just split on the newlines, remove asterisks, match entities against a list, and detect negations.
Well, the above was actually generated using Open AI’s text-davinci-edit-001
model with the following prompt:
Expand abbreviations No slashes Clarify negations Turn into bullet list
(You can try this yourself here.)
There’s a fair amount of variability with that prompt, but trying a simpler one like ROS -> Review of Systems
leads to very consistent results.
Let’s run with that idea. Imagine creating a pipeline of transformations using an LLM with natural language prompts (like we just did) as a way to bootstrap an NLP app. It may contain errors, but that’s not new. Whether it’s LLMs, ML systems, or classical NLP, these transformations will require systematic and continuous testing, evaluation, error analysis, and monitoring for the entire life of the app.
(Side bar: data infrastructure for NLP is severely lacking too, and that’s not a problem LLMs can solve. But I’ll address that another time.)
Notice how in the LLM case you don’t need deep software or scientific skills to build that NLP pipeline. Working with LLMs is akin to learning to use Google search effectively.
A New Day for NLP
As an NLP practitioner, I often look at other data-intensive fields for inspiration. To take some examples, fraud detection and product analytics are mature enough that they have dedicated technical practitioners—fraud analysts, data analysts, and analytics engineers are new roles introduced to serve the growing demand. These roles straddle the boundaries of product management, data science, software engineering, and domain experience (such as consumer financial transactions) to become productive for those problems. They replace those larger interdisciplinary teams with a single person.
Can we accomplish the same in NLP?
Today the sheer complexity required to build NLP apps means a few expensive successes and a great many failures. They often require large teams of labelers, product managers, domain experts, NLP and ML scientists, data and ML engineers, and even the odd infrastructure or fullstack engineer.
In other words, NLP projects are some of the most expensive in all of software. Just pulling the team together alone is a $2m+/year expenditure.
If developing NLP apps in a new way with LLMs can be made to work, we have the potential to leapfrog much of that cost and complexity. It won’t be perfect in the beginning, but neither is the old way.
And if it does work, we may find a new technical role emerge: the NLP analyst, where you won’t need a PhD in linguistics or expert ML engineering skills to do it. And the macroeconomic consequences of that will be extraordinary.
(If you find this future compelling, let’s chat.)