What Are the Limitations of Large Language Models (LLMs)?

As you explore the exciting world of AI and begin interacting with Large Language Models (LLMs) like ChatGPT and Claude, it’s essential to understand both their remarkable capabilities and their inherent limitations. While these AI assistants can engage in impressively human-like conversations, generate creative writing, answer questions, and help with all sorts of tasks, they are not all-knowing oracles or infallible robots. In this article, we’ll walk you through the key limitations of LLMs so you can use them confidently and responsibly in your day to day work..

Note: this is the second in a series of guides on Prompt Engineering. In future articles, we will cover Prompting Techniques, How to Evaluate and Optimize Prompts, Walkthrough of Different LLM Settings, and How to Prepare Data for LLMs.

Computational constraints – LLMs can’t process everything at once

When you’re talking to LLMs, it’s important to understand that despite their impressive language abilities, they do have computational limits on how much text they can process at once. Specifically, most LLMs have a maximum number of “tokens” they can handle in a single input or output.

Tokens are how LLMs measure text – they’re kind of like words, but technically they’re subword units that roughly correspond to word fragments or characters. On average, 1 token is about 4 characters or 0.75 words.

Here are the typical max token limits for some popular LLMs:

DeveloperModel Context length
OpenAIGPT-3.5 Turbo16k tokens
OpenAIGPT-4 Turbo128k tokens
AnthropicClaude 3 Haiku200k tokens
AnthropicClaude 3 Sonnet200k tokens
AnthropicClaude 3 Opus200k tokens
GoogleGemini Pro128k tokens
GoogleGemini 1.5128k or 1m tokens
Data accurate as of March 2024

So what does this mean in practice? 

Basically, if you try to paste a long article or multi-page document into an LLM prompt, you’ll likely get an error message saying you’ve exceeded the max token limit. The LLM simply can’t hold that much text in its “working memory” at once. To work within LLM token constraints:

  • Break long text into smaller chunks and feed them in sequentially
  • Summarize or paraphrase long passages to their key points
  • Specify a max output length to prevent the LLM from running on
  • Focus prompts on specific sections vs. full documents
  • Use special “retrieval” techniques to have LLMs scan and pull from large datasets
  • Leverage document embeddings and vector databases for smarter text search
  • Wait patiently as bigger and better LLMs are developed that can handle more tokens!

Ultimately, while LLMs are getting better at handling longer contexts, if you’re working with a lot of text, you’ll need to get clever with how you extract, compress and feed in the most relevant bits to stay within the limits and get the best results.

Hallucinations – sometimes LLMs make stuff up

One quirk of LLMs is that they can sometimes “hallucinate” – meaning they generate text that seems realistic and plausible but is actually inaccurate, misleading, or nonsensical.

For example, an LLM might invent historical “facts”, misrepresent scientific concepts, or make up biographical details about real people.

This happens because LLMs learn by ingesting enormous amounts of online data which inevitably includes errors, biases, and outdated info. They then statistically replicate the patterns they observe in this messy data, which can lead to them confidently asserting falsehoods.

So when you’re using an LLM, it’s wise to take its outputs with a grain of salt, especially for important topics. Don’t blindly trust everything it says, always:

  • Cross-check important claims against authoritative sources
  • Ask follow-up questions to probe the AI’s certainty
  • Rely more on LLMs for subjective or creative content vs factual info
  • If an output seems fishy, ask the LLM for its sources or evidence
  • Prompt the LLM to double-check its own outputs for accuracy

By keeping a critical eye and verifying key claims, you can harness the creative power of LLMs while avoiding being misled by their occasional hallucinations.

Limited knowledge – LLMs can’t update it’s knowledgebase

Another thing to keep in mind is that LLMs are ultimately a snapshot of the world’s knowledge at a specific time of their training. They aren’t natively connected to the internet and can’t automatically learn about current events. So their knowledge can become stale or inaccurate over time.

For example, let’s say you ask an LLM about COVID-19 vaccination rates. If its training data is from early 2021, it won’t be able to give you up-to-date statistics. Or if you ask it to recommend a laptop, it might suggest models that have been discontinued.

Here are the training data cut-off dates of some popular LLMs:

DeveloperModel Training data cut-off
OpenAIGPT-3.5 TurboSep 2021
OpenAIGPT-4 TurboDec 2023
AnthropicClaude 3 HaikuAug 2023
AnthropicClaude 3 SonnetAug 2023
AnthropicClaude 3 OpusAug 2023
GoogleGemini ProEarly 2023
GoogleGemini 1.5Early 2023
Data accurate as of March 2024

When using an LLM for knowledge work, keep these tips in mind::

  • Cross-referencing its claims with the latest data
  • Relying more on LLMs for evergreen vs rapidly changing topics
  • Prompting the LLM with the current date for time-sensitive queries
  • Fine-tuning LLMs on the latest data for applications requiring up-to-date info
  • Pairing LLM outputs with human curation and fact-checking

With the right strategies, you can still get tremendous value from LLMs even if their knowledge isn’t always cutting-edge. Just be mindful of their training date and supplement their outputs with the latest intel.

Lack of long-term memory and learning

Building on the previous limitations we covered, another significant limitation of current LLMs is their lack of long-term memory and learning capabilities. Unlike humans who can continuously learn and build upon their knowledge over time, LLMs generally treat each conversation or task as a standalone interaction. They don’t automatically retain information from previous chats or learn from new data in real-time.

Here’s a typical scenario: Let’s say you’re chatting with an LLM and you share some personal details like your name, hobbies, and favorite books. If you start a new conversation later on, the LLM won’t remember those details or your previous interactions. It’s like hitting the reset button each time.

This “forgetting” happens because LLMs are essentially stateless inference machines. They make predictions based on their pre-trained knowledge, but they’re not dynamically updating their underlying models with each interaction. Some key implications of this:

  • No personalization: LLMs can’t learn your individual preferences or communication style over time to provide a tailored experience.
  • No knowledge accumulation: LLMs can’t synthesize information across multiple conversations to build up a richer understanding of a topic.
  • No contextual awareness: LLMs may lose the thread of a conversation if it spans multiple sessions, leading to repetition or contradictions.
  • No real-time learning: LLMs can’t automatically incorporate new information in real-time, so their knowledge can become stale or outdated.

However, researchers and developers are actively working on ways to simulate and approximate long-term learning in LLMs.

So while true long-term learning remains an open challenge, there are ways to partially mitigate this limitation and create more stateful, personalized experiences with LLMs. As the technology evolves, we may see LLMs that can more faithfully simulate the incremental knowledge accumulation and learning that comes naturally to humans. For now, it’s important to ground your expectations and craft your prompting strategies with this constraint in mind.

Limited reasoning – LLMs struggle complex multistep problems

While LLMs can produce very coherent and fluent writing, they often struggle with tasks that require complex logical reasoning, multi step problem-solving, or quantitative analysis.

This is because fundamentally, LLMs work based on statistical word associations rather than robust causal models or rich knowledge representations of the world. They typically struggle with solving:

  • Solving complex math or word problems with multiple steps
  • Proving theorems or writing rigorous logical proofs
  • Explaining the detailed mechanisms behind scientific concepts
  • Strategic planning or forecasting long chains of causes and effects

If you work in an area that depends heavily on this kind of analytical rigor, LLMs should generally be used to assist and augment human intelligence rather than to replace human oversight and critical thinking. You can make LLMs work for you by:

  • Breaking down complex problems into simpler sub-steps to prompt the LLM
  • Providing the LLM with examples of the reasoning process you want it to follow
  • Prompting the LLM to explain its logic and show its work
  • Cross-checking the LLM’s reasoning against other sources and your own judgment

At the end of the day, LLMs are incredible tools for brainstorming and ideation. But for rigorous reasoning and analysis, pair them with human oversight and other specialized tools to get the best results.

Inconsistency – LLMs can contradict themselves

LLMs can give conflicting outputs for very similar prompts – or even contradict themselves within the same response! This happens because LLMs make probabilistic predictions based on subtle patterns in their prompt and training data. They don’t have strict logical consistency.

For example, you might ask an LLM “What year did World War 2 end?” twice, and get “1945” the first time and “1946” the second time. Or within the same output, an LLM might say an event happened in “1969” in one sentence and “1968” in another.

You can manage this by:

  • Prompting the LLM multiple times and look for consensus in the outputs
  • Breaking down complex queries into a series of smaller, simpler prompts
  • Adjusting randomness settings to find the right diversity vs consistency
  • Critically examining outputs for self-contradictions before accepting them
  • Asking the LLM to check its own consistency and revise contradictory statements

Ultimately, the key is to be aware of this limitation and not blindly accept everything they say as gospel. With the right prompt engineering and human oversight, you can still get great value from LLMs even if they occasionally contradict themselves.

Lack of true understanding – LLMs don’t really “get” subtext

LLMs don’t really understand language the same way humans do. They are very skilled at statistically mimicking the patterns of human communication. But they lack the rich contextual knowledge, commonsense reasoning, and theory of mind that allows humans to fluently interpret subtext, tone, analogies, sarcasm, and implicit meanings.

For example, if you sarcastically said to an LLM “Well that’s just great!”, intending to convey frustration, the LLM might incorrectly interpret it as positive sentiment and respond with a chipper “I’m glad you’re happy!”. Or if you make an analogy or reference that isn’t in its training data, e.g. “That’s like finding a needle in a blueberry pie”, it will get very confused.

When your communicating with an LLM, try to:

  • Be direct and literal, avoid heavily relying on implication or subtext
  • Provide ample context, don’t assume unstated knowledge
  • Avoid highly esoteric analogies, idioms, or cultural references
  • Clarify ambiguous statements (for both your and the LLM’s benefit!)
  • Prompt the LLM to ask for clarification if something seems confusing
  • Anthropomorphize LLMs less, don’t expect human-level social reasoning

Essentially, by communicating in a more explicit, context-rich way, you’ll have much more reliable and helpful conversations with AI assistants.

We don’t think this will always be the case, as LLMs advance, they’ll slowly get better at handling subtext and ambiguity. But for now, meet them halfway with clear, direct communication.

Difficulty with certain linguistic elements

While LLMs have achieved remarkable fluency in understanding and generating human-like text, they can still struggle with certain finer points of language. Things like complex grammar, syntax, punctuation, and figurative expressions can sometimes trip them up, leading to outputs that sound a bit “off” to a native speaker. Here are some common linguistic rough spots for LLMs:

  • Unusual syntax or word order: Sentences with atypical structures like questions, exclamations, or stylized prose can confuse LLMs. Think Yoda-speak: “Much to learn, you still have”.
  • Proper punctuation and capitalization: LLMs don’t always nail the finer points of punctuation, especially with less common marks like semicolons, em dashes, or ellipses. They may also inconsistently capitalize proper nouns.
  • Figurative or non-literal language: Idioms, metaphors, sarcasm, and other expressions that mean something different than their literal phrasing can be hard for LLMs to parse. “It’s raining cats and dogs” could lead to some… interesting interpretations.
  • Linguistic humor and wordplay: Puns, jokes, and creative misspellings based on subtle linguistic quirks often fall flat for LLMs. Why can’t a nose be 12 inches long? Because then it’d be a foot!

Now, these limitations aren’t universal or absolute. As LLMs continue to improve and train on more diverse datasets, they’re getting better at handling linguistic edge cases. And some models already perform better than others, depending on their specific architectures and training approaches.

But if you’re aiming for pixel-perfect, publication-ready prose from an LLM, it’s still a good idea to review and refine the outputs with human eyes.

Bias and stereotyping – LLMs can perpetuate prejudices

Safety is a hot topic for AI and LLMs, some users want no restrictions, others want all of the restrictions. Either way, LLMs can replicate harmful biases and stereotypes that exist in their training data and in broader society. Since these models learn from human-created content on the internet, they can inadvertently perpetuate prejudiced or discriminatory views in their own outputs.

Final word

LLMs are incredible tools that will increasingly reshape how we learn, create, and work. However, they are not magic crystal balls or infallible digital sages.

The best way to conceptualize LLMs is as knowledgeable but naive assistants. They have ingested a vast amount of information, but don’t always know how to apply it effectively or consistently to real-world situations. With human judgment and critical thinking as the guide rails, LLMs can be incredible augmentations to our intelligence. We just have to understand where they shine and where they struggle.

While these limitations of LLMs can seem daunting, the good news is there are active efforts to address them and you can adopt practical strategies to get the most out of your AI tools:

  1. Think critically and fact-check: Don’t just blindly accept LLM outputs as ground truth, especially on important topics. Probe their responses, check claims against authoritative sources, and apply your own judgment.
  2. Provide context and be specific: The clearer and more contextual information you can provide in your prompts, the more relevant and reliable the LLM’s outputs will generally be. Don’t assume unstated knowledge.
  3. Prompt creatively and across perspectives: To counteract inconsistency and bias, try prompting the LLM from multiple angles on the same topic. Look for patterns and triangulate towards more balanced views.
  4. Adapt prompts to LLM strengths: Focus on using LLMs for queries that play to their strengths, like open-ended brainstorming, and avoid overtaxing them with huge walls of text or complex reasoning tasks.

Hopefully this overview gives you a confident foundation to start exploring and experimenting with LLMs yourself. Embrace their amazing potential, but always keep a watchful, critical eye. The future of human-AI collaboration is bright – and by mastering LLMs’ quirks, you can be at the forefront of this exciting frontier.

Newsletter

Sign up our newsletter to get update information, news and free insight.

Sign up for PromptDrive.ai

Accelerate AI adoption in your business with our all-in-one Chat AI collaboration platform

In this article

Facebook
LinkedIn
X
Email

Related articles

Accelerate AI adoption in your business with our all-in-one Chat AI collaboration platform
Trusted by 5400+ BUSINESSES