Unmasking the Digital Penman: An Introduction to Forensic Linguistics

Written by Blogger

August 6, 2025

In the world of digital forensics, piecing together digital breadcrumbs is critical for understanding events, identifying perpetrators, and building cases. But what if the evidence wasn’t a timestamp or an IP address, but the very words used in an email, a chat log, or a social media post? On the evening of January 15th, 1999, Miriam Illes was murdered in her home in Williamsport, Pennsylvania. Miriam and her husband, Dr. Richard Illes, had been going through a divorce, and Richard immediately became a suspect due to his questioning police about evidence found at the crime scene. However, he wasn’t arrested because he had a strong alibi. Not long after the murder, Dr. Illes’ attorney (likely his divorce attorney) received an anonymous letter signed “Soldier of Equality, Soldier of God, Soldier of Death.” The letter stated that Dr. Illes was not the murderer and contained linguistic features such as misspellings (including the misspelling of Dr. Illes’ name) and missing punctuation, suggesting an author of low literacy or little formal education. However, a few months later, Dr. Illes’ attorney received a second letter from the same author. In this letter, the author admitted to having disguised his or her writing in the fi rst letter and claimed to have fooled the police into thinking he was unintelligent. This new letter contained only one misspelling and one instance of missing punctuation, but hints of disguise were still recognizable. Eventually, through further investigations, police determined that Dr. Illes himself had written both letters in an attempt to defl ect blame. With this and other evidence, they were able to convict him of his wife’s murder (Leung, 2005).

What is Forensic Linguistics?

When I tell people I’m a linguist, their first follow-up question is often, “How many languages do you speak?” While I do enjoy learning and speaking foreign languages, my day-to-day work is far more akin to being a “dialect detective.” As a linguist, I’m deeply interested in how language works – from regional variations in vocabulary (think “coke” vs. “soda” vs. “pop”) to the rapid evolution of slang, or even how marketing language infl uences consumer behavior. What’s crucial to understand is that your language habits – how you speak and write – are deeply infl uenced by where you’re from, your daily activities, and the people you interact with.
Every text message you send, every email you compose, every social media post you publish – these all document your individual language patterns. This means that I spend hours sifting through digital writings like emails, texts, forum posts, or even comments, looking for linguistic characteristics that can help identify or profi le an author, or even clarify the intent behind a communication. Beyond criminal investigations, forensic linguistics offers valuable insights in areas like intellectual property disputes (primarily trademark and copyright issues related to language), marketing analysis, and even determining the true meaning of ambiguous contracts.

But Can’t AI Do This Task?

In our increasingly AI-driven world, it’s tempting to think that artifi cial intelligence could easily take over tasks like linguistic author identifi cation. However, when it comes to the nuanced world of forensic linguistics, human expertise remains indispensable. Relying solely on AI for linguistic analysis in digital forensics can be problematic:

● Limited Data, Infi nite Nuances: Forensic cases often involve a limited quantity of writing or speaking samples. AI models, while powerful with vast datasets, struggle to make accurate assessments when data is scarce. The sheer complexity and subtle nuances of human language are nearly infi nite, making it incredibly diffi cult for AI to truly “read between the lines.”

● Lack of Contextual Understanding: AI models frequently miss crucial contextual cues, cultural references, and subtle stylistic variations that human linguists pick up on. This leads to incomplete analyses, where vital details are overlooked. Imagine trying to understand the intent behind a sarcastic comment without understanding the relationship between the communicators – that’s a challenge for AI.

● Bias and Black Boxes: AI models are trained on existing data, and if that data contains biases, the model’s output will refl ect those biases. Even more concerning for legal proceedings, many AI models operate as “black boxes,” meaning it’s impossible to explain how they arrived at a particular conclusion. This lack of transparency makes it incredibly diffi cult to present and defend AI-generated fi ndings to a jury.

● Risk of Misinterpretation: AI reports can be overly complex, vague, or even misleading if not interpreted by an expert. Without a human linguist to contextualize and explain the results, there’s a risk of misinterpreting the fi ndings, which can have serious consequences in a legal setting.

While AI tools can be valuable for analyzing large datasets and identifying patterns, they are best used as support tools for a human linguist, not as a replacement. A forensic linguist, with years of specialized training, possesses the critical thinking skills, contextual awareness, and nuanced understanding of language necessary to accurately distinguish individuals through their communication patterns and clearly explain those fi ndings in a courtroom.

Connecting the Dots

So, the next time you’re faced with a digital forensic puzzle, remember that the words themselves can hold powerful clues. Forensic linguistics offers a unique lens through which to analyze digital communications, providing insights into authorship, intent, and meaning that complement other digital forensic techniques.

About the Author

Sarah Carlson is a forensic linguist based in Washington, D.C. Since completing extensive training as a forensic linguist at Georgetown University, she has consulted in various cases of unknown authorship as well as threat assessment. For more information about forensic linguistics, visit www.langscope.com.

Forensic-Impact Articles

Understanding the Risks of AI in Investigations

Understanding the Risks of AI in Investigations

When data integrity is everything, hooking an AI tool directly into your investigation workflow is a major security gamble especially when dealing with sensitive evidence, login credentials, or PII. As AI becomes a standard feature in forensic tools and other digital...

Why do tools show different results?

Why do tools show different results?

Since I started working in the DFIR space many years ago I always remembered the rule of two tools. That rule, although stated, is not always followed by every examiner. With the rising costs of DFIR tools many organizations have only funded one tool for their teams,...