Reducing AI-Slop with Logit Bias
In an internal experiment, outputs using logit bias were judged "less AI-like" than the baseline 74.5% of the time over 353 blind A/B evaluations.
Logit bias is a vocabulary knob. You push specific tokens down (ban) or up (boost) during generation.
It works on tokens, not phrases. Phrase-level bans can unwittingly penalize common words and distort the whole output.
Start small. Boosting more than 0.1 usually causes noticeable changes. Strong bans can cause weird distributions fast.
Logit bias can backfire. Without a repetition penalty, some generations got stuck in loops.
Logit bias is great at modifying the vocabulary of an LLM output.
I ran an internal experiment testing whether applying targeted logit bias during generation can make GPT-4o-mini outputs sound less like typical AI text.
The evaluator (GPT-5.2) saw both outputs in randomized A/B order and had to pick which sounded more AI-generated.
Across 353 blind A/B comparisons, the evaluator labeled the baseline as more AI-like in 263/353 (74.5%) of the cases.
This doesn't mean that "we beat AI detection", though.
Instead, it means: pushing certain tokens down and others up changed the feel of the writing enough that an evaluator noticed it consistently.
The core takeaway for me was simple. The AI has a bunch of phrases or words that it loves to use.
If you read a lot of AI writing, you can tell the overuse of specific words or the overuse of specific phrases causes issues.
Logit bias lets you push against that, but only if you respect what it is: token-level control.
A good use of logit bias is to remove the idiosyncrasies of AI-writing.
Things like meta-commentary, overly even formatting, and stiff transitions.
In one example, we used a prompt like: "Write a product description of a yoga mat called Flow Pro Medium."
The baseline drifted into polished template behavior even when the system prompt said "be anti-AI." It had heavy, markdown-style headings throughout.
The logit bias version came out without headers and felt more conversational. It also used first-person more. That difference alone was enough for the evaluator to treat one as "more AI-ish" than the other.
When logit bias backfires, it doesn’t look like cute typos.
It looked like obvious failure and looping because the model ran out of tokens to continue.
For example:
If you would like my thoughts back in real-time to you in that real back and forth in to become to you to you to you to you to you to you to you to you to you to you to you to you to you to you to you, let my back to you to you to you to you to you to you to you to you to you to you to you to you to you to you to you to back to you in real back to you to back to you to you to you to you to you in my in my in my in my in my in my
This can be considered mechanical failure. If we push too hard, or we ban the wrong tokens, we can force the model into a corner.
To avoid mechanical failure, there are things we can do.
- Add repetition penalty to generation parameters
- Add regex checkers for looping
- Avoid banning the most common 1,000 or 10,000 words in any language.
Here's how to add repetition penalty:
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[...],
logit_bias=logit_bias,
frequency_penalty=0.3, # Penalize repeated tokens (0.0 to 2.0)
presence_penalty=0.1 # Encourage topic diversity (0.0 to 2.0)
)
So, you're interested in implementing it? Here's my minimum viable setup.
First, always start with an eval. You want to ensure what you're doing is helping you, not hindering you.
Then create two lists: a ban list and a boost list. These can be simple text files managed through Git and pulled in at runtime or compile time.
During runtime, you tokenize those words or phrases for your model, and pass those tokens plus bias strengths into the API.
Finally, confirm you've improved things by re-running your eval.
Here's how tokenization works in practice:
import tiktoken
def get_phrase_token_sequences(phrase: str, encoding) -> list[tuple[int, ...]]:
"""
Generate token sequences for a phrase in different contexts.
Returns a list of token sequences (tuples) that represent the phrase
with leading/trailing spaces and capitalization variants.
"""
sequences = []
# Original phrase
tokens = tuple(encoding.encode(phrase))
if tokens:
sequences.append(tokens)
# With leading space (common in mid-sentence)
tokens = tuple(encoding.encode(f" {phrase}"))
if tokens:
sequences.append(tokens)
# Capitalized variants (for sentence start)
tokens = tuple(encoding.encode(phrase.capitalize()))
if tokens:
sequences.append(tokens)
tokens = tuple(encoding.encode(f" {phrase.capitalize()}"))
if tokens:
sequences.append(tokens)
# Remove duplicates while preserving order
seen = set()
unique_sequences = []
for seq in sequences:
if seq not in seen:
seen.add(seq)
unique_sequences.append(seq)
return unique_sequences
# Example usage
encoding = tiktoken.get_encoding("o200k_base")
sequences = get_phrase_token_sequences("furthermore", encoding)
# Returns: [(24963,), (4034, 6903), ...] - different tokenizations
Example of tokenization of words and phrases
And here's how to pass the logit bias to the API:
from openai import OpenAI
client = OpenAI(api_key="your-api-key")
# Build your logit_bias dictionary (token_id as string: bias value as int)
logit_bias = {
"24963": -100, # Totally ban "furthermore"
"4034": -50, # Discourage " furthermore" (with space)
"12389": 5, # Boost "honestly"
"6789": 5, # Boost " honestly" (with space)
}
# Pass it to the API
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a product description..."}
],
temperature=1.0,
logit_bias=logit_bias # Add this parameter
)
Example of how to pass logit bias using the OpenAI API
Here's a simple checklist:
- Have I setup a baseline evaluation?
- Either a qualitative evaluation through a human evaluator or an LLM judge.
- Have I tried to avoid having any words or tokens that are in the top 1,000 or 10,000 most common words / tokens in my language?
- This is how you avoid accidentally banning "the", "with", "us", etc, and getting stuck in a loop.
- Start with small bias strengths.
- Typically, more than 0.1 will mean that you're going to see a lot of changes when boosting.
- If you're banning, you can be stronger. You can even ban the token fully (1) if you don't want a token to show up, but be careful you're not banning common tokens.
- Does my test keep everything the same except the logit bias?
- Keep everything else the same: prompts, tools, everything.
- The only thing that changes is your ban / boost list, and your weights.
- Have I added guardrails for common mechanical failures?
- Add repetition penalty
- Add a regex checker
The content of the banlist and boostlist matter, but the real lesson is why they work.
In my short experiment, I aimed to penalize "school essay transitions" and boost "I'm speaking from experience" markers.
Here's how I structured the ban and boost lists in code:
# Words or phrases to discourage (-50 bias)
banlist = [
# Stiff transitions
"Additionally",
"Furthermore",
"Moreover",
"In addition",
"Consequently",
# Corporate buzzwords
"leverage",
"robust",
"seamless",
"cutting-edge",
"game-changer",
"innovative",
"transformative",
# Filler adverbs
"ultimately",
"essentially",
"particularly",
"specifically",
# Filler phrases
"deep dive",
"it's worth noting",
"as mentioned",
"moving forward",
# Punctuation patterns
"—", # em dash
]
# Words or phrases to encourage (+5 bias)
biaslist = [
# Personal markers
"I feel",
"I think",
"I believe",
"I would say",
"in my opinion",
"personally",
# Authenticity markers
"honestly",
"frankly",
"to be honest",
"here's the thing",
# Colloquial expressions
"crazy",
"literally",
"actually",
]
By removing those stiff transitions we make the output less formally structured.
By removing empty adverbs and adding personal markers, we improve authenticity and engagement, creating a consistent voice.
One word of caution though:
If you force "conversational" too hard in the wrong genre, you can get a tone shift that is jarring to readers.
So you want your ban / boost lists to be domain-specific.
A final tip is to aim for a 'keyhole surgery' when applying logit bias.
Logit bias works on tokens, not phrases.
What do I mean?
If we try to ban a phrase like "the thing is," this internally will be broken up into tokens like ["the", "thing", "is"].
By putting a negative bias on those, you're less banning the phrase, and more crippling the lexicon that the model needs to speak normally.
That's how you end up with garble and repetition loops.
You banned too much words and the model ran out of working vocabulary and got stuck.
Two practical fixes for this:
- Keep a list of the 10,000 most common English words and ensure you don't ban any of those tokens. An alternative is to do a post process step where you remove those tokens from your ban / boost list.
- An alternative to removal of tokens is dilution, where you apply a weighted multiplier based on how commonly a word or token shows up in your language. ie. Load up a large data set in your language, break it down into tokens, do a frequency analysis, and dilute the most common tokens.
- Avoid using logit bias entirely to remove verbatim phrases. If you want to remove "hum with delight", "knowing smile" or "deep dive"… use a different system. This can be a lightweight post evaluator with regex, or a finetuned smaller LLM evaluator. Alternatively, you can attempt a generate, evaluate, and revise pattern instead.
Here's how to filter common words from your ban list:
def filter_common_words(banlist: list[str], encoding,
common_words: set[str]) -> list[str]:
"""
Remove words from banlist that are in the common words set.
This prevents accidentally banning essential vocabulary.
"""
filtered = []
for phrase in banlist:
# Check if the phrase is a single common word
if phrase.lower() in common_words:
print(f"Skipping common word: {phrase}")
continue
# For multi-word phrases, check if all words are common
words = phrase.lower().split()
if all(word in common_words for word in words):
print(f"Skipping common phrase: {phrase}")
continue
filtered.append(phrase)
return filtered
# Load common words (you can use NLTK or a custom list)
# Example: most common 10,000 English words
common_words = {
"the", "be", "to", "of", "and", "a", "in", "that", "have",
"i", "it", "for", "not", "on", "with", "he", "as", "you",
# ... (load from a file or package)
}
# Filter before building logit_bias
filtered_banlist = filter_common_words(banlist, encoding, common_words)
So, If you generate a lot of content and you're seeing "AI slop" tells like headers, stiff transitions, and empty adverbs, try this:
- Pick one domain (business emails, memos, product descriptions).
- Create a tiny banlist of single words like "furthermore" and "moreover."
- Create a tiny boostlist of single words like "honestly" and "frankly."
- Run a baseline versus biased A/B on 10 prompts you already use.
- Add one hard rule: if you see repetition loops or garble, back off immediately and stop banning anything that could be common vocabulary.
Here's a minimal, runnable example you can test right now:
from openai import OpenAI
import tiktoken
import re
import os
# Initialize
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
encoding = tiktoken.get_encoding("o200k_base")
# Define your lists
banlist = [
"Furthermore", "Moreover", "Additionally", "Consequently",
"essentially", "particularly", "specifically",
"leverage", "robust", "game-changer", "deep dive",
]
biaslist = [
"I think", "I believe", "honestly", "frankly",
"personally", "in my opinion",
]
def get_phrase_token_sequences(phrase: str, encoding) -> list[tuple[int, ...]]:
"""Get token sequences for a phrase with variants."""
sequences = []
for variant in [phrase, f" {phrase}", phrase.capitalize(), f" {phrase.capitalize()}"]:
tokens = tuple(encoding.encode(variant))
if tokens and tokens not in sequences:
sequences.append(tokens)
return sequences
def build_logit_bias_dict(banlist, biaslist, encoding):
"""Build logit_bias dictionary from ban/boost lists."""
logit_bias = {}
for phrase in banlist:
for seq in get_phrase_token_sequences(phrase, encoding):
if seq:
logit_bias[str(seq[0])] = -50
for phrase in biaslist:
for seq in get_phrase_token_sequences(phrase, encoding):
if seq:
logit_bias[str(seq[0])] = 5
# Limit to 300 tokens (OpenAI restriction)
return dict(list(logit_bias.items())[:300])
def detect_repetition_loop(text: str) -> bool:
"""Detect repetition loops in generated text."""
pattern = r'\\\\b(\\\\w+)(?:\\\\s+\\\\1){4,}'
return bool(re.search(pattern, text, re.IGNORECASE))
# Build the logit_bias dictionary
logit_bias = build_logit_bias_dict(banlist, biaslist, encoding)
# Generate with logit bias
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a product description for a yoga mat."}
],
temperature=1.0,
logit_bias=logit_bias,
frequency_penalty=0.3, # Add repetition penalty
)
result = completion.choices[0].message.content
# Check for loops
if detect_repetition_loop(result):
print("Warning: Repetition detected!")
else:
print(result)