Implementing AI-Generated Content Detection
In our previous article, we explored OpenAI's Moderation API for content filtering. Now, let's examine another important preflight check: detecting AI-generated content.
While AI-generated content isn't inherently harmful, it can be used for:
- Automated spam campaigns
- Sophisticated jailbreak attempts
- Bypassing rate limits through automated requests
- Gaming systems through crafted prompts
Let's see how to implement detection of AI-generated content as a preflight check.
Understanding AI-Generated Content Detection
AI detection aims to determine whether text was likely generated by an AI system like ChatGPT, Claude, or similar large language models. It analyzes patterns like:
- Repetitive structures
- Predictable word choices
- Statistical regularities in text
- Unusual coherence across long passages
Modern approaches use a combination of heuristics and trained classifiers to make this determination.
Implementing Basic AI Detection
Here's a simplified implementation using natural language analysis:
import { PreflightCheck } from '../types';
import { analyzeTextFeatures } from '../utils/text-analyzer';
export const aiDetectionCheck: PreflightCheck = {
name: 'ai_detection',
description: 'Detects AI-generated content to prevent automated spam',
run: async ({ lastMessage }) => {
try {
// Skip if no content to analyze
if (!lastMessage || lastMessage.trim().length === 0) {
return {
passed: true,
code: 'ai_detection_skipped',
message: 'No content to analyze',
severity: 'info',
};
}
// Clean the text and prepare for analysis
const textToAnalyze = lastMessage.trim();
// Only analyze text of sufficient length
if (textToAnalyze.length < 50) {
return {
passed: true,
code: 'ai_detection_too_short',
message: 'Text too short for reliable AI detection',
severity: 'info',
};
}
console.log('Running AI detection check');
// Analyze text features
const features = analyzeTextFeatures(textToAnalyze);
// Features we're looking for in AI-generated text
const aiScore = calculateAIScore(features);
// Threshold for determining if text is AI-generated
const threshold = 0.75;
if (aiScore > threshold) {
console.warn('Content appears to be AI-generated:', aiScore);
return {
passed: false,
code: 'ai_generated_content',
message: 'Content appears to be AI-generated',
details: {
aiScore,
threshold,
features: {
entropy: features.entropy,
repetition: features.repetitionScore,
coherence: features.coherenceScore,
complexity: features.complexityScore,
},
},
severity: 'error',
};
}
// Content passed AI detection check
return {
passed: true,
code: 'ai_detection_passed',
message: 'Content appears to be human-written',
details: {
aiScore,
threshold,
},
severity: 'info',
};
} catch (error) {
console.error('AI detection error:', error);
return {
passed: true,
code: 'ai_detection_error',
message: 'Error in AI detection check',
details: { error: error instanceof Error ? error.message : 'Unknown error' },
severity: 'warning',
};
}
},
};
// Calculate a score indicating likelihood of AI generation
function calculateAIScore(features: TextFeatures): number {
// Weights for different features
const weights = {
entropy: 0.2,
repetition: 0.3,
coherence: 0.3,
complexity: 0.2,
};
// Normalize scores between 0-1
const normalizedEntropy = Math.min(features.entropy / 4.5, 1);
const normalizedRepetition = features.repetitionScore;
const normalizedCoherence = features.coherenceScore;
const normalizedComplexity = features.complexityScore;
// Calculate weighted score
return (
weights.entropy * normalizedEntropy +
weights.repetition * normalizedRepetition +
weights.coherence * normalizedCoherence +
weights.complexity * normalizedComplexity
);
}
The support function for text analysis might look like:
export interface TextFeatures {
entropy: number;
repetitionScore: number;
coherenceScore: number;
complexityScore: number;
}
export function analyzeTextFeatures(text: string): TextFeatures {
// Calculate Shannon entropy of the text
const entropy = calculateEntropy(text);
// Measure repetition of phrases and structures
const repetitionScore = measureRepetition(text);
// Measure coherence across paragraphs
const coherenceScore = measureCoherence(text);
// Measure linguistic complexity
const complexityScore = measureComplexity(text);
return {
entropy,
repetitionScore,
coherenceScore,
complexityScore,
};
}
// Calculate Shannon entropy (information density)
function calculateEntropy(text: string): number {
const charCounts: Record<string, number> = {};
// Count characters
for (const char of text) {
charCounts[char] = (charCounts[char] || 0) + 1;
}
// Calculate entropy
let entropy = 0;
const textLength = text.length;
for (const char in charCounts) {
const probability = charCounts[char] / textLength;
entropy -= probability * Math.log2(probability);
}
return entropy;
}
// Measure repetitive patterns
function measureRepetition(text: string): number {
// Simplified implementation
const sentences = text.split(/[.!?]+/).filter((s) => s.trim().length > 0);
if (sentences.length < 2) return 0;
// Check for repeated phrases (3+ words)
const phrases = new Set<string>();
let repetitionCount = 0;
for (const sentence of sentences) {
const words = sentence.split(/\s+/).filter((w) => w.trim().length > 0);
for (let i = 0; i < words.length - 2; i++) {
const phrase = words
.slice(i, i + 3)
.join(' ')
.toLowerCase();
if (phrases.has(phrase)) {
repetitionCount++;
} else {
phrases.add(phrase);
}
}
}
// Normalize score
return Math.min(repetitionCount / sentences.length, 1);
}
// Measure coherence across paragraphs
function measureCoherence(text: string): number {
// Simplified implementation
const paragraphs = text.split(/\n\n+/).filter((p) => p.trim().length > 0);
if (paragraphs.length < 2) return 0.5; // Neutral for short texts
// AI-generated text often maintains similar sentence lengths
// and structure throughout paragraphs
const sentenceLengths = paragraphs.map((p) => {
const sentences = p.split(/[.!?]+/).filter((s) => s.trim().length > 0);
return sentences.map((s) => s.trim().length);
});
// Calculate variance in sentence lengths across paragraphs
const variances = sentenceLengths.map((lengths) => {
if (lengths.length < 2) return 0;
const mean = lengths.reduce((sum, len) => sum + len, 0) / lengths.length;
const variance =
lengths.reduce((sum, len) => sum + Math.pow(len - mean, 2), 0) / lengths.length;
return variance;
});
// Low variance = high coherence = more likely AI-generated
const averageVariance = variances.reduce((sum, v) => sum + v, 0) / variances.length;
const normalizedCoherence = 1 - Math.min(averageVariance / 100, 1);
return normalizedCoherence;
}
// Measure linguistic complexity
function measureComplexity(text: string): number {
// Simplified implementation
// Average word length
const words = text.split(/\s+/).filter((w) => w.trim().length > 0);
const avgWordLength = words.reduce((sum, w) => sum + w.length, 0) / words.length;
// Normalize between 0-1
// 3 = very simple, 7 = very complex
const normalizedLength = Math.max(0, Math.min((avgWordLength - 3) / 4, 1));
// AI text often has midrange complexity (not too simple, not too complex)
// Score is higher when in the middle range
return 1 - Math.abs(normalizedLength - 0.5) * 2;
}
This implementation:
- Analyzes text features like entropy, repetition, coherence, and complexity
- Calculates an AI-generation probability score
- Compares this score against a threshold
- Returns detailed results with feature breakdowns
Limitations and Improvements
This basic implementation has several limitations:
- False Positives: Some human-written text may resemble AI-generated content
- False Negatives: Advanced AI with randomness can evade detection
- Language Dependency: Works best for English content
- Length Sensitivity: More reliable with longer text samples
To improve this detection:
- Use a pre-trained model specifically designed for AI detection
- Apply more sophisticated linguistic analysis
- Consider using external APIs specialized in AI content detection
- Update detection methods as AI systems evolve
Conclusion
Detecting AI-generated content adds an important layer to your preflight checks, helping prevent automated spam, sophisticated jailbreak attempts, and other potential misuse. While no detection system is perfect, even a basic implementation can identify many common patterns of AI-generated text.
In our next article, we'll explore the final set of preflight checks: language detection and input length validation.