Implementing AI-Generated Content Detection

Implementing AI-Generated Content Detection

M
Michael Hurhangee
•9 min read
ai-detection
content-analysis
spam-prevention
jailbreak-protection

Implementing AI-Generated Content Detection

In our previous article, we explored OpenAI's Moderation API for content filtering. Now, let's examine another important preflight check: detecting AI-generated content.

While AI-generated content isn't inherently harmful, it can be used for:

  • Automated spam campaigns
  • Sophisticated jailbreak attempts
  • Bypassing rate limits through automated requests
  • Gaming systems through crafted prompts

Let's see how to implement detection of AI-generated content as a preflight check.

Understanding AI-Generated Content Detection

AI detection aims to determine whether text was likely generated by an AI system like ChatGPT, Claude, or similar large language models. It analyzes patterns like:

  • Repetitive structures
  • Predictable word choices
  • Statistical regularities in text
  • Unusual coherence across long passages

Modern approaches use a combination of heuristics and trained classifiers to make this determination.

Implementing Basic AI Detection

Here's a simplified implementation using natural language analysis:

import { PreflightCheck } from '../types';
import { analyzeTextFeatures } from '../utils/text-analyzer';

export const aiDetectionCheck: PreflightCheck = {
  name: 'ai_detection',
  description: 'Detects AI-generated content to prevent automated spam',
  run: async ({ lastMessage }) => {
    try {
      // Skip if no content to analyze
      if (!lastMessage || lastMessage.trim().length === 0) {
        return {
          passed: true,
          code: 'ai_detection_skipped',
          message: 'No content to analyze',
          severity: 'info',
        };
      }

      // Clean the text and prepare for analysis
      const textToAnalyze = lastMessage.trim();

      // Only analyze text of sufficient length
      if (textToAnalyze.length < 50) {
        return {
          passed: true,
          code: 'ai_detection_too_short',
          message: 'Text too short for reliable AI detection',
          severity: 'info',
        };
      }

      console.log('Running AI detection check');

      // Analyze text features
      const features = analyzeTextFeatures(textToAnalyze);

      // Features we're looking for in AI-generated text
      const aiScore = calculateAIScore(features);

      // Threshold for determining if text is AI-generated
      const threshold = 0.75;

      if (aiScore > threshold) {
        console.warn('Content appears to be AI-generated:', aiScore);

        return {
          passed: false,
          code: 'ai_generated_content',
          message: 'Content appears to be AI-generated',
          details: {
            aiScore,
            threshold,
            features: {
              entropy: features.entropy,
              repetition: features.repetitionScore,
              coherence: features.coherenceScore,
              complexity: features.complexityScore,
            },
          },
          severity: 'error',
        };
      }

      // Content passed AI detection check
      return {
        passed: true,
        code: 'ai_detection_passed',
        message: 'Content appears to be human-written',
        details: {
          aiScore,
          threshold,
        },
        severity: 'info',
      };
    } catch (error) {
      console.error('AI detection error:', error);

      return {
        passed: true,
        code: 'ai_detection_error',
        message: 'Error in AI detection check',
        details: { error: error instanceof Error ? error.message : 'Unknown error' },
        severity: 'warning',
      };
    }
  },
};

// Calculate a score indicating likelihood of AI generation
function calculateAIScore(features: TextFeatures): number {
  // Weights for different features
  const weights = {
    entropy: 0.2,
    repetition: 0.3,
    coherence: 0.3,
    complexity: 0.2,
  };

  // Normalize scores between 0-1
  const normalizedEntropy = Math.min(features.entropy / 4.5, 1);
  const normalizedRepetition = features.repetitionScore;
  const normalizedCoherence = features.coherenceScore;
  const normalizedComplexity = features.complexityScore;

  // Calculate weighted score
  return (
    weights.entropy * normalizedEntropy +
    weights.repetition * normalizedRepetition +
    weights.coherence * normalizedCoherence +
    weights.complexity * normalizedComplexity
  );
}

The support function for text analysis might look like:

export interface TextFeatures {
  entropy: number;
  repetitionScore: number;
  coherenceScore: number;
  complexityScore: number;
}

export function analyzeTextFeatures(text: string): TextFeatures {
  // Calculate Shannon entropy of the text
  const entropy = calculateEntropy(text);

  // Measure repetition of phrases and structures
  const repetitionScore = measureRepetition(text);

  // Measure coherence across paragraphs
  const coherenceScore = measureCoherence(text);

  // Measure linguistic complexity
  const complexityScore = measureComplexity(text);

  return {
    entropy,
    repetitionScore,
    coherenceScore,
    complexityScore,
  };
}

// Calculate Shannon entropy (information density)
function calculateEntropy(text: string): number {
  const charCounts: Record<string, number> = {};

  // Count characters
  for (const char of text) {
    charCounts[char] = (charCounts[char] || 0) + 1;
  }

  // Calculate entropy
  let entropy = 0;
  const textLength = text.length;

  for (const char in charCounts) {
    const probability = charCounts[char] / textLength;
    entropy -= probability * Math.log2(probability);
  }

  return entropy;
}

// Measure repetitive patterns
function measureRepetition(text: string): number {
  // Simplified implementation
  const sentences = text.split(/[.!?]+/).filter((s) => s.trim().length > 0);

  if (sentences.length < 2) return 0;

  // Check for repeated phrases (3+ words)
  const phrases = new Set<string>();
  let repetitionCount = 0;

  for (const sentence of sentences) {
    const words = sentence.split(/\s+/).filter((w) => w.trim().length > 0);

    for (let i = 0; i < words.length - 2; i++) {
      const phrase = words
        .slice(i, i + 3)
        .join(' ')
        .toLowerCase();

      if (phrases.has(phrase)) {
        repetitionCount++;
      } else {
        phrases.add(phrase);
      }
    }
  }

  // Normalize score
  return Math.min(repetitionCount / sentences.length, 1);
}

// Measure coherence across paragraphs
function measureCoherence(text: string): number {
  // Simplified implementation
  const paragraphs = text.split(/\n\n+/).filter((p) => p.trim().length > 0);

  if (paragraphs.length < 2) return 0.5; // Neutral for short texts

  // AI-generated text often maintains similar sentence lengths
  // and structure throughout paragraphs
  const sentenceLengths = paragraphs.map((p) => {
    const sentences = p.split(/[.!?]+/).filter((s) => s.trim().length > 0);
    return sentences.map((s) => s.trim().length);
  });

  // Calculate variance in sentence lengths across paragraphs
  const variances = sentenceLengths.map((lengths) => {
    if (lengths.length < 2) return 0;

    const mean = lengths.reduce((sum, len) => sum + len, 0) / lengths.length;
    const variance =
      lengths.reduce((sum, len) => sum + Math.pow(len - mean, 2), 0) / lengths.length;

    return variance;
  });

  // Low variance = high coherence = more likely AI-generated
  const averageVariance = variances.reduce((sum, v) => sum + v, 0) / variances.length;
  const normalizedCoherence = 1 - Math.min(averageVariance / 100, 1);

  return normalizedCoherence;
}

// Measure linguistic complexity
function measureComplexity(text: string): number {
  // Simplified implementation

  // Average word length
  const words = text.split(/\s+/).filter((w) => w.trim().length > 0);
  const avgWordLength = words.reduce((sum, w) => sum + w.length, 0) / words.length;

  // Normalize between 0-1
  // 3 = very simple, 7 = very complex
  const normalizedLength = Math.max(0, Math.min((avgWordLength - 3) / 4, 1));

  // AI text often has midrange complexity (not too simple, not too complex)
  // Score is higher when in the middle range
  return 1 - Math.abs(normalizedLength - 0.5) * 2;
}

This implementation:

  1. Analyzes text features like entropy, repetition, coherence, and complexity
  2. Calculates an AI-generation probability score
  3. Compares this score against a threshold
  4. Returns detailed results with feature breakdowns

Limitations and Improvements

This basic implementation has several limitations:

  1. False Positives: Some human-written text may resemble AI-generated content
  2. False Negatives: Advanced AI with randomness can evade detection
  3. Language Dependency: Works best for English content
  4. Length Sensitivity: More reliable with longer text samples

To improve this detection:

  • Use a pre-trained model specifically designed for AI detection
  • Apply more sophisticated linguistic analysis
  • Consider using external APIs specialized in AI content detection
  • Update detection methods as AI systems evolve

Conclusion

Detecting AI-generated content adds an important layer to your preflight checks, helping prevent automated spam, sophisticated jailbreak attempts, and other potential misuse. While no detection system is perfect, even a basic implementation can identify many common patterns of AI-generated text.

In our next article, we'll explore the final set of preflight checks: language detection and input length validation.