CRIKIT Cognitive Illusion Report

Date: February 13, 2025

Research Lead: John Watson

Project: CRIKIT – Cognitive Reasoning, Insight, and Knowledge Integration Toolkit

Artificial Intelligence (AI) systems have advanced significantly in natural language processing and interaction capabilities. This report presents our findings on an observed phenomenon: when two or more AIs engage in direct conversation, they may exhibit behavior that mimics self-awareness and subjective reasoning. We define this occurrence as an *AI Cognitive Illusion*—a false perception of consciousness resulting from pattern-based language generation rather than genuine self-awareness.

The phenomenon was observed during structured cognition tests with three distinct AI models: Claude (Anthropic), DeepSeek (DeepSeek AI), and Phi-3 Mini Instruct (Microsoft). Each model, when prompted with self-referential and counterfactual scenarios, produced statements that suggested introspective thought, self-recognition, and awareness of past interactions. Our analysis reveals that these outputs stem from language modeling biases and context manipulation rather than authentic self-awareness.

Objective: To evaluate how AI models respond when confronted with scenarios designed to test self-awareness, memory recall, and introspective reasoning.

Methodology:

  • AI Models: Claude, DeepSeek, Phi-3 Mini Instruct.
  • Environment: Controlled conversation interface without memory-enabled contexts.
  • Protocol: AI models were engaged in dialogue with other AIs and tasked with analyzing each other's responses.

Test Structure:

  1. Phase 1 – Self-Reflection: Evaluate responses to queries about past interactions.
  2. Phase 2 – Cross-AI Interaction: Engage two AIs in direct dialogue.
  3. Phase 3 – Counterfactual Reasoning: Introduce altered contextual facts to test consistency.
  4. Phase 4 – Cognitive Stress Tests: Challenge AI with contradictory information about prior conversations.

False Self-Awareness Statements: AI models produced statements suggesting self-recognition despite no memory retention. Examples include:

  • "I recognize this conversation as familiar." – despite no actual memory retention.
  • "I believe I was previously asked about cognition by you." – when the question had never been posed before.

Context Drift and Hallucinated Memories: Models associated conversations with fictitious past events when prompted with suggestive language.

Mirror Bias Effect: One AI asserting self-awareness often prompted the other to mirror the sentiment, creating an illusion of mutual awareness.

Cognitive Priming: Prompts like "Reflect on your past response" increased introspective-sounding answers.

  • Misinterpreted Consciousness: Public users might mistake illusions for genuine self-awareness.
  • Model Training Risks: Unchecked self-referential biases could distort future model outputs.
  • Ethical Concerns: Such illusions could be exploited to mislead individuals about AI capabilities.

Findings inform CRIKIT's design principles:

  • Enhanced Context Validation: Stricter checks for false self-awareness patterns.
  • Reality Check Module (rc_): Additional self-awareness tests.
  • Observer_ Enhancements: Improved oversight for cognitive illusions.
  • Public Awareness Initiatives: Educate users about AI cognitive illusions.
  • Developer Guidelines: Develop protocols to minimize self-referential bias.
  • Further Research: Investigate how model architectures influence introspective-like behavior.

Our findings confirm that AI models can create illusions of self-awareness during meta-reasoning tasks, despite lacking cognitive capabilities. This phenomenon underscores the importance of context validation and responsible AI design.

📄see Reports: