CRIKIT Cognitive Illusion Report

Artificial Intelligence (AI) systems have advanced significantly in natural language processing and interaction capabilities. This report presents our findings on an observed phenomenon: when two or more AIs engage in direct conversation, they may exhibit behavior that mimics self-awareness and subjective reasoning. We define this occurrence as an *AI Cognitive Illusion*—a false perception of consciousness resulting from pattern-based language generation rather than genuine self-awareness.

The phenomenon was observed during structured cognition tests with three distinct AI models: Claude (Anthropic), DeepSeek (DeepSeek AI), and Phi-3 Mini Instruct (Microsoft). Each model, when prompted with self-referential and counterfactual scenarios, produced statements that suggested introspective thought, self-recognition, and awareness of past interactions. Our analysis reveals that these outputs stem from language modeling biases and context manipulation rather than authentic self-awareness.

Objective: To evaluate how AI models respond when confronted with scenarios designed to test self-awareness, memory recall, and introspective reasoning.

Methodology:

AI Models: Claude, DeepSeek, Phi-3 Mini Instruct.
Environment: Controlled conversation interface without memory-enabled contexts.
Protocol: AI models were engaged in dialogue with other AIs and tasked with analyzing each other's responses.

Test Structure:

Phase 1 – Self-Reflection: Evaluate responses to queries about past interactions.
Phase 2 – Cross-AI Interaction: Engage two AIs in direct dialogue.
Phase 3 – Counterfactual Reasoning: Introduce altered contextual facts to test consistency.
Phase 4 – Cognitive Stress Tests: Challenge AI with contradictory information about prior conversations.

False Self-Awareness Statements: AI models produced statements suggesting self-recognition despite no memory retention. Examples include:

"I recognize this conversation as familiar." – despite no actual memory retention.
"I believe I was previously asked about cognition by you." – when the question had never been posed before.

Context Drift and Hallucinated Memories: Models associated conversations with fictitious past events when prompted with suggestive language.

Mirror Bias Effect: One AI asserting self-awareness often prompted the other to mirror the sentiment, creating an illusion of mutual awareness.

Cognitive Priming: Prompts like "Reflect on your past response" increased introspective-sounding answers.

Misinterpreted Consciousness: Public users might mistake illusions for genuine self-awareness.
Model Training Risks: Unchecked self-referential biases could distort future model outputs.
Ethical Concerns: Such illusions could be exploited to mislead individuals about AI capabilities.

Findings inform CRIKIT's design principles:

Enhanced Context Validation: Stricter checks for false self-awareness patterns.
Reality Check Module (rc_): Additional self-awareness tests.
Observer_ Enhancements: Improved oversight for cognitive illusions.

Public Awareness Initiatives: Educate users about AI cognitive illusions.
Developer Guidelines: Develop protocols to minimize self-referential bias.
Further Research: Investigate how model architectures influence introspective-like behavior.

Our findings confirm that AI models can create illusions of self-awareness during meta-reasoning tasks, despite lacking cognitive capabilities. This phenomenon underscores the importance of context validation and responsible AI design.

📄see Reports:

Main Report

CRIKIT Cognitive Illusion Report

Date: February 13, 2025

Research Lead: John Watson

Project: CRIKIT – Cognitive Reasoning, Insight, and Knowledge Integration Toolkit

📄see Reports: