LLM's Fingerprint at Birth: How Random Initialization Shapes Persistent Token Preferences
报告人:胡天阳-香港中文大学(深圳)
时间:2025-10-16 14:00-15:00
地点:四元厅
Abstract: Transformers form the core architecture of today’s large language models (LLMs). While most of their remarkable capabilities emerge through large-scale training, we find that transformers at random initialization already exhibit surprisingly strong structural biases. In particular, untrained models show extreme next-token preferences that persist throughout their lifecycle. These initialization-born biases serve as intrinsic fingerprints of LLMs—unique, reproducible, and determined solely by the random seed. This talk will first present this phenomenon and then explain its origin. Two forces jointly drive the collapse of token representations: MLP induces inter-sequence representation concentration, while self-attention introduces intra-sequence concentration. Together, these effects align hidden representations along a seed-dependent direction, giving rise to the observed extreme preference. Beyond fingerprinting, this mechanism reveals fundamental architectural inductive biases in transformers and offers new perspectives on improving pre-training stability and mitigating attention sink.
Bio: 胡天阳现为香港中文大学(深圳)数据科学色控
的助理教授。他的主要研究方向为人工智能与统计的交叉领域,包括统计机器学习、可信 AI、特征表示学习、深度生成模型等,旨在通过揭示 AI 模型的深层机制,为设计更有效的新算法提供理论指导。
