The World Needs Something Better Than The Transformer
Of GTC’s 900+ sessions, the most wildly popular was a conversation hosted by NVIDIA founder and CEO Jensen Huang with seven of the authors of the legendary research paper that introduced the aptly named... “Everything that we’re enjoying today can be traced back to that moment,” Huang said to a packed room with hundreds of attendees, who heard him speak with the authors of “Attention Is All You... Sharing the stage for the first time, the research luminaries reflected on the factors that led to their original paper, which has been cited more than 100,000 times since it was first published and... They also discussed their latest projects and offered insights into future directions for the field of generative AI. While they started as Google researchers, the collaborators are now spread across the industry, most as founders of their own AI companies. “We have a whole industry that is grateful for the work that you guys did,” Huang said.
"Attention Is All You Need"[1] is a 2017 landmark[2][3] research paper in machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014 by Bahdanau et al.[4] It is considered a foundational[5] paper in modern artificial intelligence,... The paper's title is a reference to the song "All You Need Is Love" by the Beatles.[8] The name "Transformer" was picked because Jakob Uszkoreit, one of the paper's authors, liked the sound of... An early design document was titled "Transformers: Iterative Self-Attention and Processing for Various Tasks", and included an illustration of six characters from the Transformers franchise. The team was named Team Transformer.[8] Some early examples that the team tried their Transformer architecture on included English-to-German translation, generating Wikipedia articles on "The Transformer", and parsing.
These convinced the team that the Transformer is a general purpose language model, and not just good for translation.[9] As of 2025,[update] the paper has been cited more than 173,000 times,[10] placing it among the top ten most-cited papers of the 21st century.[11] Seven of the eight authors of the landmark ‘Attention is All You Need’ paper, that introduced Transformers, gathered for the first time as a group for a chat with Nvidia CEO Jensen Huang in... They included Noam Shazeer, co-founder and CEO of Character.ai; Aidan Gomez, co-founder and CEO of Cohere; Ashish Vaswani, co-founder and CEO of Essential AI; Llion Jones, co-founder and CTO of Sakana AI; Illia Polosukhin,... Niki Parmar, co-founder of Essential AI, was unable to attend. In 2017, the eight-person team at Google Brain struck gold with Transformers — a neural network NLP breakthrough that captured the context and meaning of words more accurately than its predecessors: the recurrent neural...
The Transformer architecture became the underpinnings of LLMs like GPT-4 and ChatGPT, but also non-language applications including OpenAI’s Codex and DeepMind’s AlphaFold. But now, the creators of Transformers are looking beyond what they built — to what’s next for AI models. Cohere’s Gomez said that at this point “the world needs something better than Transformers,” adding that “I think all of us here hope it gets succeeded by something that will carry us to new... That’s the exciting step because I think [what is there now] is too similar to the thing that was there six, seven, years ago.” In a discussion with VentureBeat after the panel, Gomez expanded on his panel comments, saying that “it would be really sad if [Transformers] is the best we can do,” adding that he had thought... “I want to see it replaced with something else 10 times better, because that means everyone gets access to models that are 10 times better.”
If modern artificial intelligence has a founding document, a sacred text, it is Google’s 2017 research paper “Attention Is All You Need.” This paper introduced a new deep learning architecture known as the transformer, which has gone on to revolutionize the field of AI over the past half-decade. The generative AI mania currently taking the world by storm can be traced directly to the invention of the transformer. Every major AI model and product in the headlines today—ChatGPT, GPT-4, Midjourney, Stable Diffusion, GitHub Copilot, and so on—is built using transformers. Transformers are remarkably general-purpose: while they were initially developed for language translation specifically, they are now advancing the state of the art in domains ranging from computer vision to robotics to computational biology. In short, transformers represent the undisputed gold standard for AI technology today.
Since their introduction in 2017 by Vaswani et al., transformers have been the cornerstone of some of the groundbreaking developments in artificial intelligence (AI). From language models like GPT-4 to vision applications like ViT (Vision Transformers), this architecture’s versatility and scalability have powered results across various domains. But as we push the boundaries of what’s possible in AI, we’re confronted with questions about its limitations and the need for alternatives that address scalability, efficiency, and specialized applications. While the transformer architecture has proven highly effective, its widespread use comes with drawbacks: high computational costs, inefficient use of resources for certain tasks, and inherent challenges like hallucinations in language models. This blog explores what might come next, the potential successors to transformers, novel ideas in deep learning, and what they mean for the future of AI. Transformers revolutionized AI by introducing the self-attention mechanism, allowing models to process sequences as a whole rather than step-by-step, as done in Recurrent Neural Networks (RNNs) and Long Short-Term Memory Networks (LSTMs).
This parallelization made transformers highly scalable and effective for handling long-range dependencies, especially in natural language processing (NLP) and computer vision. However, the computational demand of this approach is immense, particularly because self-attention scales quadratically with the input sequence length. Since the original transformer, numerous models have extended its principles: BERT (Bidirectional Encoder Representations from Transformers): Focused on understanding context in both directions, making it highly effective for tasks like question answering and sentiment analysis. 今年的英伟达GTC,英伟达创始人&CEO黄仁勋在主题演讲之外,公开的活动就只有一场圆桌讨论——Transforming AI。 当地时间早上7:00,距离这场圆桌讨论开始还有4个小时,就有观众来到了圣何塞McEnery会议中心。圆桌讨论开始前1个小时,门口已经排起长龙。
观众如此关注这场圆桌讨论的原因除了黄仁勋之外,还有英伟达预告的重磅嘉宾:Ashish Vaswani、Noam Shazeer、Niki Parmar、Jakob Uszkoreit、Llion Jones、Aidan Gomez、Lukasz Kaiser和Illia Polosukhin。 他们都曾就职于Google,也是论文《Attention Is All You Need(注意力就是你所需要的一切)》的作者,被称为——“Transformer论文八子”,而这篇论文也被称为“梦开始的地方”。 2017年,Google团队发表了一篇文章《Attention Is All You Need》。这篇开创性的论文介绍了基于Transformer的深度学习架构。Transformer彻底改变了自然语言处理(NLP)领域,它的自注意力机制也被广泛应用于计算机视觉等其他领域,并对AI研究产生了深远影响,成为了AI发展史上的一个里程碑。截至今天,这篇论文的被引用次数已高达112576。
People Also Search
- 'You Transformed the World,' NVIDIA CEO Tells Researchers Behind ...
- The World Needs Something Better Than the Transformer
- Attention Is All You Need - Wikipedia
- 'Attention is All You Need' creators look beyond Transformers for AI at ...
- Attention is All You Need: Exploring the Landmark Paper
- Transformers Revolutionized AI. What Will Replace Them?
- 'Attention is All You Need' creators look beyond Transformers at Nvidia ...
- What Comes After Transformers? An Exploration of AI's Next Frontier
- 黄仁勋对话Transformer论文作者:世界需要比Transformer更好的东西
Of GTC’s 900+ Sessions, The Most Wildly Popular Was A
Of GTC’s 900+ sessions, the most wildly popular was a conversation hosted by NVIDIA founder and CEO Jensen Huang with seven of the authors of the legendary research paper that introduced the aptly named... “Everything that we’re enjoying today can be traced back to that moment,” Huang said to a packed room with hundreds of attendees, who heard him speak with the authors of “Attention Is All You......
"Attention Is All You Need"[1] Is A 2017 Landmark[2][3] Research
"Attention Is All You Need"[1] is a 2017 landmark[2][3] research paper in machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014 by Bahdanau et al.[4] It is considered a foundational[5] paper in modern artificial intelligence,... The paper's title is a referen...
These Convinced The Team That The Transformer Is A General
These convinced the team that the Transformer is a general purpose language model, and not just good for translation.[9] As of 2025,[update] the paper has been cited more than 173,000 times,[10] placing it among the top ten most-cited papers of the 21st century.[11] Seven of the eight authors of the landmark ‘Attention is All You Need’ paper, that introduced Transformers, gathered for the first ti...
The Transformer Architecture Became The Underpinnings Of LLMs Like GPT-4
The Transformer architecture became the underpinnings of LLMs like GPT-4 and ChatGPT, but also non-language applications including OpenAI’s Codex and DeepMind’s AlphaFold. But now, the creators of Transformers are looking beyond what they built — to what’s next for AI models. Cohere’s Gomez said that at this point “the world needs something better than Transformers,” adding that “I think all of us...
If Modern Artificial Intelligence Has A Founding Document, A Sacred
If modern artificial intelligence has a founding document, a sacred text, it is Google’s 2017 research paper “Attention Is All You Need.” This paper introduced a new deep learning architecture known as the transformer, which has gone on to revolutionize the field of AI over the past half-decade. The generative AI mania currently taking the world by storm can be traced directly to the invention of ...