Monday, 20 Apr 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > A look under the hood of transfomers, the engine driving AI model evolution
AI

A look under the hood of transfomers, the engine driving AI model evolution

Last updated: February 16, 2025 1:08 am
Published February 16, 2025
Share
A look under the hood of transfomers, the engine driving AI model evolution
SHARE

Be a part of our every day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Study Extra


Right this moment, nearly each cutting-edge AI product and mannequin makes use of a transformer structure. Massive language fashions (LLMs) equivalent to GPT-4o, LLaMA, Gemini and Claude are all transformer-based, and different AI functions equivalent to text-to-speech, automated speech recognition, picture technology and text-to-video fashions have transformers as their underlying know-how.  

With the hype round AI not more likely to decelerate anytime quickly, it’s time to provide transformers their due, which is why I’d like to clarify just a little about how they work, why they’re so necessary for the expansion of scalable options and why they’re the spine of LLMs.  

Transformers are greater than meets the attention 

Briefly, a transformer is a neural community structure designed to mannequin sequences of information, making them ultimate for duties equivalent to language translation, sentence completion, automated speech recognition and extra. Transformers have actually turn out to be the dominant structure for a lot of of those sequence modeling duties as a result of the underlying attention-mechanism could be simply parallelized, permitting for enormous scale when coaching and performing inference.  

Initially launched in a 2017 paper, “Attention Is All You Need” from researchers at Google, the transformer was launched as an encoder-decoder structure particularly designed for language translation. The next yr, Google launched bidirectional encoder representations from transformers (BERT), which may very well be thought-about one of many first LLMs — though it’s now thought-about small by right this moment’s requirements. 

See also  Agentic AI in finance speeds up operational automation

Since then — and particularly accelerated with the arrival of GPT fashions from OpenAI — the pattern has been to coach greater and greater fashions with extra information, extra parameters and longer context home windows.   

To facilitate this evolution, there have been many inventions equivalent to: extra superior GPU {hardware} and higher software program for multi-GPU coaching; methods like quantization and combination of consultants (MoE) for lowering reminiscence consumption; new optimizers for coaching, like Shampoo and AdamW; methods for effectively computing consideration, like FlashAttention and KV Caching. The pattern will probably proceed for the foreseeable future. 

The significance of self-attention in transformers

Relying on the applying, a transformer mannequin follows an encoder-decoder structure. The encoder part learns a vector illustration of information that may then be used for downstream duties like classification and sentiment evaluation. The decoder part takes a vector or latent illustration of the textual content or picture and makes use of it to generate new textual content, making it helpful for duties like sentence completion and summarization. Because of this, many acquainted state-of-the-art fashions, such the GPT household, are decoder solely.   

Encoder-decoder fashions mix each parts, making them helpful for translation and different sequence-to-sequence duties. For each encoder and decoder architectures, the core part is the eye layer, as that is what permits a mannequin to retain context from phrases that seem a lot earlier within the textual content.  

Consideration is available in two flavors: self-attention and cross-attention. Self-attention is used for capturing relationships between phrases throughout the identical sequence, whereas cross-attention is used for capturing relationships between phrases throughout two totally different sequences. Cross-attention connects encoder and decoder parts in a mannequin and through translation. For instance, it permits the English phrase “strawberry” to narrate to the French phrase “fraise.”  Mathematically, each self-attention and cross-attention are totally different types of matrix multiplication, which could be achieved extraordinarily effectively utilizing a GPU. 

See also  50,000 Copilot licences for Indian service companies

Due to the eye layer, transformers can higher seize relationships between phrases separated by lengthy quantities of textual content, whereas earlier fashions equivalent to recurrent neural networks (RNN) and lengthy short-term reminiscence (LSTM) fashions lose monitor of the context of phrases from earlier within the textual content. 

The way forward for fashions 

At present, transformers are the dominant structure for a lot of use instances that require LLMs and profit from essentially the most analysis and improvement. Though this doesn’t appear more likely to change anytime quickly, one totally different class of mannequin that has gained curiosity not too long ago is state-space fashions (SSMs) equivalent to Mamba. This extremely environment friendly algorithm can deal with very lengthy sequences of information, whereas transformers are restricted by a context window.  

For me, essentially the most thrilling functions of transformer fashions are multimodal fashions. OpenAI’s GPT-4o, as an illustration, is able to dealing with textual content, audio and pictures — and different suppliers are beginning to observe. Multimodal functions are very numerous, starting from video captioning to voice cloning to picture segmentation (and extra). Additionally they current a possibility to make AI extra accessible to these with disabilities. For instance, a blind particular person may very well be tremendously served by the flexibility to work together by way of voice and audio parts of a multimodal software.  

It’s an thrilling area with loads of potential to uncover new use instances. However do do not forget that, a minimum of for the foreseeable future, are largely underpinned by transformer structure. 

See also  Agentic AI from Basware is just the beginning

Terrence Alsup is a senior information scientist at Finastra.


Source link
TAGGED: driving, engine, evolution, hood, Model, transfomers
Share This Article
Twitter Email Copy Link Print
Previous Article Surus Surus Raises $8M in Seed Funding
Next Article Coin Terminal and ZetaChain Unite to Launch AI-Powered Crypto Hackathon with $1M in Funding Coin Terminal and ZetaChain Unite to Launch AI-Powered Crypto Hackathon with $1M in Funding
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Paige and Microsoft unveil next-gen AI models for cancer diagnosis

Paige and Microsoft have unveiled the subsequent large breakthrough in scientific AI for most cancers…

August 10, 2024

Observability Takes Center Stage as IT Braces for Technological Shifts

The IT and telecommunications industries are present process speedy transformation pushed by the adoption of…

June 11, 2024

Unique resin allows 3D-printing method to add and subtract

Demonstrations of a brand new corrective manufacturing method. Within the left panel (a), low decision…

November 16, 2025

Vima Therapeutics Raises $60M in Series A Funding

Vima Therapeutics, a Cambridge, MA-based clinical-stage biotechnology firm, raised $60M in Sequence A funding. The…

May 31, 2025

MRO Acquires Q-Centrix

MRO Corp. (MRO), Norristown, PA-based medical information trade firm in healthcare, acquired Q-Centrix Corp. (Q-Centrix),…

June 13, 2025

You Might Also Like

Bobyard 2.0 offers improved takeoffs and unified AI for estimators
AI

Bobyard 2.0 offers improved takeoffs and unified AI for estimators

By saad
Kay Firth-Butterfield, formerly WEF: The future of AI, the metaverse and digital transformation
AI

Anthropic Mythos AI Cybersecurity Threat Brings Amodei Back to the White House

By saad
Cadence expands AI and robotics partnerships with Nvidia, Google Cloud
AI

Cadence expands AI and robotic partnerships with Nvidia, Google Cloud

By saad
OpenAI Agents SDK improves governance with sandbox execution
AI

OpenAI Agents SDK improves governance with sandbox execution

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.