By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Latest World News UpdateLatest World News UpdateLatest World News Update
  • Home
  • Business
  • National
  • Entertainment
  • Sports
  • Health
  • Science
  • Tech
  • World
  • Marathi
  • Hindi
  • Gujarati
  • Videos
  • Press Release
    • Press Release
    • Press Release Distribution Packages
Reading: Yandex researchers develop new methods for compressing large language models, cutting AI deployment costs by up to 8 times – World News Network
Share
Notification Show More
Font ResizerAa
Latest World News UpdateLatest World News Update
Font ResizerAa
  • Home
  • Business
  • National
  • Entertainment
  • Sports
  • Health
  • Science
  • Tech
  • World
  • Marathi
  • Hindi
  • Gujarati
  • Videos
  • Press Release
    • Press Release
    • Press Release Distribution Packages
Follow US
  • Advertise
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Latest World News Update > Blog > Business > Yandex researchers develop new methods for compressing large language models, cutting AI deployment costs by up to 8 times – World News Network
Business

Yandex researchers develop new methods for compressing large language models, cutting AI deployment costs by up to 8 times – World News Network

worldnewsnetwork
Last updated: July 29, 2024 12:00 am
worldnewsnetwork 10 months ago
Share
SHARE

VMPL
Bangalore (Karnataka) [India], July 29: The Yandex Research team, in collaboration with researchers from IST Austria, NeuralMagic, and KAUST, have developed two innovative compression methods for large language models: Additive Quantization for Language Models (AQLM) and PV-Tuning. When combined, these methods allow for a reduction in model size by up to 8 times while preserving response quality by 95%. The methods aim to optimize resources and enhance efficiency in running large language models. The research article detailing this approach has been featured at the International Conference on Machine Learning (ICML), currently underway in Vienna, Austria.
Key features of AQLM and PV-Tuning
AQLM leverages additive quantization, traditionally used for information retrieval, for LLM compression. The resulting method preserves and even improves model accuracy under extreme compression, making it possible to deploy LLMs on everyday devices like home computers and smartphones. This results in a significant reduction in memory consumption.
PV-Tuning addresses errors that may arise during the model compression process. When combined, AQLM and PV-Tuning deliver optimal results — compact models capable of providing high-quality responses even on limited computing resources.
Method evaluation and recognition
The effectiveness of the methods was rigorously assessed using popular open source models such as LLama 2, Llama 3, Mistral, and others. Researchers compressed these large language models and evaluated answer quality against English-language benchmarks — WikiText2 and C4 — maintaining an impressive 95% answer quality as the models were compressed by 8 times.
Who can benefit from AQLM and PV-Tuning
The new methods offer substantial resource savings for companies involved in developing and deploying proprietary language models and open-source LLMs. For instance, the Llama 2 model with 13 billion parameters, post-compression, can now run on just 1 GPU instead of 4, reducing hardware costs by up to 8 times. This means that startups, individual researchers, and LLM enthusiasts can run advanced LLMs such as Llama on their everyday computers.
Exploring new LLM applications
AQLM and PV-Tuning make it possible to deploy models offline on devices with limited computing resources, enabling new use cases for smartphones, smart speakers, and more. With advanced LLMs integrated into them, users can use text and image generation, voice assistance, personalized recommendations, and even real-time language translation without needing an active internet connection.
Moreover, models compressed using the methods can operate up to 4 times faster, as they require fewer computations.
Implementation and access
Developers and researchers worldwide can already use AQLM and PV-Tuning, which are available on GitHub. Demo materials provided by the authors offer guidance for effectively training compressed LLMs for various applications. Additionally, developers can download popular open-source models that have already been compressed using the methods.
ICML highlight
A scientific article by Yandex Research on the AQLM compression method has been featured at ICML, one of the world’s most prestigious machine learning conferences. Co-authored with researchers from IST Austria and experts from AI startup Neural Magic, this work signifies a significant advancement in LLM compression technology.
About Yandex
Yandex is a global technology company that builds intelligent products and services powered by machine learning. The company aims to help consumers and businesses better navigate the online and offline world. Since 1997, Yandex has been delivering world-class, locally relevant search and information services and has also developed market-leading on-demand transportation services, navigation products, and other mobile applications for millions of consumers across the globe.
For reference [additional details for media & journalists]
Deploying large language models (LLMs) on consumer hardware is challenging due to the inherent trade-off between model size and computational efficiency. Compression methods, such as quantization, have offered partial solutions, but often compromise model performance.
To address this challenge, researchers from Yandex Research, IST Austria, KAUST, and NeuralMagic developed two compression methods — Additive Quantization for Language Models (AQLM) and PV-Tuning. AQLM reduces the bit count per model parameter to 2-3 bits while preserving or even enhancing model accuracy, particularly in extreme compression scenarios. PV-Tuning is a representation-agnostic framework that generalizes and improves upon existing fine-tuning strategies.
AQLM’s key innovations include learned additive quantization of weight matrices, which adapts to input variability and joint optimization of codebook parameters across layer blocks. This dual strategy enables AQLM to outperform other compression techniques, setting new benchmarks in the field.
AQLM’s practicality is demonstrated by its implementations on GPU and CPU architectures, making it suitable for real-world applications. Comparative analysis shows that AQLM can achieve extreme compression without compromising model performance, as evidenced by its superior results in metrics like model perplexity and accuracy in zero-shot tasks.
PV-Tuning provides convergence guarantees in restricted cases, and has been shown to outperform previous methods when used for 1-2 bit vector quantization on highly-performant models such as Llama and Mistral. By leveraging PV-Tuning, the researchers achieved the first Pareto-optimal quantization for Llama 2 models at 2 bits per parameter.
The effectiveness of the methods was rigorously assessed using popular open-source models such as LLama 2, Mistral, and Mixtral. Researchers compressed these large language models and evaluated answer quality against English-language benchmarks — WikiText2 and C4 — maintaining an impressive 95% answer quality as the models were compressed by 8 times.

* The closer the average accuracy of answers in tests is to the original model, the better the new methods are at preserving the quality of answers. The figures above show the combined results of the two methods, which compress the models by, on average, 8 times.
(ADVERTORIAL DISCLAIMER: The above press release has been provided by VMPL. ANI will not be responsible in any way for the content of the same)

Contents
WORLD MEDIA NETWORKPRESS RELEASE DISTRIBUTIONPress releases distribution in 166 countriesPress releases in all languagesPress releases in Indian LanguagesIndia PackagesEurope PackagesAsia PackagesMiddle East & Africa PackagesSouth America PackagesUSA & Canada PackagesOceania PackagesCis Countries PackagesWorld Packages

Disclaimer: This story is auto-generated from a syndicated feed of ANI; only the image & headline may have been reworked by News Services Division of World News Network Inc Ltd and Palghar News and Pune News and World News

sponsored by

WORLD MEDIA NETWORK


PRESS RELEASE DISTRIBUTION

Press releases distribution in 166 countries

EUROPE UK, INDIA, MIDDLE EAST, AFRICA, FRANCE, NETHERLANDS, BELGIUM, ITALY, SPAIN, GERMANY, AUSTRIA, SWITZERLAND, SOUTHEAST ASIA, JAPAN, SOUTH KOREA, GREATER CHINA, VIETNAM, THAILAND, INDONESIA, MALAYSIA, SOUTH AMERICA, RUSSIA, CIS COUNTRIES, AUSTRALIA, NEW ZEALAND AND MORE

Press releases in all languages

ENGLISH, GERMAN, DUTCH, FRENCH, PORTUGUESE, ARABIC, JAPANESE, and KOREAN CHINESE, VIETNAMESE, INDONESIAN, THAI, MALAY, RUSSIAN. ITALIAN, SPANISH AND AFRICAN LANGUAGES

Press releases in Indian Languages

HINDI, MARATHI, GUJARATI, TAMIL, TELUGU, BENGALI, KANNADA, ORIYA, PUNJABI, URDU, MALAYALAM
For more details and packages

Email - support@worldmedianetwork.uk
Website - worldmedianetwork.uk

India Packages

Read More

Europe Packages

Read More

Asia Packages

Read More

Middle East & Africa Packages

Read More

South America Packages

Read More

USA & Canada Packages

Read More

Oceania Packages

Read More

Cis Countries Packages

Read More

World Packages

Read More
sponsored by

You Might Also Like

India’s export outlook remains uncertain, trade deficit to widen to 1.2% of GDP in FY26: UBI Report – World News Network

India’s MSMEs face credit gap of Rs 30 lakh crore, women-owned businesses see highest shortfall: Report – World News Network

India’s MSMEs face credit gap of Rs 30 lakh crore, women-owned businesses see highest shortfall: Report – World News Network

From generic drugs to vaccines, Indian pharma industry is transforming lives worldwide – World News Network

India’s land port restrictions aim to “restore equality in relationship” with Bangladesh: Sources – World News Network

Share This Article
Facebook Twitter Email Print
Previous Article Chhattisgarh CM attends BJP Chief Ministers’ Conclave, informs strategies, schemes of state – World News Network
Next Article Congress to hold Parliamentary Party general body meeting on July 31 – World News Network
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

235.3kFollowersLike
69.1kFollowersFollow
11.6kFollowersPin
56.4kFollowersFollow
136kSubscribersSubscribe
4.4kFollowersFollow
- Advertisement -

Latest News

Punjab govt reinstates two senior officers months after suspending them over corruption charges – World News Network
National 4 hours ago
NEP not saffronisation, but empowerment: BJP’s Tamilisai Soundararajan slams TN govt over education row – World News Network
National 4 hours ago
Heavy Rain and Thunderstorms lash parts of Mandi District in Himachal Pradesh – World News Network
National 4 hours ago
IMD predicts light to moderate rain in Mumbai, neigbouring districts today – World News Network
National 4 hours ago

Sports

IPL 2025: GT, RCB qualify for playoffs, PBKS secure final four finish after 11 years – World News Network
Sports
Sai Sudharsan-Shubman Gill continue record-breaking run in IPL 2025, help GT reach playoffs – World News Network
Sports

Popular Category

  • Business
  • Entertainment
  • Health
  • National
  • Videos
  • Gujarati

Popular Category

  • Hindi
  • Lifestyle
  • Marathi
  • National
  • Science
  • Sports
  • Tech
  • World

Entertainment

Cannes 2025: Robert Pattinson, Jennifer Lawrence’s ‘Die, My Love’ earns standing ovation after world preimere – World News Network
Entertainment
Tom Cruise praises Ana de Armas for her action-packed film ‘Ballerina’ – World News Network
Entertainment
Copyright © 2023 World News Network. All Rights Reserved.
Join Us!

Subscribe to our newsletter and never miss our latest news, podcasts etc..

[mc4wp_form]
Zero spam, Unsubscribe at any time.
Welcome Back!

Sign in to your account

Lost your password?