Gemma 4 models use a training trick to reduce memory traces

Presentation timeline for Gemma 4 GAT models.

TL;DR

Gemma 4 models are now available for download with quantization-aware training (QAT), which reduces the size and memory footprint of the models.
These open-source models retain quality better thanks to QAT compared to those using post-training quantization (PTQ).
The GAT-optimized Gemma 4 models are available in five sizes: Gemma 4 E2B, Gemma 4 E4B, Gemma 4 12B, Gemma 4 26B A4B and Gemma 4 31B.

Follows Google introduced the laptop-grade Gemma 4 12B model earlier this week, the company released its new Gemma 4 model checkpoints with quantization-aware training. Quantization is necessary to reduce the amount of memory required to run lightweight models. The standard method is post-training quantization (PTQ), which quantizes the model after training, but can result in poorer performance. According to Google, the latest Gemma 4 versions use quantization-aware training (QAT) instead to reduce model quality loss and speed up decoding. blog post.

Google says that incorporating quantization into the training process results in checkpoints with better performance than PTQ-refined models. Compressed models work well on phones and laptops thanks to a special mobile quantization scheme. This involves the use of pre-calculated parameters, 2-bit compression on certain parts of the model, dictionary list and short-term memory compression. For the user, this results in a smaller model that consumes less system memory.

Don’t want to miss out on the best Android Authority?

Multiple model sizes are available with QAT optimization, including Gemma 4 E2B, Gemma 4 E4B, Gemma 4 12B, Gemma 4 26B A4B and Gemma 4 31B. The smallest versions, as the text-only Gemma 4 E2B modelrequires less than one gigabyte of memory to run. With no resource intensive requirements, these small Gemma 4 checkpoints are ideal for running on phones.

Google has shared the approximate memory requirements for loading the new Gemma 4 models with different sizes of QAT:

Memory requirements of Gemma 4 model sizes.

Four different formats of Gemma 4 QAT models are available for download: unquantized QAT checkpoints, GPT-Generated Unified Format (GGUF), mobile-optimized, and Compressed Tensors. According to Google, these models “maintain similar quality to bfloat16, while dramatically reducing the memory requirements for loading the model.”

After downloading the Gemma 4 QAT model weights, users can run the checkpoints on their phones, laptops or desktops. you can find mobile and desktop models Hugging Face in as well as in LM Studio.

Thank you for being a part of our community. Read our Comment Policy before deployment.

Source link

Gemma 4 models use a training trick to reduce memory traces

Leave a ReplyCancel Reply

Reid Hoffman is leaving the Microsoft board to go into “founder mode” with his startup Manus

Russia is developing a smaller Starlink and will continue to operate until 2027

Steam Machine’s summer release confirmed, but still no price

Leave a ReplyCancel Reply

Trending now

Reid Hoffman is leaving the Microsoft board to go into “founder mode” with his startup Manus

Russia is developing a smaller Starlink and will continue to operate until 2027

Steam Machine’s summer release confirmed, but still no price