Vector Quantization in Data Compression Using Python

Tether is shipping TurboQuant KV-cache quantization with Vulkan support into its QVAC SDK

Tether successfully integrated Google’s TurboQuant into the inference engine of its local AI framework, QVAC. It is the ...

IEEE

Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning

Abstract: Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part. Among many techniques, data ...

Running an LLM at Long Context Means Your Model's Memory Fills Up Before the Conversation Gets Interesting.

The technical name for this bottleneck is the KV cache, a data structure that stores the model's working memory for every token it has processed. At 32,000 tokens of context in an 8-billion-parameter ...

IEEE

PQTable: Fast Exact Asymmetric Distance Neighbor Search for Product Quantization Using Hash Tables

Abstract: We propose the product quantization table (PQTable), a product quantization-based hash table that is fast and requires neither parameter tuning nor training steps. The PQTable produces ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results