Abstract: Vision-Language Models (VLMs) have enabled a variety of real-world applications. The large parameter size of VLMs brings large memory and computation overhead which poses significant ...
Abstract: Structured pruning and quantization are fundamental techniques used to reduce the size of deep neural networks (DNNs), and typically are applied independently. Applying these techniques ...