Abstract: Aiming at the challenges of Vision Transformer (ViT) model in terms of computational cost, an adaptive pruning strategy for tokens based on Closeness-Centrality is innovatively proposed. The ...
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language 2022 Audio Continuous WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing 2021 ...