News
[2025.05] Our paper of Patchification Scaling Laws is accepted by ICML2025. We find that a pixel is worth a token!
[2025.02] Our papers of Adventurer and Mamba-Reg are accepted by CVPR 2025!
[2024.07] Our SCLIP paper is accepted by ECCV 2024. It's an elegant way to extract dense CLIP features without training!
[2023.08] Come to JHU and start my PhD life!
[2023.01] One paper accepted by ICLR 2023!
[2022.07] Come to MSRA NLC group for internship!
[2022.07] One paper accepted by ECCV 2022!
...
|
|
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
Feng Wang,
Yaodong Yu,
Wei Shao,
Yuyin Zhou,
Alan Yuille,
Cihang Xie
ICML 2025 . | arXiv
| code
We introduce Patchification Scaling Laws, where we find that the patch size of Vision Transformers
and recurrent models can be scaled down to 1x1 with consistently improved predictive performance.
We challenge the notion that "an image is worth 256 tokens"---actually it is worth 50,176 and even more.
|
|
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Feng Wang,
Jieru Mei,
Alan Yuille
ECCV 2024 . | arXiv
| code
We present a zero-shot semantic segmentation model called SCLIP (Segmentation-adapted CLIP model),
which leverages our newly proposed correlative self-attention mechanism and allows training-free
adaptation to semantic segmentation tasks with CLIP.
|
|
Causal Image Modeling for Efficient Visual Understanding
Feng Wang,
Timing Yang,
Yaodong Yu,
Sucheng Ren,
Guoyizhe Wei,
Angtian Wang,
Wei Shao,
Yuyin Zhou,
Alan Yuille,
Cihang Xie,
CVPR 2025 . | arXiv
| code
we present a comprehensive analysis of causal image modeling and introduce the
Adventurer series models where we treat images as sequences of patch tokens
and employ uni-directional language models to learn visual representations.
This modeling paradigm allows us to process images in a recurrent formulation with
linear complexity relative to the sequence length, which can effectively address
the memory and computation explosion issues posed by high-resolution and fine-grained images.
|
|
Mamba-Reg: Vision Mamba Also Needs Registers
Feng Wang,
Jiahao Wang,
Sucheng Ren,
Guoyizhe Wei,
Jieru Mei,
Wei Shao,
Yuyin Zhou,
Alan Yuille,
Cihang Xie,
CVPR 2025 . | arXiv
| code
Similar to Vision Transformers, we identify artifacts also present within the feature maps of Vision Mamba.
These artifacts, corresponding to high-norm tokens emerging in low-information background areas of images, appear much more severe in Vision Mamba.
|
|
CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation
Feng Wang,
Huiyu Wang,
Chen Wei,
Alan Yuille,
Wei Shen
ECCV , 2022 |
arXiv /
code
We propose a dense (pixel-wise) self-supervised contrastive learning method called CP2,
which facilitates both image- and pixel-level representations.
We obtain 78.6% mIoU with a ResNet-50 and 79.5% with a ViT-S by finetuning CP2 pretrained models on PASCAL VOC.
|
|
Learning to Decompose Visual Features with Latent Textual Prompts
Feng Wang,
Manling Li,
Xudong Lin,
Hairong Lv,
Alexander G. Schwing,
Heng Ji
ICLR , 2023. | arXiv
We propose a novel vision-language model called Decomposed Feature Prompting (short as DeFo),
which decouples the language inputs from the classes to be inferred,
and learns to extract detailed visual features with textual prompts.
|
|
Dual Prompt Tuning for Domain-Aware Federated Learning
Guoyizhe Wei, Feng Wang, Anshul Shah, Rama Chellappa
Preprint, under review . | arXiv
We address the challenges of domain shift in vision-language inference by leveraging
the technique of prompt learning for both the image and text encoders in CLIP,
which facilitates domain adaptation over decentralized and non-iid data.
|
|
Boost Neural Networks by Checkpoints
Feng Wang,
Guoyizhe Wei,
Qiao Liu,
Jinxiang Ou,
Xian Wei,
Hairong Lv
NeurIPS , 2021 |
arXiv
We propose a novel checkpoint ensemble called Checkpoint Boosted Neural Networks (CBNN),
where a boosting scheme is utilized to accelerate model convergence and maximize the checkpoint diversity.
Our superior performance is supported by a theoretical proof.
|
Last update: Apr. 2025      Template
|
|