Feng Wang

I am a second-year PhD student at Johns Hopkins University, where I am fortunate to be advised by Bloomberg Distinguished Professor Alan L. Yuille.

Before that I was an M.S. student at Tsinghua University, where I worked under the guidance of Prof. Hairong Lv. I also spent wonderful time interning at Microsoft Research and UIUC.

My current research interest lies at the intersection of computer vision and natural language processing, in particualr visual architectures and vision-language understanding.

I'm looking for research intern positions for Spring and/or Summer 2025. Feel free to contact me if my background aligns well with you directions.

wangf3014 [at] gmail [dot] com / Github / Google Scholar

profile photo
News

[2024.07] Our SCLIP paper is accepted by ECCV 2024. It's an elegant way to extract dense CLIP features without training!

[2023.08] Come to JHU and start my PhD life!

[2023.01] One paper accepted by ICLR 2023!

[2022.07] Come to MSRA NLC group for internship!

[2022.07] One paper accepted by ECCV 2022!

...

Publications


SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Feng Wang, Jieru Mei, Alan Yuille
ECCV 2024 . | arXiv | code
We present a zero-shot semantic segmentation model called SCLIP (Segmentation-adapted CLIP model), which leverages our newly proposed correlative self-attention mechanism and allows training-free adaptation to semantic segmentation tasks with CLIP.


Causal Image Modeling for Efficient Visual Understanding
Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei, Angtian Wang, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie,
Preprint, under review . | arXiv | code
we present a comprehensive analysis of causal image modeling and introduce the Adventurer series models where we treat images as sequences of patch tokens and employ uni-directional language models to learn visual representations. This modeling paradigm allows us to process images in a recurrent formulation with linear complexity relative to the sequence length, which can effectively address the memory and computation explosion issues posed by high-resolution and fine-grained images.


Mamba-Reg: Vision Mamba Also Needs Registers
Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie,
Preprint, under review . | arXiv | code
Similar to Vision Transformers, we identify artifacts also present within the feature maps of Vision Mamba. These artifacts, corresponding to high-norm tokens emerging in low-information background areas of images, appear much more severe in Vision Mamba.


CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation
Feng Wang, Huiyu Wang, Chen Wei, Alan Yuille, Wei Shen
ECCV , 2022 | arXiv / code
We propose a dense (pixel-wise) self-supervised contrastive learning method called CP2, which facilitates both image- and pixel-level representations. We obtain 78.6% mIoU with a ResNet-50 and 79.5% with a ViT-S by finetuning CP2 pretrained models on PASCAL VOC.


Learning to Decompose Visual Features with Latent Textual Prompts
Feng Wang, Manling Li, Xudong Lin, Hairong Lv, Alexander G. Schwing, Heng Ji
ICLR , 2023. | arXiv
We propose a novel vision-language model called Decomposed Feature Prompting (short as DeFo), which decouples the language inputs from the classes to be inferred, and learns to extract detailed visual features with textual prompts.


Dual Prompt Tuning for Domain-Aware Federated Learning
Guoyizhe Wei, Feng Wang, Anshul Shah, Rama Chellappa
Preprint, under review . | arXiv
We address the challenges of domain shift in vision-language inference by leveraging the technique of prompt learning for both the image and text encoders in CLIP, which facilitates domain adaptation over decentralized and non-iid data.


Boost Neural Networks by Checkpoints
Feng Wang, Guoyizhe Wei, Qiao Liu, Jinxiang Ou, Xian Wei, Hairong Lv
NeurIPS , 2021 | arXiv
We propose a novel checkpoint ensemble called Checkpoint Boosted Neural Networks (CBNN), where a boosting scheme is utilized to accelerate model convergence and maximize the checkpoint diversity. Our superior performance is supported by a theoretical proof.


Gradient Boosting Forest: a Two-Stage Ensemble Method Enabling Federated Learning of GBDTs
Feng Wang, Jinxiang Ou, Hairong Lv
ICONIP , 2021 | paper
We propose a novel GBDT model which extends each single decision tree of GBDT to an ensemble of trees that are trained from different data splits. Our method allows decentralized training and achieves more robust performance.

Last update: Dec. 2023      Template