Publications
* indicates equal contribution
|
|
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Sule Bai*, Yong Liu*, Yifei Han, Haoji Zhang, Yansong Tang
arXiv Preprint, 2024
[Paper]
[Code]
We propose a training-free method that enhances CLIP's segmentation performance through self-calibration without introducing new parameters or relying on additional backbones.
|
|
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Yong Liu*, Sule Bai*, Guanbin Li, Yitong Wang, Yansong Tang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[Paper]
[Code]
We propose an open-vocabulary segmentation (OVS) method by calibrating in-vocabulary and
domain-biased embedding space with generalized contextual prior of CLIP.
|
|
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang*, Sule Bai*, Guangyi Chen, Lei Chen, Jiwen Lu, Junle Wang, Yansong Tang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[Paper]
[Code]
We investigate a new problem called narrative action evaluation (NAE) and propose a prompt-guided
multimodal interaction framework.
|
|