Current large vision-language models (LVLMs) achieve remarkable progress, yet there remains significant uncertainty regarding their ability to accurately apprehend visual details, that is, in performing detailed captioning.
Existing benchmarks for VQA hallucination fall short in precisely evaluating hallucinations within detailed captions. To address this, we employ concept matching and coverage as tools for assessing hallucinations in detailed captions. Currently, CCEval is primary geared towards evaluating object existence hallucination.
Please check out our [evaluation code].HallE-Switch controls hallucination/imagination by one continuous parameter.
@misc{zhai2023halleswitch,
title={HallE-Switch: Rethinking and Controlling Object Existence Hallucinations in Large Vision Language Models for Detailed Caption},
author={Bohan Zhai and Shijia Yang and Xiangchen Zhao and Chenfeng Xu and Sheng Shen and Dongdi Zhao and Kurt Keutzer and Manling Li and Tan Yan and Xiangjun Fan},
year={2023},
eprint={2310.01779},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This website is adapted from LLaVA, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Usage and License Notices: The data, code and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of CLIP, LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.
Related Links: [REACT] [GLIGEN] [Computer Vision in the Wild (CVinW)] [Insutrction Tuning with GPT-4]