Walid Bousselham

I'm a PhD student at Bonn University, advised by Prof. Hilde Kuehne. I'm also participating in MIT-IBM Watson Sight and Sound Project.

My primary research area is deep learning for multimodal models. Particularly, I am interested in zero-shot adaptation of pretrained models for emerging behavior.

Prior to this, I finished my Master of Engineering in Applied Mathematics at ENSTA Paris in France and my Master of Science in Statistics and applied Probabilities at the National University of Singapore (NUS) .

Email  /  Scholar  /  Twitter  /  Github

profile photo

🔥 News

05.2024   I spend the summer 2024 at MiT CSAIL as a visiting scholar working with Hendrik Strobelt and Angie Boggust.

05.2024   I gave a talk at "Cohere For AI - Community Talks" regarding our latest work "LeGrad" in collaboration with MiT & IBM Research.

03.2024   Our paper Grounding Everything: Emerging Localization Properties in Vision-Language Transformers was accepted at CVPR 2024!.

01.2024   I gave an interview to the Computer Vision News magazine, that features our recent paper "Grounding Everything". [Link to the interview]

01.2024   I will be attending the BMVA Symposium on Vision and Language with an oral and a poster presenting our recent paper Grounding Everything.


🔬 Featured Research

MaskInversion: Localized Embeddings via Optimization of Explainability Maps
Walid Bousselham, Sofian Chaybouti Christian Rupprecht, Vittorio Ferrari, Hilde Kuehne
arXiv, 2024
Project Page / Code / arXiv

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
Walid Bousselham, Angie Boggust, Sofian Chaybouti Hendrik Strobelt Hilde Kuehne
arXiv, 2024
Project Page / Code / arXiv / Demo

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Walid Bousselham, Felix Petersen, Vittorio Ferrari, Hilde Kuehne
CVPR, 2024
Code / arXiv / Demo

Learning Situation Hyper-Graphs for Video Question Answering
Aisha Urooj, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah
CVPR, 2023
Code / arXiv

Efficient Self-Ensemble for Semantic Segmentation
Walid Bousselham, Guillaume Thibault, Lucas Pagano, Archana Machireddy, Joe Gray, Young Hwan Chang, Xubo Song
BMVC, 2022
Code / arXiv / video


🛠️ Open-source Libraries

MaskInversion
A library for generating localized embeddings of CLIP-like models via optimization of explainability maps.

pip install maskinversion_torch

GitHub / PyPI
LeGrad
An explainability method for Vision Transformers that, given a text prompt, generates a heatmap localizing the part of the image that is important for the model to recognize the text prompt.

pip install legrad_torch

GitHub / PyPI
GEM (Grounding Everything Method)
A library for exploring emerging localization properties in Vision-Language Transformers.

pip install gem_torch

GitHub / PyPI

Design and source code borrowed from Jon Barron's website.