Start Date

5-12-2025 12:00 PM

End Date

5-12-2025 1:00 PM

Description

We present a parameter-efficient FiLM-based framework for visual saliency ranking that operates on a frozen ResNet backbone. Instead of fine-tuning convolutional weights, lightweight conditioning embeddings modulate four intermediate backbone stages through Feature-wise Linear Modulation (FiLM), enabling controlled scaling and shifting of both channel and spatial feature responses. The resulting multi-scale FiLM-modulated features are fused to produce instance-level saliency rank predictions. Our study shows that conditioning-based modulation can effectively reshape backbone representations while preserving parameter efficiency, providing a flexible alternative to full backbone tuning for fine-grained saliency reasoning.

Share

COinS
 
Dec 5th, 12:00 PM Dec 5th, 1:00 PM

Exploring Conditioning Strategies for FiLM-Based Feature Modulation in Saliency Ranking

We present a parameter-efficient FiLM-based framework for visual saliency ranking that operates on a frozen ResNet backbone. Instead of fine-tuning convolutional weights, lightweight conditioning embeddings modulate four intermediate backbone stages through Feature-wise Linear Modulation (FiLM), enabling controlled scaling and shifting of both channel and spatial feature responses. The resulting multi-scale FiLM-modulated features are fused to produce instance-level saliency rank predictions. Our study shows that conditioning-based modulation can effectively reshape backbone representations while preserving parameter efficiency, providing a flexible alternative to full backbone tuning for fine-grained saliency reasoning.