Start Date
5-12-2025 12:00 PM
End Date
5-12-2025 1:00 PM
Description
We present a parameter-efficient FiLM-based framework for visual saliency ranking that operates on a frozen ResNet backbone. Instead of fine-tuning convolutional weights, lightweight conditioning embeddings modulate four intermediate backbone stages through Feature-wise Linear Modulation (FiLM), enabling controlled scaling and shifting of both channel and spatial feature responses. The resulting multi-scale FiLM-modulated features are fused to produce instance-level saliency rank predictions. Our study shows that conditioning-based modulation can effectively reshape backbone representations while preserving parameter efficiency, providing a flexible alternative to full backbone tuning for fine-grained saliency reasoning.
Exploring Conditioning Strategies for FiLM-Based Feature Modulation in Saliency Ranking
We present a parameter-efficient FiLM-based framework for visual saliency ranking that operates on a frozen ResNet backbone. Instead of fine-tuning convolutional weights, lightweight conditioning embeddings modulate four intermediate backbone stages through Feature-wise Linear Modulation (FiLM), enabling controlled scaling and shifting of both channel and spatial feature responses. The resulting multi-scale FiLM-modulated features are fused to produce instance-level saliency rank predictions. Our study shows that conditioning-based modulation can effectively reshape backbone representations while preserving parameter efficiency, providing a flexible alternative to full backbone tuning for fine-grained saliency reasoning.