Towards Unified and Effective Domain Generalization

Abstract

We propose UniDG, a novel and Unified framework for Domain Generalization that is capable of significantly enhancing the out-of-distribution generalization performance of foundation models regardless of their architectures. The core idea of UniDG is to finetune models during the inference stage, which saves the cost of iterative training. Specifically, we encourage models to learn the distribution of test data in an unsupervised manner and impose a penalty regarding the updating step of model parameters. The penalty term can effectively reduce the catastrophic forgetting issue as we would like to maximally preserve the valuable knowledge in the original model. Empirically, across 12 visual backbones, including CNN-, MLP-, and Transformer-based models, ranging from 1.89M to 303M parameters, UniDG shows an average accuracy improvement of +5.4% on DomainBed.

Method

In this paper, we propose a novel method, named Marginal Generalization, to update the encoder for Test-Time Adaptation (TTA).

Intuitively, Marginal Generalization aims to let the encoder learn representations of the target data within a certain distance from the representations obtained by the initial model. Intuitively, while the encoder f′(·) is trying to adapt to the novel data, it always refers to the original model f(·) and keeps the representations within a distance σ from the original, which means the pretrained source knowledge can be preserved and catastrophic forgetting is avoided. 2) As we keep updating the encoder via entropy minimization on the test data, it cooperates better with the classifier and yields more discriminative features on the target domain. In addition, the features extracted by the updated encoder can be utilized by multiple TTA mechanisms. For example, by naturally combining Marginal Generalization and Memory Bank (Wu et al., 2018), we propose Differentiable Memory Bank, which demonstrates superior performance over the traditional memory bank methods because it performs feature filtration and storage on differentiable features.

Experiment

We conduct experiments on with DomainBed benchmark on PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet datasets. We evaluate UniDG by taking 3 parallel trials with random seeds to calculate means and standard errors of classification accuracy (%) on 5 datasets.

UniDG prominently achieves a brilliant performance on Domain generalization tasks. Table 1 shows the performances of the existing advanced approaches for DG tasks using different pre-training methods. Impressively, with only ImageNet pre-training, UniDG outperforms the CAR-FT model with CLIP pre-training by 1.1% in the average accuracy (79.6% vs. 78.5%). On the terrain data set with complex domain shift, the accuracy of UniDG reached 62.4%, outperforming CAR-FT by 0.5%.

UniDG remarkably outperforms all existing test-time methods including the state-of-the-art method, TAST (Jang & Chung, 2023). Specifically, as shown in Table 2, we choose ResNet-18 and ResNet- 50 as the backbone and average accuracy as the metric to evaluate several test-time methods. UniDG achieves an average accuracy of 67.2% with ResNet-18 on VLCS, PACS, OfficeHome, and TerraInc datasets.

Accuracy accumulation curves on VLCS. UniDG outperforms the base ERM model by about 5% in accuracy. Note we randomly select 10 different trial seeds for better comparison.

BibTeX

If you find our work useful, please cite our paper. BibTex code is provided below:

@article{zhang2023unified,
        title={Towards Unified and Effective Domain Generalization}, 
        author={Yiyuan Zhang and Kaixiong Gong and Xiaohan Ding and Kaipeng Zhang and Fangrui Lv and Kurt Keutzer and Xiangyu Yue},
        year={2023},
        eprint={2310.10008},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
      }

Towards Unified and Effective Domain Generalization

A comparison between existing methods and UniDG (Ours) on the accuracy averaged across the PACS, VLCS, OfficeHome, and TerraInc datasets.

UniDG brings out an average of 5.4% improvement to 12 backbones which scale from 1.59M to 303M parameters.

Abstract

Method

Experiment

BibTeX