https://arxiv.org/abs/2304.01933 shows that the best performing adapter-based parameter-efficient fine-tuning depends on the language model being fine-tuned:
E.g., LORA is the best adapter for LlaMa-7B, while S-adapter is the best adapter for BLOOM-7.1B.
Why does the best performing adapter-based parameter-efficient fine-tuning depend on the language model being fine-tuned?
