Personalized Daily Arxiv Papers 11/28/2024

Total relevant papers: 2

Paper selection prompt and criteria at the bottom

Table of contents with paper titles:

$H^3$Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs Authors: Selim Furkan Tekin, Fatih Ilhan, Tiansheng Huang, Sihao Hu, Zachary Yahn, Ling Liu
Curriculum Demonstration Selection for In-Context Learning Authors: Duc Anh Vu, Nguyen Tran Cong Duy, Xiaobao Wu, Hoang Minh Nhat, Du Mingzhe, Nguyen Thanh Thong, Anh Tuan Luu

0. $H^3$Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs

ArXiv ID: 2411.17792 Authors: Selim Furkan Tekin, Fatih Ilhan, Tiansheng Huang, Sihao Hu, Zachary Yahn, Ling Liu

Abstract: arXiv:2411.17792v1 Announce Type: new Abstract: Alignment of pretrained LLMs using instruction-based datasets is critical for creating fine-tuned models that reflect human preference. A growing number of alignment-based fine-tuning algorithms and benchmarks emerged recently, fueling the efforts on effective alignments of pre-trained LLMs to ensure helpful, harmless, and honest answers from both open-source and closed-source LLMs. This paper tackles this problem by developing an alignment fusion approach, coined as $H^3$Fusion, with three unique characteristics. First, $H^3$Fusion ensembles multiple individually aligned LLMs to create a final fine-tuned alignment model with enhanced capabilities beyond those of individual models, delivering robust alignment through promoting helpful, harmless, honest fusion. Second, $H^3$Fusion leverages the mixture-of-experts (MoE) methodology in two steps. We first freeze the multi-head attention weights of each individual model while tuning the FFN layer during alignment fusion. Then we merge the aligned model weights with an expert router according to the type of input instruction and dynamically select a subset of experts that are best suited for producing the output response. Finally, we boost the performance of the resulting $H^3$3Fusion model by introducing gating loss and regularization terms. The former penalizes the selection errors of the expert-router, and the latter mediates the expert weights drifting during fine-tuning and dynamically adjusts the fusion behavior of the resulting model by canalizing the activations on the experts. Extensive evaluations on three benchmark datasets show that $H^3$3Fusion is more helpful, less harmful, and more honest from two aspects: it outperforms each individually aligned model by $11.37%$, and it provides stronger robustness compared to the state-of-the-art LLM ensemble approaches by $13.77%$. Code is available at github.com/sftekin/h3fusion.

Comment: Matches criterion 1 as it discusses alignment of pretrained LLMs using instruction-based datasets for fine-tuning. Relevance: 5 Novelty: 4

1. Curriculum Demonstration Selection for In-Context Learning

ArXiv ID: 2411.18126 Authors: Duc Anh Vu, Nguyen Tran Cong Duy, Xiaobao Wu, Hoang Minh Nhat, Du Mingzhe, Nguyen Thanh Thong, Anh Tuan Luu

Abstract: arXiv:2411.18126v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown strong in-context learning (ICL) abilities with a few demonstrations. However, one critical challenge is how to select demonstrations to elicit the full potential of LLMs. In this paper, we propose Curriculum Demonstration Selection (CDS), a novel demonstration selection method for ICL. Instead of merely using similarity, CDS additionally partitions samples by their complexity measurements. Following curriculum learning, CDS then selects demonstrations from easy to difficult. Thus the selected demonstrations cover a wide range of difficulty levels, enabling LLMs to learn from varied complexities within the training set. Experiments demonstrate that our CDS consistently outperforms baseline methods, achieving notable improvements across nine LLMs on three benchmarks. Moreover, CDS proves especially effective in enhancing LLM performance in solving challenging problems.

Comment: Not a match to the specified criteria. Relevance: 3 Novelty: 3

Paper selection prompt

New methodological improvements to RLHF or instruction-following which are specific fine-tuning steps that are taken to make language models better at following user instructions across a range of tasks.
- Relevant: papers that discuss specific methods like RLHF, or instruction-tuning datasets, improving these methods, or analyzing them. Usually these papers will explicitly mention RLHF, instruction-following or instruction-tuning.
- Not relevant: papers about adaptation to some task. Simply following instructions or inputs are not sufficient.
Shows new powerful test set contamination or membership inference methods for language models. Test set contamination is the phenomenon where a language model observes a benchmark dataset during pretraining.
- Relevant: test statistics that can detect contamination of benchmarks in language models. statistics that can provide guarantees are more interesting. membership inference methods that are general enough to apply to language models are also relevant.
- Not relevant: any papers that do not consider language models, or that do not consider test set contamination.
Shows a significant advance in the performance of diffusion language models.
- Relevant: papers that study language models that are also diffusion models. Continuous diffusions are even more relevant, while discrete diffusions are less so.
- Not relevant: papers about image diffusions like DALL-E or Stable Diffusion, or papers that do not explicitly mention language models or applications to text.
Describes new paradigms to evaluating open-ended text generation. Evaluating the outputs of language models is hard, especially in open-ended settings like for chatbots.
- Relevant: papers that fundamentally rethink language model evaluation -- especially by accounting for subjectivity or using adversaries.
- Not relevant: specific evaluations for specific tasks, identifying new properties or flaws of language models, or simply collecting new data.
Conducts surveys or provides data into real-world usage and safety properties of language models.
- Relevant: papers that create new datasets or surveys on real-world usage of language models.
- Not relevant: papers that apply language models to new real-world tasks.
Studies 'scaling laws' in the context of neural networks. Scaling laws refer to the very clear power-law relationship between the size or computational power used to train a model and the performance of that model.
- Relevant: theoretical or conceptual explanation behind scaling laws for language models.
- Not relevant: papers that have experiments at different model scales (but do not explicitly fit a scaling law) or papers that mention scaling laws, but the scaling laws are not the central subject of the paper

In suggesting papers to your friend, remember that he enjoys papers on statistical machine learning, and generative modeling in natural language processing. Your friend also likes learning about surprising empirical results in language models, as well as clever statistical tricks. He does not want to read papers that are about primarily applications of methods to specific domains.