Collections
Discover the best community collections!
Collections including paper arxiv:2507.11097
-
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper • 2501.18492 • Published • 88 -
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Paper • 2412.19512 • Published • 9 -
Course-Correction: Safety Alignment Using Synthetic Preferences
Paper • 2407.16637 • Published • 26 -
Refusal in Language Models Is Mediated by a Single Direction
Paper • 2406.11717 • Published • 4
-
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 77 -
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 28 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 121 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31
-
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Paper • 2502.05163 • Published • 23 -
CRANE: Reasoning with constrained LLM generation
Paper • 2502.09061 • Published • 21 -
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models
Paper • 2502.15799 • Published • 7 -
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
Paper • 2502.16776 • Published • 6
-
Human-like Episodic Memory for Infinite Context LLMs
Paper • 2407.09450 • Published • 62 -
MUSCLE: A Model Update Strategy for Compatible LLM Evolution
Paper • 2407.09435 • Published • 23 -
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Paper • 2407.09121 • Published • 6 -
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Paper • 2407.14482 • Published • 26
-
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Paper • 2502.05163 • Published • 23 -
CRANE: Reasoning with constrained LLM generation
Paper • 2502.09061 • Published • 21 -
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models
Paper • 2502.15799 • Published • 7 -
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
Paper • 2502.16776 • Published • 6
-
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper • 2501.18492 • Published • 88 -
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Paper • 2412.19512 • Published • 9 -
Course-Correction: Safety Alignment Using Synthetic Preferences
Paper • 2407.16637 • Published • 26 -
Refusal in Language Models Is Mediated by a Single Direction
Paper • 2406.11717 • Published • 4
-
Human-like Episodic Memory for Infinite Context LLMs
Paper • 2407.09450 • Published • 62 -
MUSCLE: A Model Update Strategy for Compatible LLM Evolution
Paper • 2407.09435 • Published • 23 -
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Paper • 2407.09121 • Published • 6 -
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Paper • 2407.14482 • Published • 26
-
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 77 -
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 28 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 121 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 31