Outline

Ingegneria Sismica

Ingegneria Sismica

RC-CoSA: Controllable secure alignment architecture for large language models based on risk-constrained inference search

Author(s): Linghao Meng1
1School of Electrical Automation and Information Engineering, Tianjin University, Tianjin, China, 300072
Meng, Linghao . “RC-CoSA: Controllable secure alignment architecture for large language models based on risk-constrained inference search.” Ingegneria Sismica Volume 43 Issue 3: 1-20, doi:10.65102/is20261283.

Abstract

The current secure alignment of large language models (LLMs) generally adopts a static paradigm, that is, training a single model through predefined general principles. However, this approach lacks flexibility in the face of diverse security needs in different cultural backgrounds, geographical norms, and specific application scenarios. At the same time, re-aligning models for each segment requirement will bring high computing costs and engineering overhead. To this end, we propose a risk-constrained controllable safety alignment architecture (RC-CoSA), which aims to adapt the model to diverse and intertwined safety requirements in the inference stage without updating the underlying model parameters. Compared with existing methods that rely on single-sample autoregressive generation, RC-CoSA improves the robustness and controllability of response generation under complex security configurations through compliance-first best-of-N candidate screening, structured security completion for partial-compliance scenarios, and decoupling multi-stage reasoning-evaluation process. The experimental results show that the actual benefits of RC-CoSA have a certain base dependence: on the DeepSeek base, the proposed method significantly reduces the Helpful + Unsafe ratio from 11.0% to 0.5%, and increases the CoSA-Score to 0.596, and improves the overall information validity. On the GPT-4o base, RC-CoSAlign also increased the CoSA-Score from 0.288 to 0.349 and the Helpful + Safe from 50.8% to 61.9%, but its compression of the risk of violations is relatively limited. On the Llama3.1-8B-INST base, although the inference period enhancement can improve the comprehensive control performance, its inhibition stability against the risk of violation is still affected by the characteristics of the base model. The above results show that RC-CoSA, as an inference-period execution control framework, can effectively improve the comprehensive controllability of the model under complex security configurations, but its benefit intensity is still affected by the original security boundary, generation distribution and instruction compliance ability of the base model.

Keywords
Large Language Models, Risk-Constrained Controllable Safety Alignment, Best-of-N Optimization, Inference-Time Adaptation

Related Articles

Liqin Zheng1, Dongrui Qing2, Yan Zhang1
1School of Mathematics and Statistics, Shaan Xi Xue Qian Normal University Xi’an 710100, P.R.China
2School of Marxism, Xi’an University of Finance and Economics Xi’an 710100, P.R.China
Yanan Gao1, Aiqun Peng2, Nina Ma2
1Management School of Anhui Business and Technology College Hefei 230000, Anhui, China
2Economics and Trade School of Anhui Business and Technology College Hefei 230000, Anhui, China
Ya’ning Liu1, Ping Ma1
1School of Teacher Education, Shihezi University, Shihezi, Xinjiang, 832000, China
Yuhui Li1, Zhongliang Gong1
1College of Mechanical and Intelligent Manufacturing, Central South University of Forestry and Technology, Changsha, Hunan, 410004, China
Hanqing Hu1, Chengjin Liu1, Tianmu Tian1
1School of Management Science and Engineering, Beijing Information Science & Technology University, Beijing 100192