On May 28, 2025, at the invitation of Jianyu Niu, Research Assistant Professor of The Research Institute of Trustworthy Autonomous Systems (RITAS) at Southern University of Science and Technology, Postdoctoral researcher Xiang Zheng from City University of Hong Kong delivered an academic report on "Reinforcement Learning-Based Adversarial Evaluation and Defense Enhancement for Large Language Models (LLM) " in Room 443B of the South Tower of the College of Engineering.
Figure 1 XiangZheng presents an academic report
At present, LLM has been more deeply applied in customer service, law, healthcare, and other fields based on its ability of understanding, reasoning, programming, planning and decision-making. While enhancing performance, ensuring LLM security has emerged as a critical concern. Based on this background, Xiang Zheng focused on the perspective of security assessment and introduced the technical framework, lightweight tools and defense solutions of LLM security assessment to the teachers and students present.
Reinforcement Learning-Based Adversarial Evaluation and Defense Enhancement for LLM refers to evaluating the fault tolerance boundary of LLM under malicious input through adversarial testing, such as modifying prompt words and injecting noise, optimizing attack strategies using reinforcement learning algorithms to discover vulnerabilities, and designing defense mechanisms to enhance model robustness. In the report, Xiang Zheng introduced a series of recent related work, such as the curiosity-driven LLM black-box audit framework (CALM), the black-box defense mechanism using VLMs to resist jailbreak attacks (BlueSuffix), and the multi-dimensional systemic security evaluation that is closer to real scenarios (ROSE).
Figure 2 Technical Architecture of BlueSuffix
After the report, the teachers and students engaged in technical dialogues around the core topics of the report and combined with their own research directions. Xiang Zheng systematically responded to the questions raised by teachers and students from the dimensions of technical implementation, experimental verification and industry application. The report ended successfully in a lively academic dialogue.