AdvIRL: Reinforcement Learning-Based Adversarial Attacks on 3D NeRF Models

Tommy Nguyen, Mehmet Ergezer, Christian Green

Wentworth Institute of Technology

Abstract

The increasing deployment of AI models in critical applications has exposed them to significant risks from adversarial attacks. While adversarial vulnerabilities in 2D vision models have been extensively studied, the threat landscape for 3D generative models, such as Neural Radiance Fields (NeRF), remains underexplored. This work introduces AdvIRL, a novel framework for crafting adversarial NeRF models using Instant Neural Graphics Primitives (Instant-NGP) and Reinforcement Learning. Unlike prior methods, AdvIRL generates adversarial noise that remains robust under diverse 3D transformations, including rotations and scaling, enabling effective black-box attacks in real-world scenarios. Our approach is validated across a wide range of scenes, from small objects (e.g., bananas) to large environments (e.g., lighthouses). Notably, targeted attacks achieved high-confidence misclassifications, such as labeling a banana as a slug and a truck as a cannon, demonstrating the practical risks posed by adversarial NeRFs. Beyond attacking, AdvIRL-generated adversarial models can serve as adversarial training data to enhance the robustness of vision systems. The implementation of AdvIRL is publicly available at https://github.com/Tommy-Nguyen-cpu/AdvIRL/tree/MultiView-Clean?tab=readme-ov-file, ensuring reproducibility and facilitating future research.

Diagrams & Figures

Our pipeline comprises three key algorithms: an image segmentation algorithm, the Neural Radiance Field (NeRF) algorithm, and a reinforcement learning algorithm. The image segmentation algorithm occurs before the reinforcement learning algorithm, executing once to generate background-free images, as illustrated in the figure below.

Figure 1: AdvIRL begins by processing a set of input images to generate segmented images, denoted as X^segmented. These segmented images are then used to render the NeRF model, producing rendered images X. The initial parameters P₀ of the NeRF model are extracted, enabling AdvIRL to modify them as part of the pipeline. Using the initial observation space (X), AdvIRL generates an action A_i, which is concatenated with the parameters P_i at timestep i. The updated parameters are subsequently processed by Instant-NGP to generate multi-view shots of the object. These multi-view shots are then classified using our CLIP classifier, which outputs results that the environment uses to compute a reward R, as defined in the accompanying figure. This reward guides the agent in determining subsequent actions. The red arrow in the diagram illustrates the feedback loop, where parameters generated by AdvIRL are fed back to Instant-NGP to produce the adversarial 3D model.

A detailed diagram of our reinforcement learning pipeline is shown in the figure below.

Fig. 2 - For each timestep t, our agent passes an action to the environment, the environment uses the action to modify our NeRF parameters which are then fed to the NeRF model to produce multi-angle images of the object, and finally the images are fed to a classifier. The rewards are based on the confidence of the classifier being of a particular class. The figure uses a simple reward function that returns the probability that the image is not of the true class.

Results

Our reinforcement learning algorithm demonstrates outstanding performance, generating adversarial noise that remains robust across a range of angles and distances. Across all three scene types, the algorithm consistently produces noise targeted at specific classes, resulting in the majority of observed images being misclassified as the adversarial class. It's noteworthy that while the algorithm typically generates adversarial noise specifically for the object, there are instances where applying noise beyond the object leads to higher confidences in the target class. This is exemplified by the scene featuring the lighthouse with adversarial noise.

Agent Learning To Generate Noise For Christmas Tree Class

Citation

@misc{nguyen2024advirlreinforcementlearningbasedadversarial,
      title={AdvIRL: Reinforcement Learning-Based Adversarial Attacks on 3D NeRF Models}, 
      author={Tommy Nguyen and Mehmet Ergezer and Christian Green},
      year={2024},
      eprint={2412.16213},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.16213}, 
}

Acknowledgements

We would like to thank Dr. Micah Schuster and Dr. Antonio Furgiuele for reviewing our paper and providing feedback on our work. We also thank Joey Litalien for the framework for our AdvIRL website.

Truck & Lighthouse Scene ©Arno Knapitsch and Jaesik Park and Qian-Yi Zhou and Vladlen Koltun (CC BY 4.0)