Organizations Are Urged by the NSA and CISA to Protect Data Used in AI Models

A new artificial intelligence (AI) data security guidance encourages organizations to secure the data used by AI applications — an important consideration as AI use skyrockets while security lags. The guidance was issued jointly by the National Security Agency (NSA) and its Artificial Intelligence Security Center (AISC), the Cybersecurity and Infrastructure Security Agency (CISA), and the FBI.

While AI-powered tools and machine learning systems, like large language models (LLMs), help automate processes and enable organizations to work faster, security is often an afterthought. Not only is security lacking, but many organizations underestimate the risks — which are plentiful — as well.

To that end, the federal agencies are working with international partners to document best practices on how to secure sensitive and critical data that is used to train and operate AI systems. The guidance aims to address three main areas of AI data security risks: supply chain, maliciously modified or “positioned” data, and data drift. Left unaddressed, these risk areas could allow attackers to manipulate data and degrade model performance. Organizations using AI in their operations must employ security measures throughout the life cycle, from AI development through deployment.

10 Best Practices to Follow

Organizations are urged to track data provenance, verify and maintain data integrity during storage and transport, protect the data and store it securely, and conduct ongoing data security risk assessments. The list is intended to help system owners “better protect the data used to build and operate their AI-based systems, whether running on premises or in the cloud.”

Source reliable data and track data provenance: Verify that data sources in use rely on trusted, reliable, and accurate data for training and operating AI systems. Implement provenance tracking and incorporate a secure provenance database to help identify sources of maliciously modified data and ensure the data has not been manipulated.
Verify and maintain data integrity during storage and transport: Data integrity is an essential component to preserve data trustworthiness.
Employ digital signatures to authenticate data revisions: Digital signatures prevent tampering, and original versions of the data should be cryptographically signed. Organizations should adopt quantum-resistant digital signature standards.
Leverage trusted infrastructure: Create a trusted computing environment based on the zero-trust architecture to isolate sensitive operations and keep data in secure enclaves. Trusted environments are essential to secure AI applications on-premises and in the cloud.
Classify data and use access controls: Categorize data using a classification system based on sensitivity and required protection measures. The output of AI systems should be classified at the same level as the input data, rather than creating a separate set of guardrails.
Encrypt data: This includes securing data at rest, in transit, and during processing. AES-256 encryption is the de facto industry standard and is considered resistant to quantum computing threats. Use protocols, such as TLS with AES-256 or post-quantum encryption, for data in transit.
Store data securely: Store data in certified storage devices that enforce NIST FIPS 140-3 compliance, ensuring that the cryptographic modules used to encrypt the data provide high-level security against advanced intrusion attempts.
Consider privacy-preserving techniques: Data depersonalization techniques, such as data masking, involve replacing sensitive data with other pieces of information so that the data can be used without exposing the sensitive details. When possible, use data masking to facilitate AI model training and development. Differential privacy is a framework that provides a mathematical guarantee quantifying the level of privacy of a dataset or query. Decentralized learning techniques, such as federated learning, permit AI system training over multiple local datasets, with limited sharing of data among local instances.
Delete data securely: Erase data using a secure deletion method, such as cryptographic erase, block erase, or data overwrite, before repurposing or decommissioning any systems.
Conduct ongoing data security risk assessments: Continuously improve data security measures to keep pace with evolving threats and vulnerabilities, learn from security incidents, stay up to date with emerging technologies, and maintain a robust security posture. Industry-standard frameworks include the NIST SP 800-3r2, Risk Management Framework (RMF), and the NIST AI 100-1, Artificial Intelligence RMF.

Can Organizations Implement the Guidance?

AI poses many risks when introduced into the supply chain. Organizations feed copious amounts of data to train AI models, but that can leave room for error, introduce vulnerabilities, and lead to data leaks. Therefore, organizations must understand the risks and how third parties use the data. There are best practices around digital signatures, data integrity, and data provenance to help address that.

While it’s encouraging that the NSA released guidance to bolster data security, which is foundational to building trustworthy AI systems, implementing the practices at scale can be costly and resource-intensive for organizations, says Dr. Margaret Cunningham, director of security and AI strategy at Darktrace.

“Without stronger incentives or accountability, some organizations may be hesitant to make the necessary investments,” she says. “As the industry matures, it will be important to align incentives and expectations to ensure responsible practices are not just recommended but realistically adopted.”

On the other hand, most of the NSA and CISA guidance is achievable today with modest investment and focus, says Kinnaird McQuade, chief security architect at BeyondTrust. Other elements, like quantum-safe cryptography, are more aspirational and likely to land in future budget cycles, he adds.

“The bigger blind spot for most CISOs is LLM-specific risk — particularly data poisoning that can twist models long before anyone notices,” McQuade says. “Controlling privileged access to training data, enforcing least privilege for both human and nonhuman identities, and continuously monitoring for anomalous behavior are all practical, achievable steps organizations can take today.”