Defend your generative AI web applications and data
AWS Shield Advanced, AWS Firewall Manager, and AWS WAF and it’s AWS WAF Bot Control
Understanding Data and Model Lineage
Source citation - source - helps assess the reliability and trustworthiness of the output
Datasets
Databases
Other sources
Documenting data origins - place of origin - understanding the potential biases, limitations, or quality issues that might be present in the training data
Details about the data collection process
The methods used to curate and clean the data
Any preprocessing or transformations applied to the data
Tools and techniques
Data lineage
Cataloging
Model cards
Amazon SageMaker Model Cards
Provide guidance on how a model should be used.
Support audit activities with detailed descriptions of model training and performance.
Communicate how a model is intended to support business goals.
Data engineering lifecycle
Data engineering automation and access control
Pipeline automation is an important part of modern data-centric architecture design.
You can use AWS Glue workflows to create a pipeline.
Data collection
Amazon Kinesis, AWS Database Migration Service (DMS) and AWS Glue
Data preparation and cleaning
one of the most important, yet most time-consuming, stages of the data lifecycle.
for large workload that has a variety of data, use Amazon EMR or AWS Glue
Data quality checks
AWS Glue DataBrew, and AWS Glue Data Quality
Data visualization and analysis
Amazon QuickSight - to create graphs or charts.
Amazon Neptune - for graph database operations and visualization.
Infrastructure as code (IaC) deployment
AWS CloudFormation
Monitoring and debugging
Amazon CloudWatch
Best Practices for Secure Data Engineering
Assessing data quality
Implementing privacy-enhancing technologies
Data access control
Data integrity
AWS Privacy Reference Architecture - guidelines to assist in deign and implementation of privacy-supporting controls