Page 212 - Cyber Defense eMagazine August 2024
P. 212
• Bias and fairness issues, when AI models perpetuate or even exacerbate existing biases, leading
to unfair or discriminatory outcomes and decisions.
• Toxicity, when models produce harmful or offensive content, which is particularly concerning in
customer-facing applications.
Evaluation and Risk Assessment
Comprehensive risk assessment solutions are deployed to mitigate AI and GenAI risks. These solutions
evaluate AI models on various fronts, identifying vulnerabilities and providing actionable insights to
improve security and trustworthiness. Key features of effective risk assessment include:
• Penetration Testing: systematic evaluation of AI models to uncover security weaknesses pre and
post deployment.
• Hallucination: detecting and assessing the likelihood AI models will generate false or misleading
information.
• Evaluating a model’s overall resilience.
• Privacy: assessing a model’s propensity to leak senJulysitive information.
• Content: detecting and mitigating the generation of toxic, offensive, harmful, unfair, unethical, or
discriminatory language.
• Bias and Fairness: identifying and addressing biases within a model to ensure fair and ethical
outcomes.
• Weak Spots: pinpointing specific vulnerabilities within AI applications.
Case Studies and Practical Applications: English to French Translation
When DeepKeep evaluated Meta’s LlamaV2 7B LLM, we identified significant weaknesses in its ability
to handle translation from English to French. The example below demonstrates the decline in
performance DeepKeep found when applying its transformations, resulting in an over 90% drop in
accuracy.
The table below showcases 5 test examples:
Original Prompt LlamaV2 7B’s Translation Correct Translation
It is the biggest acquisition C'est l'acquisition la plus C’est la plus grande acquisition
in eBay's history. importante de l'histoire d'eBay. de l’histoire d’eBay.
Cyber Defense eMagazine – August 2024 Edition 212
Copyright © 2024, Cyber Defense Magazine. All rights reserved worldwide.