Page 258 - Cyber Defense eMagazine August 2024

P. 258

2. Data Cleaning/Preparation: The process of cleaning the data before using it for the AI pipeline
(this is done by removing duplicates and excluding non-supported format, empty cells, or invalid
entries that can lead to technical issues)
3. Model Development: The process of building data models by training a large set of data and
analyzing certain patterns from the datasets and making predictions without additional human
intervention. An iterative MDD (model driven development) is generally followed here.
4. Model Serving: The process of deploying machine learning (ML) models into the AI pipeline and
integrating them into a business application. Mostly, these model functions, available as API, are
deployed at a scale and can be used to perform tasks or make predictions based on real-time or
batch data.
5. Model Monitoring: The process of assessing the performance and efficacy of the models against
the live data and tracking metrics related to model quality (e.g., latency, memory, uptime,
precision accuracy, etc.) along with data quality, model bias, prediction drift, and fairness.

While companies can use Gen AI solutions to expedite AI model development, it also poses enormous
risks [3] to critical proprietary and business data. Data integrity and confidentiality are crucial and
associated risk must be considered before approving new AI initiatives. The AI solutions can create a
serious malware risk and impact if the right practices aren't followed. Following different types of attacks
can compromise the integrity and reliability of the data models-

1. Data Pipeline attack - The entire pipeline of data collection to data training provides a large attack
surface that can be easily exploited to obtain access, modify data, or introduce malicious inputs
and cause privacy violations.
2. Data Poisoning attack - it involves inserting harmful or misleading data into the training datasets
to intentionally influence or manipulate the model operation. It can also be done by modifying the
existing dataset or deleting a portion of the dataset.
3. Model Control attack - Malware taking broader control of the decision-making process of the
model resulting in erroneous outputs and significant impact to life and loss. This primarily occurs
when externally accessible AI models are intentionally manipulated (like control of an automated
vehicle).
4. Model Evasion attack - This attack results in a real-time data manipulation assault like changing
the user inputs or device readings to modify the AI's responses or actions.
5. Model Inversion attack - It's a reverse engineering attack that can be used extensively to steal
proprietary AI training data or personal information by exploiting the model output. Ex The
inversion attack on a model predicting the cancer can be used to infer the person's medical
history.
6. Supply Chain attack - Attackers hack third party software components (ex - open source third
party libraries or assets) included in the model training, deployment or pipeline to insert malicious
code and control the model behavior. Ex - Due to 1600 HuggingFace API leaked tokens [4],
hackers were able to access 723 accounts of Organizations using HuggingFace API in their model
development supply chain.
7. Denial of Service attack (DoS) - This kind of attack overloads AI systems with numerous requests
or inputs, resulting in performance degradation or denial of service due to the resource downtime
or exhaustion. Though it doesn't result in the theft or loss of critical information, it can cost the

253 254 255 256 257 258 259 260 261 262 263