Page 258 - Cyber Defense eMagazine August 2024
P. 258

2.  Data Cleaning/Preparation:  The process  of cleaning the data before  using it for the AI pipeline
                   (this is done by removing duplicates and excluding non-supported  format, empty cells, or invalid
                   entries that can lead to technical issues)
               3.  Model  Development:  The  process  of  building  data  models  by  training  a large  set  of  data  and
                   analyzing  certain  patterns  from  the  datasets  and  making  predictions  without  additional  human
                   intervention. An iterative MDD (model driven development) is generally followed here.
               4.  Model Serving: The process of deploying machine learning (ML) models into the AI pipeline and
                   integrating them into a business application. Mostly, these model functions, available as API, are
                   deployed at a scale and can be used to perform tasks or make predictions based on real-time or
                   batch data.
               5.  Model Monitoring: The process of assessing the performance and efficacy of the models against
                   the  live  data  and  tracking  metrics  related  to  model  quality  (e.g.,  latency,  memory,  uptime,
                   precision accuracy, etc.) along with data quality, model bias, prediction drift, and fairness.

            While companies can use Gen AI solutions to expedite  AI model development,  it also poses enormous
            risks  [3]  to  critical  proprietary  and  business  data.  Data  integrity  and  confidentiality  are  crucial  and
            associated  risk must be considered  before approving  new AI initiatives.  The AI solutions  can create a
            serious malware risk and impact if the right practices aren't followed. Following different types of attacks
            can compromise the integrity and reliability of the data models-

               1.  Data Pipeline attack - The entire pipeline of data collection to data training provides a large attack
                   surface that can be easily exploited to obtain access, modify data, or introduce malicious inputs
                   and cause privacy violations.
               2.  Data Poisoning attack - it involves inserting harmful or misleading data into the training datasets
                   to intentionally influence or manipulate the model operation. It can also be done by modifying the
                   existing dataset or deleting a portion of the dataset.
               3.  Model  Control  attack  -  Malware  taking  broader  control  of  the  decision-making  process  of  the
                   model resulting in erroneous outputs and significant impact to life and loss. This primarily occurs
                   when externally accessible AI models are intentionally manipulated (like control of an automated
                   vehicle).
               4.  Model Evasion attack - This attack results in a real-time data manipulation  assault like changing
                   the user inputs or device readings to modify the AI's responses or actions.
               5.  Model Inversion  attack  - It's a reverse  engineering  attack that can be used extensively  to steal
                   proprietary  AI  training  data  or  personal  information  by  exploiting  the  model  output.  Ex  The
                   inversion  attack  on  a  model  predicting  the  cancer  can  be  used  to  infer  the  person's  medical
                   history.
               6.  Supply  Chain  attack  - Attackers  hack  third party  software  components  (ex - open  source  third
                   party libraries or assets) included in the model training, deployment or pipeline to insert malicious
                   code  and  control  the  model  behavior.  Ex  -  Due  to  1600  HuggingFace  API  leaked  tokens  [4],
                   hackers were able to access 723 accounts of Organizations using HuggingFace API in their model
                   development supply chain.
               7.  Denial of Service attack (DoS) - This kind of attack overloads AI systems with numerous requests
                   or inputs, resulting in performance degradation or denial of service due to the resource downtime
                   or exhaustion.  Though  it doesn't  result in the theft or loss of critical information,  it can cost the






            Cyber Defense eMagazine – August 2024 Edition                                                                                                                                                                                                          258
            Copyright © 2024, Cyber Defense Magazine. All rights reserved worldwide.
   253   254   255   256   257   258   259   260   261   262   263