Page 34 - index
P. 34







The New Security Rules Of Hadoop


Tackling Data 2.0
Jeremy Stieglitz, VP Products, Dataguise



The advent of Hadoop has introduced amazing new possibilities in data science for analytics,
audit, fraud detection, and real-time prevention for cybersecurity teams. Hadoop’s flexibility and
speed means more logging and event data can be processed in more ways. Yet with this
massive data influx, it can be difficult to account for all of an organization’s sensitive data.
Details such as network configurations, addresses, user names, passwords, location data can
represent leaks and breach risk if they are not secured. And with more unstructured or semi-
structured data muddying the identification and location of sensitive data, protection and
managing risk in Hadoop has gotten even more challenging. This article will highlight some of
the business and technology drivers for new security data management in Hadoop, and propose
sensitive data discovery technologies to drive data-centric protection to address these.

Data 2.0 Forces New Security Thinking

Hadoop is much more than just a very large database. With big data, enterprises now have
access to more data coming from a wide range of sources – like social networks – and are
intent on better leveraging the value of this disjointed information across multiple applications.
This endeavor has resulted in a major challenge in handling the volume, velocity and variety of
structured and unstructured data. In terms of actually handling the data itself, there are 3
primary differences between traditional data repositories and Hadoop: type, access, and
process.

1 – Type

Hadoop provides the ability to work with structured, semi-structured, and unstructured data.
Massive scalability and this huge array of new data types in Hadoop has created a situation in
which enterprises can no longer place guarantees on data being appropriately classified,
provenanced, cleaned, and trustworthy.


In particular, businesses are increasingly combining web clickstream and application logging
data (data historically too noisy or too voluminous to process in data warehouses) and
combined with relational data or customer profiling data traditionally kept in the data warehouse
to drive new business insights.

The presence of this “gray data” poses entirely new security challenges because the
classification and/or location of data which may be sensitive is not known. Unlike database and
data warehouse data models, where sensitive data can be catalogued and known in a more or
less static data model, no rigid structure or sensitive data identification exists in gray data.


34 Cyber Warnings E-Magazine – June 2015 Edition
Copyright © Cyber Defense Magazine, All rights reserved worldwide

   29   30   31   32   33   34   35   36   37   38   39