Page 35 - index
P. 35
2 –Access
Enterprises today are more reliant on data than they have ever been. Decisions regarding
attacks, forensics, product performance and operational characteristics are strongly guided by
data. At the same time, data access and analytics are no longer limited to dedicated PhDs and
cyber security veterans. Now business and technical users across finance, marketing, supply
chain, engineering, project management, and sales operations want to analyze and improve
decision making based on data mining and analytics. To maximize this promise of Big Data,
businesses are making efforts to provide wide availability to data, and integrate and consolidate
information silos throughout the enterprise. New tools are also provided dramatically easier
business user access to data, in the form of sophisticated visualization programs, and
expansions to everyday business productivity tools such as Microsoft Excel that allow non-
developers to run queries and ask natural language queries of data. This expanded access and
data consolidation has invariably decentralized data usage and grown the risk management and
data governance requirements for IT. Enterprises now have the need to enforce greater control
over how this data is made available to curtail risks of misuse within this larger context of data
democratization and open access to enterprise wide data analytics.
3 –Process
Data processing in Hadoop is fundamentally different. Unlike data warehouse or database
systems that store data and load data into applications and queries, queries and applications
and now being brought to the data as computational processes are done at the data level or in
real-time data streams. Most significantly, with real-time streaming rather than the traditional
data model of storing, indexing and querying data, there is no data “at rest” per se as the data is
used, analyzed, re-purposed before it’s ever stored. The security impact of this shift is large.
The fundamental data encryption strategies around locking data “at rest” or “in flight” in a
Hadoop environment break down. The more accurate data protection question may become: do
you do encryption for data in-process?
Tackling Data 2.0 with Data-Centric Discovery and Protection
By now, we’ve hopefully drummed home just how different data in Hadoop is. It’s vast,
varied, and vague. It can be unpredictable, it defies order, and it’s very tough to govern. Not
surprisingly, data breaches abound in this Wild West-like environment. They abound
because traditional security tools have not adapted to the complexities of big data.
Traditional security measures are built on top of newer big data technologies. This approach
does not work because security must be implemented at a fundamental level in the design
of a data security tool. Firewalls can protect sensitive information from external malicious
access through IP and port checks without having any underlying knowledge of the network
structure. Once that structure is breached, an attacker can access or steal any data in the
Hadoop cluster. For example, an Access Control List (ACL) will not restrict user access if a
data block is not deemed to be sensitive. In the Hadoop data model, this data block may be
unstructured text or a stream of data that needs to be analyzed at the content level in order
to be classified as sensitive.
35 Cyber Warnings E-Magazine – June 2015 Edition
Copyright © Cyber Defense Magazine, All rights reserved worldwide