Posts Tagged ‘compliance’

How to secure a Hadoop data lake with EMC Isilon

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

Apache™ Hadoop®, open-source software for analyzing huge amounts of data, is a powerful tool for companies that want to analyze information for valuable insights.

Hadoop redefines how data is stored and processed. A key advantage of Hadoop is that it enables analytics on any type of data. Some organizations are beginning to build data lakes—essentially large repositories for unstructured data—on the Hadoop Distributed File System (HDFS) so they can easily store data collected from a variety of sources, and then run compute jobs on data in its original file format. There’s no need to load data into the HDFS for analysis, saving data scientists time and money. They can then survey their Hadoop data lake and discover big data intelligence to drive their business.

However, the Hadoop data lake also presents challenges for organizations that want to protect sensitive information stored in these data repositories. For example, organizations might need to follow internal enterprise security policies or external compliance regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) or the Sarbanes-Oxley Act (SOX). A Hadoop data lake is difficult to secure because HDFS was neither designed nor intended to be an enterprise-class file system. It is a complex, distributed file system of many client computers with a dual purpose: data storage and computational analysis. HDFS has many nodes, each of which presents a point of access to the entire system. Layers of security can be added to a Hadoop data lake, but managing each layer adds to complexity and overhead.

Best of both worlds

The EMC® Isilon® scale-out data lake offers the best of both worlds for organizations using Hadoop: enterprise-level security and easy implementation of Hadoop for data analytics.securing a hadoop data lake

The new white paper, Security and Compliance for Scale-Out Hadoop Data Lakes, describes how Hadoop data is stored on Isilon scale-out network-attached storage (NAS), and how the OneFS® operating system helps to secure that data.

An Isilon cluster separates data from compute clients in which the Isilon cluster becomes the HDFS file system. All data is stored on an Isilon cluster and secured by using access control lists, access zones, self-encrypting drives, and other security features. OneFS implements the server-side operations of HDFS as a native protocol. Therefore, Hadoop clients access data on the cluster through HDFS and standard protocols such as SMB and NFS.

For more information about how Hadoop is implemented on an Isilon cluster, see EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics.

Isilon security capabilities

OneFS can facilitate your efforts to comply with regulations such as HIPAA, SOC, SEC 17a-4, the Federal Information Security Management Act (FISMA), and the Payment Card Industry Data Security Standard (PCI DSS). The table below summarizes some of the challenges of securing a Hadoop data lake, and how the capabilities of an Isilon cluster can help to address these issues. For full descriptions of these capabilities, see Security and Compliance for Scale-Out Hadoop Data Lakes.

 Hadoop data lakes: security challenges and Isilon capabilities

Security challenges Isilon capabilities Description
A Hadoop data lake can contain sensitive data—intellectual property, confidential customer information, and company records. Any client connected to the data lake can access or alter this sensitive data.
  • Compliance mode and write-once, read-many (WORM) storage
  • Auditing
The SEC 17a-4 regulation requires that data is protected from malicious, accidental, or premature alteration. Isilon SmartLock™ is a OneFS feature that locks down directories through WORM storage. Use compliance mode only for scenarios where you need to comply with SEC 17a-4 regulations. In addition, auditing can help detect fraud, unauthorized access attempts, or other threats to security.
ACL policies help to ensure compliance. However, clients may be connecting to the Hadoop cluster by using different protocols, such as NFS or HTTP.
  • Authentication and cross-protocol permissions
OneFS authenticates users and groups connecting to the cluster through different protocols by using POSIX mode bits, NTFS, and ACL policies. By managing ACL policies in OneFS, you can address compliance requirements for environments that mix NFS, SMB, and HDFS.
Applying restricted access to directories and files in HDFS requires adding layers to your file system.
  • Role-based access control for system administration (RBAC)
  • Identity management
  • User mapping
  • Access zones
The PCI DSS Requirement 7.1.2 specifies that access must be restricted to privileged user IDs. RBAC, a OneFS feature, lets you manage administrative access by role, and assign privileges to a role. You can associate one user with one ID through identity management and user mapping, and then assign that ID to a role. In OneFS, access zones are a virtual security context in which OneFS connects to directory services, authenticates users, and controls access to a segment of the file system.
FISMA and HIPAA and other compliance regulations might require protection for data at rest. Encryption of data at rest Isilon self-encrypting drives are FIPS 140-2 Level 3 validated. The drives automatically apply AES-256 encryption to all data stored in the drives without requiring additional equipment. You can enable a WORM state on directories for data at rest.

To learn how to implement Hadoop on your Isilon cluster, see 7 best practices for setting up Hadoop on an EMC Isilon cluster.

Start a conversation about Isilon content

Have a question or feedback about Isilon content? Visit the online EMC Isilon Community to start a discussion. If you have questions or feedback about this blog, contact isi.knowledge@emc.com. To provide documentation feedback or request new content, contact isicontent@emc.com.

 

[display_rating_result]

EMC Isilon SmartLock: An overview and demonstration

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

EMC Isilon supports multiple data protection strategies for your cluster environment. Last month, we featured a video about EMC Isilon SyncIQ™, which uses snapshot technology to replicate changed blocks of data. If you want to apply Write Once Read Many (WORM) data protection to files on your cluster, consider EMC Isilon SmartLock™. SmartLock is Isilon’s licensed software-based approach to WORM data protection and retention.

We have two videos to help you become more familiar with SmartLock. The first video, “Data Protection with EMC Isilon SmartLock,” provides an overview about SmartLock features and functionality. The second video, “Technical Demo: EMC Isilon SmartLock,” provides a demonstration of several procedures in the OneFS command-line interface, such as creating SmartLock directories and setting a default retention period.

SmartLock overview

When you license SmartLock for your cluster, you can have a mix of files with normal protection and files with WORM (or SmartLock) protection. With SmartLock, you designate SmartLock directories and select files in those directories to commit to SmartLock status. During the commit process, files are assigned a retention period: a period. During this retention period, files cannot be modified or deleted.

SmartLock operates in either Enterprise or Compliance mode. Compliance mode enables you to meet specific regulatory compliance requirements, such as SEC 17a-4 requirements. If you run SmartLock in Compliance mode, the entire cluster operates in Compliance mode and cannot be reverted to Enterprise mode.

In the following video, EMC Isilon Solutions Architect Russ Stevenson covers these concepts in more detail and answers the following frequently asked questions:

  • What is SmartLock?
  • What are the options for setting up and configuring SmartLock?
  • How does the privileged delete function work?
  • How do I commit a file to SmartLock status?
  • What is SmartLock compliance mode?

SmartLock technical demo

If you’re already familiar with SmartLock and want to learn more about some common commands, the following video offers demonstrations of these procedures:

  • Initial configuration
  • Default retention period
  • File commitment and properties
  • Explicit retention period
  • Retention date override
  • Privileged delete

For questions and feedback about this blog or videos, email isi.knowledge@emc.com. To provide documentation feedback or request new content, email isicontent@emc.com.