Posts Tagged ‘data protection’

EMC Isilon InsightIQ 3.1 is now available!

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

The latest release of EMC® Isilon® InsightIQ includes new and enhanced reports that help you become a rock star at managing space on your cluster.

New file system reports

The following new reports are available to help you manage cluster capacity, deduplication, and quotas in OneFS. For useful tips about these reports, refer to the InsightIQ 3.1 User Guide.

Usable capacity reporting
Do you often wonder how much free space is on your cluster when accounting for the space that is being used to protect your data? The usable capacity report is an excellent resource that helps you prevent your cluster from reaching capacity. The report anticipates how much protection overhead you might need in addition to capacity that is already reserved for snapshots and virtual hot spares. Essentially, the report breaks down an estimate of how much capacity can be used for storing data and how much capacity can be reserved for protecting your data. Keep in mind that this report only provides estimates.

Usable capacity report in InsightIQ 3.1

Deduplication reporting

Running a deduplication job in OneFS by using SmartDedupe® software module creates free space on your cluster. In OneFS, you can assess the amount of disk space you’ll save before you start a deduplication job. You can also do this in InsightIQ 3.1. However, InsightIQ also lets you view historical and current information about how much space is saved by deduplication over a specific range of time.

Two sections from the deduplication report in InsightIQ 3.1. Historical deduplication job information is not cumulative.

Two sections from the deduplication report in InsightIQ 3.1. Deduplication job information is not cumulative.

Quota reporting
The quota report enables you to simplify quota management in OneFS. This report displays information about quotas created through the SmartQuotas® software module. You can view quotas that are assigned to specific directories, the limits defined by those quotas, and the amount of data stored in the directories that those quotas are applied to. This information can help you compare the data usage of a directory to the quota limits over time, and predict when a directory is likely to reach its quota limit.

A section from the quota report in InsightIQ 3.1. Historical data is generated by quota reports in OneFS.

Two sections from the quota report in InsightIQ 3.1. Click on a directory to view quota limit usage over time. Historical data is generated by quota reports in OneFS.

Enhanced reporting

InsightIQ 3.1 includes enhancements to cache reporting and exporting capabilities on file system analytics reports. For example, you can now view information about L3 cache usage in performance reports and download the data from file system analytic reports to CSV files.

Upgrading or installing InsightIQ 3.1

If you want to install this release, review the InsightIQ 3.1 Installation Guide for requirements and procedures. If you want to upgrade to this release, first explore your upgrade options covered in the Isilon Supportability and Compatibility Guide, and then perform the procedure provided in the InsightIQ 3.1 Installation Guide.

For more information about all the features, fixes, and changes in functionality in this release, refer to the InsightIQ 3.1 Release Notes. For information about using InsightIQ to monitor your cluster, refer to the InsightIQ 3.1 User Guide.

Start a conversation about Isilon content

Have a question or feedback about Isilon content? Visit the online EMC Isilon Community to start a discussion. If you have questions or feedback about this blog, contact us at isi.knowledge@emc.com. To provide documentation feedback or request new content, contact isicontent@emc.com.

[display_rating_result]

How to secure a Hadoop data lake with EMC Isilon

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

Apache™ Hadoop®, open-source software for analyzing huge amounts of data, is a powerful tool for companies that want to analyze information for valuable insights.

Hadoop redefines how data is stored and processed. A key advantage of Hadoop is that it enables analytics on any type of data. Some organizations are beginning to build data lakes—essentially large repositories for unstructured data—on the Hadoop Distributed File System (HDFS) so they can easily store data collected from a variety of sources, and then run compute jobs on data in its original file format. There’s no need to load data into the HDFS for analysis, saving data scientists time and money. They can then survey their Hadoop data lake and discover big data intelligence to drive their business.

However, the Hadoop data lake also presents challenges for organizations that want to protect sensitive information stored in these data repositories. For example, organizations might need to follow internal enterprise security policies or external compliance regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) or the Sarbanes-Oxley Act (SOX). A Hadoop data lake is difficult to secure because HDFS was neither designed nor intended to be an enterprise-class file system. It is a complex, distributed file system of many client computers with a dual purpose: data storage and computational analysis. HDFS has many nodes, each of which presents a point of access to the entire system. Layers of security can be added to a Hadoop data lake, but managing each layer adds to complexity and overhead.

Best of both worlds

The EMC® Isilon® scale-out data lake offers the best of both worlds for organizations using Hadoop: enterprise-level security and easy implementation of Hadoop for data analytics.securing a hadoop data lake

The new white paper, Security and Compliance for Scale-Out Hadoop Data Lakes, describes how Hadoop data is stored on Isilon scale-out network-attached storage (NAS), and how the OneFS® operating system helps to secure that data.

An Isilon cluster separates data from compute clients in which the Isilon cluster becomes the HDFS file system. All data is stored on an Isilon cluster and secured by using access control lists, access zones, self-encrypting drives, and other security features. OneFS implements the server-side operations of HDFS as a native protocol. Therefore, Hadoop clients access data on the cluster through HDFS and standard protocols such as SMB and NFS.

For more information about how Hadoop is implemented on an Isilon cluster, see EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics.

Isilon security capabilities

OneFS can facilitate your efforts to comply with regulations such as HIPAA, SOC, SEC 17a-4, the Federal Information Security Management Act (FISMA), and the Payment Card Industry Data Security Standard (PCI DSS). The table below summarizes some of the challenges of securing a Hadoop data lake, and how the capabilities of an Isilon cluster can help to address these issues. For full descriptions of these capabilities, see Security and Compliance for Scale-Out Hadoop Data Lakes.

 Hadoop data lakes: security challenges and Isilon capabilities

Security challenges Isilon capabilities Description
A Hadoop data lake can contain sensitive data—intellectual property, confidential customer information, and company records. Any client connected to the data lake can access or alter this sensitive data.
  • Compliance mode and write-once, read-many (WORM) storage
  • Auditing
The SEC 17a-4 regulation requires that data is protected from malicious, accidental, or premature alteration. Isilon SmartLock™ is a OneFS feature that locks down directories through WORM storage. Use compliance mode only for scenarios where you need to comply with SEC 17a-4 regulations. In addition, auditing can help detect fraud, unauthorized access attempts, or other threats to security.
ACL policies help to ensure compliance. However, clients may be connecting to the Hadoop cluster by using different protocols, such as NFS or HTTP.
  • Authentication and cross-protocol permissions
OneFS authenticates users and groups connecting to the cluster through different protocols by using POSIX mode bits, NTFS, and ACL policies. By managing ACL policies in OneFS, you can address compliance requirements for environments that mix NFS, SMB, and HDFS.
Applying restricted access to directories and files in HDFS requires adding layers to your file system.
  • Role-based access control for system administration (RBAC)
  • Identity management
  • User mapping
  • Access zones
The PCI DSS Requirement 7.1.2 specifies that access must be restricted to privileged user IDs. RBAC, a OneFS feature, lets you manage administrative access by role, and assign privileges to a role. You can associate one user with one ID through identity management and user mapping, and then assign that ID to a role. In OneFS, access zones are a virtual security context in which OneFS connects to directory services, authenticates users, and controls access to a segment of the file system.
FISMA and HIPAA and other compliance regulations might require protection for data at rest. Encryption of data at rest Isilon self-encrypting drives are FIPS 140-2 Level 3 validated. The drives automatically apply AES-256 encryption to all data stored in the drives without requiring additional equipment. You can enable a WORM state on directories for data at rest.

To learn how to implement Hadoop on your Isilon cluster, see 7 best practices for setting up Hadoop on an EMC Isilon cluster.

Start a conversation about Isilon content

Have a question or feedback about Isilon content? Visit the online EMC Isilon Community to start a discussion. If you have questions or feedback about this blog, contact isi.knowledge@emc.com. To provide documentation feedback or request new content, contact isicontent@emc.com.

 

[display_rating_result]

EMC Isilon SnapshotIQ: An overview and best practices

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

If you want to capture a moment in time with a camera, you snap a picture. When you want to capture the data on your cluster at a moment in time with the EMC® Isilon® OneFS® operating system, you take a snapshot.

EMC Isilon SnapshotIQ™ is a licensed software module that lets you create new snapshots and manage snapshot schedules. In this blog post, you’ll learn about SnapshotIQ basics and best practices.

SnapshotIQ overview

A snapshot is taken at a directory-level. The snapshot maintains an image of data that existed in that directory at that moment when the snapshot was created, even if the data changes. Taking a snapshot is an instantaneous operation. Rather than create a redundant copy of the data blocks, snapshots use pointers to reference current blocks on the cluster. Because of this, snapshots do not consume additional disk space unless the data referenced by the snapshot is modified. If the files are modified, the snapshot stores read-only copies of the original blocks.

Image provided by Patrick Kreuch

Image provided by Patrick Kreuch

Snapshots are the foundation for data protection strategies in OneFS. Snapshots are also used by the EMC Isilon SyncIQ™ software module to replicate a consistent point-in-time image of a directory from one cluster to another.

Watch the following video, “Data Protection and Disaster Recovery with Isilon SnapshotIQ,” to learn how to manage snapshots with the SnapshotIQ module. EMC Isilon Senior Solutions Architect, Chris Klosterman, answers the following frequently asked questions:

  • What is a snapshot and why do I need it?
  • How does SnapshotIQ work?
  • Where does OneFS store snapshots?
  • What are some example schedules?
  • How is data restored?
  • When do OneFS snapshots expire and how is the snapshot space reclaimed?
  • Can I modify data in a snapshot?

SnapshotIQ best practices

You may find that working with a large number of snapshots can become challenging to manage. Consider the following best practices to improve snapshot management and avoid cluster performance degradation.

  • Do not create more than 1,000 snapshots of a single directory.
  • Consider the depth of the directory path when creating snapshots. If the path is too high on the directory tree, it will cost more cluster resources to modify data referenced by the snapshot. If the path is too deep, you may need to create more snapshot schedules, which can be difficult to manage.
  • Create an alias name for your snapshot schedules in the OneFS web administration interface. Use the alias name to help you look up the most recent snapshot generated from a schedule.
  • Do not disable the snapshot delete job in the OneFS Job Engine.

For additional best practices and details about SnapshotIQ, see the “Snapshots” section in the OneFS Administration Guide.

If you have questions or feedback about this blog or these videos, email isi.knowledge@emc.com. To provide documentation feedback or request new content, email isicontent@emc.com.

EMC Isilon SmartLock: An overview and demonstration

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

EMC Isilon supports multiple data protection strategies for your cluster environment. Last month, we featured a video about EMC Isilon SyncIQ™, which uses snapshot technology to replicate changed blocks of data. If you want to apply Write Once Read Many (WORM) data protection to files on your cluster, consider EMC Isilon SmartLock™. SmartLock is Isilon’s licensed software-based approach to WORM data protection and retention.

We have two videos to help you become more familiar with SmartLock. The first video, “Data Protection with EMC Isilon SmartLock,” provides an overview about SmartLock features and functionality. The second video, “Technical Demo: EMC Isilon SmartLock,” provides a demonstration of several procedures in the OneFS command-line interface, such as creating SmartLock directories and setting a default retention period.

SmartLock overview

When you license SmartLock for your cluster, you can have a mix of files with normal protection and files with WORM (or SmartLock) protection. With SmartLock, you designate SmartLock directories and select files in those directories to commit to SmartLock status. During the commit process, files are assigned a retention period: a period. During this retention period, files cannot be modified or deleted.

SmartLock operates in either Enterprise or Compliance mode. Compliance mode enables you to meet specific regulatory compliance requirements, such as SEC 17a-4 requirements. If you run SmartLock in Compliance mode, the entire cluster operates in Compliance mode and cannot be reverted to Enterprise mode.

In the following video, EMC Isilon Solutions Architect Russ Stevenson covers these concepts in more detail and answers the following frequently asked questions:

  • What is SmartLock?
  • What are the options for setting up and configuring SmartLock?
  • How does the privileged delete function work?
  • How do I commit a file to SmartLock status?
  • What is SmartLock compliance mode?

SmartLock technical demo

If you’re already familiar with SmartLock and want to learn more about some common commands, the following video offers demonstrations of these procedures:

  • Initial configuration
  • Default retention period
  • File commitment and properties
  • Explicit retention period
  • Retention date override
  • Privileged delete

For questions and feedback about this blog or videos, email isi.knowledge@emc.com. To provide documentation feedback or request new content, email isicontent@emc.com.