Archive for March, 2014

Pick your protocol: Multiprotocol file access in EMC Isilon OneFS

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

You’re rushing to meet a project deadline, and you need to update some related files that are stored on an EMC® Isilon® cluster. You’re working on a Linux computer, and you’re connected to the cluster over a Network File System (NFS) protocol. You need to access files in a directory that your coworker, who uses a Windows computer, created when they were connected to the same cluster over a Server Message Block (SMB) protocol. Thanks to the Isilon OneFS® operating system, you can seamlessly access your coworker’s files even though you are doing so through a very different protocol.

Multiple protocol support is a necessity in today’s IT organizations, which comprise a mix of Windows and UNIX/Linux operating environments. OneFS is designed to provide users with unified access to data on an Isilon cluster using a mix of common protocols, such as SMB, NFS, HTTP, and Hadoop Distributed File System (HDFS). For a full list of supported protocols, see the OneFS administration guides or “EMC Isilon Multiprotocol Data Access with a Unified Security Model”.

So how does OneFS support a multiprotocol environment? What are the steps a system administrator needs to take to set up multiprotocol access in OneFS?

We have two videos that cover the basics and provide recommendations for setting up multiprotocol access in OneFS. The first video, “File Access Basics in an Isilon OneFS Multi-Protocol Environments,” provides a whiteboard overview of this topic. The second video, “Technical Demo: Multi-Protocol File Access Using EMC Isilon OneFS,” provide a demonstration of common multiprotocol commands and tasks.

File access basics and AIMA in OneFS

Supporting a mix of protocols requires supporting a mix of user identities and file permissions. This requirement can leave system administrators with several considerations when configuring OneFS.

Before discussing how OneFS handles multiprotocol file access, let’s first review how two operating environments, Windows and UNIX/Linux, authorize access to files. In a Windows environment, users are identified based on unique security identifiers (SIDs). Files or directories are secured through an Access Control List (ACL). In an UNIX environment, users and groups are identified through user identifiers (UIDs) and group identifiers (GIDs), respectively. Files are secured using POSIX mode bits.

OneFS uses Authentication, Identity Management, and Authorization (AIMA) to assign the right permissions and identifiers to users (and groups) no matter which protocols they use to connect to the cluster. To securely support NFS and SMB clients, OneFS does three things:

  • Connects to directory services, such as Microsoft Active Directory (AD) and Lightweight Directory Access Protocol (LDAP), which provides a security database of user and group accounts along with their information
  • Authenticates users and groups
  • Controls access to directories and files

When a user connects to an Isilon cluster, OneFS scans Active Directory and LDAP for the user’s identifiers. Once the user is authenticated, OneFS creates an access token for the user. OneFS then maps the user’s account (known as “user mapping” in OneFS) in one directory service to another. This single access token is the key to authorizing the user so they can access files that are stored and created on the cluster using different protocols.

For example, if a user, Mike, accesses a file share through SMB, OneFS will scan Active Directory and find an SID for him. If OneFS does not find any UIDs or GIDs associated with Mike via LDAP, OneFS will generate a UID and GID for him and save them to Mike’s access token, so he can access files created by NFS users.

The same type of mapping occurs for file permissions. If a file was created through SMB, it will be assigned an ACL to control who can access the file. OneFS will create equivalent POSIX mode bits for this file. File permissions can be saved to the Isilon cluster on disk in one of three modes: native, UNIX, or SID. For more information about each mode, and about AIMA and user mapping, read the “Identities, Access Tokens, and the Isilon OneFS User Mapping Service” white paper.

This is a brief summary of how multiprotocol file access works in OneFS. Watch the following video, “File Access Basics in an Isilon OneFS Multi-Protocol Environments,” for more information and recommendations for configuring multiprotocol access in OneFS. In this video, Principal Solutions Architect Amol Choukekar answers the following frequently asked questions:

  • What are multiprotocol basics?
  • How do Window and UNIX clients differ when they access files on OneFS?
  • How does OneFS handle user and group identities?
  • How does OneFS store file permissions in a multiprotocol environment?
  • How do clients access files that were created using a different protocol?
  • How does OneFS manage file permissions?
  • What if user names are not similar across authentication providers?

How to configure multiprotocol support in OneFS

You can manage user identity mapping and file permissions using the OneFS command-line interface and OneFS web administration interface. Watch the following video, “Technical Demo: Multi-Protocol File Access Using EMC Isilon OneFS” for demonstrations of the following tasks:

  • Review configured authentication providers
  • Review an access token for a user
  • Review existing identity mappings stored on the cluster
  • Delete existing identity mappings
  • Review ACL policies on the cluster
  • Create a user mapping rule for joining different user names

This video also offers the following demonstrations:

  • File access between Windows and UNIX
  • Creation of a synthetic ACL, which dynamically maps UNIX permissions to Windows rights
  • File permissions management

 

For more information about implementing multiprotocol in OneFS, contact your account representative. If you have feedback about this blog or these videos, send an email to isi-knowledge@emc.com. If you have a request for new documentation, send an email to isicontent@emc.com.

7 best practices for setting up Hadoop on an EMC Isilon cluster

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

If you’re considering adding an Apache™ Hadoop® workflow to your EMC® Isilon® cluster, you’re probably wondering how to set it up. The new white paper “EMC Isilon Best Practices for Hadoop Data Storage” provides useful information for deploying Hadoop in your Isilon cluster environment.

The white paper also introduces the unique approach that Isilon took to Hadoop deployments. In a typical Hadoop deployment, large unstructured data sets are ingested from storage repositories to a Hadoop cluster based on the Hadoop distributed file system (HDFS). Data is mapped to the Hadoop DataNodes of the cluster and a single NameNode controls the metadata. The MapReduce software framework manages jobs for data analysis. MapReduce and HDFS use the same hardware resources for both data analysis and storage. Analysis results are then stored in HDFS or exported to other infrastructures.

Traditionl Hadoop Deployment

In an EMC Isilon Hadoop deployment, the HDFS is integrated as a protocol into the Isilon distributed OneFS® operating system. This approach gives users direct access through the HDFS to data stored on the Isilon cluster using standard protocols such as SMB, NFS, HTTP, and FTP. MapReduce processing and data storage are separated, allowing you to independently scale compute and data storage resources as needed.

EMC Isilon Hadoop Deployment

Every node in the Isilon cluster acts as the NameNode and DataNode. Compute clients running MapReduce jobs can connect to any node in the cluster. Data analysis results can be accessed by Hadoop users through standard protocols without the need to export results.

To learn more about the benefits of Hadoop on Isilon scale-out network attached storage (NAS), read “Hadoop on EMC Isilon Scale-Out NAS” and “EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics.”

Best practices for deploying Hadoop to your Isilon cluster

You can connect Apache Hadoop or an enterprise-friendly Hadoop distribution, such as Pivotal HD or Cloudera, to your Isilon cluster.

First, you’ll need to turn on the HDFS protocol in OneFS. Contact your account representative to complete this step. Next, follow these best practices:

  1. Review the EMC Hadoop Start Kit 2.0. Visit the EMC Hadoop Starter Kit (HSK) 2.0 for step-by-step guides on how to connect a Hadoop distribution to your Isilon cluster. HSK guides are available for Apache Hadoop, Pivotal HD, Cloudera, and Hortonworks. A video demonstration for Pivotal HD is also available.
  2. Find your Isilon cluster’s optimal point to help determine the number of nodes that will best serve your Hadoop workflow and compute grid. The optimal point is the point at which it scales in processing MapReduce jobs and reduces run times in relation to other systems for the same workload. Contact your account representative to help you determine this information.
  3. Create directories and set permissions. OneFS controls access to directories and files with POSIX mode bits and access control lists (ACLs). Make sure directories and files are set up with the correct permissions to ensure that your Hadoop users can access their files.
  4. Don’t run NameNode and DataNode services on clients. Because the Isilon cluster acts as the NameNode and DataNodes for the HDFS, these services should only run on the cluster and not on compute clients. On compute clients, you should only run MapReduce processes.
  5. Increase the HDFS block size from the default 64 MB to 128 MB to optimize performance. Boosting the block size lets Isilon nodes read and write HDFS data in larger blocks. The result is an increase in performance of MapReduce jobs.
  6. Store intermediate jobs on an Isilon cluster. A Hadoop client typically stores its intermediate map results locally. The amount of local storage available on a client affects its ability to run jobs. Storing map results on the cluster can help performance and scalability.
  7. Consult the Isilon best practices white paper for additional tips. You can find more details about some of these best practices in “EMC Isilon Best Practices for Hadoop Data Storage.” You can also find additional tips for tuning OneFS for HDFS operations, using EMC Isilon SmartConnect™ for HDFS, aligning datasets with storage pools, and securing HDFS connections with Kerberos.

 

If you have questions related to Hadoop and your Isilon environment, contact your account representative. If you have documentation feedback or want to request new content, email isicontent@emc.com.

[display_rating_result]

Top 20 EMC Isilon support documents in February 2014

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

One of the goals of this blog is to share the most useful EMC Isilon support-related content that we have to offer. In this post, we’re highlighting 20 of the most viewed knowledgebase (KB) articles and product documents from the month of February.

We hope these documents will help you to quickly find an answer to a common question or resolve an issue.

Top 10 KB articles

To access these KB articles, log in to the EMC Online Support site. Articles in bold are new to the top 10 list this month.

  1. How to download OneFS 7.1.0.1 (172492)
  2. Best practices for NFS client settings (90041)
  3. OneFS 7.0.2 SMB Rollup Patch (172623)
  4. How to create a bootable image of OneFS on a USB flash drive (16691)
  5. How to reset a node to factory defaults (16696)
  6. Patches available for Isilon OneFS (88358)
  7. How to connect to the management port of a node (16744)
  8. OneFS 6.5.5 SMB Rollup Patch (172742)
  9. How to reimage a node using a USB flash drive (16582)
  10. Troubleshooting performance issues (88844)

 

Top 10 product documents

To access these PDF documents, log in to the EMC Online Support site. Documents in bold are new to the top 10 list this month.

  1. OneFS 7.1 CLI Administration Guide
  2. Isilon Supportability and Compatibility Guide
  3. Current Isilon Software Releases
  4. OneFS 7.1.0 MR Release Notes
  5. OneFS 7.0.2.5 Release Notes
  6. OneFS 7.0.2 Administration Guide
  7. OneFS 7.1 Web Administration Guide
  8. Current Patches for Isilon OneFS 7.0
  9. OneFS 7.0.1 Administration Guide
  10. OneFS 7.0.2 Command Reference

 

If you have questions or feedback about this blog, send an email to isi.knowledge@emc.com. To provide documentation feedback or request new content, send an email to isicontent@emc.com.

How to keep your EMC Isilon cluster from reaching capacity

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

It’s important to maintain enough free space on your EMC® Isilon® cluster to ensure that data is protected and workflows are not disrupted. At a minimum, you should have at least one node’s worth of free space available in case you need to protect data on a failing drive.

When your Isilon cluster fills up to more than 90% capacity, cluster performance is affected. Several issues can occur when your cluster fills up to 98% capacity, such as substantially slower performance, failed file operations, the inability to write or delete data, and the potential for data loss. It might take several days to resolve these issues. If you have a full cluster, nearly full cluster, or need assistance with maintaining enough free space, contact EMC Isilon Technical Support.

Fortunately, there are several best practices you can follow to help prevent your Isilon cluster from becoming too full. These are detailed in the “Best Practices Guide for Maintaining Enough Free Space on Isilon Clusters and Pools” (requires login to the EMC Online Support site). Some of these best practices are summarized in this blog post.

Monitoring cluster capacity

To prevent your cluster from becoming too full, monitor your cluster capacity. There are several ways to do this. For example, you can configure email event notification rules in the EMC Isilon OneFS® operating system to notify you when your cluster is reaching capacity. Watch the video “How to Set Up Email Notifications in OneFS When a Cluster Reaches Capacity” for a demonstration of this procedure.

Another way to monitor cluster capacity is to use EMC Isilon InsightIQ™ software. If you have InsightIQ licensed on your cluster, you can run FSAnalyze jobs in OneFS to create data for InsightIQ’s file system analytics tools. You can then use InsightIQ’s Dashboard and Performance Reporting to monitor cluster capacity. For example, Performance Reports enable you to view information about the activity of the nodes, networks, clients, disks, and more. The Storage Capacity section of a performance report displays the used and total storage capacity for the monitored cluster over time (Figure 1).

Figure 1: The Storage Capacity section of a Performance Report in InsightIQ 3.0.

Figure 1: The Storage Capacity section of a Performance Report in InsightIQ 3.0.

For more information about InsightIQ Performance Reports, see the InsightIQ User Guides, which can be found on the EMC Online Support site.

To learn about additional ways to monitor cluster capacity, such as using SmartQuotas, read “Best Practices Guide for Maintaining Enough Free Space on Isilon Clusters and Pools.”

More best practices

Follow these additional tips to maintain enough free space on your cluster:

  • Manage your data
    Regularly delete data that is rarely accessed or used.
  • Manage Snapshots
    Snapshots, which are used for data protection in OneFS, can take up space if they are no longer needed. Read the best practices guide for several best practices about managing snapshots, or read the blog post “EMC Isilon SnapshotIQ: An overview and best practices.”
  • Make sure all nodes in a node pool or disk pool are compatible
    If you have a node pool that contains a mix of different node capacities, you can receive “cluster full” errors even if only the smallest node in your node pool reaches capacity. To avoid this scenario, ensure that nodes in each node pool or disk pool are of compatible types. Read the best practices guide for information about node compatibility and for a procedure to verify that all nodes in each node pool are compatible.
  • Enable Virtual Hot Spare
    Virtual Hot Spare (VHS) keeps space in reserve in case you need to move data off of a failing drive (smartfail). VHS is enabled by default. For more information about VHS, read the knowledgebase article, “OneFS: How to enable and configure Virtual Hot Spare (VHS) (88964)” (requires login to the EMC Online Support site).
  • Enable Spillover
    Spillover allows data that is being sent to a full pool to be diverted to an alternate pool. If you have licensed EMC Isilon SmartPools™ software, you can designate a spillover location. For more information about SmartPools, read the OneFS Web Administration Guide.
  • Add nodes
    If you want to scale-out your storage to add more free space, contact your sales representative.

If you have questions or feedback about this blog or video described in it, send an email to isi.knowledge@emc.com. To provide documentation feedback or request new content, send an email to isicontent@emc.com.