Posts Tagged ‘white paper’

Isilon Ask the Expert Forum happening now

Risa Galant

Risa Galant

Principal Technical Writer at EMC Isilon Storage Division
Risa Galant

Latest posts by Risa Galant (see all)

The Ask the Expert forum on the EMC Isilon Community featuring the Isilon Information Experience team is happening now until August 7.

This is a great opportunity to ask questions, exchange ideas, and share opinions about Isilon technical content with the people who create it, including  product documentation, release notes, videos, white papers, and more.  Check it out here. See you there!

Ask the Expert forum about EMC Isilon technical content on July 27

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

Do you have an opinion about the technical content that EMC Isilon publishes? The EMC Isilon Information Experience team—who generates documentation, release notes, videos, white papers, and more—wants to hear from you.

Let us know how we’re doing. RSVP for our Ask the Expert event on Isilon Product Community, starting July 27, 2015 and continuing through August 7. During this event, you can submit your questions, opinions, and ideas to a forum discussion thread. Answers will be submitted by the Isilon Information Experience team.

What is the “Ask the Expert” forum?

Ask the Expert (ATE) events are regularly scheduled forums that cover many topics and products. Previous ATE events include Scale-out Data Lakes and SMB Protocol Support.  In this special session, content professionals, including our Director of Information Experience, our blogger and social media lead, and several content developers will answer questions we receive from you.

You can ask us about anything related to our technical content, such as:

  • How can I be notified about the latest Isilon content?
  • How do you decide what content to publish?
  • How do I share my idea for a great paper/blog/article with you?
  • What is an Info Hub and why should I care?

What’s in it for you?

The EMC Isilon Information Experience team will post a summary of our ATE session findings. It will contain a roadmap for when you might expect to see the changes you request, if we can accommodate them, and an honest answer if we cannot.

For years, the global economy has been in transit from goods, to information, to knowledge. In particular, the need for trust grows as customers interact with content more often through more digital platforms and channels. Knowledge is now currency AND product. We recognize that our first contact with you may be through content, and we need to build trust through content.

The best way we can build trust with you is to exchange ideas, and the EMC Isilon Ask the Expert event on technical content is a great way to start the conversation. We hope to talk to you soon!

Visit the RSVP page for more details about this event. If you’re interested in more ATE forums, visit the Isilon Community or ECN event page for upcoming events.

[display_rating_result]

Multitenancy for Hadoop data on an EMC Isilon cluster

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

The process of analyzing big data within big organizations can be complicated. There can be many data sets to analyze, some which are stored in silos or contain secure information. And there can be many different Hadoop users accessing these data sets, each with different permissions and credentials. So how can organizations effectively manage multiple data sets and Hadoop users?

In EMC® Isilon® OneFS®, you can take advantage of multitenancy to tackle this issue. Multitenancy creates secure, separate namespaces on a shared infrastructure so that different Hadoop users (or tenants) can connect to an Isilon cluster, run Hadoop jobs concurrently, and consolidate their Hadoop workflows onto a single cluster. OneFS 7.2 supports several Hadoop distributions and HDFS 2.2, 2.3, and 2.4. The OneFS HDFS implementation also works with Ambari for management and monitoring, Kerberos authentication, and Kerberos impersonation.

The white paper, “EMC Isilon Multitenancy for Hadoop Big Data Analytics,” highlights how to set up access zones for multitenancy and manage Hadoop data in an Isilon cluster.

How Hadoop works in Isilon

The Apache Hadoop analytics platform comprises the Hadoop Distributed File System, or HDFS, a storage system for vast amount of data, and MapReduce, a processing paradigm for data-intensive computation analysis.

EMC Isilon serves as the file system for Hadoop clients. This enables Hadoop clients to directly access their datasets on the Isilon storage system and run data analysis jobs on their compute clients. OneFS implements server-side operations of the HDFS protocol on each node in the Isilon cluster to handle calls to the NameNode and to manage read/write requests to DataNodes.

EMC Isilon Hadoop Deployment

To configure an Isilon cluster for Hadoop, you first need to activate a HDFS license in OneFS. Contact your account team for more information. Then visit our EMC Hadoop Starter Kits to learn how to deploy multiple Hadoop distributions, such as Pivotal, Cloudera, or HortonWorks, on your Isilon cluster.

Access zones for multitenancy

Access zones lay the foundation for multitenancy in OneFS. Access zones provide a virtual security context that segregates tenants and creates a virtual region that isolates data sets. Each access zone encapsulates a namespace, HDFS directory, directory services, authentication, and auditing. An access zone also isolates system connections for further security.

The following procedures for managing and securing data sets are covered in “EMC Isilon Multitenancy for Hadoop Big Data Analytics.”

  • Provide multiprotocol support – Learn how you can store data by using existing workflows on your Isilon cluster and access it through SMB, NFS, OpenStack Swift, and HDFS protocols, instead of running HDFS copy operations to move data to Hadoop clients.
  • Manage different data sets – Learn how you can use SmartPools for managing different data sets based on customized policies.
  • Associate network resources with access zones – Understand how virtual racking works in Isilon and how you can configure SmartConnect in OneFS to manage connections to data on your Isilon cluster.
  • Secure access zones – Review how role-based access control and directory services with access zones in OneFS are used to authenticate users assigned to each zone.

Hadoop information hubs

You can find a rich array of information about Isilon and Hadoop. Visit our online Isilon Community on the EMC Community Network for InfoHubs, which serves as a single location for all of our Hadoop-related content. The Hadoop InfoHub contains links to general information about Isilon and Hadoop. The Cloudera with Isilon InfoHub contains links to information about deploying the Cloudera distribution for Isilon.

Start a conversation about Isilon content

Have a question or feedback about Isilon content? Visit the online EMC Isilon Community to start a discussion. If you have questions or feedback about this blog, contact us at isi.knowledge@emc.com. To provide documentation feedback or request new content, contact isicontent@emc.com.

[display_rating_result]

How to secure a Hadoop data lake with EMC Isilon

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

Apache™ Hadoop®, open-source software for analyzing huge amounts of data, is a powerful tool for companies that want to analyze information for valuable insights.

Hadoop redefines how data is stored and processed. A key advantage of Hadoop is that it enables analytics on any type of data. Some organizations are beginning to build data lakes—essentially large repositories for unstructured data—on the Hadoop Distributed File System (HDFS) so they can easily store data collected from a variety of sources, and then run compute jobs on data in its original file format. There’s no need to load data into the HDFS for analysis, saving data scientists time and money. They can then survey their Hadoop data lake and discover big data intelligence to drive their business.

However, the Hadoop data lake also presents challenges for organizations that want to protect sensitive information stored in these data repositories. For example, organizations might need to follow internal enterprise security policies or external compliance regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) or the Sarbanes-Oxley Act (SOX). A Hadoop data lake is difficult to secure because HDFS was neither designed nor intended to be an enterprise-class file system. It is a complex, distributed file system of many client computers with a dual purpose: data storage and computational analysis. HDFS has many nodes, each of which presents a point of access to the entire system. Layers of security can be added to a Hadoop data lake, but managing each layer adds to complexity and overhead.

Best of both worlds

The EMC® Isilon® scale-out data lake offers the best of both worlds for organizations using Hadoop: enterprise-level security and easy implementation of Hadoop for data analytics.securing a hadoop data lake

The new white paper, Security and Compliance for Scale-Out Hadoop Data Lakes, describes how Hadoop data is stored on Isilon scale-out network-attached storage (NAS), and how the OneFS® operating system helps to secure that data.

An Isilon cluster separates data from compute clients in which the Isilon cluster becomes the HDFS file system. All data is stored on an Isilon cluster and secured by using access control lists, access zones, self-encrypting drives, and other security features. OneFS implements the server-side operations of HDFS as a native protocol. Therefore, Hadoop clients access data on the cluster through HDFS and standard protocols such as SMB and NFS.

For more information about how Hadoop is implemented on an Isilon cluster, see EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics.

Isilon security capabilities

OneFS can facilitate your efforts to comply with regulations such as HIPAA, SOC, SEC 17a-4, the Federal Information Security Management Act (FISMA), and the Payment Card Industry Data Security Standard (PCI DSS). The table below summarizes some of the challenges of securing a Hadoop data lake, and how the capabilities of an Isilon cluster can help to address these issues. For full descriptions of these capabilities, see Security and Compliance for Scale-Out Hadoop Data Lakes.

 Hadoop data lakes: security challenges and Isilon capabilities

Security challenges Isilon capabilities Description
A Hadoop data lake can contain sensitive data—intellectual property, confidential customer information, and company records. Any client connected to the data lake can access or alter this sensitive data.
  • Compliance mode and write-once, read-many (WORM) storage
  • Auditing
The SEC 17a-4 regulation requires that data is protected from malicious, accidental, or premature alteration. Isilon SmartLock™ is a OneFS feature that locks down directories through WORM storage. Use compliance mode only for scenarios where you need to comply with SEC 17a-4 regulations. In addition, auditing can help detect fraud, unauthorized access attempts, or other threats to security.
ACL policies help to ensure compliance. However, clients may be connecting to the Hadoop cluster by using different protocols, such as NFS or HTTP.
  • Authentication and cross-protocol permissions
OneFS authenticates users and groups connecting to the cluster through different protocols by using POSIX mode bits, NTFS, and ACL policies. By managing ACL policies in OneFS, you can address compliance requirements for environments that mix NFS, SMB, and HDFS.
Applying restricted access to directories and files in HDFS requires adding layers to your file system.
  • Role-based access control for system administration (RBAC)
  • Identity management
  • User mapping
  • Access zones
The PCI DSS Requirement 7.1.2 specifies that access must be restricted to privileged user IDs. RBAC, a OneFS feature, lets you manage administrative access by role, and assign privileges to a role. You can associate one user with one ID through identity management and user mapping, and then assign that ID to a role. In OneFS, access zones are a virtual security context in which OneFS connects to directory services, authenticates users, and controls access to a segment of the file system.
FISMA and HIPAA and other compliance regulations might require protection for data at rest. Encryption of data at rest Isilon self-encrypting drives are FIPS 140-2 Level 3 validated. The drives automatically apply AES-256 encryption to all data stored in the drives without requiring additional equipment. You can enable a WORM state on directories for data at rest.

To learn how to implement Hadoop on your Isilon cluster, see 7 best practices for setting up Hadoop on an EMC Isilon cluster.

Start a conversation about Isilon content

Have a question or feedback about Isilon content? Visit the online EMC Isilon Community to start a discussion. If you have questions or feedback about this blog, contact isi.knowledge@emc.com. To provide documentation feedback or request new content, contact isicontent@emc.com.

 

[display_rating_result]

How EMC Isilon storage improves performance for EDA workflows

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

200248437-001To develop the chips that go inside advanced technologies, such as smartphones and personal computers, engineers often rely on electronic design automation (EDA) software tools for chip design and testing.

As EDA projects and designs increase in complexity, the amount of project data increases as well. Similar to most industries, the EDA industry is facing challenges with managing the exponential growth of unstructured data while optimizing performance and storage efficiency.

The new technical white paper, “EMC Isilon NAS: Performance at Scale for Electronic Design Automation,” highlights how Isilon scale-out network attached storage (NAS) can alleviate the bottlenecks and inefficient use of storage space for EDA workflows running on traditional storage systems. The primary audience for this white paper includes engineers and executives working in the EDA industry. However, anyone that uses workflows requiring high levels of concurrent running jobs may also find this white paper to be useful.

For example, during the frontend phase of the EDA digital design workflow, EDA applications read and compile millions of small source files to build and simulate chip design. Jobs are typically run concurrently against a deep and wide directory structure, which creates a large amount of metadata overheard and high CPU usage on the storage system. This white paper illustrates how Isilon scale-out storage is more effective than traditional data storage at alleviating workflow performance issues, such as:

  • Metadata access: Using a centralized metadata server can become a bottleneck. Average metadata operations for a typical EDA workflow include 65 percent metadata access, 20 percent writes, and 15 percent data reads. Isilon uses a distributed metadata architecture and can store all metadata on solid-state drives (SSDs), reducing the latency for metadata operations when running concurrent jobs. For more information about EMC® Isilon® OneFS® SSD caching, refer to the white paper, “EMC Isilon OneFS SmartFlash: File System Caching Infrastructure.”
  • Run times for concurrent jobs: All nodes in an Isilon cluster work in parallel. OneFS automatically distributes jobs using SmartConnect™ to each node instead of running all the jobs against a single controller or requiring the manual distribution of jobs to controllers. Isilon recommends that you work with an Isilon representative to determine the number of nodes that will best serve your workflow.

You can learn more about Isilon scale-out NAS architecture, storage efficiency, and data management by referring to “EMC Isilon NAS: Performance at Scale for Electronic Design Automation.”

Start a conversation about Isilon content

Have a question or feedback about Isilon content? Visit the online EMC Isilon Community to start a discussion. If you have questions or feedback about this blog, send an email to isi.knowledge@emc.com. To provide documentation feedback or request new content, send an email to isicontent@emc.com.

[display_rating_result]

Find Isilon help content on the EMC Isilon online community

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

You now have a fast and easy option for downloading EMC® Isilon® help content: visit the EMC Isilon online community.

Recently, the Isilon Community was enhanced in an effort to bring together Isilon-related content—including discussions and documentation—into one place and make it searchable on popular search engines. Here is a list of OneFS and Isilon documentation you can access immediately (you many have to log in to your EMC Community Network [ECN] account to access some of the documents).

Document Title

Overview

OneFS Version

OneFS Web Administration Guide A comprehensive guide to administering your cluster from the web administration interface.

 

OneFS CLI Administration Guide* A comprehensive guide to administering your cluster from the command-line interface. OneFS 7.1
Isilon Site Preparation and Planning* Learn about node specifications, switches and cables, networking topology, and other site installation topics. All versions
EMC Isilon Upgrade Planning and Preparation Includes steps for planning, completing, and troubleshooting a OneFS upgrade.
Backup and Recovery Guide Learn about methods for backing up your cluster, such as SyncIQ and NDMP. OneFS 7.1
Current Isilon Software Releases Learn which releases are available for Isilon OneFS, OneFS software modules, and Isilon firmware packages. All versions

* Requires an EMC Community Network (ECN) account

 

Search for technical white papers

The EMC Isilon Community also features technical white papers. White papers typically describe a solution to a specific problem or scenario. For example, you can download technical white papers on topics such as Hadoop implementation, data migration, electronic design automation, and multiprotocol security. To search for these white papers and additional Isilon documentation, go to the Content tab of the Isilon Community, click on the Document icon, and filter by the “documentation” tag.

How to find more help content

The EMC Isilon Community is a good source for Isilon-related content. However, additional Isilon help documentation is available only on the EMC Online Support site, including:

  • Knowledgebase articles
  • EMC Technical Advisories
  • Software downloads (except the OneFS 7.1.0.1 simulator, which is available for download on the EMC Isilon Community)
  • Help documentation for all OneFS versions

To begin your search, first log in to the EMC Online Support site, go to the OneFS product page, and enter your search terms. Watch the video, How to find content on the EMC Online Support site, for more information.

Start a conversation

Have a question or feedback about Isilon content? Visit the online EMC Isilon Community to start a discussion. If you have questions or feedback about this blog, send an email to isi.knowledge@emc.com. To provide documentation feedback or request new content, send an email to isicontent@emc.com.

[display_rating_result]

Creating SMB shares with expansion variables in EMC Isilon OneFS

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

To make it easy for users in your organization to connect to a home directory through a Windows client, you can create an SMB share in EMC® Isilon® OneFS®. The share specifies configurable permissions, performance, and security settings for each individual user. Managing SMB shares in OneFS 6.5 through 7.1 can be done manually for each user, or dynamically for a large number of users. To create an SMB share or home directory, you can take advantage of these approaches:

  • Create unique SMB shares for user home directories
    • Dynamically create a unique share for each user home directory
    • Manually create a unique share for each user home directory
  • Create a common SMB share for user home directories
    • Dynamically create user home directories in a common share
    • Manually create user home directories in a common share

Each one of these approaches is highlighted in the new white paper, “Managing SMB shares and user home directories in OneFS 6.5 and later.”

How to dynamically create an SMB share using expansion variables

One of the approaches, as described in “Managing SMB shares and user home directories in OneFS 6.5 and later,” is to dynamically create SMB shares and home directories for new users. Instead of creating per-user SMB shares, you can create a single share that includes expansion variables, such as %U for the user name. For example, when a new user logs in through Active Directory, OneFS automatically creates a unique SMB share and directory for that user.

To dynamically create unique SMB shares using name expansion variables, follow these steps:

In OneFS 7.0 and OneFS 7.1

To take full advantage of expansion variables in SMB shares, you should be running OneFS 7.0.2.9 and later, or OneFS 7.1.0.2 and later.

  1. Log in to the OneFS web administration interface.
  2. Click Protocols > Windows Sharing (SMB) > SMB Shares > Add a Share.
  3. Type a share name (for example, Home) and optional description (for example, User Home Directories).
  4. In the Directory to Be Shared box, type /ifs/home/%U. If you store home directories in another location, specify that location instead.
  5. Click Apply Windows Default ACLs.
  6. Select the Allow Variable Expansion check box.
  7. Select the Auto-Create Directories check box.
  8. Click Create.

Dynamically create SMB share and home directories using expansion variables.

In OneFS 6.5

  1. Log in to the OneFS web administration interface.
  2. Click File Sharing > SMB > Add Share.
  3. Type a share name (for example, Home) and description (for example, User Home Directories).
  4. In the Directory to share box, type /ifs/home/%U. If you store home directories in another location, specify that location instead.
  5. Click Apply Windows default ACLs.
  6. Select the Allow Username Expansion check box.
  7. Selectthe Automatically Create User Directory check box.
  8. Click Submit.

More information about SMB and home directories in OneFS

For more information about expansion variables, see the “Create an SMB share” and “Home directory creation in a mixed environment” sections in the OneFS web administration guides. The administration guide also provides configuration information for accessing home directories through FTP or SSH.

[display_rating_result]

Pick your protocol: Multiprotocol file access in EMC Isilon OneFS

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

You’re rushing to meet a project deadline, and you need to update some related files that are stored on an EMC® Isilon® cluster. You’re working on a Linux computer, and you’re connected to the cluster over a Network File System (NFS) protocol. You need to access files in a directory that your coworker, who uses a Windows computer, created when they were connected to the same cluster over a Server Message Block (SMB) protocol. Thanks to the Isilon OneFS® operating system, you can seamlessly access your coworker’s files even though you are doing so through a very different protocol.

Multiple protocol support is a necessity in today’s IT organizations, which comprise a mix of Windows and UNIX/Linux operating environments. OneFS is designed to provide users with unified access to data on an Isilon cluster using a mix of common protocols, such as SMB, NFS, HTTP, and Hadoop Distributed File System (HDFS). For a full list of supported protocols, see the OneFS administration guides or “EMC Isilon Multiprotocol Data Access with a Unified Security Model”.

So how does OneFS support a multiprotocol environment? What are the steps a system administrator needs to take to set up multiprotocol access in OneFS?

We have two videos that cover the basics and provide recommendations for setting up multiprotocol access in OneFS. The first video, “File Access Basics in an Isilon OneFS Multi-Protocol Environments,” provides a whiteboard overview of this topic. The second video, “Technical Demo: Multi-Protocol File Access Using EMC Isilon OneFS,” provide a demonstration of common multiprotocol commands and tasks.

File access basics and AIMA in OneFS

Supporting a mix of protocols requires supporting a mix of user identities and file permissions. This requirement can leave system administrators with several considerations when configuring OneFS.

Before discussing how OneFS handles multiprotocol file access, let’s first review how two operating environments, Windows and UNIX/Linux, authorize access to files. In a Windows environment, users are identified based on unique security identifiers (SIDs). Files or directories are secured through an Access Control List (ACL). In an UNIX environment, users and groups are identified through user identifiers (UIDs) and group identifiers (GIDs), respectively. Files are secured using POSIX mode bits.

OneFS uses Authentication, Identity Management, and Authorization (AIMA) to assign the right permissions and identifiers to users (and groups) no matter which protocols they use to connect to the cluster. To securely support NFS and SMB clients, OneFS does three things:

  • Connects to directory services, such as Microsoft Active Directory (AD) and Lightweight Directory Access Protocol (LDAP), which provides a security database of user and group accounts along with their information
  • Authenticates users and groups
  • Controls access to directories and files

When a user connects to an Isilon cluster, OneFS scans Active Directory and LDAP for the user’s identifiers. Once the user is authenticated, OneFS creates an access token for the user. OneFS then maps the user’s account (known as “user mapping” in OneFS) in one directory service to another. This single access token is the key to authorizing the user so they can access files that are stored and created on the cluster using different protocols.

For example, if a user, Mike, accesses a file share through SMB, OneFS will scan Active Directory and find an SID for him. If OneFS does not find any UIDs or GIDs associated with Mike via LDAP, OneFS will generate a UID and GID for him and save them to Mike’s access token, so he can access files created by NFS users.

The same type of mapping occurs for file permissions. If a file was created through SMB, it will be assigned an ACL to control who can access the file. OneFS will create equivalent POSIX mode bits for this file. File permissions can be saved to the Isilon cluster on disk in one of three modes: native, UNIX, or SID. For more information about each mode, and about AIMA and user mapping, read the “Identities, Access Tokens, and the Isilon OneFS User Mapping Service” white paper.

This is a brief summary of how multiprotocol file access works in OneFS. Watch the following video, “File Access Basics in an Isilon OneFS Multi-Protocol Environments,” for more information and recommendations for configuring multiprotocol access in OneFS. In this video, Principal Solutions Architect Amol Choukekar answers the following frequently asked questions:

  • What are multiprotocol basics?
  • How do Window and UNIX clients differ when they access files on OneFS?
  • How does OneFS handle user and group identities?
  • How does OneFS store file permissions in a multiprotocol environment?
  • How do clients access files that were created using a different protocol?
  • How does OneFS manage file permissions?
  • What if user names are not similar across authentication providers?

How to configure multiprotocol support in OneFS

You can manage user identity mapping and file permissions using the OneFS command-line interface and OneFS web administration interface. Watch the following video, “Technical Demo: Multi-Protocol File Access Using EMC Isilon OneFS” for demonstrations of the following tasks:

  • Review configured authentication providers
  • Review an access token for a user
  • Review existing identity mappings stored on the cluster
  • Delete existing identity mappings
  • Review ACL policies on the cluster
  • Create a user mapping rule for joining different user names

This video also offers the following demonstrations:

  • File access between Windows and UNIX
  • Creation of a synthetic ACL, which dynamically maps UNIX permissions to Windows rights
  • File permissions management

 

For more information about implementing multiprotocol in OneFS, contact your account representative. If you have feedback about this blog or these videos, send an email to isi-knowledge@emc.com. If you have a request for new documentation, send an email to isicontent@emc.com.

7 best practices for setting up Hadoop on an EMC Isilon cluster

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

If you’re considering adding an Apache™ Hadoop® workflow to your EMC® Isilon® cluster, you’re probably wondering how to set it up. The new white paper “EMC Isilon Best Practices for Hadoop Data Storage” provides useful information for deploying Hadoop in your Isilon cluster environment.

The white paper also introduces the unique approach that Isilon took to Hadoop deployments. In a typical Hadoop deployment, large unstructured data sets are ingested from storage repositories to a Hadoop cluster based on the Hadoop distributed file system (HDFS). Data is mapped to the Hadoop DataNodes of the cluster and a single NameNode controls the metadata. The MapReduce software framework manages jobs for data analysis. MapReduce and HDFS use the same hardware resources for both data analysis and storage. Analysis results are then stored in HDFS or exported to other infrastructures.

Traditionl Hadoop Deployment

In an EMC Isilon Hadoop deployment, the HDFS is integrated as a protocol into the Isilon distributed OneFS® operating system. This approach gives users direct access through the HDFS to data stored on the Isilon cluster using standard protocols such as SMB, NFS, HTTP, and FTP. MapReduce processing and data storage are separated, allowing you to independently scale compute and data storage resources as needed.

EMC Isilon Hadoop Deployment

Every node in the Isilon cluster acts as the NameNode and DataNode. Compute clients running MapReduce jobs can connect to any node in the cluster. Data analysis results can be accessed by Hadoop users through standard protocols without the need to export results.

To learn more about the benefits of Hadoop on Isilon scale-out network attached storage (NAS), read “Hadoop on EMC Isilon Scale-Out NAS” and “EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics.”

Best practices for deploying Hadoop to your Isilon cluster

You can connect Apache Hadoop or an enterprise-friendly Hadoop distribution, such as Pivotal HD or Cloudera, to your Isilon cluster.

First, you’ll need to turn on the HDFS protocol in OneFS. Contact your account representative to complete this step. Next, follow these best practices:

  1. Review the EMC Hadoop Start Kit 2.0. Visit the EMC Hadoop Starter Kit (HSK) 2.0 for step-by-step guides on how to connect a Hadoop distribution to your Isilon cluster. HSK guides are available for Apache Hadoop, Pivotal HD, Cloudera, and Hortonworks. A video demonstration for Pivotal HD is also available.
  2. Find your Isilon cluster’s optimal point to help determine the number of nodes that will best serve your Hadoop workflow and compute grid. The optimal point is the point at which it scales in processing MapReduce jobs and reduces run times in relation to other systems for the same workload. Contact your account representative to help you determine this information.
  3. Create directories and set permissions. OneFS controls access to directories and files with POSIX mode bits and access control lists (ACLs). Make sure directories and files are set up with the correct permissions to ensure that your Hadoop users can access their files.
  4. Don’t run NameNode and DataNode services on clients. Because the Isilon cluster acts as the NameNode and DataNodes for the HDFS, these services should only run on the cluster and not on compute clients. On compute clients, you should only run MapReduce processes.
  5. Increase the HDFS block size from the default 64 MB to 128 MB to optimize performance. Boosting the block size lets Isilon nodes read and write HDFS data in larger blocks. The result is an increase in performance of MapReduce jobs.
  6. Store intermediate jobs on an Isilon cluster. A Hadoop client typically stores its intermediate map results locally. The amount of local storage available on a client affects its ability to run jobs. Storing map results on the cluster can help performance and scalability.
  7. Consult the Isilon best practices white paper for additional tips. You can find more details about some of these best practices in “EMC Isilon Best Practices for Hadoop Data Storage.” You can also find additional tips for tuning OneFS for HDFS operations, using EMC Isilon SmartConnect™ for HDFS, aligning datasets with storage pools, and securing HDFS connections with Kerberos.

 

If you have questions related to Hadoop and your Isilon environment, contact your account representative. If you have documentation feedback or want to request new content, email isicontent@emc.com.

[display_rating_result]