The Ask the Expert forum on the EMC Isilon Community featuring the Isilon Information Experience team is happening now until August 7.
This is a great opportunity to ask questions, exchange ideas, and share opinions about Isilon technical content with the people who create it, including product documentation, release notes, videos, white papers, and more. Check it out here. See you there!
Do you have an opinion about the technical content that EMC Isilon publishes? The EMC Isilon Information Experience team—who generates documentation, release notes, videos, white papers, and more—wants to hear from you.
Let us know how we’re doing. RSVP for our Ask the Expert event on Isilon Product Community, starting July 27, 2015 and continuing through August 7. During this event, you can submit your questions, opinions, and ideas to a forum discussion thread. Answers will be submitted by the Isilon Information Experience team.
What is the “Ask the Expert” forum?
Ask the Expert (ATE) events are regularly scheduled forums that cover many topics and products. Previous ATE events include Scale-out Data Lakes and SMB Protocol Support. In this special session, content professionals, including our Director of Information Experience, our blogger and social media lead, and several content developers will answer questions we receive from you.
You can ask us about anything related to our technical content, such as:
How can I be notified about the latest Isilon content?
How do you decide what content to publish?
How do I share my idea for a great paper/blog/article with you?
What is an Info Hub and why should I care?
What’s in it for you?
The EMC Isilon Information Experience team will post a summary of our ATE session findings. It will contain a roadmap for when you might expect to see the changes you request, if we can accommodate them, and an honest answer if we cannot.
For years, the global economy has been in transit from goods, to information, to knowledge. In particular, the need for trust grows as customers interact with content more often through more digital platforms and channels. Knowledge is now currency AND product. We recognize that our first contact with you may be through content, and we need to build trust through content.
The best way we can build trust with you is to exchange ideas, and the EMC Isilon Ask the Expert event on technical content is a great way to start the conversation. We hope to talk to you soon!
The process of analyzing big data within big organizations can be complicated. There can be many data sets to analyze, some which are stored in silos or contain secure information. And there can be many different Hadoop users accessing these data sets, each with different permissions and credentials. So how can organizations effectively manage multiple data sets and Hadoop users?
In EMC® Isilon® OneFS®, you can take advantage of multitenancy to tackle this issue. Multitenancy creates secure, separate namespaces on a shared infrastructure so that different Hadoop users (or tenants) can connect to an Isilon cluster, run Hadoop jobs concurrently, and consolidate their Hadoop workflows onto a single cluster. OneFS 7.2 supports several Hadoop distributions and HDFS 2.2, 2.3, and 2.4. The OneFS HDFS implementation also works with Ambari for management and monitoring, Kerberos authentication, and Kerberos impersonation.
The Apache Hadoop analytics platform comprises the Hadoop Distributed File System, or HDFS, a storage system for vast amount of data, and MapReduce, a processing paradigm for data-intensive computation analysis.
EMC Isilon serves as the file system for Hadoop clients. This enables Hadoop clients to directly access their datasets on the Isilon storage system and run data analysis jobs on their compute clients. OneFS implements server-side operations of the HDFS protocol on each node in the Isilon cluster to handle calls to the NameNode and to manage read/write requests to DataNodes.
To configure an Isilon cluster for Hadoop, you first need to activate a HDFS license in OneFS. Contact your account team for more information. Then visit our EMC Hadoop Starter Kits to learn how to deploy multiple Hadoop distributions, such as Pivotal, Cloudera, or HortonWorks, on your Isilon cluster.
Access zones for multitenancy
Access zones lay the foundation for multitenancy in OneFS. Access zones provide a virtual security context that segregates tenants and creates a virtual region that isolates data sets. Each access zone encapsulates a namespace, HDFS directory, directory services, authentication, and auditing. An access zone also isolates system connections for further security.
Provide multiprotocol support – Learn how you can store data by using existing workflows on your Isilon cluster and access it through SMB, NFS, OpenStack Swift, and HDFS protocols, instead of running HDFS copy operations to move data to Hadoop clients.
Manage different data sets – Learn how you can use SmartPools for managing different data sets based on customized policies.
Associate network resources with access zones – Understand how virtual racking works in Isilon and how you can configure SmartConnect in OneFS to manage connections to data on your Isilon cluster.
Secure access zones – Review how role-based access control and directory services with access zones in OneFS are used to authenticate users assigned to each zone.
Hadoop information hubs
You can find a rich array of information about Isilon and Hadoop. Visit our online Isilon Community on the EMC Community Network for InfoHubs, which serves as a single location for all of our Hadoop-related content. The Hadoop InfoHub contains links to general information about Isilon and Hadoop. The Cloudera with Isilon InfoHub contains links to information about deploying the Cloudera distribution for Isilon.
Apache™ Hadoop®, open-source software for analyzing huge amounts of data, is a powerful tool for companies that want to analyze information for valuable insights.
Hadoop redefines how data is stored and processed. A key advantage of Hadoop is that it enables analytics on any type of data. Some organizations are beginning to build data lakes—essentially large repositories for unstructured data—on the Hadoop Distributed File System (HDFS) so they can easily store data collected from a variety of sources, and then run compute jobs on data in its original file format. There’s no need to load data into the HDFS for analysis, saving data scientists time and money. They can then survey their Hadoop data lake and discover big data intelligence to drive their business.
However, the Hadoop data lake also presents challenges for organizations that want to protect sensitive information stored in these data repositories. For example, organizations might need to follow internal enterprise security policies or external compliance regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) or the Sarbanes-Oxley Act (SOX). A Hadoop data lake is difficult to secure because HDFS was neither designed nor intended to be an enterprise-class file system. It is a complex, distributed file system of many client computers with a dual purpose: data storage and computational analysis. HDFS has many nodes, each of which presents a point of access to the entire system. Layers of security can be added to a Hadoop data lake, but managing each layer adds to complexity and overhead.
Best of both worlds
The EMC® Isilon® scale-out data lake offers the best of both worlds for organizations using Hadoop: enterprise-level security and easy implementation of Hadoop for data analytics.
An Isilon cluster separates data from compute clients in which the Isilon cluster becomes the HDFS file system. All data is stored on an Isilon cluster and secured by using access control lists, access zones, self-encrypting drives, and other security features. OneFS implements the server-side operations of HDFS as a native protocol. Therefore, Hadoop clients access data on the cluster through HDFS and standard protocols such as SMB and NFS.
OneFS can facilitate your efforts to comply with regulations such as HIPAA, SOC, SEC 17a-4, the Federal Information Security Management Act (FISMA), and the Payment Card Industry Data Security Standard (PCI DSS). The table below summarizes some of the challenges of securing a Hadoop data lake, and how the capabilities of an Isilon cluster can help to address these issues. For full descriptions of these capabilities, see Security and Compliance for Scale-Out Hadoop Data Lakes.
Hadoop data lakes: security challenges and Isilon capabilities
A Hadoop data lake can contain sensitive data—intellectual property, confidential customer information, and company records. Any client connected to the data lake can access or alter this sensitive data.
Compliance mode and write-once, read-many (WORM) storage
The SEC 17a-4 regulation requires that data is protected from malicious, accidental, or premature alteration. Isilon SmartLock™ is a OneFS feature that locks down directories through WORM storage. Use compliance mode only for scenarios where you need to comply with SEC 17a-4 regulations. In addition, auditing can help detect fraud, unauthorized access attempts, or other threats to security.
ACL policies help to ensure compliance. However, clients may be connecting to the Hadoop cluster by using different protocols, such as NFS or HTTP.
Authentication and cross-protocol permissions
OneFS authenticates users and groups connecting to the cluster through different protocols by using POSIX mode bits, NTFS, and ACL policies. By managing ACL policies in OneFS, you can address compliance requirements for environments that mix NFS, SMB, and HDFS.
Applying restricted access to directories and files in HDFS requires adding layers to your file system.
Role-based access control for system administration (RBAC)
The PCI DSS Requirement 7.1.2 specifies that access must be restricted to privileged user IDs. RBAC, a OneFS feature, lets you manage administrative access by role, and assign privileges to a role. You can associate one user with one ID through identity management and user mapping, and then assign that ID to a role. In OneFS, access zones are a virtual security context in which OneFS connects to directory services, authenticates users, and controls access to a segment of the file system.
FISMA and HIPAA and other compliance regulations might require protection for data at rest.
Encryption of data at rest
Isilon self-encrypting drives are FIPS 140-2 Level 3 validated. The drives automatically apply AES-256 encryption to all data stored in the drives without requiring additional equipment. You can enable a WORM state on directories for data at rest.
To develop the chips that go inside advanced technologies, such as smartphones and personal computers, engineers often rely on electronic design automation (EDA) software tools for chip design and testing.
As EDA projects and designs increase in complexity, the amount of project data increases as well. Similar to most industries, the EDA industry is facing challenges with managing the exponential growth of unstructured data while optimizing performance and storage efficiency.
The new technical white paper, “EMC Isilon NAS: Performance at Scale for Electronic Design Automation,” highlights how Isilon scale-out network attached storage (NAS) can alleviate the bottlenecks and inefficient use of storage space for EDA workflows running on traditional storage systems. The primary audience for this white paper includes engineers and executives working in the EDA industry. However, anyone that uses workflows requiring high levels of concurrent running jobs may also find this white paper to be useful.
For example, during the frontend phase of the EDA digital design workflow, EDA applications read and compile millions of small source files to build and simulate chip design. Jobs are typically run concurrently against a deep and wide directory structure, which creates a large amount of metadata overheard and high CPU usage on the storage system. This white paper illustrates how Isilon scale-out storage is more effective than traditional data storage at alleviating workflow performance issues, such as:
Metadata access: Using a centralized metadata server can become a bottleneck. Average metadata operations for a typical EDA workflow include 65 percent metadata access, 20 percent writes, and 15 percent data reads. Isilon uses a distributed metadata architecture and can store all metadata on solid-state drives (SSDs), reducing the latency for metadata operations when running concurrent jobs. For more information about EMC® Isilon® OneFS® SSD caching, refer to the white paper, “EMC Isilon OneFS SmartFlash: File System Caching Infrastructure.”
Run times for concurrent jobs: All nodes in an Isilon cluster work in parallel. OneFS automatically distributes jobs using SmartConnect™ to each node instead of running all the jobs against a single controller or requiring the manual distribution of jobs to controllers. Isilon recommends that you work with an Isilon representative to determine the number of nodes that will best serve your workflow.
Recently, the Isilon Community was enhanced in an effort to bring together Isilon-related content—including discussions and documentation—into one place and make it searchable on popular search engines. Here is a list of OneFS and Isilon documentation you can access immediately (you many have to log in to your EMC Community Network [ECN] account to access some of the documents).
OneFS Web Administration Guide
A comprehensive guide to administering your cluster from the web administration interface.
To make it easy for users in your organization to connect to a home directory through a Windows client, you can create an SMB share in EMC® Isilon® OneFS®. The share specifies configurable permissions, performance, and security settings for each individual user. Managing SMB shares in OneFS 6.5 through 7.1 can be done manually for each user, or dynamically for a large number of users. To create an SMB share or home directory, you can take advantage of these approaches:
Create unique SMB shares for user home directories
Dynamically create a unique share for each user home directory
Manually create a unique share for each user home directory
Create a common SMB share for user home directories
Dynamically create user home directories in a common share
Manually create user home directories in a common share
How to dynamically create an SMB share using expansion variables
One of the approaches, as described in “Managing SMB shares and user home directories in OneFS 6.5 and later,” is to dynamically create SMB shares and home directories for new users. Instead of creating per-user SMB shares, you can create a single share that includes expansion variables, such as %U for the user name. For example, when a new user logs in through Active Directory, OneFS automatically creates a unique SMB share and directory for that user.
To dynamically create unique SMB shares using name expansion variables, follow these steps:
In OneFS 7.0 and OneFS 7.1
To take full advantage of expansion variables in SMB shares, you should be running OneFS 184.108.40.206 and later, or OneFS 220.127.116.11 and later.
Log in to the OneFS web administration interface.
Click Protocols > Windows Sharing (SMB) > SMB Shares > Add a Share.
Type a share name (for example, Home) and optional description (for example, User Home Directories).
In the Directory to Be Shared box, type /ifs/home/%U. If you store home directories in another location, specify that location instead.
Click Apply Windows Default ACLs.
Select the Allow Variable Expansion check box.
Select the Auto-Create Directories check box.
In OneFS 6.5
Log in to the OneFS web administration interface.
Click File Sharing > SMB > Add Share.
Type a share name (for example, Home) and description (for example, User Home Directories).
In the Directory to share box, type /ifs/home/%U. If you store home directories in another location, specify that location instead.
Click Apply Windows default ACLs.
Select the Allow Username Expansion check box.
Selectthe Automatically Create User Directory check box.
More information about SMB and home directories in OneFS
For more information about expansion variables, see the “Create an SMB share” and “Home directory creation in a mixed environment” sections in the OneFS web administration guides. The administration guide also provides configuration information for accessing home directories through FTP or SSH.
You’re rushing to meet a project deadline, and you need to update some related files that are stored on an EMC® Isilon® cluster. You’re working on a Linux computer, and you’re connected to the cluster over a Network File System (NFS) protocol. You need to access files in a directory that your coworker, who uses a Windows computer, created when they were connected to the same cluster over a Server Message Block (SMB) protocol. Thanks to the Isilon OneFS® operating system, you can seamlessly access your coworker’s files even though you are doing so through a very different protocol.
Supporting a mix of protocols requires supporting a mix of user identities and file permissions. This requirement can leave system administrators with several considerations when configuring OneFS.
Before discussing how OneFS handles multiprotocol file access, let’s first review how two operating environments, Windows and UNIX/Linux, authorize access to files. In a Windows environment, users are identified based on unique security identifiers (SIDs). Files or directories are secured through an Access Control List (ACL). In an UNIX environment, users and groups are identified through user identifiers (UIDs) and group identifiers (GIDs), respectively. Files are secured using POSIX mode bits.
OneFS uses Authentication, Identity Management, and Authorization (AIMA) to assign the right permissions and identifiers to users (and groups) no matter which protocols they use to connect to the cluster. To securely support NFS and SMB clients, OneFS does three things:
Connects to directory services, such as Microsoft Active Directory (AD) and Lightweight Directory Access Protocol (LDAP), which provides a security database of user and group accounts along with their information
Authenticates users and groups
Controls access to directories and files
When a user connects to an Isilon cluster, OneFS scans Active Directory and LDAP for the user’s identifiers. Once the user is authenticated, OneFS creates an access token for the user. OneFS then maps the user’s account (known as “user mapping” in OneFS) in one directory service to another. This single access token is the key to authorizing the user so they can access files that are stored and created on the cluster using different protocols.
For example, if a user, Mike, accesses a file share through SMB, OneFS will scan Active Directory and find an SID for him. If OneFS does not find any UIDs or GIDs associated with Mike via LDAP, OneFS will generate a UID and GID for him and save them to Mike’s access token, so he can access files created by NFS users.
The same type of mapping occurs for file permissions. If a file was created through SMB, it will be assigned an ACL to control who can access the file. OneFS will create equivalent POSIX mode bits for this file. File permissions can be saved to the Isilon cluster on disk in one of three modes: native, UNIX, or SID. For more information about each mode, and about AIMA and user mapping, read the “Identities, Access Tokens, and the Isilon OneFS User Mapping Service” white paper.
This is a brief summary of how multiprotocol file access works in OneFS. Watch the following video, “File Access Basics in an Isilon OneFS Multi-Protocol Environments,” for more information and recommendations for configuring multiprotocol access in OneFS. In this video, Principal Solutions Architect Amol Choukekar answers the following frequently asked questions:
What are multiprotocol basics?
How do Window and UNIX clients differ when they access files on OneFS?
How does OneFS handle user and group identities?
How does OneFS store file permissions in a multiprotocol environment?
How do clients access files that were created using a different protocol?
How does OneFS manage file permissions?
What if user names are not similar across authentication providers?
Review existing identity mappings stored on the cluster
Delete existing identity mappings
Review ACL policies on the cluster
Create a user mapping rule for joining different user names
This video also offers the following demonstrations:
File access between Windows and UNIX
Creation of a synthetic ACL, which dynamically maps UNIX permissions to Windows rights
File permissions management
For more information about implementing multiprotocol in OneFS, contact your account representative. If you have feedback about this blog or these videos, send an email to firstname.lastname@example.org. If you have a request for new documentation, send an email to email@example.com.
If you’re considering adding an Apache™ Hadoop® workflow to your EMC® Isilon® cluster, you’re probably wondering how to set it up. The new white paper “EMC Isilon Best Practices for Hadoop Data Storage” provides useful information for deploying Hadoop in your Isilon cluster environment.
The white paper also introduces the unique approach that Isilon took to Hadoop deployments. In a typical Hadoop deployment, large unstructured data sets are ingested from storage repositories to a Hadoop cluster based on the Hadoop distributed file system (HDFS). Data is mapped to the Hadoop DataNodes of the cluster and a single NameNode controls the metadata. The MapReduce software framework manages jobs for data analysis. MapReduce and HDFS use the same hardware resources for both data analysis and storage. Analysis results are then stored in HDFS or exported to other infrastructures.
In an EMC Isilon Hadoop deployment, the HDFS is integrated as a protocol into the Isilon distributed OneFS® operating system. This approach gives users direct access through the HDFS to data stored on the Isilon cluster using standard protocols such as SMB, NFS, HTTP, and FTP. MapReduce processing and data storage are separated, allowing you to independently scale compute and data storage resources as needed.
Every node in the Isilon cluster acts as the NameNode and DataNode. Compute clients running MapReduce jobs can connect to any node in the cluster. Data analysis results can be accessed by Hadoop users through standard protocols without the need to export results.
Find your Isilon cluster’s optimal point to help determine the number of nodes that will best serve your Hadoop workflow and compute grid. The optimal point is the point at which it scales in processing MapReduce jobs and reduces run times in relation to other systems for the same workload. Contact your account representative to help you determine this information.
Create directories and set permissions. OneFS controls access to directories and files with POSIX mode bits and access control lists (ACLs). Make sure directories and files are set up with the correct permissions to ensure that your Hadoop users can access their files.
Don’t run NameNode and DataNode services on clients. Because the Isilon cluster acts as the NameNode and DataNodes for the HDFS, these services should only run on the cluster and not on compute clients. On compute clients, you should only run MapReduce processes.
Increase the HDFS block size from the default 64 MB to 128 MB to optimize performance. Boosting the block size lets Isilon nodes read and write HDFS data in larger blocks. The result is an increase in performance of MapReduce jobs.
Store intermediate jobs on an Isilon cluster. A Hadoop client typically stores its intermediate map results locally. The amount of local storage available on a client affects its ability to run jobs. Storing map results on the cluster can help performance and scalability.
Consult the Isilon best practices white paper for additional tips. You can find more details about some of these best practices in “EMC Isilon Best Practices for Hadoop Data Storage.” You can also find additional tips for tuning OneFS for HDFS operations, using EMC Isilon SmartConnect™ for HDFS, aligning datasets with storage pools, and securing HDFS connections with Kerberos.
If you have questions related to Hadoop and your Isilon environment, contact your account representative. If you have documentation feedback or want to request new content, email firstname.lastname@example.org.
The opinions and interests expressed on Dell EMC employee blogs are the employees' own and do not necessarily represent Dell EMC's positions, strategies or views. Dell EMC makes no representation or warranties about employee blogs or the accuracy or reliability of such blogs. When you access employee blogs, even though they may contain the Dell EMC logo and content regarding Dell EMC products and services, employee blogs are independent of Dell EMC and Dell EMC does not control their content or operation. In addition, a link to a blog does not mean that EMC endorses that blog or has responsibility for its content or use.