Posts Tagged ‘cluster capacity’

Quick Start Lab Guide for adding capacity or performance in the EMC Isilon OneFS Simulator

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

The EMC Isilon OneFS Simulator is a great resource for trying out OneFS on a virtual infrastructure. The OneFS Simulator is a free version of OneFS 7.2 that you can download for non-production purposes. In this simulated OneFS environment, you can get an idea of what it’s like to administer a full Isilon cluster installation.

After downloading and setting up the OneFS Simulator, take a look at our recently published Quick Start Lab Guide. This lab guide walks you through exercises for using the OneFS Simulator. The featured exercise in this guide helps you add capacity, CPU, and memory to your virtual EMC Isilon cluster by adding another node.

Leave feedback about this lab guide

This is the first lab guide for the OneFS Simulator that we’ve published. Please let us know what you think. If you like this guide, have feedback about the format, or suggestions for other quick start guides, please leave a comment or send an email to isicontent@emc.com.

Get help with OneFS Simulator set up

If you need help with the initial set up of OneFS Simulator on your virtual environment, watch this video:

Start a conversation about Isilon content

Have a question or feedback about Isilon content? Visit the online EMC Isilon Community to start a discussion. If you have questions or feedback about this blog, or comments about the video specifically, contact us at isi.knowledge@emc.com. To provide documentation feedback or request new content, contact isicontent@emc.com.

[display_rating_result]

Cluster capacity advice from an EMC Isilon expert

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

Avoiding scenarios where your cluster reaches maximum capacity is crucial for making sure it runs properly. Our Best Practices for Maintaining Enough Free Space on Isilon Clusters and Pools guide contains information to help Isilon customers keep their clusters running smoothly.

However, there are common misperceptions about cluster capacity, such as the notion that it’s easy to delete data from a cluster that is 100 percent full. Another misunderstanding: using Virtual Hot Spare (VHS) to reserve space for smartfailing a drive is not always necessary.

To clarify these issues and other concerns about cluster capacity, I interviewed one of Isilon’s top experts on this topic, Bernie Case. Bernie is a Technical Support Engineer V in Global Services at Isilon, with many years of experience working with customers who experience maximum cluster capacity scenarios. He is also a contributing author to the Best Practices for Maintaining Enough Free Space on Isilon Clusters and Pools guide. In this blog post, Bernie answers questions about cluster capacity and provides advice and solutions.

Q: What are common scenarios in the field that lead to a cluster reaching capacity?

A: The typical scenarios are when there’s an increased data ingest, which can come from either a normal or an unexpected workflow. If you’re adding a new node or replacing nodes to add capacity, and it takes longer than expected, a normal workflow will continue to write data into the cluster—possibly causing the cluster to reach capacity. Or there is a drive or node failure on an already fairly full cluster, which necessitates a FlexProtect (or FlexProtectLin) job from the Job Engine to run to re-protect data, therefore interrupting normal SnapshotDelete jobs. [See EMC Isilon Job Engine to learn more about these jobs.] Finally, I’ve seen snapshot policies that create a volume of snapshots that takes a long time to delete even after snapshot expiration. [See Best Practices for Working with Snapshots for snapshot schedule tips.]

Q: What are common misperceptions about cluster capacity?

A: Some common misconceptions include:

  • 95 percent of a 1 PiB cluster still leaves about 50TiB of space. That’s plenty for our workflow. We won’t fill that up.
  • Filling up one tier and relying on spillover to another tier won’t affect performance.
  • The SnapshotDelete job should be able to keep up with our snapshot creation rate.
  • Virtual Hot Spare (VHS) is not necessary in our workflow; we need that space for our workflow.
  • It’s still very easy to delete data when the cluster is 100 percent full.

Q: What are the ramifications of a full cluster?

A: When a cluster reaches full capacity, you’re dealing primarily with data unavailable situations—where data might be able to be read, but not written. For example, a customer can experience the inability to run SyncIQ policies, because those policies write data into the root file system (/ifs). There’s also the inability to make cluster configuration changes because those configurations are stored within /ifs.

Finally, a remove (rm) command for deleting files may not function when a cluster is completely full, requiring support intervention.

Q: What should a customer do immediately if their cluster is approaching 90-95 percent capacity?

A: Do whatever you can to slow down the ingesting or retention of data, including moving data to other storage tiers or other clusters, or adjusting snapshot policies. To gain a little bit of temporary space, make sure that VHS is not disabled.

Call your EMC account team to prepare for more storage capacity. You should do this at around 80-85 percent capacity.  It does take time to get those nodes on-site, and you don’t want any downtime.

VHS in SmartPools settings should always be enabled. The default drive to protect is 1 drive, and reserved space should be set to zero. For more information, see KB 88964.

VHS options should always be selected to set aside space for a drive failure. You should have at least 1 virtual drive (default value) set to 0% of total storage. For more information on these default values, see KB 88964 on the EMC Online Support site.

Q: What are the most effective short-term solutions for managing or monitoring cluster capacity?

A: Quotas are an effective way to see real-time storage usage within a directory, particularly if you put directories in specific storage tiers or node pools. Leverage quotas wherever you can.

The TreeDelete job [in the Job Engine] can quickly delete data, but make sure that the data you’re deleting isn’t just going into a snapshot!

Q: What are the most effective long-term solutions to implement from the best practices guide?

A: Make sure you have an event notifications properly configured, so that when jobs fail, or drives fail, you’ll know it and can take immediate action. In addition to notifications and alerts, you can use Simple Network Management Protocol (SNMP) to monitor cluster space, for an additional layer of protection.

InsightIQ and the FSAnalyze job [which the system runs to create data for InsightIQ’s file system analytics tools] can give great views into storage usage and change rate, over time, particularly in terms of daily, monthly, or weekly data ingest.

Q: Is there anything you would like to add?

A: Cluster-full situations where the rm command doesn’t work are sometimes alarming. In a file system such as OneFS, a file deletion often requires a read-modify-write cycle for metadata structures, in addition to the usual unlinking and garbage collection that occurs within the file system. Getting out of that situation can be challenging and sometimes time-consuming. Resolving it requires a support call—and a remote session, which can be a big problem for private clusters.

Sometimes accidents happen or a node can fail, which can push a cluster to the limit of capacity thresholds. Incidents such as these can occasionally lead to data unavailability situations that can halt a customer’s workflow. Being ready to add capacity at 80-85 percent can prevent just this sort of situation.

Start a conversation about Isilon content

Have a question or feedback about Isilon content? Visit the online EMC Isilon Community to start a discussion. If you have questions or feedback about this blog, or comments about the video specifically, contact us at isi.knowledge@emc.com. To provide documentation feedback or request new content, contact isicontent@emc.com.

[display_rating_result]

EMC Isilon InsightIQ 3.1 is now available!

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

The latest release of EMC® Isilon® InsightIQ includes new and enhanced reports that help you become a rock star at managing space on your cluster.

New file system reports

The following new reports are available to help you manage cluster capacity, deduplication, and quotas in OneFS. For useful tips about these reports, refer to the InsightIQ 3.1 User Guide.

Usable capacity reporting
Do you often wonder how much free space is on your cluster when accounting for the space that is being used to protect your data? The usable capacity report is an excellent resource that helps you prevent your cluster from reaching capacity. The report anticipates how much protection overhead you might need in addition to capacity that is already reserved for snapshots and virtual hot spares. Essentially, the report breaks down an estimate of how much capacity can be used for storing data and how much capacity can be reserved for protecting your data. Keep in mind that this report only provides estimates.

Usable capacity report in InsightIQ 3.1

Deduplication reporting

Running a deduplication job in OneFS by using SmartDedupe® software module creates free space on your cluster. In OneFS, you can assess the amount of disk space you’ll save before you start a deduplication job. You can also do this in InsightIQ 3.1. However, InsightIQ also lets you view historical and current information about how much space is saved by deduplication over a specific range of time.

Two sections from the deduplication report in InsightIQ 3.1. Historical deduplication job information is not cumulative.

Two sections from the deduplication report in InsightIQ 3.1. Deduplication job information is not cumulative.

Quota reporting
The quota report enables you to simplify quota management in OneFS. This report displays information about quotas created through the SmartQuotas® software module. You can view quotas that are assigned to specific directories, the limits defined by those quotas, and the amount of data stored in the directories that those quotas are applied to. This information can help you compare the data usage of a directory to the quota limits over time, and predict when a directory is likely to reach its quota limit.

A section from the quota report in InsightIQ 3.1. Historical data is generated by quota reports in OneFS.

Two sections from the quota report in InsightIQ 3.1. Click on a directory to view quota limit usage over time. Historical data is generated by quota reports in OneFS.

Enhanced reporting

InsightIQ 3.1 includes enhancements to cache reporting and exporting capabilities on file system analytics reports. For example, you can now view information about L3 cache usage in performance reports and download the data from file system analytic reports to CSV files.

Upgrading or installing InsightIQ 3.1

If you want to install this release, review the InsightIQ 3.1 Installation Guide for requirements and procedures. If you want to upgrade to this release, first explore your upgrade options covered in the Isilon Supportability and Compatibility Guide, and then perform the procedure provided in the InsightIQ 3.1 Installation Guide.

For more information about all the features, fixes, and changes in functionality in this release, refer to the InsightIQ 3.1 Release Notes. For information about using InsightIQ to monitor your cluster, refer to the InsightIQ 3.1 User Guide.

Start a conversation about Isilon content

Have a question or feedback about Isilon content? Visit the online EMC Isilon Community to start a discussion. If you have questions or feedback about this blog, contact us at isi.knowledge@emc.com. To provide documentation feedback or request new content, contact isicontent@emc.com.

[display_rating_result]

How EMC Isilon InsightIQ helps keep your cluster running smoothly

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

We highlighted a crucial cluster maintenance task in a recent blog post: monitoring cluster performance and storage capacity. Now, we’d like to take a closer look at InsightIQ™—a remarkably useful tool for monitoring your EMC® Isilon® cluster. As a licensed software module for the OneFS® operating system, InsightIQ makes it possible for you to monitor one or more clusters.

InsightIQ provides charts, graphs, and customized reports to help you understand how your cluster is performing. To learn how InsightIQ 3.0 works, watch the EMC Isilon InsightIQ Overview video, where Corporate Systems Engineer Jason Sturgeon describes the architecture of InsightIQ and shows you how to use InsightIQ reports.

How reports can help with monitoring

InsightIQ enables you to create customized reports to display data about clusters over specific periods of time. There are two general types of reports: performance reports and file system reports.

Performance reports

By monitoring cluster performance and data usage, you can help to ensure that your cluster runs smoothly. In the InsightIQ overview video, learn how to create a performance report to check cluster activity and a file system report to visualize data usage over time.

For example, if your cluster is running slowly, you can run a performance report and review the external network throughput rate. In the following figure, the performance report shows that one node is handling more network traffic than other nodes. With this information, you can redirect network traffic to other nodes and improve cluster performance.

A performance report that shows which node is handling the majority of network traffic.

A performance report in InsightIQ 3.0 that shows which node is handling the majority of network traffic.

 

 

 

 

 

 

 

 

 

File system report

InsightIQ reports can also help you to monitor cluster data usage by generating a file system report. The video describes a use case where you can explore data growth in a directory over time. For example, after running a file system report you might notice that a directory is storing a large amount of data (see the following figure). If this data hasn’t been accessed for a long time, you can approach the directory owner to determine whether this data can be deleted to increase storage capacity.

A file system report that shows a directory storing a large amount of data.

A file system report in InsightIQ 3.0 that shows a directory storing a large amount of data.

 

 

 

 

 

 

 

For more information about reporting capabilities in InsightIQ, refer to the InsightIQ documentation on the EMC Online Support site (login required).

Start a conversation about Isilon content

Have a question or feedback about Isilon content? Visit the online EMC Isilon Community to start a discussion. If you have questions or feedback about this blog, contact us at isi.knowledge@emc.com. To provide documentation feedback or request new content, contact isicontent@emc.com.

[display_rating_result]

How to keep your EMC Isilon cluster from reaching capacity

Kirsten Gantenbein

Kirsten Gantenbein

Principal Content Strategist at EMC Isilon Storage Division
Kirsten Gantenbein
Kirsten Gantenbein

It’s important to maintain enough free space on your EMC® Isilon® cluster to ensure that data is protected and workflows are not disrupted. At a minimum, you should have at least one node’s worth of free space available in case you need to protect data on a failing drive.

When your Isilon cluster fills up to more than 90% capacity, cluster performance is affected. Several issues can occur when your cluster fills up to 98% capacity, such as substantially slower performance, failed file operations, the inability to write or delete data, and the potential for data loss. It might take several days to resolve these issues. If you have a full cluster, nearly full cluster, or need assistance with maintaining enough free space, contact EMC Isilon Technical Support.

Fortunately, there are several best practices you can follow to help prevent your Isilon cluster from becoming too full. These are detailed in the “Best Practices Guide for Maintaining Enough Free Space on Isilon Clusters and Pools” (requires login to the EMC Online Support site). Some of these best practices are summarized in this blog post.

Monitoring cluster capacity

To prevent your cluster from becoming too full, monitor your cluster capacity. There are several ways to do this. For example, you can configure email event notification rules in the EMC Isilon OneFS® operating system to notify you when your cluster is reaching capacity. Watch the video “How to Set Up Email Notifications in OneFS When a Cluster Reaches Capacity” for a demonstration of this procedure.

Another way to monitor cluster capacity is to use EMC Isilon InsightIQ™ software. If you have InsightIQ licensed on your cluster, you can run FSAnalyze jobs in OneFS to create data for InsightIQ’s file system analytics tools. You can then use InsightIQ’s Dashboard and Performance Reporting to monitor cluster capacity. For example, Performance Reports enable you to view information about the activity of the nodes, networks, clients, disks, and more. The Storage Capacity section of a performance report displays the used and total storage capacity for the monitored cluster over time (Figure 1).

Figure 1: The Storage Capacity section of a Performance Report in InsightIQ 3.0.

Figure 1: The Storage Capacity section of a Performance Report in InsightIQ 3.0.

For more information about InsightIQ Performance Reports, see the InsightIQ User Guides, which can be found on the EMC Online Support site.

To learn about additional ways to monitor cluster capacity, such as using SmartQuotas, read “Best Practices Guide for Maintaining Enough Free Space on Isilon Clusters and Pools.”

More best practices

Follow these additional tips to maintain enough free space on your cluster:

  • Manage your data
    Regularly delete data that is rarely accessed or used.
  • Manage Snapshots
    Snapshots, which are used for data protection in OneFS, can take up space if they are no longer needed. Read the best practices guide for several best practices about managing snapshots, or read the blog post “EMC Isilon SnapshotIQ: An overview and best practices.”
  • Make sure all nodes in a node pool or disk pool are compatible
    If you have a node pool that contains a mix of different node capacities, you can receive “cluster full” errors even if only the smallest node in your node pool reaches capacity. To avoid this scenario, ensure that nodes in each node pool or disk pool are of compatible types. Read the best practices guide for information about node compatibility and for a procedure to verify that all nodes in each node pool are compatible.
  • Enable Virtual Hot Spare
    Virtual Hot Spare (VHS) keeps space in reserve in case you need to move data off of a failing drive (smartfail). VHS is enabled by default. For more information about VHS, read the knowledgebase article, “OneFS: How to enable and configure Virtual Hot Spare (VHS) (88964)” (requires login to the EMC Online Support site).
  • Enable Spillover
    Spillover allows data that is being sent to a full pool to be diverted to an alternate pool. If you have licensed EMC Isilon SmartPools™ software, you can designate a spillover location. For more information about SmartPools, read the OneFS Web Administration Guide.
  • Add nodes
    If you want to scale-out your storage to add more free space, contact your sales representative.

If you have questions or feedback about this blog or video described in it, send an email to isi.knowledge@emc.com. To provide documentation feedback or request new content, send an email to isicontent@emc.com.