Archive for the ‘Troubleshooting’ Category

Cool tool: the Isilon Self-Service Platform utility

Risa Galant

Risa Galant

Principal Technical Writer at EMC Isilon Storage Division
Risa Galant

Latest posts by Risa Galant (see all)

Want to have more control over troubleshooting issues on your Isilon cluster? How about being able to resolve common issues yourself, bypassing the support queue? Well, you’re in luck! The Isilon Self-Service Platform utility (SSP) allows you to do just that. Use the SSP utility to perform first-line troubleshooting on your cluster. You’ll be using the same utility that EMC Isilon Technical Support Engineers and Field Representatives use to prevent and troubleshoot a wide range of known issues that occur on Isilon clusters.
Note: The SSP utility runs only on Windows platforms.

How the SSP utility works

The SSP utility analyzes a cluster log set: it does not run live on the cluster. You first collect the cluster log files (the *.tgz files), then run the SSP utility. The utility runs a series of checks on the cluster log files and generates a diagnostic report of the current health of the cluster. You can run checks for specific categories such as pre-upgrade or health check, or choose specific checks for a custom run.

After the analysis completes, the Overall Health screen presents a summary of the basic health of your cluster. It includes high level details about the cluster and the names and status of the individual nodes on the cluster. Summary results are color-coded: red for critical, orange for needs attention, and green for okay. And to help you resolve issues, the report provides links to the relevant documentation.

You can run the SSP utility using its command line interface or the SSP GUI.

Getting the Self-Service Platform utility

Go to the Isilon Self-Service Platform Info Hub and download and unpack the SSP .zip file. The .zip file contains the SSP utility executable file as well as the user guide.

Collecting the cluster log files

Before running the SSP utility, you must collect the cluster log files. You can use the isi gather command or the OneFS WebUI to collect the files. After they’re collected, copy them to a convenient location that the SSP utility can access.  When you run the SSP utility, you specify the location of the log files, a location for the results files, and choose the check(s) that you want to run. The SSP utility places the generated results files for each test in a separate folder in the results location that you specified.

KB article 304468 explains how to collect the log files, complete with a video demonstration.

Running the SSP utility

To run the SSP utility, double-click the Isilon_Self-Service_Platform.exe file. The main screen appears:

SSP Utility Main Screen
(Click the image to see a larger version.)

Click the Help button in the upper left to view the SSP utility’s FAQ page.

Enter the log location, your service request (SR) number, and the output path for the generated diagnostic report. After you click next, you can choose the checks to run. You’ll see a screen with tabs for each of the check categories: pre-upgrade, post-upgrade, health, firmware, and custom, as the following figure shows.

SSP Utility Pre-Upgrade Checks Screen
(Click the image to see a larger version.)

The checks include:

  • Cluster level checks such as overall health of the cluster, patches installed, InfiniBand configuration and upgrade service status
  • Cluster configuration checks such as whether or not ESRS is enabled, file sharing configuration, priority of all routable gateways and whether any gateways share the same priority
  • Node level checks such as boot flash drive problems, NVRAM battery status, mismatched nodes, node health, amount of free space, and uptime
  • Node status checks such as the DMI log, device errors, kernel open files, NIC status, netstat connections, and the cluster’s var/crash partition
  • Disk level checks such as disk load for all disks in each node, node drive bay health, drive errors, and any errors reported in the idi.log file

You can select the checks to run based on category, such as pre-upgrade check, or use the Custom tab to select only those checks you’re interested in:

SSP utility Select Tests screen
(Click the image to see a larger version.)

After you select the checks you want, click the Run button. You’ll see a progress indicator at the bottom of the screen and the SSP utility’s UI is disabled for the duration of the run.

The report results output structure

After the run completes, the SSP utility presents a summary of the results, similar to the following figure (with identifying information redacted).

SummaryScreen
(Click the image to see a larger version.)

You’ll find the results in the folder you specified for the output. The generated results output file structure is similar to the following.

SSP Utility Results File Structure
(Click the image to see a larger version.)

Go get it!

The SSP utility is a great way to take charge of troubleshooting and performing preventative health checks to resolve or avoid common, known issues on your cluster. Check out the Isilon Self-Service Platform Info Hub for the latest download and documentation. And for information about collecting log files (complete with a video demonstration), see KB article 304468.

Let us know!

Let us know what you think. If you have feedback for us about this or any other Isilon technical content, email us at isicontent@emc.com. And thank you!

Check out the latest Customer Troubleshooting Guides

Risa Galant

Risa Galant

Principal Technical Writer at EMC Isilon Storage Division
Risa Galant

Latest posts by Risa Galant (see all)

Happy New Year! To celebrate, we’ve published a host of new EMC Isilon Customer Troubleshooting Guides.  These guides provide step-by-step troubleshooting instructions to help you solve issues that may affect Isilon clusters, or to walk you through the steps needed to gather important data to help EMC Customer Support solve your problem quickly.

This latest set of guides covers topics including Isilon hardware, authentication and permissions, Networking/SmartConnect configuration, and InsightIQ.

Visit our Customer Troubleshooting Info Hub for the latest list of published guides. More guides are coming soon!

The new guides are:

Check out these troubleshooting resources and let us know what you think. Email us at isicontent@emc.com with your feedback. And thank you!

Starting and stopping OneFS authentication processes

Risa Galant

Risa Galant

Principal Technical Writer at EMC Isilon Storage Division
Risa Galant

Latest posts by Risa Galant (see all)

OneFS authentication is implemented for many different storage environments and authentication requirements can differ from setup to setup. Background authentication processes (daemons) add complexity to configuring authentication. Issues can and do arise. But help and guidance are readily available!

Check out Fragile Cogs: Starting and Stopping OneFS Authentication Processes  on the Uptime Information Hub for information and advice about starting and stopping authentication daemons. (Hint: Follow the instructions or call EMC Isilon Technical Support first.)

Let us know

If you have feedback for us about this or any other Isilon technical content, email us at isicontent@emc.com. And thank you!

The self-encrypted drives erasure puzzle

Risa Galant

Risa Galant

Principal Technical Writer at EMC Isilon Storage Division
Risa Galant

Latest posts by Risa Galant (see all)

Self-Encrypted Drives (SEDs) provide hardware-level security for sensitive on-disk data. Data on SEDs is encrypted using a combination of an internal key and a drive access password. Using SEDs is simple. After performing initial drive set-up, you don’t have to do anything: SEDs handle data encryption and decryption automatically. If you want to access the data, you have to know the password. And without that password, the protected on-disk data is inaccessible.

But what if something goes wrong? Can you recover the encrypted data if the password or internal keys are lost or deleted? What if someone removes a SED from a powered-on node, or a SED becomes corrupt or is otherwise defective? What if business reasons require that you completely erase the drive? How do you safely go about doing that, and how do you verify the erasure?

To find answers to these questions, check out the Uptime Information Hub article Data erasure and SED drives: An overview and FAQ, available on the EMC Community Network’s Isilon community space. You’ll learn:

  • How SEDs work
  • What happens if a password is lost, a drive becomes defective, or someone tries to make off with the drive
  • How to erase a defective drive and how to erase all SEDs in a node or cluster
  • How to confirm that a SED has been erased
  • How long typical erasure operations take

And more.

Let us know what you think of the Data erasure and SED drives: An overview and FAQ article. If you have feedback for us about this or any other Isilon technical content, email us at isicontent@emc.com. And thank you!

Troubleshooting, anyone?

Risa Galant

Risa Galant

Principal Technical Writer at EMC Isilon Storage Division
Risa Galant

Latest posts by Risa Galant (see all)

Looking for Isilon troubleshooting information? We’ve got some great resources for you! Read on.

EMC Isilon Customer Troubleshooting Guides

We’re very pleased to announce the availability of the first set of new EMC Isilon Customer Troubleshooting Guides! These guides provide step-by-step troubleshooting instructions to help you solve issues that may affect Isilon clusters, or to walk you through the steps needed to gather important data to help EMC Customer Support solve your problem quickly.

The initial set of guides covers a wide range of topics including Isilon hardware, networking, protocols, upgrades, authentication, cluster configuration and administration, capacity, quotas, and more!

Visit the Customer Troubleshooting Guides Info Hub for the latest list of published guides. More guides are coming soon!

As we go to press, the following guides are available:

Uptime Info Hub: Best Practice Troubleshooting Information

You can also find best practice troubleshooting information on the Uptime Info Hub. Topics include:

  • Advanced troubleshooting of an Isilon Cluster: a 7-part series covering everything from checking free disk space to managing protocol issues, permissions and access, and managing hardware events
  • OneFS L3 cache performance and best practices
  • Cluster relocation planning considerations
  • The benefits of upgrading to Target Code
  • Best practices for working with Snapshots

And much more.

Check out these troubleshooting resources and let us know what you think. Email us at isicontent@emc.com with your feedback. And thank you!