T400 CRC errors on shelf/node

fsprout
Posts: 7
Joined: Wed Apr 16, 2014 10:58 am

Re: T400 CRC errors on shelf/node

Post by fsprout »

Yes, we had looked at the same. Unfortunately, the particular PD that failed wasn't showing the errors compared to others on the same shelf. We even checked other shelves and their error counts were even higher. At the time, we marked it as possible, but not probable, and started down the road of loop replacement.

It was good to do -- we fixed some fiber cables that had too tight of a radius.

Frostie
User avatar
BryanW
Posts: 71
Joined: Sat May 03, 2014 2:01 pm
Location: Dallas, TX

Re: T400 CRC errors on shelf/node

Post by BryanW »

FWIW - Any time I suspect a disk is acting up and not being spared out correctly, I use SystemReporter to check if any PDIDs show abnormal latency (servicetime in 3PAR speak). Usually you will see the latency rise out of step with the herd as the disk goes bad.

I have seen the issue manifest as latency spikes during AO or tunesys runs. I have SR alerting set up to alert me if any VV on any system hits 100ms of servicetime limit count 3.

Here is what it looks like when a SAS disk starts to call in late for work but the array doesn't notice:
Image

Here is the SR URL:
http://<YOURSYSTEMREPORTERURL/cgi-bin/3par-rpts/inserv_perf.exe?reptype=vstime&compare=PDID&maxgraphs=16&comparesel=total_svctms&refresh=&begintsecs=&endtsecs=&txtfromselpdid=&selpdid=--All+PDIDs--&selnsp=&seldiskspeed=--All+Disk+Speeds--&seldisktype=--All+Disk+Types--&charttab=chart&chartlib=gdgraph&charttype=lines&graphx=&graphy=&timeform=Auto&graphlegpos=&report=pd_perf_time&category=hourly&selsys=<YOURARRAYNAME>
Bryan W
Senior Architect/Manager of System Infrastructure, Dallas TX
https://www.linkedin.com/in/bryanlwhite
Post Reply