Bad PCIe cables can really mess up your lab and cause chaos in a data centre.
If you plug in a cable and nothing happens at all, you will ‘probably’ notice that it did not link up. Some other failures can be harder to detect, though.
Cables that break the spec in some minor way can work fine until a system upgrade, and then suddenly you start to see problems.
We used our new Cable Tester to decode and validate the identification data on cables in our lab and found some interesting issues.
Active cables for SAS and external PCIe contain an EEPROM on each end, which allows them to communicate basic details to the host system. The Quarch Cable Tester can decode and display these.
Here is a part of a decode. I have cut off the rest to hide the manufacturer and avoid embarrassing them!
This is a 16Gb/s, 4 lane, Mini Multilane HD cable. You would not guess that from the decode, though!
External PCIe 4 cables with incorrect speed rating
These cables specify the data rate they support in an EEPROM register BUT the spec currently only goes to 8Gb/s (Gen3). There is a reasonable assumption over what the ID bits will be for Gen4, but its not official and many Gen4 cables we have seen still only report as 8Gb speed compatible. If a later update in your system starts looking for Gen4 cable compatibility, it might cause an issue.
External PCIe 4 cable, No cable ID
There is an EEPROM register specified for Cable ID which is a primary way to identify what the cable claims to be. As seen above, a major manufacturer set this to 0 (Unknown) in their cable instead of the correct value to specify it as a “Shielded Mini Multilane HD 4X”. If your system does not check this, it will not matter…for now.
SAS / PCIe Spec conflict
One cable that caused us trouble recently specified data fields for both SAS and PCIe values. This meant we were unable to work out which interface the cable was designed for. It’s unclear if this is a failure to follow the spec, or just an unusual but valid implementation. It is not 100% stated, and so different vendors may make different choices, which can cause conflicts later.
Several cables we have tested appear to have unusual characters in strings, such as the vendor serial number. These are intended to be standard text characters for a user to read, but if they are not then anything parsing the data later may fail. We had a problem in our own software where we were creating log files based on the serial number of the cable. The ones with bad characters caused an exception, as we had generated an invalid file name.
All these cables passed link training, mapping, and BERT tests (testing for data corruption errors). They also show nice, clean eye diagrams. They would all work fine if you did not care about what they claimed to be. If your system reads and uses the EEPROM data in any way, there is a potential for problems.
Given that, I would suggest you verify the data loaded on by the manufacturer to make certain the cable you buy today is not going to cause problems tomorrow!