APC UPS Data Center & Enterprise Solutions Forum
Schneider, APC support forum to share knowledge about installation and configuration for Data Center and Business Power UPSs, Accessories, Software, Services.
Posted: 2021-06-28 06:58 AM . Last Modified: 2024-03-18 01:04 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:58 AM . Last Modified: 2024-03-18 01:04 AM
Has anyone experienced Lost Comms alarms on the AP8xxx series rack PDU's ?
I recently suffered a network storm caused by a duplicate uplink between seom APC switches that form the private LAN on my DCE server. Since then I have a number of AP8xxx series PDU's that have lost comms and are not recoverable. I have 1200 devices connected to this LAN and only some of the AP8xxx series devices failed to recover. There is nothing consistent about the failures.
We have a number of different failures scenarios including, lost IP config, dead LCD screens, blank LCD screen (lit but no words), lost component to card error, lost phase to card errors and lost component to EEPROM errors.
We've reset the cards, formatted, re-applied IP configs, flashed firmware over serial and USB connections and replaced management cards.
I'd be interested to learn if anyone else has seen anything similar. It's currently under escalation with the LOB but nobody seems to have any concrete answers.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:59 AM . Last Modified: 2024-03-18 01:03 AM
Hi Graham,
I wish I could do more to help from my position but I just wanted to say understand the concerns you're having. I know the local team is working without team here to review the entire situation and provide a resolution both short and long term.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:58 AM . Last Modified: 2024-03-18 01:04 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:58 AM . Last Modified: 2024-03-18 01:04 AM
HI Angela,
Thanks for your latest update. I'll certainly be escalating the December release though.
The only numbers I can see on the NMC's removed from the 8853's on the base pcb - 641-0034B-Z-REV07 and the daughter board 640-2812A-Z Rev 03. The ribbon cable has a label marked OW4744.
The replacement NMC's have a daughter board Rev 04 and the ribbon cable is labelled OW4744A. The footprint is physically different and if we plug a new card into an existing PDU we then get multiple component errors.
thanks, Graham
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:58 AM . Last Modified: 2024-03-18 01:04 AM
Hello,
I've been actively involved in your specific escalation and just wanted to clarify a few points if you have not gotten a clear answer from the support team in your region. The remedy for this type of strange behavior during the network storm will be a firmware fix. We were able to replicate similar behavior in our testing that you saw. I believe you will get more detail offline on the specific issues that are being addressed, if required.
When the firmware fix is available, the behavior will be as follows: The devices will periodically restart their network interfaces while the storm is present because of the heavy traffic going by and they should cease when the storm is removed. The firmware fix will make it so that there is no need for a reset on the unit in any form after the storm is removed and the unit will recover normally and resume operation.
I am not sure if this was communicated to you but for now until a firmware fix is available, your best bet is disabling NTP (if you can) on the devices since NTP enabled seems to really make the impact worse on the current revision. With NTP disabled in our testing, the worst thing we saw was that you'd have to press the pinhole reset button or log in via telnet/SSH and issue a reboot command for the application to boot when the storm is removed. Disabling NTP can be done with a mass config change via DCE.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:58 AM . Last Modified: 2024-03-18 01:04 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:58 AM . Last Modified: 2024-03-18 01:04 AM
Thanks Angela. I hadn't got the testing update, but am encouraged by it.
I disabled NTP a few weeks ago as suggested, thanks. Do you have a view on when the firmware fix will be available ?
I still have 13 PDU's that are unrecoverable. The local Schneider guys have been back on site over the last few days and we've now discovered the NMC's in the very latest APxxx series don't physically fit into the older ones, meaning I have to schedule downtime to replace entire PDU's. Not impressed as the AP8xxx series were sold to us as having hot swappable management cards. Clearly not the case.
Once we've swapped the units out, I guess they'll send the faulty ones back to you guys for forensic analysis. It remains my view, given the myriad of faults and lack of consistency between them that we've witnessed, there is something inherently wrong with the 8000 series. We never had these issues on the 7000's and I have over 5000 deployed.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:59 AM . Last Modified: 2024-03-18 01:04 AM
Hi again,
Our current release schedule including that fix and the big move to v6.0.X (with all the fixes and enhancements that already includes) is scheduled for December.
I found your case and I only see the RMA for the PDUs. Do you know the part number on the new displays they were trying put in these older PDUs? It might be a little too late because I know they did not ask me directly on this but there are (3) display part numbers available because some are a little different. First thought that comes to my mind is they were trying to force the incorrect display into the PDUs.
Here are the part numbers of the displays and what models they work with. I know you have AP8853.
Do you know which of those they tried to force in if you have any laying around? Either way though, the displays are not supposed to be swapped hot which I know has been done because of having to schedule downtime. I wasn't aware they were sold as hot swappable to you unfortunately - we definitely indicate "field replaceable" though but with the requirement of powering down.
Anyway, I am sorry for all of the trouble with this. The AP8XXX and AP7XXX are different platforms in all ways (hardware, management card, etc) so I think from our point of view, just because AP7XXX does one thing, the AP8XXX could behave differently but I understand your point of view as the customer. I think that we will continue to research these issues and behavior you saw to ensure the robustness of the product moving forward.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:59 AM . Last Modified: 2024-03-18 01:04 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:59 AM . Last Modified: 2024-03-18 01:04 AM
Has anyone experienced Lost Comms alarms on the AP8xxx series rack PDU's ?
I recently suffered a network storm caused by a duplicate uplink between seom APC switches that form the private LAN on my DCE server. Since then I have a number of AP8xxx series PDU's that have lost comms and are not recoverable. I have 1200 devices connected to this LAN and only some of the AP8xxx series devices failed to recover. There is nothing consistent about the failures.
We have a number of different failures scenarios including, lost IP config, dead LCD screens, blank LCD screen (lit but no words), lost component to card error, lost phase to card errors and lost component to EEPROM errors.
We've reset the cards, formatted, re-applied IP configs, flashed firmware over serial and USB connections and replaced management cards.
I'd be interested to learn if anyone else has seen anything similar. It's currently under escalation with the LOB but nobody seems to have any concrete answers.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:59 AM . Last Modified: 2024-03-18 01:03 AM
I can understand if you want to escalate the date. Maybe there are other options for an earlier date that can be provided but it could impact the 6.0.6 being pushed potentially which offers multi-user, firewall, and many other new features an enhancements.
I have a display too and it shows that same info you have. I am not seeing the 0G or 0J part numbers.
You don't have the boxes still do you or any other RMA numbers? I searched further and couldn't find any additional references on my end. Or where you taking displays from other AP8853's available versus Schneider providing spare display parts?
I am still wondering if the wrong displays were used or were taken from other units because as you noticed, some of the footprints are different in respect to the metal "chassis" the NMCs and display boards sit in.
Anyway, I know the PDUs have already arrived anyway so maybe this is not worth looking into further if you don't feel the need.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 06:59 AM . Last Modified: 2024-03-18 01:03 AM
Hi Graham,
I wish I could do more to help from my position but I just wanted to say understand the concerns you're having. I know the local team is working without team here to review the entire situation and provide a resolution both short and long term.
Link copied. Please paste this link to share this article on your social media post.
Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.