APC UPS Data Center & Enterprise Solutions Forum
Schneider, APC support forum to share knowledge about installation and configuration for Data Center and Business Power UPSs, Accessories, Software, Services.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:06 AM . Last Modified: 2024-03-18 11:40 PM
I deploy AP8841 (and AP8858NA3) typically in tandem -- two per rack. I chain them together using the In/Out ports.
For several of these, the GUI reports:
Rack PDU 1: Remote RPDU 2 (SN: ZA1545023435) communication lost.
Rack PDU 1: Remote RPDU 2 (SN: ZA1526080808) communication lost.
Rack PDU 1: Remote RPDU 2 (SN: ZA1526080760) communication lost.
And for a fourth:
Rack PDU 2: Remote RPDU 1 (SN: ZA1602081055) communication lost.
What steps might I take to fix this?
- I've reviewed the cabling and claim that the (2) Cat6 jumpers correctly run from "In" to "Out"
- I have tried pressing the Reset button with a paper clip on the 'Slaved' device, no dice. But, I can (temporarily) provide it with an Ethernet connection, watch it acquire a DHCP address, and logon via the GUI and default credentials
- All the NMC are running aos_640.bin and rpdu2g_640.bin
--sk
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:07 AM . Last Modified: 2024-03-18 11:39 PM
Hi Stuart - glad you got it all sorted out now! You're very welcome for the help.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:06 AM . Last Modified: 2024-03-18 11:40 PM
I notice that in all these cases, the Slaved device is an AP8858NA3. More generally, none of my AP8858NA3 are visible from their Masters.
--sk
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:06 AM . Last Modified: 2024-03-18 11:40 PM
Hi Stuart,
Can you provide the full debug .tar file from your master(s) (we call it the host)? Instructions here if you need them -> http://www.apc.com/us/en/faqs/FA156131
Secondly, you talked about Cat6 terminators. Are these the terminators that we provided or that came pre-installed on your Rack PDUs? They have to be the APC terminators since they have a specific pinout. Using other types or brands of terminators could definitely cause this problem. I know you talked about the cabling, so to confirm for your PDU pairs in each rack, you have an APC terminator in IN port, a standard cat6 cable going from out to in on PDU #2 and then a terminator in OUT, right?
You have these set up in pairs - is it always the same racks/guests that seem to lose comm? I made note so far you seemed to indicate the guests are seemingly all the AP8858NA3. How many total pairs do you have versus how many you're seeing problems with?
Have these been running for some time correctly and all of a sudden generated alarms, such as after a firmware upgrade or other event? Just curious when you noticed and if they ever worked and if something potentially changed somewhere to trigger this.
Can you give me an example screenshot of one of your "Group Status" screens from the web UI?
I might have more questions after some log review too..
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:06 AM . Last Modified: 2024-03-18 11:40 PM
Terminology:
OK, so we call the PDU with the network cable inserted into it ... the Host. What do we call the other one? A 'Remote'?
Cabling:
PDU Master IN <--------------> OUT PDU Slave
PDU Master OUT <-------------> IN PDU Slave
where "<-------->" is a Cat6 patch cord
Flock:
I have (6) pairs of AP8841 which are working fine.
I have (1) pair of AP8841 + AP8858NA3, where the AP8858NA3 is the Slave and is not visible to the Master.
I have (3) pairs of AP8858NA3, in which the Slave is not visible to the Master.
Timing:
This is a new installation -- no history.
Debug
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:06 AM . Last Modified: 2024-03-18 11:40 PM
Interesting. All my PDUs are wired identically, per my diagram above.
But ... you are suggesting that I don't need to build a ring, all I need is a single Cat6 cable from the Host to the Guest ... so long as the empty IN/OUT ports are terminated with these APC RJ-45 terminator thingies.
What is the part # on these terminator thingies? I'm guessing that my cable installers tossed them, so I'll be wanting more.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:06 AM . Last Modified: 2024-03-18 11:40 PM
Hi Stuart,
Here is some "light" reading on this Network Port Sharing functionality: http://www.apc.com/us/en/faqs/FA164436
The terminator part numbers are there for reference too but you will want to base the quantity you order for how many groups you have, not how many PDUs. (Or maybe check first to see if you do have any of these around in any of the PDUs) So plan for quantity two per each group (which is a pair in your setup). Another key is to make sure the cables between units don't go over 10m or else you can have comm problems. Yours are in the same rack so I don't think that is a factor here..
Our part number on quantity (2) of the terminators (shown in above image) is our part number 0J-0W05545A. For some reason if we don't have those in stock, you can ask about 0W05545A which is quantity (1). Worst case scenario, the older, bigger terminator (sticks out more from the RJ-45 port) is 0J-0W4161A/0W4161A for quantity (2) or (1) of those respectively.
I would definitely give a check though closely to all of the IN and OUT ports you are able to because these are very often missed and hard to see so I'd be shocked and installer actually noticed them and pulled them out - but maybe they also did the chaining configuration for you so they touched the IN and OUT ports...
Let me know if you have any other questions on this.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:06 AM . Last Modified: 2024-03-18 11:40 PM
I forgot too - I don't think it is causing a problem but just as an FYI, we have one rev newer of firmware available - AOS/rpdu2g v6.4.4 available. When you update a host, it can push it to the guest (assuming they are communicating ) - once we get this issue fixed, we can look at that if you'd like to get them up to date.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:06 AM . Last Modified: 2024-03-18 11:40 PM
Ok, turns out my installers moved the APC RJ-45 termination plug to the Serial port -- so I have these in-hand.
I have re-wired (5) of my PDUs (the 4 sad ones plus one happy one) to look as follows:
PDU Host IN --| |-- Out PDU Guest
PDU Host OUT <---------> IN PDU Guest
Where '<------>' is a Cat6 cable and '--|' (and '|--') are APC RJ-45 terminators.
I don't see a change -- the happy one remains happy; the sad ones remain sad (i.e. "Remote PDU (? -- no longer present) (SN: ZA1526080760) communication lost"
Would you like to see debug tar file from the 'sad' NMCs in this configuration?
--sk
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:06 AM . Last Modified: 2024-03-18 11:40 PM
OK, I've spent more time on this, here is my current status.
Three pairs of AP8858NA3 are functioning fine. Brain glitch on my end that I ever claimed otherwise.
However, the fourth pair of AP8858NA3 is still having trouble. The following Device Alarm: "Rack PDU 1: Remote RPDU (? - no longer present) (SN: ZA1526080760) communication lost."
I claim that this pair is wired correctly, i.e. wired per my previous post.
I attach a screen shot of the RPDU Group Status plus debug logs.
--sk
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:06 AM . Last Modified: 2024-03-18 11:40 PM
Hi Stuart,
Have you tried rebooting the management interface on the remote/guest PDU? You can do this by pressing and releasing the pinhole reset button on the PDU's front display. This will not affect the outlet status. If that doesn't work, we may need to serial into the remote PDU that is not communicating directly to check its event log because the System related events, meaning those referring to the management interface itself, are not propagated to the event log of the host. If the remote/guest PDU is not working, it could have an issue specific to it that we may see if we directly connect to it. This is all now that you have corrected the cabling and wiring.
All I can see in the event log is back to earlier this November and it doesn't seem it was working then.
You just have the network cable (the one that is connected to your actual ethernet network) connected to this PDU (SN ZA1510010179) that indicates it is the host, right? There is no ethernet connection on the remote PDU that is not showing up in the group status page?
But yeah, if it were me, I'd go over to this pair and see what is going on with the remote PDU's display, try to reboot it, see if any change on the host, and then if not, try to directly connect serially into the remote PDU and see what is going on in its event log (can issue the eventlog command in the CLI).
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:07 AM . Last Modified: 2024-03-18 11:40 PM
Hi Angela,
Thanx for sticking with me on this one -- and for emphasizing cabling. In fact, I have been fumbling the cable (OUT from the Host plugged into IN on the Guest), plus termination, fairly regularly.
At this point, my pairs of PDUs are behaving correctly, i.e. the Host and see the Guest.
Except for one pair of AP8841, for which the Host reports:
Rack PDU 2: NMC CAN bus off.
Rack PDU 2: Remote RPDU 1 (SN: 5A1516E01451) communication lost.
I have tried the following:
- Reset both Host & Guest using the Reset pinhole.
- Jack into the serial console on the Guest, resetToDef, reboot.
- Reverse Host & Guest (*and* I even remembered to reverse the IN / OUT cabling).
-Use the USB Key approach to upgrade the Guest's OS to 6.4.0 (both were running 6.4.0 for a while)
- Upgrade the Host to 6.4.4
Would you have another suggestion? Or would you encourage me to examine the serial cabling some more? 🙂
I upload a snippet of a console session with the Guest.
--sk
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:07 AM . Last Modified: 2024-03-18 11:39 PM
Hi Stuart,
You're welcome. I would like to help you get this resolved. And thank you for the log. You'll be an expert on this whole thing when we're done for sure
Here is what we have documented for CAN bus off error that I wrote up a bit ago: http://www.apc.com/us/en/faqs/FA173637
I personally have only seen this when for example, the optional RJ-45 temp probe is accidentally connected to the CAN bus (IN or OUT port) and it detects an error and turns itself off or something similar as mentioned in the article. I guess all of the suggestions are along the same lines - something on the CAN bus that shouldn't be there or that is faulty and is generating errors so the CAN bus turns it off - temp probe, faulty/incorrect terminator, ethernet cable, etc.
I do recall this error not being able to be cleared without a management interface reboot in older firmware revs but I think they fixed that. Plus, you said you did a reboot.
I would say just triple check that and after all of your changing of the cabling, making sure you did one last management interface reboot to make sure the CAN bus can try to restart with the proper cabling in the IN/OUT port. Or maybe try a different Cat 6 cable connecting them if you haven't already?
Can you confirm for me too - is the CAN bus off error following the one physical stick (possible issue with its display?) or is it following the PDU that is designated as the guest (based on your testing of swapping the unit designated as host and guest)?
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:07 AM . Last Modified: 2024-03-18 11:39 PM
OK Angela -- I'm in business.
I swapped parts with a working combination ... and I have tossed what I believe is a faulty terminator in the trash. All my Hosts are successfully talking to their Guests now.
To summarize my issues:
- Incorrect cable termination (ring instead of daisy-chain)
- More incorrect cable termination -- IN going to OUT and other variations
- One Host / Guest combination resisted the cabling correction ... I ended up using the USB approach to upgrade firmware (from 6.1.0 to 6.4.0) and then they started talking correctly
- One bad terminator
Everything is happily running 6.4.4 now, as well.
Thank you again for guiding me through all this.
--sk
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-28 12:07 AM . Last Modified: 2024-03-18 11:39 PM
Hi Stuart - glad you got it all sorted out now! You're very welcome for the help.
Link copied. Please paste this link to share this article on your social media post.
Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.