Posted: 2023-05-24 02:00 PM . Last Modified: 2023-05-24 02:02 PM
We have a fleet of 60-70 Vertiv UPS's of varying models, mostly GXT4's but also some GXT6's with IS-UNITY-DP cards in them that we use over SNMPv3. Most of the Unity cards are on FW 18.104.22.168
Basically, some of our UPS's are showing up fine in DCE and all sensors are live, while other seemingly identical ones show up as having a communications lost error and all of the sensors show as disconnected.
For example in one comm room we have 2 identical GXT4s with the same line card, on the same firmware level, connected to the same edge router, and as far as I can tell have identical configurations but one of them has been in a "communications failure" since last month.
I was also able to complete SNMPv3 walks from both the DCE server itself and from my PC offsite. Pings / tracerts complete fine. SNMP packets are not being blackholed I'm told
Any advice on what to do here? Are there any more logs I can look at that might be helpful? I am pretty much at a loss now. APC support was of no help because "these are not APC devices".....
Posted: 2023-05-24 02:53 PM
I know you shouldn't have to do these things however. What happens when you
Delete device from DCE and add it again?
Factory reset the Webcard and configure?
Swap Webcard for one UPS to another(if possible) does problem follow the card?
Posted: 2023-05-24 07:40 PM
This should be supported as long as the DCE is under a support agreement. Multi-vendor support is right there in the brochure, so we'll live up to that where we can (eg if the root cause is ours). If you want to DM me a ticket number, or try again with your local support and point to this if needed, we should be able to help further.
Usually my go-to for issues like this is packet captures, which really needs to go through support rather than posting them on a public forum. But a few easy things we can test, which can help nail down some v3-specific issues:
Posted: 2023-05-25 06:56 AM
I have not tried to delete / re-add the devices yet but that would be a next step. I can get one of our techs to swap the line cards around (and pray it doesn't bork them both).
Posted: 2023-05-25 07:33 AM
Would there be a more appropriate support route I could try then? The ticket I had with EcoStruxure support, they recommended I call the APC support number and that's the route I tried.
I can reboot devices but there is typically no change. The 8.2.3 devices we have tried upgrading seem to stay in the lost communications state no matter if I reboot them or manually try to scan them
If I reboot / maintenance mode a 8.0.3 device, it usually will show comms lost, then when it reboots, it comes back online as expected
However, I just tried that again on a machine I haven't previously touched and I never could get DCE to communicate with it again. I am now trying to remove it / re-add it and I'll see what happens. I can get a tech to swap / reseat the linecards failing that
SNMPv1 isn't really an option long term because our security folks frown upon it i'm told... I'd have to investigate that further, though, it might be an option for testing
No, device credentials should not have changed. Unfortunately I am also pretty new to struxureware, and I did not engineer our environment.. I just administer it for now