APC UPS Data Center & Enterprise Solutions Forum
Schneider, APC support forum to share knowledge about installation and configuration for Data Center and Business Power UPSs, Accessories, Software, Services.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Hello,
I work for an organization that has about 126 UPS units of varying models that utilize the AP9631 network management card for SNMP polling. Up until recently they have been running version 5.1.X of the aos firmware. I have been in the process of bulk upgrading the UPS management cards from 5.1.X to 6.0.6 utilizing the firmware upgrade utility provided with the 6.0.6 release. After upgrading about 20 of the devices I have noticed that 4 of the 20 that I have upgraded have started to become very slow and have (occasionally) in some instances stopped responding to SNMP polling requests.
One example of the issue we are experiencing is with a Smart-UPS X 1500 (SMX1500RM2UNC) with an AP9631 management card running AOS v6.0.6. Prior to the upgrade the management card was fairly consistent in responding to ICMP requests (with a response time of a few ms), SNMP data was being polled consistently, telnet sessions were responsive and fast. After the upgrade however the ICMP request response times are all over the place (ranging from a few ms to over 1500ms) - and as a result our SNMP data isn’t being consistently graphed (because it seems as if the card itself is being taxed to the point where it is becoming unresponsive). Establishing a telnet session to the management card yields a very laggy, delayed output.
In an effort to troubleshoot the issue a bit I have gone through the process of formatting the card and manually configuring all of the settings by hand. I have disabled IPv6 (since we only are running IPv4 on the VLANs these devices run on) thinking that it might help but it has not.
Seeing as how this version of aos has been out since April I was wondering if anybody else has experienced issues of this nature and if so, what steps can be taken to stop this? If there is anything I can provide to assist with troubleshooting please let me know.
Thanks!
Chris
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Hi Angela,
Just thought I'd let you know that I upgraded all of our UPSs with AP9630/31s from the beta 6.3.3 code to 6.4.0. I did promise I would do that as soon as I saw that the code was available. 🙂 So far, it's working fine.
Thanks again for providing the beta to me a few months back!
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Hello Jason,
I can attach it here. Just make sure to format after downgrading as noted here -> Things To Consider When Upgrading or Downgrading a Network Management Card 2 (NMC2) Device between v.... Also, I expect to have a new firmware available (6.2.0) this week. Don't know if you prefer to wait to try the upgrade versus going backwards.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Thanks Angela. If you don't mind, go ahead and attach. I'm willing to try 6.2.0 on one of the cards and see if it resolves the issues I'm seeing, but I would like to have the option to drop back if needed.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Sure - I did attach the file though. Did it not show for you? I see it as an .exe that is zipped and attached to my last reply.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
I'm afraid I only see a link to the FA article on downgrading--I must be missing something. Do I have to go somewhere else to see it?
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
make sure you're looking at the actual thread and not in your Inbox view. This link may help and send you to the thread.
Re: AP9631 management card occassionally unresponsive after upgrade to aos v6.0.6
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Got it, thanks
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Is it possible that the monitoring software is polling different data points on the cards exhibiting the issue vs those without?
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Hello,
I work for an organization that has about 126 UPS units of varying models that utilize the AP9631 network management card for SNMP polling. Up until recently they have been running version 5.1.X of the aos firmware. I have been in the process of bulk upgrading the UPS management cards from 5.1.X to 6.0.6 utilizing the firmware upgrade utility provided with the 6.0.6 release. After upgrading about 20 of the devices I have noticed that 4 of the 20 that I have upgraded have started to become very slow and have (occasionally) in some instances stopped responding to SNMP polling requests.
One example of the issue we are experiencing is with a Smart-UPS X 1500 (SMX1500RM2UNC) with an AP9631 management card running AOS v6.0.6. Prior to the upgrade the management card was fairly consistent in responding to ICMP requests (with a response time of a few ms), SNMP data was being polled consistently, telnet sessions were responsive and fast. After the upgrade however the ICMP request response times are all over the place (ranging from a few ms to over 1500ms) - and as a result our SNMP data isn’t being consistently graphed (because it seems as if the card itself is being taxed to the point where it is becoming unresponsive). Establishing a telnet session to the management card yields a very laggy, delayed output.
In an effort to troubleshoot the issue a bit I have gone through the process of formatting the card and manually configuring all of the settings by hand. I have disabled IPv6 (since we only are running IPv4 on the VLANs these devices run on) thinking that it might help but it has not.
Seeing as how this version of aos has been out since April I was wondering if anybody else has experienced issues of this nature and if so, what steps can be taken to stop this? If there is anything I can provide to assist with troubleshooting please let me know.
Thanks!
Chris
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Hi Chris,
This is not something I have come across. The first thing I know I'd want to see is the log dump from the web under About->Support->Generate then download logs.
Here are some other thoughts/comments/questions from me:
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
I personally have not upgraded any of the 100 or so ups ( with AP9631's) I look after to version 6 or above. I really didn't like the new drop down style GUI at all, plus now that I hear this sort of problem I am sitting even tighter at version 5.1.7 for now. Sorry this isn't helpful with your specific problem, but I hope it gets resolved !
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:32 AM . Last Modified: 2024-03-05 02:09 AM
Hi Angela,
Thanks for replying. I've attached the logs to this post - and below you'll find my answers to your questions:
The one example I am referring to in my initial post has happened at spontaneous times throughout the day. Using today as a sample - our system saw it "drop offline" around midnight...recovering a minute or two later. About 15 minutes later it dropped off again and recovered (again) about 2 minutes later. This happened again around 2am, about 2:15am (and according to our monitoring system it was unreachable for about 15 minutes before coming back). The periods when this happens seems does not appear to follow a pattern - some times the card will become unreachable at approximately 10 minute intervals, only to recover and then become unreachable 30 minutes later. Looking at our logs for today it has done this about 30 times so far.
This behavior has not been consistent (i.e. it does not appear to be occurring at regular intervals) however is has been occurring since I have upgraded the cards to v6.0.6. We have not had any other issues with these cards prior to upgrading to 6.0.6. Some of the cards are more eventful than others in that they seem to be having more issues (this could be related to the size of the VLAN that they are on though).
We have a mix of UPS models. I'm seeing this on both the SMX1500's and the SMX3000's.
Correct (and this is what baffles me the most). I had initially slated about 50 or so UPS units to upgrade the other day when after about 20 of them we started to see these issues. So far our monitoring system has not notified us on any of the other UPS units that were upgraded. Our network is set up such that we have our network devices on their own small management VLAN to keep the management traffic isolated from user traffic. The units that are working are on similar VLANs as the ones that are not functioning properly....
Sure....I followed the steps outlined at the bottom of document FA167693 on formatting the card. I used the "format" command from the cli prompt. After I rebooted the management card I was able to log into it using the default username/password combination and verified that all of my settings that were present on the card prior to the format were gone. What I did after this was complete was configure the card by manually entering all of the settings (rather than uploading the config.ini file that was saved from before).
I will certainly give this a try and report back the findings!
I would be more than happy to do this....I wanted to see if there was anything else I could try before downgrading (since I really do like the improvements that are present in 6.0.X)...
No, the configuration remained the same after the upgrade.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:09 AM
Hi Chris,
Ok I got the log files. It seems like what you're saying though is you have your SNMP or ICMP logs you're referring to - I wanted to line up events from the NMC event log to what you were saying. I am only seeing data from 8am today and later and I don't see anything really abnormal (constant rebooting, errors, etc). I know you modified and removed some private information but one thing that I guess maybe stuck out a little from event.txt is what is constantly trying to access the FTP interface of the card? Something is logging in and out very frequently. Were there connectivity problems during that time period from what you know? Because that might be causing a problem possibly (even though it really shouldn't). Do any of the other cards have these FTP login/logout events?
And to confirm, you just use SNMPv1 for these devices? That is what I see in config.ini but one of the SNMPv3 settings was filled in so I didn't know if you switched.
Also, your file download did not include a dump.txt? I don't think it will give me any further answers but just curious.
Depending on your answer or thoughts on these comments, we can proceed with the crossover connection and/or the downgrade option for testing.
P.S. - ok, I am glad you found FA167693 just to verify a format was done properly rather than thinking we did it and it didn't work in reality.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:09 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:09 AM
Hi Angela,
I've attached my reply as a text file because my post which I have been trying to post all day has been refused by the forums because it thinks I am trying to spam the forum. I've sent you a private message explaining my situation as well.
Chris
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:09 AM
Hi Chris,
Yes I responded to you but this works well too. We are combating spam problems lately so I apologize. Here is your reply. I am going to disable the blocking while I post this so we can all see it. I can't figure out what word it is myself.
The reason why you're only seeing data from 8am on Friday is because that was when I had formatted the management card and reprogrammed it. We were getting connectivity problems all night Thursday and so when I got to work Friday morning I started to format the cards I was having issues with. The FTP events you're seeing is my boss logging into the device. That was a one time thing (I've checked the logs this morning from over the weekend and have not seen any other FTP login attempts yet the connectivity issues still persist). I checked the other management cards' logs just to be sure and I am not seeing any other FTP events that are not ordinary.
As for the SNMP question - we use SNMPv2c for polling the UPS devices. We have SNMPv3 configured because there are some devices on our network that are polled via SNMPv3 so we have it configured for completeness sake. This has not caused any issues on any of the other cards that we have on the network that are not running this version of AOS.
The dump.txt file wasn't included in the download that was generated when I went to download all of the debug information from the card. Is there another method I should be using to get it?
Chris
Since no change over the weekend, I would say let's try either downgrading a card or crossover connection to one card - whatever is easier.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Ok - I have downgraded the UPS I have been referring to for troubleshooting purposes back to v5.1.7 (following the process for upgrading the cards outlined in FA156047) and so far everything seems to be going ok. The response times are now a consistent 1-2ms as opposed to the erratic times witnessed before when the card was running v6.0.6. I've had it running safely for over 2.5 hours now (just to give you an idea of the situation before - our monitoring system was reporting that the management card was "down" due to the extremely high response times twice an hour) with no issues in response time at all.
I will continue to monitor the device throughout the night and report back any changes tomorrow morning. Now that we've downgraded the card how should we proceed?
Chris
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Hi Chris,
Did you also format after the downgrade? I presume so since you had read the other article.
I might need to discuss this with a team member but if I were working on it myself, I'd check it after overnight and if OK, maybe try to upgrade the card again to 6.0.6 to be confident this is always reproducible. If so, then I'll need to do more consulting on my end.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Hi Angela,
I did format the card after downgrading - sorry for not mentioning that! After downgrading the card to v5.1.7 yesterday the card seemed to have stabilized. I went ahead and performed an upgrade this morning (without formatting the card - I just used the firmware upgrade utility for Windows) and now the card is acting just like it did before the downgrade process (i.e. the response times are now all over the place again).
Chris
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
While I investigate further, can you tell me the information under About->UPS in the v6.0.6 web UI? And let me know that info on any of the card displaying it. I just need the UPS specific SKUs/model numbers and then I need the UPS firmware versions please. Just want to check to see if any pattern there.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Here's the info on the one that I have been working with:
Model: Smart-UPS X 1500
SKU: SMX1500RM2UNC
Serial Number: AS0945130576
Firmware Revision: UPS 02.2 (ID11)
Manufacture Date: 11/04/2009
The other ones we have been experiencing issues with are listed below:
Model: Smart-UPS X 1500
SKU: SMX1500RM2UNC
Serial Number: AS0945130582
Firmware Revision: UPS 02.2 (ID11)
Manufacture Date: 11/04/2009
---------------------------
Model: Smart-UPS X 1500
SKU: SMX1500RM2UNC
Serial Number: AS1007122654
Firmware Revision: UPS 02.2 (ID11)
Manufacture Date: 02/12/2010
---------------------------
Model: Smart-UPS X 1500
SKU: SMX1500RM2UNC
Serial Number: AS1007330406
Firmware Revision: UPS 02.2 (ID11)
Manufacture Date: 02/13/2010
---------------------------
Model: Smart-UPS X 3000
SKU: SMX3000RMLV2UNC
Serial Number: IS1235006143
Firmware Revision: UPS 01.9 (ID10)
Manufacture Date: 08/25/2012
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Hi Chris,
Since we are not seeing this issue, we'll have to see what we can do to debug it.
Do you think you'd be able to do a crossover connection as well to one card temporarily to see if any change while eliminating the network traffic?
The other thought was because of major TCP/IP stack changes that perhaps there is a difference in how different types of activity are prioritized. That is a little confusing since some cards do it and some do not. On this point, is there any way to temporarily disable SNMP (or maybe SNMPv3) temporarily just to see if any change and for us to maybe determine if this is related to the SNMP task or not.
If we don't get anywhere with this, I might need to pursue trying to replicate your card as much as possible to see if we can see the same behavior..
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Ok - I hooked up a crossover cable to the UPS from my laptop and the problem has disappeared. However when I hook it back up to the network the problem comes back.
I also disabled both SNMPv1/2c and SNMPv3 and I am still seeing the same issue. I thought it was an issue with the amount of traffic on the VLAN it was initially sitting on so I had moved it to a smaller VLAN that doesn't carry as much traffic (it is carrying about 4kbps of constant traffic). That didn't seem to help either.
I am going to see if I can put this device on its own VLAN (with no devices aside from itself and the router). In the meantime do you have any other suggestions?
Chris
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
I'll assume when you disabled SNMP, you let the card reboot to actually disable it, correct? (please excuse me but gotta ask these things to be sure based on my experiences!)
On the lower traffic VLAN, how long did it sit there? Event log didn't change at all, right? And no new events stick out at all (i.e. frequent reboots started, etc)?
The only other thing I can suggest is do a format and (maybe you already did this) which gives us the default fresh configuration, see if it slows down, and then lets slowly change the configuration, assuming it does not happen after a fresh format but after you re-configured your settings - what do you think?
I also don't know if its seeming like some type of traffic problem one VLAN if you move a troublesome card to a non troublesome VLAN? (not sure if that matters or is even based on what is where)
Lastly, I don't know if we can somehow utilize the firewall on the management card in some way to deal with this - I am thinking we enable it maybe and then there is a firewall "debug" log that shows the traffic going by real time (for testing purposes) and we can see if anything sticks out to us there and perhaps block certain types of traffic and see if that makes any difference?
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Haha - it's ok! I did reboot the card after I disabled SNMP (there was a banner at the top of the page that reminded me to do so for the changes to take effect).
I currently have it sitting on the lower-traffic VLAN right now and I've seen it drop off at least 2-3 times already. I have not seen any events in the event log that stick out (aside from the network interface rebooting from the changes I made that required the restart).
I will be more than happy to give the format command another go and see if that helps at all...at this point I'm willing to try anything! What I'll do is just configure NTP and the IP address and let it sit overnight to see what happens.
I thought that moving the management card to another, smaller VLAN would help but (at least at first glance) that doesn't seem to be solving the issue. I can take it one step further and place it on a temporary VLAN where it is the only device on the VLAN (aside from the router) but that ultimately would be impractical since this is operated in an enterprise environment and wouldn't be a viable solution (considering that these cards were having no problems before updating).
Instead of using the firewall utility and use the card to see what traffic is passing by I will hook a laptop up (with Wireshark running) and grab a packet capture on the port to see what is flying by - that might provide some insight (however there is only about 4kbps of traffic flowing through the link....so that doesn't seem like it would make any difference but who knows).
Either way I'll fill you in tomorrow morning and let you know what I found!
Chris
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Sounds good.
P.S. - I just encountered the problem with another reply on a different thread on here and it blocked it due to spam It was something in URL for a kbase that time...
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
lol - good to know that I am not the only one having those sorts of issues! Luckily I have not had any issues posting these past few days....
I seemed to have noticed something interesting after we formatted the card yesterday and left SNMP unconfigured.....the card did not cause any issues with our monitoring system at all overnight. I came in this morning and saw that it was running just fine. So I turned on SNMP again and sure enough it seems to have caused the card to start flaking out again. I turned it back off and it seems to have quieted down (although the response times aren't what they were on the previous release.....I'm still seeing occasional spikes of over 1000ms whereas with the last release the max I was seeing was a few hundred ms).
Currently I have SNMP turned on because we really need to be graphing our data and we can't really leave this turned off any longer (even if it will cause issues). That being said - do you still think we need to do a packet capture?
Chris
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
I am not sure - I still don't know how to explain identical set ups are not having the same problem versus this being an obvious 6.0.6 issue - can you think of anything else different as far as SNMP or network traffic goes that would potentially affect certain units and not others?
Is it an option to try SNMPv1 on this device to see if it does it versus SNMPv2c (which uses SNMPv3 set up in our card)?
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:33 AM . Last Modified: 2024-03-05 02:08 AM
Well I was discussing this with some of my co-workers and the only thing that we can come up with is that some of our VLANs carry more traffic than others (however the VLAN that the particular UPS we have been troubleshooting this week is only sends about 4kbps worth of traffic to the port the UPS is hooked up to which isn't really that much and shouldn't be causing behavior like this).
We have a packet capture running right now to see how the UPS is responding to ICMP requests as well as SNMP requests. I have cycled through turning on/off SNMP while I have had the ping running to see if the UPS is taking longer to respond to the requests. Currently I am only running SNMPv1 on the card (since I wanted to slowly add services back on and see how the card reacted). This is when the card started to freak out (so to speak).
Looking at the packet capture I can see that the UPS is actually taking a long time to respond to the ICMP requests and it doesn't seem to be related to the card being polled by SNMP either (about one minute after the management card was polled for SNMP data I spotted a ICMP reply of about 1000ms).
If there's any other information you would like me to provide from the packet capture let me know!
Chris
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:08 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:08 AM
Hi,
Sorry to drag up an old thread but I am experiencing this issue on three management AP9630 management cards, one running v6.1.1 and the other two running v6.2.0. None of our older cards show this issue.
I think I have narrowed down the cause to be arp processing on the APC management card. We have a significant number of Cisco wireless access points which are connected to the same subnet and these send out gratuitous ARPs every 60 seconds. If I put an ACL on the switch port facing the UPS to block ARP packets on egress (obviously not a long term solution) then the problem goes away. Remove the ACL from the switch and the problem comes back. There are approx. 250 wireless access points on this subnet, each sending a gratuitous ARP once a minute, so the APC management card is receiving approx. 4 per second on average.
There doesn't seem to be any way to stop the Cisco access points from doing this and the previous version of AOS didn't seem to have a problem with it.
Mark
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:08 AM
Hi Mark,
I have been working with the original poster on this thread/issue and we (APC/Schneider) believe we have found the issue affecting v6.X.X AOS. We essentially found a problem that shows when the NMC is hit with a high amount of network traffic. This fix will be available in the next version of AOS due out soon and then our various device applications will adopt it shortly thereafter.
If you are interested in trying a beta, understanding the implications that testing is not fully complete on our side, I can send you a link to try assuming you have a Smart-UPS device or Rack PDU 2G (AP8XXX). If you are interested, let me know and I'll private message you some details.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:08 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:08 AM
Hi Angela,
Yes, I would be interested in trying the beta. I can test it on a smart-ups RT 1000 XL fitted with an AP9630 NMC.
Thanks,
Mark
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Hi Mark,
I came back here this morning to specifically post that my issue appears to be resolved. I have been working with Angela over the last year or so trying to isolate the issue I was experiencing and we have finally come to a solution (which she already mentioned). So hopefully when they release the latest version (with the fixed code) this can be solved.
Angela is awesome.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Thanks Chris! While yes, I am awesome I will pass along this comment to the development team who really did most of the work and investigation here. I was just persistent in making sure it happened I am also estatic we can put this issue behind us and our customers can benefit from the fixes and that we were able to pinpoint the problem.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Hello Angela, Chris, Mark... 🙂
Reviving this thread after about a month. I'm experiencing the same problem that Chris first wrote about. We are running the latest AOS available as of now, version 6.2.1. I am seeing high ping times and complete timeouts, and our InfraStruXure Central keeps reporting that various UPSs are down because it doesn't get a ping response. I work for a school district with 40 sites, each of which has anywhere from 1 UPS to over 20 (if the school has VoIP). Most sites have older NMC 1 (AP9617, or in some cases, AP9619 with environmental monitoring) cards, running AOS 3.7.2. Absolutely no problem whatsoever with those. Our newest school, though, which opened last year, has SmartUPS 2200 rack-mount units, all of which have NMC 2 (AP9631). They shipped with AOS 6.0, and were upgraded a while back to 6.1.1. I was seeing the problem with 6.1.1, and I just upgraded them to 6.2.1 and still have the same problem.
Angela and Chris mentioned a fix. What was that? I would love to implement that if I can. Also, Mark mentioned an ACL on the Cisco switch ports to which the UPSs are connected. I'm not a Cisco guy, but I can get around the CLI if I know what commands to use. 🙂 (I'm not familiar with setting up ACLs.) This particular school also has 92 Aruba wireless access points. I don't know if they send out that same ARP stuff that the Cisco APs do as Mark mentioned, but it's quite possible. I haven't run a Wireshark capture yet, but that's next on my agenda. But I was just looking for some feedback.
Thanks!
-- Bryce
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Hi Bryce,
The ultimate fix we will be providing publicly is a firmware fix to tweak behavior of the NMC when there is moderate-high network traffic from ARPs or whatever is going on. I expect this in AOS v6.3.3 and higher for whatever firmware application you and other users have (Smart-UPS, Symmetra, etc).
We have confirmed now with a few customers, including Chris and Mark, that the beta appears to work and that is why we posted that it will be resolved. I am trying to limit how many people get the beta as it is just that - a "beta" but if you're in a pinch, I'll send it to you to test if you promise to upgrade to the production build when available - probably in a month or two (I hope) and will keep it to yourself.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Hello Angela,
I would be happy to try the beta version and see how it works, and yes, when the production release comes out, I will mote definitely upgrade. 🙂 Since the post to Chris where you mentioned the upgrade was back in September, I thought perhaps that upgrade had since been released and was version 6.2.1 (since you and he had talked about trying 6.2.0 and having it not help).
Do you want to send it to me via private message as well?
Thanks!
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
BTW, after running a Wireshark capture on the port where the NMC is connected, while I'm not an expert at Wireshark by any means, I do see a lot of ARP and mDNS traffic. I know this particular school uses a lot of MacBooks and iPads, and they have Apple TVs, which probably explains the mDNS traffic. There is a lot of broadcast traffic in general, and I see a lot of traffic with the description "Gratuitous ARP for aa.bb.cc.dd (reply)" where the source is "Apple_xx:yy:zz" ("xx:yy:zz" being the first 3 octets of a MAC address).
Good ol' Apple, killin' my UPSs... heh 🙂
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Hi Angela,
Good news! The beta fix worked for me as well. Using a simple ping test, I saw much better results. 35 packets, and only 1 dropped, and the rest all had sub-10ms response times. Thank you!!!
Chris, you're right; Angela rocks. 🙂
-- Bryce
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Hi Angela,
Is there any news on when the full release of the fixed firmware will be available? I have just bought a brand new SRT5KRMXLI which comes with an inbuilt AP9537SUM network management card running AOS 6.2.0. This is suffering the exact same problem. Unfortunately, I am unable to use the beta firmware that you provided earlier as that was only for AP9630/9631 cards.
If the full release is still some time away, is it possible to get hold of the beta version for an AP9537? Without the fix the UPS is basically unusable for us as most of the time it fails to respond to our monitoring systems.
Thanks
Mark
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Hi Mark,
It is expected tentatively in December I just saw someone mention today. I heard now too that they made a few other changes under the hood for other stuff and now it is expected to be v6.4.0.
But to clarify, yes, you can load that same firmware I gave you on to the SRT UPS. AP9537SUM is just a [confusing] part number for a modified, embedded mini AP9631 that goes into some Smart-UPS model UPSs like SRT. The firmware I gave you will work. It will work on anything that uses sumx application on NMC2 essentially - so AP9630/31/embedded mini versions.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 04:34 AM . Last Modified: 2024-03-05 02:07 AM
Hi Angela,
Just thought I'd let you know that I upgraded all of our UPSs with AP9630/31s from the beta 6.3.3 code to 6.4.0. I did promise I would do that as soon as I saw that the code was available. 🙂 So far, it's working fine.
Thanks again for providing the beta to me a few months back!
Link copied. Please paste this link to share this article on your social media post.
Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.