APC UPS Data Center & Enterprise Solutions Forum
Schneider, APC support forum to share knowledge about installation and configuration for Data Center and Business Power UPSs, Accessories, Software, Services.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:00 AM . Last Modified: 2024-03-05 01:52 AM
Hello
I manage a fleet of APC ups units in a number of sites that are predominately the older Smart UPS 3000 XL's or some have the newer Smart UPS 3000 X. Previously I had standard network management cards in all the units with no environmental monitoring and was very happy with their performance and reliability. This year I purchased AP9631's for the fleet with temperature probes to replace all of the existing cards. Initially we stopped the deployment of the cards as APC had a limit on the character length of the email address that we were unable to change due to organisational compliance reasons so we only deployed 2 cards under the old o/s which worked for everything except notifications via email. Now that the 'new' o/s is released we have pushed ahead and replaced so far 40 of the cards for about half of the fleet of remote sites. The o/s whilst looking pretty has given us nothing but troubles with the email notifications and now we are starting to find that if you keep playing with the settings you almost get the equivalent of an apc blue screen of death which politely says:
You are attempting to access an APC device
The application you are trying to load is incompatible with the current APC OS. Please verify the correct firmware is loaded
Once the cards get to this point its like a point of no return. Now these cards have 'worked' for weeks at a time until we tried to get the notification services running and tested. 3 of the cards are now at this point where it doesn't matter if I factory reset, reload the firmware again (Which doesn't have a problem during the process) it goes back to this point (Firmware reloads seem to work for around 30 seconds where you see the main page once logged on but says no ups found then goes back to the apc crash screen). With the cards in the field that haven't had the major dummy spit the notification services are erratic in that a notification will come anywhere between 24 to 48 hours later than they should (Eg it says self test passed on 1/1/2014 8am in the email but the email servers don't get the email until 2/1/2014 at say 11am). So for fear of breaking all the other cards we are now scared of touching anything. So am I the only person out there to be having all these troubles or have other faced the same issues and just put up and shut up ?
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:02 AM . Last Modified: 2024-03-05 01:50 AM
Hi Jarrod,
I'll have to take a closer look at all of the helpful information and data you've provided when I back in the office on Thursday of this week (Jan 2).
In the meantime, you can attach your .tar file to this post if in your reply box on the top right you select Use Advanced Editor. A file upload option will be available.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:00 AM . Last Modified: 2024-03-05 01:52 AM
Good morning,
I took a look at the log files (thanks for sending) but through whatever happened, I am not seeing the information I need to in order to verify the issue with the MX record caching is what is causing the reboot. I do believe if you use the IP of the mail server instead of the DNS name, you can eliminate the issue since it should not need to look up the DNS record to send the mail.
Any updates on your end about what's working and not working?
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:00 AM . Last Modified: 2024-03-05 01:52 AM
For anyone who happens to come across this thread and has similar issues to what we have had, We applied ASO 6.2.0 across the fleet in the hope of it being fixed but these issues are still not resolved so I would advise using another vendor if you need to depend upon notification services working stably, maybe in a future release this issue might be dealt with
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:00 AM . Last Modified: 2024-03-05 01:52 AM
Thanks for the followup Angela. Im going to do a bit more testing at the site that I have been playing with. As I mentioned above I changed the secondary ups nmc to the ip address but the issue seemed to remain, however the primary ups touch wood seemed to be ok after the firmware reload and wipe of all settings. It might be a case that I have to do the same to the secondary to get the stability on the card again. I will do a bit more testing later today and let you know how it goes after I get some time to perform the reset
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:00 AM . Last Modified: 2024-03-05 01:52 AM
An update on the units thus far with the testing
For units that have never had email setup
- Using ip of a mail server instead of dns name it works straight away without additional problems touch wood
Existing configured cards with dns name
- Change to ip and save settings, then send test email produces the original freeze issue
- Reboot card and same behaviour with freeze
- Logon via telnet and wipe config as posted previously then put ip for mail server back in seems to work without additional problems touch wood
So for the moment the trial units are running stably with an IP address after we wipe the config and put the ip details back in. Also maybe worth mentioning is the behaviour whereby its almost like the 'last' email in the queue before the current gets sent off when trying to send one. So for example, above when I try to send test email and nothing happens, then reboot the card and on reboot the email comes through for a test. Instead of the ups restart emails you would expect. Hope this info helps in resolving the problem
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:00 AM . Last Modified: 2024-03-05 01:52 AM
I may have spoken to soon, not sure if there is any relation to the main issue or not but once I added a second recipient to the notifications list it all went pear shaped again, but at least it seems if I remove recipient 2 and reboot the card the main notifications work, however test recipient now freezes email again, so Im resorting to running a ups self test to make sure of email flow
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Anytime you had an event happen and you're able to get back to it, let me see those logs. I am still confused on what the issue is and the "freezing" so I hope the logs could report some reasoning or events to me. You can provide that .tar file again.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
To elaborate a little more on what I mean by the freezing, If I were to go to send a test email to recipient 1 the web page reports test in progress and the status never changes and never produces an error, but you also then don't get the email. It seems that the delay of anywhere from an hour to 48 hours is resolved by changing to an ip instead of dns name for the mail server, but if you have the issue above you dont get any emails until another email is meant to come through so eg if I trigger a ups self test I then get the test recipient email but not the ups self test email almost as if its sending the previous one in the queue. Im not sure if the logs will be helpful as it doesn't produce an error, I had a look at the contents of the .tar before sending it last time out of curiousity so unless I could see an error event is there any point ?
Touch wood the cards seem to be a little more stable for the moment if I use an Ip for the mail server and not a dns name and if I only use 1 recipient in the email list and if you don't try and send a test email unless you really need to (We verify emails are flowing now by either initiating a self test or dropping power as its not acceptable for operations if we aren't getting up to the minute notifications)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Hello
I manage a fleet of APC ups units in a number of sites that are predominately the older Smart UPS 3000 XL's or some have the newer Smart UPS 3000 X. Previously I had standard network management cards in all the units with no environmental monitoring and was very happy with their performance and reliability. This year I purchased AP9631's for the fleet with temperature probes to replace all of the existing cards. Initially we stopped the deployment of the cards as APC had a limit on the character length of the email address that we were unable to change due to organisational compliance reasons so we only deployed 2 cards under the old o/s which worked for everything except notifications via email. Now that the 'new' o/s is released we have pushed ahead and replaced so far 40 of the cards for about half of the fleet of remote sites. The o/s whilst looking pretty has given us nothing but troubles with the email notifications and now we are starting to find that if you keep playing with the settings you almost get the equivalent of an apc blue screen of death which politely says:
You are attempting to access an APC device
The application you are trying to load is incompatible with the current APC OS. Please verify the correct firmware is loaded
Once the cards get to this point its like a point of no return. Now these cards have 'worked' for weeks at a time until we tried to get the notification services running and tested. 3 of the cards are now at this point where it doesn't matter if I factory reset, reload the firmware again (Which doesn't have a problem during the process) it goes back to this point (Firmware reloads seem to work for around 30 seconds where you see the main page once logged on but says no ups found then goes back to the apc crash screen). With the cards in the field that haven't had the major dummy spit the notification services are erratic in that a notification will come anywhere between 24 to 48 hours later than they should (Eg it says self test passed on 1/1/2014 8am in the email but the email servers don't get the email until 2/1/2014 at say 11am). So for fear of breaking all the other cards we are now scared of touching anything. So am I the only person out there to be having all these troubles or have other faced the same issues and just put up and shut up ?
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Hmm, I wonder if it is related to this at all (http://www.apc.com/support/index?page=content&country=US〈=en&id=FA177701) even if it does not cause a stack failure and reboot. I am skeptical though on that.
I've definitely done many test emails personally and have not experienced any freezing. I also have dealt with many configuration scenarios/escalations where we haven't noticed this type of issue either.
Can you give me any more details in order to try and replicate the set up - like is your mail server local, located on the same subnet as the card, etc because I am not seeing this unfortunately.
I agree - maybe if you're not seeing anything in the logs, especially no reboots, that the logs likely won't be all that helpful.
Now that I think of it, vhinckle - does this sound similar to what you were experiencing, with the email not being sent until another one happened? And is that still occurring?
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
After deleting the file mentioned above and reloading the firmware again the card seems to be running for the moment. Ill look into the caching of the mx record to see if it temporarily fixes the problem. Thanks for the info
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Hi Jarrod,
I'm sure APC's official tech support will pipe up here, though I wanted to give you a quick reply.
Judging from your symptoms it sounds like:
- You're using AOS606
- You configured email notifications
- The card encountered a problem and rebooted (7 times)
- The card gives up and stops loading the application firmware
Under About->Support on a failing unit, there's a button "Generate Logs" which collects debug information to attach here so we can get a deeper look at the problem.
That said, this sounds like the issue reported here. If it's the same problem, APC is releasing a firmware update soon to address the cause. Until then, two workarounds are 1) making sure the mail server's MX record is cached by the DNS server, or 2) turn off email notifications. You can reset to factory defaults to unconfigure email notifications.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Thanks for the quick reply !
So with the about > support logs unless I can turn it on in telnet I don't think I can do it ?
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Ah, I think that's unavailable without the application firmware running. However all the logs can be downloaded via FTP. You'll want "config.ini", "event.txt", "data.txt", and anything in the /dbg/ directory (more details here).
If you want to reset to defaults from the command line, enter "resetToDef -p keepip" and after that you should be able to load the application firmware back in without it failing after 30s. Well, at least until you configure email which I suspect is the problem here.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Well I issued the factory reset command in telnet but same as before, I get to logon to it and see the home screen, tried to go to about and support and then crash again. I even tried physically removing one of the cards and taking the battery out for a few minutes earlier in the week with the same result. With loading the firmware back, should I be trying to send the firmware again as it had the same behaviour before ?
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Hi Jarrod,
I just came across FA175980 in the APC Knowledge Base (I'd link directly to it, but it'll immediately redirect to frame the page and fail). Reset to Default doesn't work in 606 -- that's actually my fault -- instead in the telnet command line interface do:
Then load in the app firmware. Hopefully that should do the trick.
edit Also, the battery powers the real-time clock but has no effect on stored settings.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
If what voidstar says does not work, worst case, you could downgrade to 5.1.7. Here is some helpful information/resources on some of the items you've encountered and just to be aware of with v6.0.6. It should be a one stop shop for any identified issues, including downgrading if you need to. Network Management Card 2 Firmware v6.X.X FAQ/Upgrade Issues | FAQs | Schneider Electric US
I need to make a knowledge base with the TCP/IP stack issue with the uncached MX record thing.
If you need the older firmware file, let me know. While 6.X.X offers some great features, I understand that several users cannot work around them for one reason or another and may choose to downgrade.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Fingers crossed on mx record cache issue as downgrading brings me back to the original issue with length of email address being a problem and makes the notifications unusable in our environment, will post back how it goes
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Ok so a few observations thus far
- Initial card where I applied the reset as mentioned above is now contactable
- I thought perhaps if mx caching is an issue, then maybe a direct ip of the mail server would be a temporary option (Correct me if I had the wrong way of thinking here ?)
- Applied the ip address for the mail server and put just my email address
- Tried to send test email (In past all test email either never arrived or took 1 - 2 days and never showed an erro
- Test email went straight away
So with this card I will keep monitoring, now the second card at the same site in the DR location
- Adjust the mail server setting to be the ip address of the mail server
- Interestingly once I logged on, all of a sudden 4 email flew through that should have come over the past 2 days for different for basic info events
- I got an email stating I had changed mail server config
- Hit the test button for the email address and the old freeze again and no test email but touch wood the card hasn't crashed yet
- So I went to about and support and tried to generate logs from IE 11 and firefox but neither seem to do anything and the downloaded file is zero bytes
Is there another way for me to get the logs for the second card to you as it might help with analysing the issue ?
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Actually my mistake, didn't wait long enough across vpn for the log generation to respond back in firefox, I have a 20kb tar file is that correct ? And if so where do I send to ?
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
Also if it helps I rebooted the card mentioned above after changing to the mail server ip instead of its name and got some emails during reboot. Then once I logged in I went to recipient test and it stuck on test in progress but I got an email almost like the one in queue before which was a startup one
- So I clicked test recipient and never got the test email
- Got the following instead @ 3:32pm when I clicked test
Name: My UPS
Location: My Location
Contact: Me
Usual Device info
Date: 26/12/2013
Time: 14:28:14
Code: 0x0344
Info: Environmental: Restored and the usual about the temperature sensor
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
wow...so, I'm not crazy! the email issue he's describing is exactly like what I had happen with the one card I upgraded to 6.0.6! Currently, though, my card is stable and NOT doing the weird delay email thing, but I have also not been experimenting with it lately, either. If I remember right, I think if finally stabilized after I had cleared the config (i.e. deleted the original config.ini file), rebuilt it manually (using an uploaded config.ini file caused more problems that it was worth), and left as much on default settings as possible. ...in thinking about it now, the problem definitely seemed to get worse the more I played with the card, almost like it wasn't happy keeping up with notifications the more I played with the settings...
Fortunately, once I got the firmware properly on the card, I never had to deal with crashing from the card (I think there was an issue where the firmware didn't "take" the first time...but, it has been awhile).
Let me know if I can post anything that could help.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:01 AM . Last Modified: 2024-03-05 01:51 AM
I would agree with vhinckle that the more you play with the settings for notificaitons the shakier it gets. The 'stable' pair of cards in the field had the configs completely wiped and firmware reloaded as mentioned and then I manually put the bare essentials back in and used an ip for the mail server and so long as I don't add a second recipient or tweak too many notification settings it appears to work. In terms of reproducing it, maybe add a second or third recipient and change a few notifications on or off then send test email to them all and it may happen. In terms of the link you sent I would say that it corrects the application failed to load and full crash of the card issue when using an ip instead of a dns name, however the notifications not arriving or arriving out of sequence might be something different
In terms of mail server setup, mail servers are offsite to the ups cards in a central facility, but bear in mind the cards worked notification wise from all ups units and were fine until we went to the new o/s on the new cards and we have notifications from a multitude of vendors equipment from servers to switches to nas units all that function as expected and the mail servers are m/s exchange 2010, let me know if you need more info
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:02 AM . Last Modified: 2024-03-05 01:51 AM
I brought this to the attention of our development team who said it sounded familiar. They believe this issue and other issues we've found relating to event logging and event configuration will be addressed within a future release. AOS 6.1.1 is due out soon but only will resolve two specific issues that don't directly relate to this. While that might be worth a try (as well as to fix the two critical issues we know for sure it is addressing), we might need to wait for the subsequent release scheduled after that. AOS 6.1.1 is due within a few weeks as it stands now and the following release I heard was Q2 since it included a larger quantity of bug fixes and enhancements. No official timetable though that I have right in front of me..
Let me know if either of you have questions.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-07-01 05:02 AM . Last Modified: 2024-03-05 01:50 AM
Hi Jarrod,
I'll have to take a closer look at all of the helpful information and data you've provided when I back in the office on Thursday of this week (Jan 2).
In the meantime, you can attach your .tar file to this post if in your reply box on the top right you select Use Advanced Editor. A file upload option will be available.
Link copied. Please paste this link to share this article on your social media post.
Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.