APC UPS Data Center & Enterprise Solutions Forum
Schneider, APC support forum to share knowledge about installation and configuration for Data Center and Business Power UPSs, Accessories, Software, Services.
Posted: 2021-06-29 08:49 AM . Last Modified: 2024-03-12 02:45 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:49 AM . Last Modified: 2024-03-12 02:45 AM
Hi, I have three SmartUPS's with AP9631 network cards in my home, one for the server rack, one for the distribution switch, and one for my workstation.
We had a power failure, and all the UPS's ran out of power and shutdown.
Nothing wrong there.
Problem is the email notifications, I only received notifications from 2 out of the 3 UPS's.
There is a battery backup on the Verizon FiOS ONT, but it is only used for phone voice, not for data, so the internet was down while the power was out.
On restore of power the UPS's came back to life, the one for the distribution switch came online first as it had the lowest power draw.
Only two of the three UPS's sent emails notifications out, when power and internet was restored, the one that turned on first never sent an email.
From the logs I can see that it tried many times to resolve smtp.gmail.com, and I assume it eventually gave up.
I could not find any retry settings.
What is the expected behavior for sending emails when there is a transitional failure?
With transitional I mean anything that can recover and is not the server saying go away like mailbox does not exist or is full.
Is there a way I can change the retry behavior, e.g. retry count, retry delay interval?
P.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:50 AM . Last Modified: 2024-03-12 02:44 AM
I compared your config.ini files but it is very tedious to do. I uploaded portions of them to my NMC so I could see them through the web interface.
The default configuration for most, if not all NMC events, is once you add a recipient, to receive all events, every 1, 2, 5, or 10 minutes, until the condition clears. (Certain events log I said just happened once and don't have this type of configuration available - i.e. user logged in.) This has been this way for years and across all firmware versions. If you wanted to take a peek, the easiest place to do it nowadays is to go in the web UI, go to Configuration->Notification->Event Actions->By events and select the heading of "power events." It will bring you to a screen that shows three columns, If you see a bullet point in the column, such as for email, then at least one recipient is set up to receive that notification. Then, you can click on individual events to see the configuration for each recipient. That is what I did and saw some of the power events were configured for no repeat so they would've been tried once. That is why I asked for the specifics on the names of the events or the event codes so I could check the configuration for those specific events on "Stairs."
You could reset event configuration if you wanted under Control->Network->Reset/Reboot and Reset Only Event Configuration to put it back to the defaults on Stairs or any of the units.
Some of the questions I asked I know you responded to but the answers were still a little incomplete as to what I need. For example, stairs started emailing at some point - what specific event did it first email you after the issue? And for the last question, which specific events did you receive in email that stairs did not give? That way, we can check the logs and configuration at the event code level to see if these event configurations on stairs were indeed different and that is why you did not receive them.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:49 AM . Last Modified: 2024-03-12 02:44 AM
Hi P,
Can you give me the log file dump from one Network Management Card that sent the email and then the one that did not? Instructions here -> How can I download Event, Data, Configuration, and Debug files from my Network Management Card? | FA...
I'll review those and look into the retry process once I validate how you have it configured, what firmware you're using, etc which will be in those log files I ask for. At a minimum, I need event.txt, data,txt, and config.ini.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:49 AM . Last Modified: 2024-03-12 02:44 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:49 AM . Last Modified: 2024-03-12 02:44 AM
Support debug logs are here:
https://dl.dropboxusercontent.com/u/2182768/apc.email.zip
The "stairs" unit did not send its emails.
Thank you
Pieter
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:50 AM . Last Modified: 2024-03-12 02:44 AM
Thanks!
I got the logs if you want to disable that link.
The one difference I do see with the stairs unit it is that it is the only Network Management Card that reports a Network Management Card coldstarted. event meaning the Network Management Card itself lost power. After that, then I see it still could not resolve smtp.gmail.com after. Do you know when it started sending emails again? I couldn't tell for sure if you're receiving updates regarding NTP updates via email but it looked like you were.
Can you tell me how long the outage was and how much runtime each UPS had? Is your home access to the internet backed up by one of these units? If so, which? I was thinking since stairs, according to what you said, was first powered on, that perhaps it still could not reach the internet before access was restored and by the time it wanted to send an email, the condition had already cleared and was no longer true. And which email did you receive from the others, meaning, for which specific events did you expect to receive from stairs compared to what office and garage provided?
There are no retry settings a user can adjust but I just have to research the amount of retries and logic in this department which I hope to understand in the next few days.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:50 AM . Last Modified: 2024-03-12 02:44 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:50 AM . Last Modified: 2024-03-12 02:44 AM
The sequence would be something as follows:
The power went out and all the equipment and local network kept on running on UPS power.
The internet went out shortly after power failed, not sure how long after, the ONT has battery backup but prioritizes power for voice phone only, it shuts down internet and TV.
The UPS's would try to send emails to report the failure, and fail, as there is no internet.
The UPS's ran out of battery and powered down, I assume this happened to all of them.
The "stairs" unit has the least load, and would have lasted the longest.
On power coming back on, the UPS's would power the equipment back up.
The UPS's will try to send emails to report power restored.
The router will take a couple minutes to boot.
The internet will come back online at some point.
If there is no email sending retry, then emails will be lost if the local router or internet mail server is not available at the time the UPS tries to send an email.
I assume there is some kind of retry as some of the messages were delivered.
Maybe the retry count is not high enough or the retry delay is not long enough to account for this, common, type of failure?
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:50 AM . Last Modified: 2024-03-12 02:44 AM
Hi again,
I spent some time looking at this and discussing.
The events in the log are saying Email: Could not resolve 'smtp.gmail.com'. so this is an issue with DNS resolution understandably. How DNS works and retries will factor in to the expected sequence because DNS retries and sending email mechanisms are separate.
I asked some additional questions based on what events you received from the two units office and garage that did send vs. stairs that did not - do you have any feedback on that?
I noticed in another review of config.ini that stairs unit seems to have a different email configuration on certain items than the other two units. That is telling me the configuration has been touched and I just wanted to make sure it is not factoring in to what happened here. I can tell you about the DNS retries and email mechanism retries but it may not be the root cause of why this happened. What I saw was some events on the stairs unit have their delay and repeat interval disabled so they'd only be sent once. Was this a change made by you at some point I asked what specific events the two units that did send mail sent so we could verify the configuration for those same events on the stairs unit to make sure they were not disabled or had their repeat interval modified/disabled because then they'd be only tried once, and only tried again if the condition happened again. So, by default, majority of the events (beyond stuff that only happens once - user logged in, user logged out, etc), are set by default to no delay, repeat until the condition clears, every 1, 2, 5, or 10 minutes, depending on the severity.
What I know now is the NMC's event repeater keeps track of event information. If the event is configured to repeat 10 times, it will send 10 separate events to the email process/task in the system and the email task (where the NMC resolves mail server's name) each time. The NMC does not resolve and cache once because the network could change.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:50 AM . Last Modified: 2024-03-12 02:44 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:50 AM . Last Modified: 2024-03-12 02:44 AM
Hi, I do not recall making changes to the repeat value settings or to events.
The units were purchased at different times and all came with different original firmware versions, maybe that plays a role in what the defaults are.
Where do I edit the delay and repeat intervals, I do not recall ever seeing a menu item for that?
Or is there a way to compare and edit the settings files directly?
Other questions:
"Do you know when it started sending emails again?"
The stairs unit never sent an email about the power fail or restore, it sent email later for other events.
"Can you tell me how long the outage was and how much runtime each UPS had?"
Looking at the logs it was just under an hour, the predicted runtimes reported now is 53min, 23min, 1hour 6min
"Is your home access to the internet backed up by one of these units?"
The switches and router yes, the Verizon FiOS ONT no.
"If so, which?"
Stairs backs main distribution switches, garage backs router.
"meaning, for which specific events did you expect to receive from stairs compared to what office and garage provided?"
I expected all units to send me emails as soon as power was restored and the internet became reachable.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 08:50 AM . Last Modified: 2024-03-12 02:44 AM
I compared your config.ini files but it is very tedious to do. I uploaded portions of them to my NMC so I could see them through the web interface.
The default configuration for most, if not all NMC events, is once you add a recipient, to receive all events, every 1, 2, 5, or 10 minutes, until the condition clears. (Certain events log I said just happened once and don't have this type of configuration available - i.e. user logged in.) This has been this way for years and across all firmware versions. If you wanted to take a peek, the easiest place to do it nowadays is to go in the web UI, go to Configuration->Notification->Event Actions->By events and select the heading of "power events." It will bring you to a screen that shows three columns, If you see a bullet point in the column, such as for email, then at least one recipient is set up to receive that notification. Then, you can click on individual events to see the configuration for each recipient. That is what I did and saw some of the power events were configured for no repeat so they would've been tried once. That is why I asked for the specifics on the names of the events or the event codes so I could check the configuration for those specific events on "Stairs."
You could reset event configuration if you wanted under Control->Network->Reset/Reboot and Reset Only Event Configuration to put it back to the defaults on Stairs or any of the units.
Some of the questions I asked I know you responded to but the answers were still a little incomplete as to what I need. For example, stairs started emailing at some point - what specific event did it first email you after the issue? And for the last question, which specific events did you receive in email that stairs did not give? That way, we can check the logs and configuration at the event code level to see if these event configurations on stairs were indeed different and that is why you did not receive them.
Link copied. Please paste this link to share this article on your social media post.
Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.