APC UPS Data Center & Enterprise Solutions Forum
Schneider, APC support forum to share knowledge about installation and configuration for Data Center and Business Power UPSs, Accessories, Software, Services.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
Background:
We have two APC RT6000s in a rack where every device they support has 2 power supplies. So UPS01 supports all the left-side PS, and UPS02 supports all the right-side PS. We noticed that these 6000s were connected directly to one of the storage appliances via crossover cables, and this appliance runs a specialized OS so it can ping/monitor the UPS via SNMP, but no telnet/www/ftp.
We didn't like that, so we were trying to change the 6000s' config to have different IPs so we can manage them remotely. Over the weekend a while back we tried to do that, but we learned that the serial port in the back of these things need a non-standard serial cable(Looks to be p/n 940-0024) so we couldn't manage the UPS that way. We also connected the crossover cable to our laptops, and we could ping the IP but never telnet/ssh.
So we left them alone, plugged the crossover cables back to the storage appliance, and flew back to our offices(this was part of a bigger project).
The actual problem:
A week later in reviewing the logs for errors on the storage appliance, we noticed that the NICs that are connected to these UPSs are going down every 24 minutes. And these messages started after we tried to reconfigure the UPSs. They go down at the same time. They come back up 4-5 seconds later, and the interfaces go down again about 24 minutes later(23min-55s to 23min-59s).
Sat May 15 23:15:56 EDT [netif.linkDown:info]: Ethernet e0c: Link down, check cable.
Sat May 15 23:15:56 EDT [netif.linkDown:info]: Ethernet e0d: Link down, check cable.
Sat May 15 23:16:01 EDT [netif.linkUp:info]: Ethernet e0d: Link up.
Sat May 15 23:16:01 EDT [netif.linkUp:info]: Ethernet e0c: Link up.
Sat May 15 23:39:55 EDT [netif.linkDown:info]: Ethernet e0d: Link down, check cable.
Sat May 15 23:39:55 EDT [netif.linkDown:info]: Ethernet e0c: Link down, check cable.
Sat May 15 23:40:00 EDT [netif.linkUp:info]: Ethernet e0d: Link up.
Sat May 15 23:40:00 EDT [netif.linkUp:info]: Ethernet e0c: Link up.
At this point I opened a case with APC, but I haven't got anywhere because I have no ability to collect information since these devices are connected directly to the appliances and they don't telnet/www/ftp/ssh.
Right now, what I'm trying to find out is which is causing these messages. Are the interfaces on the UPS going down and is the appliance simply reporting it, or is the appliance sending something and making the interfaces shut down? I guess I'd like to know which is the cause/effect.
I'm pretty sure there's no power issue because everything in the rack has the ability to report a loss of power to their power supplies, and if it anything power for so much as 1 second we'd be notified(and we've gotten the notifications before).
That's all the indications I have of this issue, and I have 2 conflicting biases:
1. This appliance's interfaces don't go down just every now and then. It takes something like a shutdown or disconnection to report this type of message. With this in mind, I thought the UPSs are going down every 24 minutes.
2. These UPSs aren't connected/linked to each other. There is no reason, that I can think of, that'd have them go down at the same second each time. When we tried to reconfigure these UPSs we tried on the first UPS, realized that we can't, and didn't touch the 2nd one. So, with this in mind, it looks like the filer's reporting the links to go down simultaneously for some reason.
Since I'm a UPS novice I'd like to know if I'm missing some obvious troubleshooting step, or would like to know if anyone has seen something like this.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
Background:
We have two APC RT6000s in a rack where every device they support has 2 power supplies. So UPS01 supports all the left-side PS, and UPS02 supports all the right-side PS. We noticed that these 6000s were connected directly to one of the storage appliances via crossover cables, and this appliance runs a specialized OS so it can ping/monitor the UPS via SNMP, but no telnet/www/ftp.
We didn't like that, so we were trying to change the 6000s' config to have different IPs so we can manage them remotely. Over the weekend a while back we tried to do that, but we learned that the serial port in the back of these things need a non-standard serial cable(Looks to be p/n 940-0024) so we couldn't manage the UPS that way. We also connected the crossover cable to our laptops, and we could ping the IP but never telnet/ssh.
So we left them alone, plugged the crossover cables back to the storage appliance, and flew back to our offices(this was part of a bigger project).
The actual problem:
A week later in reviewing the logs for errors on the storage appliance, we noticed that the NICs that are connected to these UPSs are going down every 24 minutes. And these messages started after we tried to reconfigure the UPSs. They go down at the same time. They come back up 4-5 seconds later, and the interfaces go down again about 24 minutes later(23min-55s to 23min-59s).
Sat May 15 23:15:56 EDT [netif.linkDown:info]: Ethernet e0c: Link down, check cable.
Sat May 15 23:15:56 EDT [netif.linkDown:info]: Ethernet e0d: Link down, check cable.
Sat May 15 23:16:01 EDT [netif.linkUp:info]: Ethernet e0d: Link up.
Sat May 15 23:16:01 EDT [netif.linkUp:info]: Ethernet e0c: Link up.
Sat May 15 23:39:55 EDT [netif.linkDown:info]: Ethernet e0d: Link down, check cable.
Sat May 15 23:39:55 EDT [netif.linkDown:info]: Ethernet e0c: Link down, check cable.
Sat May 15 23:40:00 EDT [netif.linkUp:info]: Ethernet e0d: Link up.
Sat May 15 23:40:00 EDT [netif.linkUp:info]: Ethernet e0c: Link up.
At this point I opened a case with APC, but I haven't got anywhere because I have no ability to collect information since these devices are connected directly to the appliances and they don't telnet/www/ftp/ssh.
Right now, what I'm trying to find out is which is causing these messages. Are the interfaces on the UPS going down and is the appliance simply reporting it, or is the appliance sending something and making the interfaces shut down? I guess I'd like to know which is the cause/effect.
I'm pretty sure there's no power issue because everything in the rack has the ability to report a loss of power to their power supplies, and if it anything power for so much as 1 second we'd be notified(and we've gotten the notifications before).
That's all the indications I have of this issue, and I have 2 conflicting biases:
1. This appliance's interfaces don't go down just every now and then. It takes something like a shutdown or disconnection to report this type of message. With this in mind, I thought the UPSs are going down every 24 minutes.
2. These UPSs aren't connected/linked to each other. There is no reason, that I can think of, that'd have them go down at the same second each time. When we tried to reconfigure these UPSs we tried on the first UPS, realized that we can't, and didn't touch the 2nd one. So, with this in mind, it looks like the filer's reporting the links to go down simultaneously for some reason.
Since I'm a UPS novice I'd like to know if I'm missing some obvious troubleshooting step, or would like to know if anyone has seen something like this.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
i understand that its difficult to get any information from the devices but any little thing would help - similarities between them etc,
i assume that the network interfaces on these UPSs are the AP9619 network management cards? you can tell that by looking at the back of the UPS and seeing where the network cable is plugged into.
is the device thats directly connected to the UPS just pinging it and thats when it detects the link is down or is it just going based off activity?
what type of SNMP activity is happening between the two devices? are you polling for status OIDs, etc?
it would really help to know the firmware version on the network management cards. If you have never updated the management cards, you could pull the card out of the UPS and find the sticker on it that indicates the APP and AOS. before removing the NMC, you are supposed to put the UPS into bypass via the switch on the back of the UPS, remove the card, check the sticker, put it back in, and then remember to take the UPS out of bypass. if you don't want to go to bypass (since that will pass utility power through to your equipment, you could pull the card out with minimal risk to check. before going to bypass, you'd also want to verify that the UPS has no red LEDs on the front of it and appears to be operating normally.
these are just a few things that come to mind to try and identify the cause. the logs from the network management card would really give us some insight if you could at least get one of them - we need to know if the link is going down because the management interface is rebooting or something. this would be indicated in the log. i don't suppose anyone is every near the device to check the status/link LEDs with this 4-5 second link drop happens? we could tell if the card is rebooting from the LED status.
i feel like we're missing something here whether it is a similarity in firmware, network settings, misconfiguration, etc but I have not seen anything like this before.
hope this helps get us started!
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
Hi,
Thanks for your reply.
i assume that the network interfaces on these UPSs are the AP9619 network management cards? you can tell that by looking at the back of the UPS and seeing where the network cable is plugged into.
-I can ask the onsite person who would look at this for me. Is there something specific I can mention to him to look for?
is the device thats directly connected to the UPS just pinging it and thats when it detects the link is down or is it just going based off activity?
-I know it’s not pinging periodically to check, but I’m not sure what makes it report its ports to be down. I suspect that it’s a link-level connection that ceases to exist.
what type of SNMP activity is happening between the two devices? are you polling for status OIDs, etc?
-Based on the spec I read earlier, the appliance uses SNMP traps to monitor UPS, but I’m not exactly sure what all happens there. I do know that those link down messages are not related to SNMP, though, but that’s all I know.
it would really help to know the firmware version on the network management cards. If you have never updated the management cards, you could pull the card out of the UPS and find the sticker on it that indicates the APP and AOS. before removing the NMC, you are supposed to put the UPS into bypass via the switch on the back of the UPS, remove the card, check the sticker, put it back in, and then remember to take the UPS out of bypass. if you don't want to go to bypass (since that will pass utility power through to your equipment, you could pull the card out with minimal risk to check. before going to bypass, you'd also want to verify that the UPS has no red LEDs on the front of it and appears to be operating normally.
-This is good to know, and I might have checked, had I known that. But we probably won’t do this without someone from IT present, and unfortunately this is a remote office with no IT staff.
i don't suppose anyone is every near the device to check the status/link LEDs with this 4-5 second link drop happens? we could tell if the card is rebooting from the LED status.
-I can arrange for someone to observe the APC units around the time this happens, since we can predict fairly precisely when the interfaces go down. Is there a certain pattern that I can ask him to look for?
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
Hi,
Thanks for your help. Your responses had been very helpful.
if you look at my screenshot of the faceplate, there is a link and status LED on the card's network jack. when the card has valid IP settings, the LED is solid green so i'd expect your status LED to be solid green. when the NMC's interface is rebooting, the status LED will rapidly flash - alternating between green and orange. i'd like to know if it starts flashing indicating a reboot or anything else. the other types of LED combos could be any combo of green or orange - flashing rapidly or slowly.
My guy observed that the LEDs, both of them, flash rapidly when the storage appliance's NICs report being down.
Having said that, it looks like I will get to go to the office after all, so I'm planning to take a serial cable I bought on APC's website(p/n 940-0024) to connect to the UPS. Aside from the cable and setting hyperterminal to 2400/8/n/1/no flow control, it doesn't look like there's more to connecting to the unit. Is that accurate? I was reading the user guide and that appears to be all that's needed and I can configure the NMC from there that way.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
yes, as long as this is an SURT6000XLT or XLI, versus an SURTD6000RMXLP3U, you just connect the 940-0024 to the serial port on the UPS and configure the port settings to what you mentioned. press enter a few times and you should get the username and password prompt to access the NMC as long as you see LEDs on the cards. the default username/password is apc/apc.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
Link copied. Please paste this link to share this article on your social media post.
Posted: 2021-06-29 12:22 AM . Last Modified: 2024-03-14 12:50 AM
Background:
We have two APC RT6000s in a rack where every device they support has 2 power supplies. So UPS01 supports all the left-side PS, and UPS02 supports all the right-side PS. We noticed that these 6000s were connected directly to one of the storage appliances via crossover cables, and this appliance runs a specialized OS so it can ping/monitor the UPS via SNMP, but no telnet/www/ftp.
We didn't like that, so we were trying to change the 6000s' config to have different IPs so we can manage them remotely. Over the weekend a while back we tried to do that, but we learned that the serial port in the back of these things need a non-standard serial cable(Looks to be p/n 940-0024) so we couldn't manage the UPS that way. We also connected the crossover cable to our laptops, and we could ping the IP but never telnet/ssh.
So we left them alone, plugged the crossover cables back to the storage appliance, and flew back to our offices(this was part of a bigger project).
The actual problem:
A week later in reviewing the logs for errors on the storage appliance, we noticed that the NICs that are connected to these UPSs are going down every 24 minutes. And these messages started after we tried to reconfigure the UPSs. They go down at the same time. They come back up 4-5 seconds later, and the interfaces go down again about 24 minutes later(23min-55s to 23min-59s).
Sat May 15 23:15:56 EDT [netif.linkDown:info]: Ethernet e0c: Link down, check cable.
Sat May 15 23:15:56 EDT [netif.linkDown:info]: Ethernet e0d: Link down, check cable.
Sat May 15 23:16:01 EDT [netif.linkUp:info]: Ethernet e0d: Link up.
Sat May 15 23:16:01 EDT [netif.linkUp:info]: Ethernet e0c: Link up.
Sat May 15 23:39:55 EDT [netif.linkDown:info]: Ethernet e0d: Link down, check cable.
Sat May 15 23:39:55 EDT [netif.linkDown:info]: Ethernet e0c: Link down, check cable.
Sat May 15 23:40:00 EDT [netif.linkUp:info]: Ethernet e0d: Link up.
Sat May 15 23:40:00 EDT [netif.linkUp:info]: Ethernet e0c: Link up.
At this point I opened a case with APC, but I haven't got anywhere because I have no ability to collect information since these devices are connected directly to the appliances and they don't telnet/www/ftp/ssh.
Right now, what I'm trying to find out is which is causing these messages. Are the interfaces on the UPS going down and is the appliance simply reporting it, or is the appliance sending something and making the interfaces shut down? I guess I'd like to know which is the cause/effect.
I'm pretty sure there's no power issue because everything in the rack has the ability to report a loss of power to their power supplies, and if it anything power for so much as 1 second we'd be notified(and we've gotten the notifications before).
That's all the indications I have of this issue, and I have 2 conflicting biases:
1. This appliance's interfaces don't go down just every now and then. It takes something like a shutdown or disconnection to report this type of message. With this in mind, I thought the UPSs are going down every 24 minutes.
2. These UPSs aren't connected/linked to each other. There is no reason, that I can think of, that'd have them go down at the same second each time. When we tried to reconfigure these UPSs we tried on the first UPS, realized that we can't, and didn't touch the 2nd one. So, with this in mind, it looks like the filer's reporting the links to go down simultaneously for some reason.
Since I'm a UPS novice I'd like to know if I'm missing some obvious troubleshooting step, or would like to know if anyone has seen something like this.
Link copied. Please paste this link to share this article on your social media post.
Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.