APC UPS Data Center & Enterprise Solutions Forum
Schneider, APC support forum to share knowledge about installation and configuration for Data Center and Business Power UPSs, Accessories, Software, Services.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2024-02-23 03:09 AM . Last Modified: 2024-02-23 03:12 AM
Dear all,
this is what happened some days ago.
We had a planned power outage of several hours and we decided to use the occasion as another (battery-)test.
There's a server rack that is powered by a Smart UPS X and all the "native" servers have their PCNS. Everything runs fine.
Recently we added two standalone ESXi machines and both have their own PCNS running in a separate VM (VMware Virtual Appliance).
Here, too, everything looked good, dry tests in the past (unplugging the power plug of the rack and wait) ran fine.
But what we didn't had in mind:
If the network is cut to the rest of the world, we loose DNS functionality. And exactly that seems to be a problem for PCNS when it has to connect the ESXi host.
As I understand, this connection is done via https and the ESXI's certificates are checked. For this the ESXi's fqdn as written in its SSL cert must be resolvable to PCNS, I guess. Is this correct?
If this 'handshake' fails, PCNS will not tell the ESXis to shut down and, well... the UPS just powers down on empty batteries. Bad thing.
We got this error in PCNS's error.log (ESXi's hostname changed to esxi2...):
2024-02-19T07:48:31,620 ERROR Thread-78 com.vmware.vim25.ws.WSClient java.net.UnknownHostException: esxi2.example.com at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:229) - Exception caught while invoking method: CurrentTime
2024-02-19T07:48:31,622 ERROR Thread-78 com.apcc.m11.components.webserver.util.virtualization.vmware.VMWareConnection - validateESXiConnection() - host: esxi2.example.com, RemoteException occurred, attempting reconnection
2024-02-19T07:48:31,622 ERROR Thread-78 com.vmware.vim25.ws.WSClient java.net.UnknownHostException: esxi2.example.com at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:229) - Exception caught while invoking method: Logout
2024-02-19T07:48:31,688 ERROR pool-11-thread-1 com.vmware.vim25.ws.WSClient java.net.UnknownHostException: esxi2.example.com at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:229) - Exception caught while invoking method: RetrieveServiceContent
2024-02-19T07:48:31,689 ERROR pool-11-thread-1 com.apcc.m11.components.webserver.util.virtualization.vmware.ESXiConnectionCallable - call() - failed connecting to: esxi2.example.com, RemoteException: java.rmi.RemoteException: Exception caught trying to invoke method RetrieveServiceContent; nested exception is: java.net.UnknownHostException: esxi2.example.com
2024-02-19T07:48:31,690 ERROR Thread-78 com.apcc.m11.components.webserver.util.virtualization.vmware.VMWareConnection - processConnectionFailure() - failed connecting to host: esxi2.example.com - (RemoteException) Exception caught trying to invoke method RetrieveServiceContent; nested exception is: java.net.UnknownHostException: esxi2.example.com
The corresponding entries in EventLog.txt (IP-address changed to random numbers) :
02/19/2024 07:42:50 UPS has switched to battery power. .3.5.1.5.4.1
02/19/2024 07:47:50 UPS critical event: <b>On Battery</b>. .3.4.9.9
02/19/2024 07:47:50 Shutdown sequence started on Host <b>101.43.76.13</b> in response to UPS critical event: <b>On Battery</b>. .3.4.9.9
02/19/2024 07:48:31 Cannot connect to Host. PowerChute will not be able to issue commands to the Host. .3.4.9.9
This is the same for PCNS v5.0.0 and a slightly older v4.4.1. The ESXi hosts are running v7.0U3c and v7.0U3o
The PCNS_VMware_User_Guide.pdf tells us:
DNS Configuration issues may prevent PowerChute from connecting to the host e.g. a stale
DNS record containing an invalid hostname/FQDN or IP address.
The following exception appears in the Error Log:
VI SDK invoke exception:java.net.UnkownHostException
To reproduce this (without crashing everything again) we removed the DNS server entries from the underlying linux, rebooted (to empty any caches) and re-entered the ESXi's ip and login credentials.
Result: PCNS cannot contact the ESXi host.
As soon as we put an entry for the ESXi host into /etc/hosts this works again, even without access to a DNS server.
But this can't be the final solution. Any thoughts? Something we missed or misunderstood? Why isn't everybody else running into the same problem?
PS: Another idea was to add the (public) IP-address of the ESXi host as "alternative name" into its SSL-certificate. But that's a no-go for our CA.
Cheers
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2024-02-26 05:16 AM
You found the correct solution. If the domain name server is offline the domain names cannot be resolved. This will happen with any TCP connection that relies on domain name resolution. For example disconnect DNS and from any other VM attempt to ping the ESXi host via domain name. The connection will fail. Then run the ping test utilizing the ESXi host IP address. The connection will be successful.
To resolve the issue, add the ESXi hostname and IP address to the the etc/hosts file.
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2024-03-11 06:58 AM
Thanks, @BillP for your reply.
On one hand I'm glad I found the problem and also its solution. But I still can't believe this is it.
I guess most of all UPS protected systems that run ESXi hosts don't have their DNS "included". They will all fail?!
While setting up the UPS and PCNS we already had in mind to only use IP addresses to stay independend from any service outside of this room and we even made changes to the VLAN configuration as we knew that without the non-local VLAN routers some of the nets inside our rack will get disconnected from each other.
But I didn't see a warning that PCNS for VMware is useless per default in an isolated rack. Everyone should be alarmed about this, when the idea was to protect the VMs from crashing.
I know that if I want to talk to a webserver speaking https and use its IP-address instead of its hostname, I'll be warned (SSL_ERROR_BAD_CERT_DOMAIN).
But anyway I can accept this "risk" and proceed. This is how it optionally should work here, too. Either use a strict mode (check certificate/ hostname) or -especially when using IP addresses instead of hostnames- do not rely on resolving the hostname of the webserver and ignore that SSL_ERROR_BAD_CERT_DOMAIN.
As for me this could be a checkbox. That should be fine for the paranoiacs.
So, to summarize:
Either put the ESXi addresses (IPv4+IPv6) into the /etc/hosts of the PCNS's OS (and I guess it would be nice for many admins to have some "howto" for this).
Or additionally install PCNS into every VM, with all the drawbacks...
I understand this is a community driven forum. As there is obviously no other solution for us PCNS users, the next step for me is to open a support ticket.
Or are any committed PCNS developers listening here already?
Bye
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2024-03-13 09:24 AM
We have a how to edit the hosts file. See knowledge base document FAQ000262541
https://www.se.com/us/en/faqs/FAQ000262541/
And a document the explains how to correct the connect error FAQ000265038.
https://www.se.com/us/en/faqs/FAQ000265038/
Also, we plan to implement additional check for common issues in a future release and the check will direct users to the solution.
Link copied. Please paste this link to share this article on your social media post.
Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.