EcoStruxure IT forum
Schneider Electric support forum about installation and configuration for DCIM including EcoStruxure IT Expert, IT Advisor, Data Center Expert, and NetBotz
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:50 PM . Last Modified: 2024-04-09 02:11 AM
Hi All,
One question about DCO cluster need your help , If we need shutdown the DCO server for master and slave , what's the procedure for it ?
I just shutdown the server1, and server2 still working ,but it can't login DCO client . so I reboot server2 and the client can work . Why this action can't work under cluster environment ?
In the next step ,I shutdown both of DCO server , and turn on it at few hours later , the client can't login and some error message show on webmin , pls check the attachment and kindly advise what's the reason ,many thanks .
Best Regards
(CID:104175223)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:52 PM . Last Modified: 2024-04-09 02:10 AM
Hi Chu Li Wei
Here is the root cause:
On the 16th of November, node-248 crashed. This meant that on the 21st of November when node-243 was restarted, node-248 was not ready to accept connections.
The work around:
When restarting the master node, it is important to make certain that the slave node is able to take over. This is mainly done through the Webmin page. If this is not the case, the slave node should be restarted first, and allowed to get into a good state, before the master node is restarted.
I will sent you a more detailed report in a direct mail.
Thanks
Jesper
(CID:105456803)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:50 PM . Last Modified: 2024-04-09 02:11 AM
Hi Chu Li, It is not possible to say exactly what is wrong on the servers without the logs. So please collect the server logs from both servers. Regarding the time difference. Are you using NTP? If you are not please try and connect both servers to a NTP server and see if this corrects the problem. Thanks Jesper
(CID:104175230)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:50 PM . Last Modified: 2024-04-09 02:11 AM
Hi Jesper , Thanks for advise, I just download the logs from both of server ,could you provide your mail address to me so that I can send it for you ? So far not use NTP yet , I'll inform customer and try it, many thanks . Best Regards
(CID:104175280)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:50 PM . Last Modified: 2024-04-09 02:10 AM
Hi Chu Li, I have sent you a link to a Box folder. Please upload the log files. Thanks Jesper
(CID:104175337)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:50 PM . Last Modified: 2024-04-09 02:10 AM
Hi Chu Li, Thanks for the client log files, but I will also need the server log files to investigate this issue. Is the server currently running or is it still down ? Thanks Jesper
(CID:104863307)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:50 PM . Last Modified: 2024-04-09 02:10 AM
Hi Jesper , How to download the server log file ? Yse , the server currenyly running . any suggestion ? Best Regards
(CID:104863317)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:50 PM . Last Modified: 2024-04-09 02:10 AM
Hi Log in to the DCO client. Select Help>Download Log Files and save the server and client log files to a location of your choice. Log files are also available from the Webmin server management interface (StruxureWare Data Center Operation>Download server log files). Just to make sure I understand your request completely. 1 You would like a description on how to shutdown cluster nodes correctly. 2. Investigate the reason for the error message described in the request. What did you do to get the server running again? Thanks Jesper
(CID:104863321)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:51 PM . Last Modified: 2024-04-09 02:10 AM
Hi Jesper, 1. Yes , I'd like to know the procsdure for start / shutdown . 2. I want to know why cluster fail when I shutdown one of both server (the operation stop by another server), is it relate time different between these two servers ? I follow the procedure which Jef provide me previously , pls refer below SOP: 1. shut-down/turn off the "slave node" (hint: go to webmin > StruxureWare DC Operation > Status, to identify the slave node) 2. reboot the master node 3. now turn on the "slave node" and let it boot. Best Regards
(CID:104863328)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:51 PM . Last Modified: 2024-04-09 02:10 AM
Hi Jesper , I just upload the server log in box ,pls refer it and help me find the solution ,many thanks . Best Regards
(CID:104863400)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:51 PM . Last Modified: 2024-04-09 02:10 AM
Hi Chu Li, We have looked at the logs but there appears to several errors, so to narrow down the problem could you please let us know when the problem occurred? At what time was the users not able to log in? We would also like to know your timezone since the server is running in UTC timezone. We are working on a new page on EcoStruxure IT Help Center that describes how to turn of nodes in a cluster. It will be available shortly. Thanks Jesper
(CID:104863457)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:51 PM . Last Modified: 2024-04-09 02:10 AM
Hi Jesper , We got customer's request that they need suhtdown all servers cox of maintenance last Saturday, So I shutdown one of node (node1) in the morning, few mins later I found that I can't login client , check another one(node2) , the operation had stop , it can works after I reboot the node2 (node1 still offline) . In the afternoon , I also shutdown node 2 , after completed the maintenance ,I reboot both of node but the application couldn't work . so follow the SOP to reboot again , 1. shut-down/turn off the "slave node" (hint: go to webmin > StruxureWare DC Operation > Status, to identify the slave node) 2. reboot the master node 3. now turn on the "slave node" and let it boot. hope this would help ,thanks .
(CID:104863497)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:51 PM . Last Modified: 2024-04-09 02:10 AM
Hi Chu Li, What time (local time) did this happen ?
(CID:104863502)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:51 PM . Last Modified: 2024-04-09 02:10 AM
Hi Jesper , I shutdown one of node (node1) in the morning (about 08:30 AM to 09:00 AM), few mins later I found that I can't login client , check another one(node2) , the operation had stop , it can works after I reboot the node2 (node1 still offline , about 09:30 AM to 10 AM) . In the afternoon , I also shutdown node 2 (about 01:30 PM to 2 PM) , after completed the maintenance ,I turn on both of node but the application couldn't work (about 04:30 PM to 5 PM) . Final reboot time on 05:30 PM to 6 PM . Above for your reference ,thank you . BTW , so far the DCO appears offline message sometimes ,and appears re-connect successful few mins later every time , does this also show in server log ? Any concern about this ? Best Regards
(CID:104863503)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:51 PM . Last Modified: 2024-04-09 02:10 AM
Hi Jesper , Customer found error message about DCO can't received data and alarm from DCE server , Does this issue relate to the time synchronization? Does our DCE can be a time server that all DCO can point to DCE as time synchronization? Many thanks . Best Regards
(CID:105456578)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:51 PM . Last Modified: 2024-04-09 02:10 AM
Hi Jesper , Good day . Any update for this case ? so far the communication lost message appear very often , I don't know what's the problem on it , do I need provide you the newest server log ? Your help would be appreciated . Best Regards
(CID:105456703)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:52 PM . Last Modified: 2024-04-09 02:10 AM
Hi Chu Li, We are still investigating the logs to find the reason for the failed cluster shutdown. I hope to have the details very soon. We are also working on describing the correct procedure for shutdown of the cluster. I don't think the communication issues are related to the issues with the cluster. This sounds like a communication issues between the devices and the DCE server. Would it be possible for you to create a new EcoStruxure IT Help Center question on this issue. Since this is not directly related to cluster. Thanks Jesper
(CID:105456706)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:52 PM . Last Modified: 2024-04-09 02:10 AM
Hi Chu Li Wei
Here is the root cause:
On the 16th of November, node-248 crashed. This meant that on the 21st of November when node-243 was restarted, node-248 was not ready to accept connections.
The work around:
When restarting the master node, it is important to make certain that the slave node is able to take over. This is mainly done through the Webmin page. If this is not the case, the slave node should be restarted first, and allowed to get into a good state, before the master node is restarted.
I will sent you a more detailed report in a direct mail.
Thanks
Jesper
(CID:105456803)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:52 PM . Last Modified: 2024-04-09 02:10 AM
Hi Jesper , Really appreciated for kindly analysis this issue , except waiting the new version coming , Any action that we can prevent this issue happen again ? Many thanks . Best Regards
(CID:105456959)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:52 PM . Last Modified: 2024-04-09 02:10 AM
Hi Chu Li Wei To prevent this issue in the future the key thing is to ensure that both nodes in the cluster is running probably before shutting down any of the nodes. So check web min before turning off nodes. Thanks Jesper
(CID:105456997)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:52 PM . Last Modified: 2024-04-09 02:10 AM
Hi Chu Li Wei Under normal circumstances shouldn't DCO nodes crash automatically, but it is of cause always recommended to monitor the operation of the server to ensure that it is running correctly. This is done in Webmin and in the logs files. If this happens again would we like to see the logs so we can analyze why the server crashed in the first place. Thanks Jesper
(CID:105457347)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:52 PM . Last Modified: 2024-04-09 02:10 AM
Hi Jesper , Thank you for kindly explanation , I will keep monitor it and get back to you if any issue happened. Best Regards
(CID:105457396)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:52 PM . Last Modified: 2024-04-09 02:10 AM
Hi Li Wei, If customer don't have NTP server, you can enable the DCE as NTP server, point both DCO server NTP to the DCE Server. Thanks, Daniel
(CID:105456839)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:52 PM . Last Modified: 2024-04-09 02:09 AM
Hi Jesper , Yes ,I know this , but I mean any other way to prevent the DCO server crash automatically ? Best Regards
(CID:105457397)
Link copied. Please paste this link to share this article on your social media post.
Link copied. Please paste this link to share this article on your social media post.
Posted: 2020-07-02 05:52 PM . Last Modified: 2023-10-31 10:52 PM
This question is closed for comments. You're welcome to start a new topic if you have further comments on this issue.
Link copied. Please paste this link to share this article on your social media post.
Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.