It is possible, although very rare, that the DBServer process becomes frozen and does not respond to any client (ViewX shows offline for that server, server icon on the desktop is grey and any other servers cannot connect successfully to it). In these scenarios it is likely that a database lock has not been released correctly by a database thread and so that when another database threads requires a databack lock they must wait, and wait, and wait
ClearSCADA 2015 R2 (v77) contains deadlock detection of a Main server from the other hot-standby servers, as part of the polling between servers. In v78 a watch dog mechanism which allows detection and automated restarting of the affected server will be available.
Usually you need to kill the DBServer.exe process using Task Manager and manually start the service for the server to return to operation. Killing the DBServer.exe process is a severe action which will lose any configuration changes or value data not yet flushed to disk or to the Standby servers. It is also possible that the database became deadlocked midway through saving the database to disk, in which case the database may also be considered corrupt at startup.
Only 'you' know your system and how it behaves under typical usage; before killing a server which is/was Main it is worth discussing what was happening on the system that could have caused the problem. Standby servers will resync from the Main server, assuming it is operating correctly, when restarted so the risk in this case is a lot less. Although, in this scenario, the urgency to recover the Standby server is also usually lower.
However, before you kill the process there are some steps to take to ensure that enough information is gathered to ensure that technical support are better able to diagnose the problem and provide resolution.
Gather Database Log Files
As documented in Problem Reporting - Getting the Right Information you need to collect the database log files and snapshot files from the server that is experiencing issues, ideally at least an hour before the server froze to at least five minutes into the freeze. However, if more information either side is available more information may be better.
Additional logging, although this cannot be enabled if ClearSCADA is currently deadlocked:
'Read/Write Lock Diagnostics' is a ServerStatus and DBSnapshot logging feature which should be enabled (via ServerStatus | Database | Read/Write Lock Diagnostics' and right-click column headings in the main pane) which is a modest logging overhead and provides summary information about database lock usage.
Lock (LCK) database class logging having been enabled before the freeze starts is extremely useful however is not enabled by default and when enabled will generate a lot more logging reducing the coverage of the log files. If this problem keeps occuring then on subsequent restarts ensure this LCK logging is enabled via the ServerStatus program, and if possible the number of DB log files are increased to ensure coverage is sufficient.
If the server is in a Standby state, also collect the log files from the Main server for the same time period where it is possible to ensure that any information from the Main that could have cause the Standby server to freeze is caught for analysis.
Generate a DBServer Minidump
To allow for a better understanding of what the DBServer process was doing at the time of the deadlock a minidump needs to be generated for analysis. The follow section discusses three options, ranging from simple (Task Manager) through to advanced (DebugDiag).
The size of the dump will depend on the tool used and will typically be the size of the virtual bytes of DBServer.exe, depending on the options selected.
Using Task Manager
Producing a dump (MDMP or DMP) file can be achieved via the Windows Task Manager or, often more conveniently and successfully, via an external diagnostic tool created for the purpose. Windows Task Manager on a 64-bit Windows operating system will have two versions, a 32-bit and 64-bit version, and the correct one must be used for target process being dumped.
Procdump.exe is a command line application from SysInternals (http://technet.microsoft.com/en-us/sysinternals/default) that will dump a process' current details to a .mdmp file. This file can then be provided to technical support to aid in the analysis of the log files. You will also need psexec.exe, also from SysInternals, to be able to generate a minidump file for DBServer.exe.
The following process assumes you are comfortable and understand the usage of the Windows Command Prompt.
Using this process on a healthy process will cause it to lock up whilst the minidump is generated, which may take a few minutes. Do so at your own risk
1. Copy procdump.exe and psexec.exe to a suitable location on your server. Ensure sufficient disk space exists to store a file size the size of the current virtual memory usage of DBServer. Also install a suitable file compression tool if needed.
2. Open up a Command Prompt. If you're running Windows Server 2008 R2 or Windows Server 2012 R2 then run it as administrator by right-clicking the choosing the 'Run as Administrator' option.
3. Change directories to the location that you copied the procdump and psexec executables to.
4. Using psexec run a Command Prompt as the 'SYSTEM' user. This can be done with the following command: psexec -s cmd
5. If successful, you will likely be changed to the %WINDIR%\System32 folder. Change back to the directory with the SysInternal executables.
6. Run the following command:
procdump -accepteula dbserver.exe normal.mdmp
This should quickly generate a small minidump file of a few MB in size. The -accepteula switch will automatically accept the dialogue that would normally be shown to a user to accept before the process is ran however as you're running as the SYSTEM account there is nowhere to show the dialogue and instead the process will appear to hang. Refer to the Microsoft website for the EULA for this software, or temporarily run the procdump.exe process as your user to get the dialogue shown.
7. Run the following command:
procdump -accepteula -ma dbserver.exe full.mdmp
This will generate a much larger minidump file, basically the entire memory of DBServer.exe and depending on how much memory DBServer.exe is using may take a long time.
8. Provide the normal.mdmp file, along with the log files, to technical support for analysis.
9. Recover the server to operational usage.
10. Compress the full.mdmp (it can be compressed down to about 5% of size using certain compression methods) and store it in a safe place should it be required for analysis.
Note that some company security policies may require the full minidump files be stored in a secure location similar to ClearSCADA database backups as the data contained within that minidump may contain sensitive information such as IP addresses and user names should someone want to data mine the contents*
DebugDiag will provide similar functionality to the above tools, as well as including more information within the minidump file allowing a more detailed analysis.
Performance and Memory Problems - Using DebugDiag to track Memory Usage
In addition to permission to install the DebugDiag tool, consideration needs to be given to the storage of report files and the very large DMP files output. A DMP file will consist of the Virtual Memory (whole footprint of the process) and the extra information collected, likely to be many GB in size. How will these large DMP files be transferred?
Configuration / Execution
This description covers the approach of creating a rule for the DBServer process, so that the folder location of the reports and dump files can be configured. Using the simple approach of initiating memory analysis from the process list could result in a default location being used and that disk/partition being overrun with DMP file content and affecting the whole host's performance.
Open the collection tool from the Windows, Start, All Programs: