Issue
This is a detailed explanation of extended logging gaps, performance and load balancing
Product Line
Andover Continuum
Environment
- Continuum
- Windows XP
- Windows Server 2000
- Windows Server 2003
- Windows Server 2008
Cause
- System overloads
- Slow server and/or server crashes
- Server CPU Usage running at high percentage rates
- Low Physical Memory Available
- Gaps in Extended logging data
Resolution
This article discusses factors that need to be considered when extended logging is being used to log a large number of points. There are too many variables involved in the Extended logging engine to place specifics on the number of extended logs that can be used. Performance will be contingent on many hardware, software and configuration variables. It is best practice to make one configuration change at a time and wait for the results prior to making an additional change.
The SQL server version will affect extended logging performance
There are limitations to different versions of SQL server.
For instance, MSDE has a 2 Gig limit and if this is exceeded, it will cause corruption in the database and the database may have to be rebuilt. SQL Standard and Enterprise versions allow a certain number of processors and RAM to be utilized by SQL. This is a link to Microsoft’s web site that compares the versions.
http://www.microsoft.com/sql/prodinfo/features/compare-features.mspx
PC hardware such as hard drive access times, RAM amount and speed as well and CPU speed and quantity of processors will directly affect extended logging performance. While always at a minimum following the hardware requirements in the Continuum Cyberstation installation manual, there are times when more processing power and memory are needed. Sites that are performing large amounts of extended logging are definite candidates to consider going beyond the recommended hardware requirements.
While 64-bit machines are not supported for running Cyberstation software [before v1.94], they can be used to increase performance on SQL server side. One benefit of using a 64-bit OS and SQL2005/2008 64-bit versions is increased memory capabilities and well as the ability to have several processors. Server hard drives with faster read/write capacities can also be improved. Sites that fall under some of the items discussed below should consider the benefits of faster servers to better handle their Lan site communications and data storage needs.
How many Cyberstations are being used?
As a general rule, we do not recommend using 1 Cyberstation for logging greater than 2000 points. This does not take into consideration the logging interval. 1 Cyberstation as a stand alone could be overwhelmed by 2000 points if the points are configured at small intervals with a large number of entries. In the case of a LAN site with several thousand extended log requests, more than 1 Cyberstation may be needed for load balancing.
What is load balancing?
Load balancing was implemented to divide extended logging work among Cyberstations based on the overall number of log entries per hour. Each Cyberstation will be logging on average approximately the same number of entries per hour, but the number of points logged by each workstation may be different. The algorithm looks at the timestamp of the log and if it’s not logged for a specific Cyberstation within a certain timeframe + the Global interval then it’s offered up to the rest of the Cyberstations (pool) to load balance equally.
How to identify gaps in data:
Run a report for extended log data and you notice that the regular interval for points do not show based on the timestamps.
Run the following SQL query to verify the data in the database
select * from extendedlog order by datetime desc --Selects all logs in descending order
Select top 5000000 * from extendedlog --Selects the top 5 million rows (adjustable)
When gaps are identified, a number of steps may be examined to eliminate the gaps.
- Offline controllers will buffer data as long as they can, but if left offline for long enough then data could be lost as it is first in first out. Normally extended log data will be requested from the controller by Cyberstation once it has come back online and the data will be filled in from the most recent to the oldest data.
- Check the Continuum error logs.
- If there is a large number of errors being processed, this can contribute to a draw on needed resources on the system. Resolve the issues in the Continuum error log to reduce the resource overhead.
- Run Acctrace.exe for a period of time. This can often give some additional information about extended logging that is not reported in the Continuum error logs.
- Determine how many Cyberstations are being used for load balancing for extended logging.
This can be done by verifying in the Cyberstation preferences setting.- ‘Download Extended Log data from controllers on the LAN is set to True
- To verify the number of Cyberstations being utilized in the database
- Run a query against the database.
- Select * from ExtendedLogReservation
- Observe the ReservationWSLo column, points are evenly assigned to a unique id
- Any new Cyber added to the pool of workstations for extended logging should be assigned within a couple minutes.
- Examine the Hard drive I/O subsystem
According to SQL server performance data, it is recommended to have multiple smaller drives on a hard drive array as opposed to fewer larger drives for better access times. - SQL settings
- Are any governor settings enabled? (MSDE has one by default)
- Determine the number of processors that are enabled on the SQL server itself. These need to be selected to be used.
- Is Boost SQL Server priority selected? If not, enable it.
- If a database has moved to another PC or an upgrade was done SP_CHANGEDBOWNER 'andover97' must be run or it can cause db problems including extended logging to no work at all or only partially.
- SQL profiler can be used to examine if there are issues within SQL that could use specific tuning. For example, certain queries or stored procedures could be isolated that have high CPU usage. Help for using SQL Profiler can be found on Microsoft’s web site or from SQL admin literature.
- Truncating tables
Regular maintenance of removing extended logs can increase performance by reducing the number of rows that the need to be read and inserted into the database.- Extended Log Automatic Purge Interval (days)
This is the number of days to keep extended log data in the database before purging it. Extended log data is stored in a single internal table, so purging the data after a specified number of days prevents this table from becoming too large. A value of 0 ensures that the data is never purged. - If performance is so bad that it is affecting normal operations it may be desirable to directly purge records from the database.
(It is best practice to perform a database backup prior to any delete query.)
Also, ensure there is a database backup if the data needs to be saved at various points in time. - This query DELETES all records greater than 3/10/08 14:43:00 from extended log table
delete from extendedlog where datetime >= ‘2008-03-10 14:43:00.000’
- Extended Log Automatic Purge Interval (days)
- Increase the interval that extended logs are retrieved in points with small interval times.
- A listview with the ExtLogEnable and ExtLogInterval columns added can be run for numerics, inputs and outputs to indicate which logs have an interval of zero or a small number for the interval.
- Increase the interval and reduce the amount to reduce the number of requests.
- Increase the general preference Extended Log Database Check Interval
- The interval at which Continuum will check the database for points that are enabled for automatic extended logging. It checks the extended logging table, and uploads the point data. The default is 5 minutes.
- Increase this Database Check Interval if the system is unable to process all extended logs requested in this timeframe.
- Extended Log Maximum LAN Buffer Interval (minutes)
- This is the maximum time interval, in minutes, that CyberStation may check each controller on a LAN network for new extended logging data. This determines how long Cyberstation will allow the controller log to accumulate new data before downloading it. This setting reduces a small amount of overhead when set to a higher value, and allows more up-to-date extended logs when set to a lower value.
- Examples: If this is set to 1 minute and a point is logged every 10 seconds then Cyberstation will wait one minute and download 6 entries for that point. If this is set to 1 minutes and a point is logged every hour, Cyberstation will only try to download the data every hour.
- Increase the Extended Log Database Check Interval and stagger Continuum startup
When it is necessary, enable logging on several Cyberstations for load balancing. If additional Cyberstations have been added and the system is still having trouble keeping up with logging demand, implement the following procedure.- Increase the Continuum General Preference Extended Log Database Check Interval to 60 minutes
- Stagger the start up times of the Cyberstations to 10 minutes apart.
Staggering the startups as well as extending the time should allow the load balancing to level off by enabling the load balancing engine to utilize all of the Cyberstation’s resources and disperse the processing. Adding several Cyberstations at the same time could potentially cause the load balancing engine to get into a race condition where it’s trying to reassign all the points at the same time while still logging data.