Geo SCADA Knowledge Base
Access vast amounts of technical know-how and pro tips from our community of Geo SCADA experts.
Link copied. Please paste this link to share this article on your social media post.
Originally published on Geo SCADA Knowledge Base by Anonymous user | June 09, 2021 05:27 PM
📖 Home Back
This page describes in detail how the Query Processor's join optimisation algorithm works. It is intended to help people construct queries that can be optimised efficiently and won't be rejected. Knowledge of SQL, particularly joins, is assumed. Some knowledge of the mathematical concept of permutations would be helpful.
The two main factors that affect how efficiently a join can be processed are the order of the tables in the join, and how each table is accessed (called its access path). The combination of join order and each table's access path is known as the access plan for the query. A table's access path is determined by its position in the table order, the join type and certain sub-clauses of the WHERE clause that reference the table. Note that a query that only selects from a single table is still classed as a join for the purposes of optimisation, though in this case the order of tables is irrelevant.
The basic algorithm for join optimisation is to try each possible permutation of table order, determine the access path of each table, then calculate an execution cost for that access plan. The access plan with the lowest cost is then used for execution.
There are five possible access paths for a table, which are (in order of preference):
An access path of Scan, which is the default path, is the most expensive as the Query Processor simply iterates around all rows in the table. A Lookup access path is preferable, as the Query Processor can go directly to the required row or rows using an indexed column. If the indexed column is a unique index, such as the Id of an object, the access path is Unique Lookup, otherwise it is a Non-Unique Lookup. If the value of the indexed column is specified in the query using a Constant Expressions, the access path is Lookup By Value, otherwise it is Lookup By Column. An index column has an associated cost which indicates the relative expense of lookup up a row using that column. A column with a lower cost is preferred to a column with a higher cost. For example, is is quicker to look up an object by Id than by FullName.
The following table lists the index columns for the various table types, with the lowest cost columns listed first:
Table Type | Unique Indices | Non-Unique Indices |
---|---|---|
Object | Id, FullName | ParentGroupId, Search Keys |
Aggregate | - | Id |
Alarm Condition | - | Id |
Event Journal | RecordId | Id, Source, FileId |
Point Historic | RecordId | Id |
Processed Historic | RecordId | Id |
Forecast Historic | RecordId | Id |
Data Grid | - | - |
Data Table | - | As configured |
Permits | Id, Name | - |
The access path of a table is determined by the sub-clauses that reference that table. A sub-clause is a simple condition that contains no AND operators, for example:
FullName LIKE '%Pump%'
RecordTime BETWEEN TIMESTAMP '2007-01-01 00:00:00' AND TIMESTAMP '2007-01-01 23:59:59'
Id = 123 OR Id = 456
In addition to an access path, a table also has a filter condition, consisting of any sub-clauses from the clause list that refer to that table and aren't used to determine the access path or as constraint columns. The filter condition is evaluated for each row of the join to determine if that row should be in the resultset. A sub-clause that references more than one table will only be assigned to one of those table's filter conditions, which table that is depends on the table order.
A constant expression is an expression that consists only of literals, parameter markers and constant functions such as CURRENT_TIMESTAMP, for example:
In the various expressions and conditions used throughout this page, Value1, Value2 etc. represent constant expressions.
The algorithm to determine the best access plan consists of the following steps:
The initial list of sub-clauses is built from the join specifiers and the WHERE clause:
Once the initial sub-clause list has been built, sub-clauses that in certain forms are transformed into one or more equivalent sub-clauses that are easier to use for determining access paths. This allows queries to be expressed more precisely, or to convey more semantic information to a human reader, and still be optimised efficiently. The possible transformations are:
If there are multiple sub-clauses of the form Column ''op'' Value, where ''op'' is one of comparison operators <, <=, > or >=, and Value is a literal value (not a parameter), then these are collated into a single < or <= sub-clause and/or a single > or >= sub-clause with the appropriate values.
Each sub-clause is categorised into one of five types, depending on its form and the number of tables referenced.
The permutations of the table order are determined by the number of consecutive initial inner joins. If the query contains an outer join, subsequent tables will not be reordered as outer joins are not commutative. Only the first five inner joins will be reordered (for a maximum of 120 permutations), subsequent tables will not be ordered. It is the responsibility of the person writing the query to ensure the query can be efficiently optimised if it contains outer joins or more than five tables. For example:
For each permutation of table order, the access and link sub-clauses are assigned to tables to determine the access paths, using the following rules:
The cost of an access plan is the product of the costs of each of the tables in the join. The cost of a table is determined by the table type and its access path. The cost of a table with a Unique Lookup access path is always 1. All other cost calculations are approximations that can be performed quickly:
Table Type | Scan Cost | Non-Unique Lookup Cost |
---|---|---|
Object | Object count / 100 | Average no. object per group, no. objects in search map |
Aggregate | Aggregate count / 100 | 1 |
Alarm Condition | Alarmable object count / 33 | 1 |
Event Journal | Total no. records | Average no. records per object |
Point Historic | - | Average no. records per point |
Processed Historic | - | Average no. records per point |
Forecast Historic | - | 100 |
Data Grid | No. rows in grid | - |
Data Table | No. rows in table | No. rows in table |
Permits | No. permits | - |
The chosen final access plan is the plan with the lowest cost. If multiple plans have the same cost, the chosen plan will be the first one that was calculated.
The cost of each access plan and the chosen final access plan can be found in the database log file:
06-DEC-2007 11:49:42.578 0F8C [SCX] 2:12 Prepare( IN: GetColumnEnums false )
06-DEC-2007 11:49:42.578 0F8C [SCX] 2:12 SQL: SELECT H.RecordId, H.RecordTime, H.Value FROM CDBHistoric AS H JOIN CDBPoint As P USING (Id)
06-DEC-2007 11:49:42.593 0F8C [SCX] 2:12 (PHIL::CQPAccess) Join Permutation 0: CDBHISTORIC = Scan, CDBPOINT = Unique By Column; Cost = -1
06-DEC-2007 11:49:42.593 0F8C [SCX] 2:12 (PHIL::CQPAccess) Join Permutation 1: CDBPOINT = Scan, CDBHISTORIC = Non-Unique By Column; Cost = 18.2
06-DEC-2007 11:49:42.593 0F8C [SCX] 2:12 (PHIL::CQPAccess) Final Access Plan: CDBPOINT = Scan, CDBHISTORIC = Non-Unique By Column; Cost = 18.2
06-DEC-2007 11:49:42.593 0F8C [SCX] 2:12 Prepare( OUT: hr 0 -> Ok., TableId 0x2 )
A cost of -1 indicates that the access was rejected because it contained a scan of a historic table.
Certain clauses that reference a single table may be chosen as constraint sub-clauses. A constraint sub-clause reduces the amount of data searched by an access path of Scan or Non-Unique Lookup. A sub-clause that is used for a constraint will not be used to determine a table's access path or added to a table's filter condition, even if it meets the requirements for an access sub-clause.
A constraint sub-clause must reference a constraint column (which may or may not be an indexed column) and must be in a specific form. The possible constraint columns and their required forms are:
Table Type | Constraint Columns | Required Form |
---|---|---|
Event Journal | Id, Source | Column = Value1 |
Event Journal | Id, Source | Column = Value1 OR Column = Value2 OR ... |
Event Journal | Source | Source LIKE Value1 |
Event Journal | Source | Source LIKE Value1 OR Source = Value2 |
Event Journal, Historic | RecordTime | RecordTime ''op'' Value1 (where ''op'' is any comparison operator other than <>) |
Multiple constraints constraints can be applied to the same column, and a table can have constraints on multiple constraints. Each constraint reduces the amount of data to be searched and can vastly improve the performance of a query.
A database index is an internal set of all the objects or aggregates in the database derived from a particular class. These are used to quickly iterate around all objects of a class, by both the Query Processor and other areas. Having an index for every class would be prohibitively expensive in terms of memory, so only certain strategic base classes have an index. When calculating costs for an object or aggregate table, the nearest base class with an index is used, for example when scanning CPointAlgManual, the index for CDBPoint is used. The classes that have indices are:
Link copied. Please paste this link to share this article on your social media post.
Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.