Help
  • Explore Community
  • Get Started
  • Ask the Community
  • How-To & Best Practices
  • Contact Support
Notifications
Login / Register
Community
Community
Notifications
close
  • Forums
  • Knowledge Center
  • Events & Webinars
  • Ideas
  • Blogs
Help
Help
  • Explore Community
  • Get Started
  • Ask the Community
  • How-To & Best Practices
  • Contact Support
Login / Register
Sustainability
Sustainability

Join our "Ask Me About" community webinar on May 20th at 9 AM CET and 5 PM CET to explore cybersecurity and monitoring for Data Center and edge IT. Learn about market trends, cutting-edge technologies, and best practices from industry experts.
Register and secure your Critical IT infrastructure

SQL Join Optimization

Geo SCADA Knowledge Base

Access vast amounts of technical know-how and pro tips from our community of Geo SCADA experts.

cancel
Turn on suggestions
Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Home
  • Schneider Electric Community
  • Knowledge Center
  • Geo SCADA Knowledge Base
  • SQL Join Optimization
Invite a Co-worker
Send a co-worker an invite to the portal.Just enter their email address and we'll connect them to register. After joining, they will belong to the same company.
You have entered an invalid email address. Please re-enter the email address.
This co-worker has already been invited to the Exchange portal. Please invite another co-worker.
Please enter email address
Send Invite Cancel
Invitation Sent
Your invitation was sent.Thanks for sharing Exchange with your co-worker.
Send New Invite Close
Top Labels
Top Labels
  • Alphabetical
  • database 32
  • Web Server and Client 31
  • WebX 19
  • Request Form 18
  • Lists, Events & Alarms 16
  • ViewX 15
  • Application Programming 12
  • Setup 12
  • Telemetry 8
  • Events & Alarms 7
  • Lists 7
  • Mimic Graphics 7
  • Downloads 6
  • Support 5
  • IoT 5
  • SCADA 5
  • Geo SCADA Expert 5
  • Drivers and Communications 4
  • Security 4
  • DNP 3 3
  • IEC 61131-3 Logic 3
  • Trends and Historian 2
  • Virtual ViewX 2
  • Geo Scada 1
  • ClearSCADA 1
  • Templates and Instances 1
  • Releases 1
  • Maps and GIS 1
  • Mobile 1
  • Architectures 1
  • Tools & Resources 1
  • Privacy Policy 1
  • OPC-UA 1
  • Previous
  • 1 of 4
  • Next
Latest Blog Posts
  • OPC UA - Driver and Server
  • Requirements for Generating a Valid OPC UA Server Certificate
  • Load Events Using LoadRecord and LoadRecords
  • Geo SCADA Embedded Component Licenses
  • Geo SCADA 2023 Known Issues
Related Products
product field
Schneider Electric
EcoStruxure™ Geo SCADA Expert

Invite a Colleague

Found this content useful? Share it with a Colleague!

Invite a Colleague Invite
Anonymous user
Not applicable
‎2021-06-09 08:27 AM
0 Likes
0
1166
  • Bookmark
  • Subscribe
  • Email to a Friend
  • Printer Friendly Page
  • Report Inappropriate Content

Link copied. Please paste this link to share this article on your social media post.

‎2021-06-09 08:27 AM

SQL Join Optimization

Originally published on Geo SCADA Knowledge Base by Anonymous user | June 09, 2021 05:27 PM

📖 Home  Back  
This page describes in detail how the Query Processor's join optimisation algorithm works. It is intended to help people construct queries that can be optimised efficiently and won't be rejected. Knowledge of SQL, particularly joins, is assumed. Some knowledge of the mathematical concept of permutations would be helpful.

Overview

The two main factors that affect how efficiently a join can be processed are the order of the tables in the join, and how each table is accessed (called its access path). The combination of join order and each table's access path is known as the access plan for the query. A table's access path is determined by its position in the table order, the join type and certain sub-clauses of the WHERE clause that reference the table. Note that a query that only selects from a single table is still classed as a join for the purposes of optimisation, though in this case the order of tables is irrelevant.
The basic algorithm for join optimisation is to try each possible permutation of table order, determine the access path of each table, then calculate an execution cost for that access plan. The access plan with the lowest cost is then used for execution.

Access Paths

There are five possible access paths for a table, which are (in order of preference):

  • Unique Lookup By Value
  • Unique Lookup By Column
  • Non-Unique Lookup By Value
  • Non-Unique Lookup By Column
  • Scan


An access path of Scan, which is the default path, is the most expensive as the Query Processor simply iterates around all rows in the table. A Lookup access path is preferable, as the Query Processor can go directly to the required row or rows using an indexed column. If the indexed column is a unique index, such as the Id of an object, the access path is Unique Lookup, otherwise it is a Non-Unique Lookup. If the value of the indexed column is specified in the query using a Constant Expressions, the access path is Lookup By Value, otherwise it is Lookup By Column. An index column has an associated cost which indicates the relative expense of lookup up a row using that column. A column with a lower cost is preferred to a column with a higher cost. For example, is is quicker to look up an object by Id than by FullName.

The following table lists the index columns for the various table types, with the lowest cost columns listed first:

  Table Type   Unique Indices   Non-Unique Indices
  Object   Id, FullName   ParentGroupId, Search Keys
  Aggregate   -   Id
  Alarm Condition   -   Id
  Event Journal   RecordId   Id, Source, FileId
  Point Historic   RecordId   Id
  Processed Historic   RecordId   Id
  Forecast Historic   RecordId   Id
  Data Grid   -   -
  Data Table   -   As configured
  Permits   Id, Name   -

 

Sub-Clauses

The access path of a table is determined by the sub-clauses that reference that table. A sub-clause is a simple condition that contains no AND operators, for example:

FullName LIKE '%Pump%'

 

RecordTime BETWEEN TIMESTAMP '2007-01-01 00:00:00' AND TIMESTAMP '2007-01-01 23:59:59'

 

Id = 123 OR Id = 456

 

Filter Conditions

In addition to an access path, a table also has a filter condition, consisting of any sub-clauses from the clause list that refer to that table and aren't used to determine the access path or as constraint columns. The filter condition is evaluated for each row of the join to determine if that row should be in the resultset. A sub-clause that references more than one table will only be assigned to one of those table's filter conditions, which table that is depends on the table order.

Constant Expressions

A constant expression is an expression that consists only of literals, parameter markers and constant functions such as CURRENT_TIMESTAMP, for example:

  • 123
  • 1 + 2
  • TIMESTAMP '2007-01-01 00:00:00'
  • OPC'D'
  • CURRENT_TIMESTAMP
  • ?
  • ? || '%'


In the various expressions and conditions used throughout this page, Value1, Value2 etc. represent constant expressions.

Optimisation Algorithm

The algorithm to determine the best access plan consists of the following steps:

  • Build Sub-Clause List
    • Transform Sub-Clauses
    • Collate Comparisons
  • Determine Sub-Clause Types
  • Calculate Table Order Permutations
    • Determine Access Paths
    • Calculate Cost
  • Determine Final Access Plan

 

Build Sub-Clause List

The initial list of sub-clauses is built from the join specifiers and the WHERE clause:

  • For a join Table A NATURAL JOIN TableB a sub-clause of the form TableA.Column = TableB.Column is added for each column that is present in both TableA and TableB.

 

  • For a join TableA JOIN TableB ON ( Condition ), Condition is split into AND-seperated sub-clauses.

 

  • For a join TableA JOIN TableB USING ( ColumnList ), a sub-clause of the form TableA.Column = TableB.Column is added for each column in ColumnList.

 

  • The WHERE clause is also split into AND-seperated sub-clauses.

 

Transform Sub-Clauses

Once the initial sub-clause list has been built, sub-clauses that in certain forms are transformed into one or more equivalent sub-clauses that are easier to use for determining access paths. This allows queries to be expressed more precisely, or to convey more semantic information to a human reader, and still be optimised efficiently. The possible transformations are:

  • Column BETWEEN Value1 AND Value2 is transformed into Column >= Value1 and Column <= Value2

 

  • Column LIKE 'StringWithNoWildcards' is transformed into Column = 'StringWithNoWildcards'. A similar transformation exists for NOT LIKE.

 

  • Column IN ( Value1, Value2, ... ) is transformed into Column = Value1 OR Column = Value2 ...

 

  • Column = ANY ( Value1, Value2, ... ) is transformed into Column = Value1 OR Column = Value2 OR .... Similar transformations exist for <>, <, <=, > and =.

 

  • Column = ALL ( Value1, Value2, ... ) is transformed into Column = Value1 and Column = Value2 etc. Similar transformations exist for <>, <, <=, > and >=.

 

  • NOT Column = Value1 is transformed into Column <> Value1. Similar transformations exist for <>, <, <=, > and >=.

 

Collate Comparisons

If there are multiple sub-clauses of the form Column ''op'' Value, where ''op'' is one of comparison operators <, <=, > or >=, and Value is a literal value (not a parameter), then these are collated into a single < or <= sub-clause and/or a single > or >= sub-clause with the appropriate values.

Determine Sub-Clause Type

Each sub-clause is categorised into one of five types, depending on its form and the number of tables referenced.

  • A no-dependency sub-clause is a sub-clause that does not reference any table. These are of little use in a realistic query, but are syntactically valid. No-dependency sub-clauses are added to a special filter condition that is evaluated before the join is executed. If this condition evaluates to false, then the join is not executed and an empty resultset is returned.

 

  • A constraint sub-clause is a sub-clause that references a single table and can be used to reduce the amount of rows searched in the table, but does not determine the table's access path. See the main section on constraint sub-clauses for more details.

 

  • An access sub-clause is a sub-clause that refences one or two tables and is in the correct form to determine a table's access path. An access sub-clause that references one table must reference exactly one column, and an access sub-clause that references two table must reference exactly two columns. At least one of the columns reference must be an index column. The possible forms of an access sub-clause are: 
    • TableA.Column = Value1
    • TableA.Column = Value1 OR TableA.Column = Value2 OR ...
    • TableA.Column1 = TableB.Colum2

 

  • A filter sub-clause is a sub-clause that reference a single table and is not suitable as a constraint or access sub-clause. Filter-sub clauses are added to the tables Filter Conditions.

 

  • A link sub-clause is a sub-clause that references two table and is not suitable as an access sub-clause, or that references more than two tables.

 

Calculate Table Order Permutations

The permutations of the table order are determined by the number of consecutive initial inner joins. If the query contains an outer join, subsequent tables will not be reordered as outer joins are not commutative. Only the first five inner joins will be reordered (for a maximum of 120 permutations), subsequent tables will not be ordered. It is the responsibility of the person writing the query to ensure the query can be efficiently optimised if it contains outer joins or more than five tables. For example:

 

  • A JOIN B will result in 2 permutations (A, B; B, A).

 

  • A JOIN B JOIN C will result in 6 permutations (A, B, C; A, C, B; B, A, C; B, C, A; C, A, B; C, B, A).

 

  • A OUTER JOIN B will result in 1 permutations (A, B).

 

  • A JOIN B OUTER JOIN C will result in 2 permutations (A, B, C; B, A, C).

 

  • A JOIN B JOIN C JOIN D JOIN E JOIN F JOIN G will result in 120 permutations (A, B, C, D, E, F, G; A, B, C, E, D, F, G; etc.).

 

Determine Access Paths

For each permutation of table order, the access and link sub-clauses are assigned to tables to determine the access paths, using the following rules:

  • An access sub-clause that references a single table is assigned to that table.

 

  • An access sub-clause that references two tables is assigned to the right-most of those tables in the table order, if the column of that table referenced by the sub-clause is indexed. If the column referenced is not indexed, the sub-clause is added to the table's filter condition.

 

  • Only a single sub-clause can be used to determine a table's access path, so if another access sub-clause can improve the access path (e.g. by changing from Non-Unique Lookup By Value to Unique Lookup By Value), then the sub-clause used for the worse access path is added to the table's filter condition.

 

  • Link sub-clauses are added to the filter condition of the right-most table referenced by the sub-clause.

 

Calculate Cost

The cost of an access plan is the product of the costs of each of the tables in the join. The cost of a table is determined by the table type and its access path. The cost of a table with a Unique Lookup access path is always 1. All other cost calculations are approximations that can be performed quickly:

  Table Type   Scan Cost   Non-Unique Lookup Cost
  Object   Object count / 100   Average no. object per group, no. objects in search map
  Aggregate   Aggregate count / 100   1
  Alarm Condition   Alarmable object count / 33   1
  Event Journal   Total no. records   Average no. records per object
  Point Historic   -   Average no. records per point
  Processed Historic   -   Average no. records per point
  Forecast Historic   -   100
  Data Grid   No. rows in grid   -
  Data Table   No. rows in table   No. rows in table
  Permits   No. permits   -



  • The number of objects or aggregates in a particular table is approximated by using the nearest database index. This number is reduced by a factor of 100 to represent the fact that scanning objects is quicker than scanning the same number of historic records.

 

  • If an Id or Source constraint has been applied to an event journal scan, the cost is the average number of records per object multiple by the number of objects that match the constraint. If the constraint includes LIKEs, the number of objects that match each LIKE is approximated as 1000.

 

  • If a RecordTime constraint has been applied to an event journal, point historic or processed historic lookup, the cost is reduced by a factor of 100.

 

  • It is not possible to do a scan of a historic table i.e. retrieve all historic records for all objects. An access plan that contains a historic scan is rejected outright. If all possible access plans for a query include a historic scan, the query is rejected. Note that it is possible to each a similar effect using a query such as SELECT ... FROM CDBHistoric WHERE Id IN ( SELECT Id FROM CHistory ), though this is not recommended unless you can be certain the database does not contain large amounts of historic data.

 

  • If the cost of a table is calculated as 0 (e.g. there are no objects in that table), that table is not used to calculate the overall cost. This is to prevent an empty table causing an access plan with expensive tables at the beginning of the table order being chosen.

 

  • If there are no potential access sub-clauses that reference two tables, or there is only one permutation, then the join order specified in the the query is used. In this case, access and link sub-clauses are still assigned to tables to determine access paths, but the cost calculation is skipped.

 

Determine Final Access Plan

The chosen final access plan is the plan with the lowest cost. If multiple plans have the same cost, the chosen plan will be the first one that was calculated.
The cost of each access plan and the chosen final access plan can be found in the database log file:

06-DEC-2007 11:49:42.578 0F8C [SCX] 2:12 Prepare( IN: GetColumnEnums false )
06-DEC-2007 11:49:42.578 0F8C [SCX] 2:12 SQL: SELECT H.RecordId, H.RecordTime, H.Value FROM CDBHistoric AS H JOIN CDBPoint As P USING (Id)
06-DEC-2007 11:49:42.593 0F8C [SCX] 2:12 (PHIL::CQPAccess) Join Permutation 0: CDBHISTORIC = Scan, CDBPOINT = Unique By Column; Cost = -1
06-DEC-2007 11:49:42.593 0F8C [SCX] 2:12 (PHIL::CQPAccess) Join Permutation 1: CDBPOINT = Scan, CDBHISTORIC = Non-Unique By Column; Cost = 18.2
06-DEC-2007 11:49:42.593 0F8C [SCX] 2:12 (PHIL::CQPAccess) Final Access Plan: CDBPOINT = Scan, CDBHISTORIC = Non-Unique By Column; Cost = 18.2
06-DEC-2007 11:49:42.593 0F8C [SCX] 2:12 Prepare( OUT: hr 0 -> Ok., TableId 0x2 )


A cost of -1 indicates that the access was rejected because it contained a scan of a historic table.

Constraint Sub-Clauses

Certain clauses that reference a single table may be chosen as constraint sub-clauses. A constraint sub-clause reduces the amount of data searched by an access path of Scan or Non-Unique Lookup. A sub-clause that is used for a constraint will not be used to determine a table's access path or added to a table's filter condition, even if it meets the requirements for an access sub-clause.
A constraint sub-clause must reference a constraint column (which may or may not be an indexed column) and must be in a specific form. The possible constraint columns and their required forms are:

  Table Type   Constraint Columns   Required Form
  Event Journal   Id, Source   Column = Value1
  Event Journal   Id, Source   Column = Value1 OR Column = Value2 OR ...
  Event Journal   Source   Source LIKE Value1
  Event Journal   Source   Source LIKE Value1 OR Source = Value2
  Event Journal, Historic   RecordTime   RecordTime ''op'' Value1 (where ''op'' is any comparison operator other than <>)


Multiple constraints constraints can be applied to the same column, and a table can have constraints on multiple constraints. Each constraint reduces the amount of data to be searched and can vastly improve the performance of a query.

Database Indices

A database index is an internal set of all the objects or aggregates in the database derived from a particular class. These are used to quickly iterate around all objects of a class, by both the Query Processor and other areas. Having an index for every class would be prohibitively expensive in terms of memory, so only certain strategic base classes have an index. When calculating costs for an object or aggregate table, the nearest base class with an index is used, for example when scanning CPointAlgManual, the index for CDBPoint is used. The classes that have indices are:

  • CAggregate
  • CAlarmObject
  • CDBObject
  • CDBPoint
  • CDBPointSource
  • CDBPulse
  • CDBServer
  • CDBUser
  • CFileObject
  • CGroup
  • CHisFiles
  • CHistory
  • CParam
  • CResettingAccumulator
  • CTemplate
  • CTemplateInstance
  • CVariable


Go: Home Back

Labels:
  • database

  • Events & Alarms

  • Lists

Author

Biography

Anonymous user

Link copied. Please paste this link to share this article on your social media post.

  • Back to Blog
  • Newer Article
  • Older Article
To The Top!

Forums

  • APC UPS Data Center Backup Solutions
  • EcoStruxure IT
  • EcoStruxure Geo SCADA Expert
  • Metering & Power Quality
  • Schneider Electric Wiser

Knowledge Center

Events & webinars

Ideas

Blogs

Get Started

  • Ask the Community
  • Community Guidelines
  • Community User Guide
  • How-To & Best Practice
  • Experts Leaderboard
  • Contact Support
Brand-Logo
Subscribing is a smart move!
You can subscribe to this board after you log in or create your free account.
Forum-Icon

Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.

Register today for FREE

Register Now

Already have an account? Login

Terms & Conditions Privacy Notice Change your Cookie Settings © 2025 Schneider Electric

This is a heading

With achievable small steps, users progress and continually feel satisfaction in task accomplishment.

Usetiful Onboarding Checklist remembers the progress of every user, allowing them to take bite-sized journeys and continue where they left.

of