[Imported] Historical Record Queries

sbeadle · ‎2019-10-25

>>Message imported from previous forum - Category:ClearSCADA Software<<
User: tfranklin, originally posted: 2018-10-23 22:45:24 Id:182
Quick question regarding history....

I'm running some test on queries against history on a point and I noticed that putting **ORDER BY** on a query makes it take quite a bit longer with large amounts of data. This makes sense as ClearSCADA has to load all history for said point, then sort it.

**Example:**
SELECT TOP(100) ID,RECORDTIME,VALUEASREAL FROM CDBHISTORIC WHERE ID = 100 ORDER BY RECORDTIME ASC

The query above will take much longer than the query below provided there is a large amount of data brought back:
SELECT TOP(100) ID,RECORDTIME,VALUEASREAL FROM CDBHISTORIC WHERE ID = 100

So my question -- if I omit the ORDER BY from the query does it already bring back the records sorted in DESC order? Lets assume that I run a historical optimization nightly to ensure records are sorted properly.

sbeadle · ‎2019-10-25

>>Responses imported from previous forum

Reply From User: adamwoodland, posted: 2018-10-23 23:47:51
In theory yes, currently should return the data in the order in the file on disk and the optimisation should put have put them in order (ignoring any new data).

It is probably considered bad practice though, if you need it in a certain order you should explicitly state that order, implied order could change in the future.

Reply From User: adamwoodland, posted: 2018-10-23 23:49:39
Also how many records a minute do you have? The soft limit specified in ClearSCADA is a value per point every 15 seconds, and one of the reasons is to limit performance issues similar to what you're seeing.

If you are storing data faster than once every 15 seconds, is there an option to log less frequent either with sig change or historic compression?

Reply From User: tfranklin, posted: 2018-10-24 13:23:56
Yeah so all of the testing I'm doing is based off of a worst case, unrealistic scenario that gets asked about frequently. I used the scxcmd HISGEN cmd to generate data every 5s for a few hundred points over a year. I ended up killing the process a good way into it due to HDD limitations.

I'm querying against 1 point that has a large amount of data on it -- 12.5million records to be exact (more than 1 yr) or data.

Using a TOP(50000) + ORDER BY statement, the query takes ~1.5mins to run.
Using a TOP(50000) without ORDER BY, the query takes 16sec to run.

I think what I'm really trying to find out is that, say there is 5 years of data at 15s intervals. That's ~2.1mil records a year, over 5 years is 10.5mil records.

What's the most efficient way of querying that data for 10.5mil records without causing a large negative impact? What I've seen is that queries that take a long time to run will actually lock ClearSCADA completely and cause all drivers to become unresponsive.

Reply From User: geoffpatton, posted: 2018-10-24 13:38:16
Realistically what would you be doing with the data? Using a historic view will speed things up, because a internal process will crunch the numbers, do the Avg, Min, Max, or whatever and the SQL has to handle less results.

Reply From User: tfranklin, posted: 2018-10-24 13:41:52
Moving raw history from server A to server B where IDs don't match.

Example being, company A buys company B. Company A wants to bring B's history into their system so they don't have to keep track or maintain company B's system.

Reply From User: geoffpatton, posted: 2018-10-24 13:52:31
Then I would make the time frame for the queries smaller and export/import in small chunks.

Reply From User: du5tin, posted: 2018-10-24 14:06:25
We have done this history transfer using a C# tool running outside of CS. The app hooks into the old system with ODBC and the new system with the Automation interface. It is a slow process but much faster than moving the history by hand. The tool accepts a source Id, a destination Id and a date-time range. And yes... user performance is impacted enough that they notice. But if you do the transfer in smaller chunks during low use periods you should be able to get it done without too much pain.

I would ask why they want that back history. Is there actually any value there? Could they simply archive the old system to a virtual machine somewhere or back it up to tape for that one time they think they might need the pressure from Site XYZ two years from now?

Reply From User: emFLSPIE, posted: 2019-03-18 22:10:27
I am trying to query historic data (a few hours) and my time stamps are approx. two minutes, where can I verify the rate the data is being saved at? and increase for more data resolution.

Reply From User: adamwoodland, posted: 2019-03-19 04:23:37
The process and queries at https://community.se.com/t5/Geo-SCADA-Knowledge-Base/Firewall-Configuration-to-Allow-Client-Server-C... can probably help, just restrict to the points you may be interested in

See Answer In Context

sbeadle · ‎2019-10-25