Design Question: GeoSCADA Data Table vs External SQL Server DB

MikeSI · ‎2024-11-23

Hi all,

I'm working on a project which involves storing, gathering and querying data from a few sources (both within GeoSCADA and also external).

I'm at the tinkering stage and tossing up whether to use Data Tables, or to use an external SQL database (such as SQL Server Express) for the data storage.

Has anyone been down this road before? Any lessons learnt?

Is there a 'best practice' solution?

Seems to me, if I go with Data Tables the issue will be accessing and storing data from outside GeoSCADA.

If I go with an external DB, the issue will be accessing/storing data from inside GeoSCADA. The only way I understand to do this is using ODBC linked tables, and I'm not sure how reliable this is.

Any advice much appreciated.

tfranklin · ‎2024-11-25

There's a lot of "it depends" when you start to design stuff like this. Some questions you'll need to consider:

What generates the data? Is it client driven via script? Is it logic? Is it both?
How much data is being stored?
What is the frequency of which data is updated/inserted/deleted into the table?
If I store it externally, how can we ensure the data is redundant/fault tolerant and available across all GS servers?

In general, my recommendations are always that if you're going to keep it in a data table in Geo SCADA you want to ensure the table never exceeds 100MB. Going beyond that could cause issues with server synchronization. Typically we store as much in the GS data tables as we can until we see that growth might become and issue. We also have some utilities that are explicitly written to store data outside of GS when we know that size will become an issue.

If you're storing data inside of a data table:
- Avoid frequent writes
- Don't exceed 100MB as a rule of thumb (a little more is probably okay)
- Consider pre-loading the data (checkbox on the properties) if the table is large
- Maybe consider a background process for purging cache on it

If you choose to store data in SQL
- Ensure you have a redundancy plan.
- Utilize Linked ODBC tables to perform CRUD operations from within Geo SCADA
- Avoid using Linked ODBC tables in logic. It's a death-wish.
- You can store far more data but will need to consider the load of how a client will hit it.
- You can utilize SQL views to do a lot of complex stuff for pre-processing

I'm sure somebody will disagree with something above, but in general the answer is always that it depends. I'd consider the size and rate of which you write to the tables to make a decision. If it's infrequent and < 100MB of data you're probably fine leaving it in a data table within GS.

See Answer In Context

tfranklin · ‎2024-11-25

There's a lot of "it depends" when you start to design stuff like this. Some questions you'll need to consider:

What generates the data? Is it client driven via script? Is it logic? Is it both?
How much data is being stored?
What is the frequency of which data is updated/inserted/deleted into the table?
If I store it externally, how can we ensure the data is redundant/fault tolerant and available across all GS servers?

In general, my recommendations are always that if you're going to keep it in a data table in Geo SCADA you want to ensure the table never exceeds 100MB. Going beyond that could cause issues with server synchronization. Typically we store as much in the GS data tables as we can until we see that growth might become and issue. We also have some utilities that are explicitly written to store data outside of GS when we know that size will become an issue.

If you're storing data inside of a data table:
- Avoid frequent writes
- Don't exceed 100MB as a rule of thumb (a little more is probably okay)
- Consider pre-loading the data (checkbox on the properties) if the table is large
- Maybe consider a background process for purging cache on it

If you choose to store data in SQL
- Ensure you have a redundancy plan.
- Utilize Linked ODBC tables to perform CRUD operations from within Geo SCADA
- Avoid using Linked ODBC tables in logic. It's a death-wish.
- You can store far more data but will need to consider the load of how a client will hit it.
- You can utilize SQL views to do a lot of complex stuff for pre-processing

I'm sure somebody will disagree with something above, but in general the answer is always that it depends. I'd consider the size and rate of which you write to the tables to make a decision. If it's infrequent and < 100MB of data you're probably fine leaving it in a data table within GS.

MikeSI · ‎2024-11-27

Thank you @tfranklin for this detailed answer. That's great information.

I've decided to go with Data Tables, and I have some logic which exports data and also trims the table to prevent it getting too large. I looked at ways to dynamically create new data tables (say every month or year) but that got too hard.

I found this statement funny and intriguing:

Avoid using Linked ODBC tables in logic. It's a death-wish.

This was exactly my original plan 😁 Any particular reason why it's a big no-no?

Does this mean that CRUD operations on external DBs from Logic is not possible and is only possible from client script using linked ODBC?

Thanks again!

AdamWoodlandToo · ‎2024-11-27

I don't disagree with any of the above really, I'll just chip in some other things to consider:

1. Who in the business owns the data? Quite often with non-TSV style data despite it being used in SCADA it might not be the team who looks after the SCADA that owns it so why should they be the people who hold it and take responsibility for it, as well as maintaining who has permissions to it

2. Do you have, or will you have, multiple Geo SCADA systems that need the same data? Having a single source of truth to the data would be better than duplication and so might drive where best to store it

And yes, linked tables and logic are a bad combination due to the expecations logic has when it reads and writes the variables

MikeSI · ‎2024-11-29

Thanks @AdamWoodlandToo . Both really good points to consider.

AdamWoodlandToo · ‎2024-12-02

Any particular reason why it's a big no-no?
Does this mean that CRUD operations on external DBs from Logic is not possible and is only possible from client script using linked ODBC?

Basically the way logic is designed to run is very quick short executions, think reading inputs, running and writing outputs in sub-millisecond territory and whilst there is some queues used it is a fairly linear process. This allows lots and lots and lots of logic to run in a single thread whilst maintaining an integrity that multi-threaded just doesn't support (at least 25 years ago when it was first written*). Some indepth info I wrote many years ago with the then development manager of ClearSCADA at https://community.se.com/t5/Geo-SCADA-Knowledge-Base/Logic-Execution/ba-p/279061 which is still valid today.

Now the problem is logic (structured text) was really the only server-side programing language we had, so people started using it for more than what it was originally designed for and to allow us to do that the developers added the ability to use SQL (%S and %D), which was frankly one of the most amazing additions to logic ever. But the problem then become inputs (and outputs) changed from sub millisecond to millisecond+ severly impacting scaling (think templates) and the speed of what logic can do. But if you're querying the realtime memory resident tables and making they query efficient it works, but when you start querying the disk based tables (CDBHistoric, CDBEventJournal) things start to cause some serious issues.

And this is where linked tables come in. First off it goes outside DBServer so there is a speed impact there, then you're querying a third party database which may or may not be fast, and because logic has to wait for the results to come back it effectively stalls logic and this means a lock on the database for an extended time which may or may not be able to be released temporarily. The NOCACHE keyword came around the same time period but for a different purpose (accumulation processing of batched data due to the queuing of logic inputs causing a problem with the maths) but turns out the move in the logic process from a early write lock to a late read lock also helps with the SQL side of things.

Compared to scripting which is client-side and the database locks can be released easily, where the impact of any stalls is much much less.

And then to compound the problem, I have used a couple of logic programs and linked tables about 15 years ago against Oracle, and as the logic ran once every 10 mins and I made sure the query was a millisecond or two wasn't identified as a problem and no one had been bitten, yet. Until one day the Oracle server dissappeared, and so when the query was issued then instead of the Oracle components coming back saying offline (eventually, would still an issue with the timeout delay under normal circumstances) the API call never replied to DBServer! A bug in the Oracle client software but basically ClearSCADA sat there for 20 mins with a lock on the database that couldn't be released, and so the system locked up. After 20 mins the next logic program could run, and again 20 mins of a database lockup, after which the other program was scheduled to run... and you get the point. Oracle came back online and the problem fixed and the system recovered.

The customer was quite nice about it really, at the time I felt very guilty when I ran through the logic process in my head and realised the problem, and still do a little to this day. I should have thought it through more and identified then the problems as I had the info, rather than just trying to solve a customer's problem.

So that is one issue that triggered the advise not to use logic and linked tables came out. Logic isn't really purposed for "slow" read/writes whatever the source is, and you're at the mercy of third party components impacting the availability of the system.

With the addition of Python into the server things will hopefully be improved, we'll see!

(*) When SCX6 (ClearSCADA) was originally written and I first got my hands on it in 2002 there was a Registry setting to enable multi-threaded logic execution. It appeared to work but was removed from the product pretty quickly, not sure why...

MikeSI · ‎2024-12-04

Thanks Adam !! That insight was both useful and really interesting.

ps: don't feel guilty about the Oracle oversight. I think we can all fess up and say we've done something similar (or much worse 🙃).