Geo SCADA Knowledge Base
Access vast amounts of technical know-how and pro tips from our community of Geo SCADA experts.
Link copied. Please paste this link to share this article on your social media post.
Originally published on Geo SCADA Knowledge Base by Anonymous user | June 10, 2021 04:11 AM
In the Geo SCADA Expert help on recommended server configurations it is mentioned that calculate the disk space you require based on the size of the records, and then multiply by a factor of two. This pages provides further details on why that advice is made.
NTFS clusters and the inefficiencies of storing data smaller than the cluster size mean that more disk space will be used than calculated off the raw numbers alone, and so applying a factor provides a simple way to allow for the inefficiencies.
By default the cluster size of NTFS is 4KB, it is how the file system allocates storage. Only one file per cluster is allowed and that cluster is wholly allocated to that file. This means that if you write a single 32 byte record (the size of a single historic record) it actually uses 4KB of disk space.
You can test this on your machine by creating a txt file, making the content just "A" and saving it. Go to the file properties in Explorer and it will tell you both the size and size on disk. Size on disk will be your cluster size, and as you increase the file to many MB, the size on disk will always be a multiple of your cluster size. Save "AAAAAA" to the file and now the file size increases but the actual size on disk is unaffected and so the actual disk space used is unchanged, as it is still going to use the allocated cluster until the file size is greater than the allocated cluster and then a new cluster is allocated.
This leads into the second question, why double it (with all of the historians taken into account)? Calculating actual disk space verses disk usage based on the 32 bytes per record and whatever cluster size you use isn't so easy. Given you rarely have a single record per week and likely a lot more causing multiple disk clusters to be used per file it does seem to work itself out (actually it seemed to be about 1.5 times the calculated size, but does come down to how much data per minute you store, and the guide of 4 values per minute per point isn't due to this behavior, that limit is due to something else). It's a guide by the product team, and each customer will be unique.
So with the default cluster size of 4KB and using historic values (32 bytes per record).
So you can see that with more data, as the number of clusters that get allocated to the file increases, the efficiency (disk usage based on raw numbers versus actual disk usage by NTFS) actually gets better on each roll-over. It is not a case of reducing the cluster size, that causes performance issues and has to be balanced.
Regarding file fragmentation, the use of clusters combined with the way data is streamed into a Geo SCADA system could compound that. However it makes little difference on SSDs as their random I/O read is on par with their sequential performance. But in a HDD as that NTFS cluster is reserved for that file the NTFS cluster will slowly be filled with data for that point for that week. Once the NTFS cluster is filled a new NTFS cluster will be allocated on disk. Chances are extremely likely that that new cluster won't be next to the original cluster on disk as it takes time to fill the NTFS cluster on any system (especially with the 32 bytes per record of historic values). So a defragmentation basically takes those clusters and puts them together on disk significantly increasing sequential read performance but making no difference to the cluster storage efficiency.
Go: Home Back
Link copied. Please paste this link to share this article on your social media post.
Create your free account or log in to subscribe to the board - and gain access to more than 10,000+ support articles along with insights from experts and peers.