Welcome to the new Schneider Electric Community

It's your place to connect with experts and peers, get continuous support, and share knowledge.

  • Explore the new navigation for even easier access to your community.
  • Bookmark and use our new, easy-to-remember address (community.se.com).
  • Get ready for more content and an improved experience.

Contact SchneiderCommunity.Support@se.com if you have any questions.

Close
Invite a Co-worker
Send a co-worker an invite to the Exchange portal.Just enter their email address and we’ll connect them to register. After joining, they will belong to the same company.
Send Invite Cancel
84549members
353805posts

Corrupt Historic File Alarm

EcoStruxure Geo SCADA Expert Forum

Find out how SCADA systems and networks, like EcoStruxure Geo SCADA Expert, help industrial organizations maintaining efficiency, processing data for smarter decision making with IoT, RTU and PLC devices.

geoffpatton
Commander
Commander
0 Likes
10
1696

Corrupt Historic File Alarm

One of my customers has a corrupt historic file alarm that I'll look into what's going on tomorrow, but I thought I'd see if anyone here had any advice on dealing with this. They have a hot standby pair and a permanent standby.

10 Replies 10
AdamWoodland
Commander Commander
Commander
0 Likes
2
1689

Re: Corrupt Historic File Alarm

The first question is really what the corruption is:

 

a. NTFS file system corruption

b. File write corruption

c. Application level corruption

 

For a. the solution can often be chkdsk /R, but the content of the file may also be corrupt. The standby may have the data as data is streamed to the standby separately to whatever goes to disk so the standby may have all the data, however a resync may have caused problems there. The thing about a chkdsk /R is it needs a reboot so when it comes back up the other server should be the Main and so should recover that way too.

 

For b., it is possible with anti-virus, backup and similar tools, although very rare. Check the size of the HRD file (not the size on disk) is a multiple of 32 bytes... it might not be

 

For c, I've seen zeros be written as the 32 bytes a long long time ago. scx_cmd hisdump might show something or you may need to break out a hex editor and check each 32 bytes for anything odd

 

For b and c then support can also take a look, although probably need DB logs of the record being written too which may be asking a lot depending on how far back the problem happened

geoffpatton
Commander
Commander
0 Likes
1
1686

Re: Corrupt Historic File Alarm

@AdamWoodlandyour comment about db logs made me go ahead and give it quick look for when they got the alarm. it was about 12 hours ago so I checked the Primary's DB log. The db log only had the same description as the Alarm in it. The good news is that it is just one file corrupt and the file is on the one running standby.

BevanWeiss
Spock
Spock
0 Likes
0
1679

Re: Corrupt Historic File Alarm

I've never seen this to be due to an actual NTFS disk failure.

 

The issues that I've seen cause this have been:

  1. No disk space in my Test VM (normally the GeoSCADA DB crashes not long after)
  2. Anti-virus (Sophos / Norton) configured to real-time scan the directories in question (this normally causes a single file, sometimes a couple of files... but then sporadically, and normally at weird times of the day when you can't actively check it)
  3. Backup software which failed to apply a Shadow Copy backup, and just did a bulk lock on the disk instead (for this it generally results in much more than just a single file)

Lead Control Systems Engineer for Alliance Automation (VIC).
All opinions are my own and do not represent the opinions or policies of my employer, or of my cat..
geoffpatton
Commander
Commander
0 Likes
6
1658

Re: Corrupt Historic File Alarm

ChkDsk said everything was fine and the file size was the same as on the running Primary, so it might have already resynced it.

I still deleted it though and restarted the Standby so I know it got replaced syncing from the Primary again.

 

They are running Sophos so that is likely the cause of the problem. First time in like 7 years to have a corrupt historic file alarm though.

 

Thanks for the tips guys. It was likely Sophos interfering but I did not see any settings that would let me tell Sophos to not scan the history files. I let the server admin know since I cannot playing in his software anyway. I don't think he will change anything unless it happens again since it has not happened in the past 7 or so years.

BevanWeiss
Spock
Spock
0 Likes
5
1654

Re: Corrupt Historic File Alarm

I feel that anti-virus software is largely just a random number generator to determine which files it's going to mess with.

 

I consistently have issues with non-Microsoft anti-virus and ClearSCADA/GeoSCADA.

If it's not actual historic file corruption warnings (which are typically not at all corrupt files, just files locked for a longer time than ClearSCADA likes), then it's horrible performance because of scanning of ViewX cache files, or log files..

 

On different software I've even had anti-virus cause crashes of PLC programming software.  That's one of those annoying 'tech support' calls.. where they say "Please sir to be turning off your anti-virus" and me saying "Fine... but I really don't think it's the antivirus, it's clearly a bug in your software"... and of course for the crashes to immediately stop when I disabled the anti-virus.

I still say it was a bug in their software... but also.. bloody anti-virus.


Lead Control Systems Engineer for Alliance Automation (VIC).
All opinions are my own and do not represent the opinions or policies of my employer, or of my cat..
AdamWoodland
Commander Commander
Commander
0 Likes
4
1627

Re: Corrupt Historic File Alarm

The best way I've found to say with 99% confidence that "it is anti-virus" without running active things things like procmon is look in the config or data folder. If anti-virus is interfering with file writes and file locks then you'll see temp files persist there.

 

When the database saves it first writes to the temp files (the slow bit of the process), then deletes the old file version and renames the temp file. This causes problems with av when exclusions aren't sent, and with the amount of time this happens each minute usually captures something eventually. Normally you may see a temp file but it goes within a couple of seconds depending on how Windows Explorer is updating the file list. 

 

The other 1% is when the database crashes mid database save, but then you'll see lots of files with the same timestamp rather than random files with random timestamps.

 

Modern AV solutions like Cylance and Crowdstrike brings a whole new way of working, and the nuances of those on the database are yet to be seen.

BevanWeiss
Spock
Spock
0 Likes
3
1583

Re: Corrupt Historic File Alarm

First time I've heard of Cylance and Crowdstrike.  I'm not sure the IT world needs to have another software set with non-deterministic behaviour 😉  We already have Windows...

 

I can see it being a good replacement for 'the dog ate my homework'.. 'the anti-virus AI must have gotten confused and thought my thesis was a virus... crazy AI you know...'


Lead Control Systems Engineer for Alliance Automation (VIC).
All opinions are my own and do not represent the opinions or policies of my employer, or of my cat..
AdamWoodland
Commander Commander
Commander
0 Likes
2
1545

Re: Corrupt Historic File Alarm

In theory it should be better. You can still set exceptions but if an application has written similar data to the same folder 5 billion times it should have hopefully learnt pretty quickly that is 'normal' behaviour and let it continue.

BevanWeiss
Spock
Spock
0 Likes
1
1542

Re: Corrupt Historic File Alarm

That's the problem I see though... the AI learning part if not controlled.

 

For example alarms... for 10 years there might be a value of '0'... and then one day the alarm occurs, and it's a '1'.

Is the 'smart' AI going to consider that abnormal and prevent the writing?


Lead Control Systems Engineer for Alliance Automation (VIC).
All opinions are my own and do not represent the opinions or policies of my employer, or of my cat..
AdamWoodland
Commander Commander
Commander
0 Likes
0
1484

Re: Corrupt Historic File Alarm

It should be behaviour based, so in theory it should be irrelevant what data you write, it is the why rather than the what. Not saying there might not be a problem as it is still relatively new technology but as I understand it that scenario shouldn't one of the problems.

 

COM in scripting (i.e. FileScriptingHost and WShell.Script) and changes to DLLs used and referenced due to upgrades, those are areas of higher likelihood of something being triggered as that would be a deviation from what could be normal/learnt behaviour.

 

Cylance for example is usually set to block powershell scripts. So anything like SYSTEM() calls might also be affected.