File Integrity with Checksums

This section describes how file integrity can be improved by using checksums to detect compromised files.

Each time a file is checked out, FCS creates a checksum on the file to be checked out, and compares it with the checksum that was created and recorded when the file was previously checked in.

The following topics are discussed:

How FCS Uses Checksums

You can improve your system's file integrity by turning on the use of checksums. When a file is checked out, FCS generates a checksum at runtime for the file to be checked out and compares it with the checksum sent on the ticket for that file. If the checksums differ, the file has been modified or corrupted since it was last checked in. The file checkout fails, and FCS returns an error. The user should alert a system administrator who can then resolve the problem.

Note that the data inconsistency is detected when a user retrieves the file at checkout, not when the data is actually corrupted.

The following steps describe how the checksum integrity check process works.


  1. When a file is checked in, FCS computes a checksum for it. The checksum is sent back to the MCS as part of the checkin receipt and stored for future reference.
  2. When the file is checked out, the MCS sends the recorded checksum as part of the checkout ticket.
  3. When FCS receives the ticket, it computes a checksum of the target file and compares it with the recorded checksum from the MCS. If the two checksums differ, the file has been modified since its last checkin; FCS returns an error and the checkout fails.

This figure shows where the checksums are generated, sent, and stored during the checkin process.



This figure shows where the checksums are generated and sent during the checkout process.



During file synchronization or a file copy, FCS automatically copies the checksums for files.

You can generate checksums for migrated data without having to check in the individual files. See Data Migration.

Checksum Activation

To activate or deactivate the usage of checksums, use the following MQL commands:

add store STORE_NAME checksum [ on | off ];

modify store STORE_NAME checksum [ on | off ]; 

When checksums are activated, FCS will compute and send a checksum to the MCS when a file is checked in, and perform the checksum integrity check when a file is checked out. The default value for both is off.

Also, when checksums are activated, you can set the checkout process to simply warn the user instead of failing with an error. If the checksum integrity check detects a corrupted file, the file checkout succeeds, the user is warned, and the invalid checksum is logged.

add store STORE_NAME checksumwarnonly [ on | off ];

modify store STORE_NAME checksumwarnonly [ on | off ];

The default value for both is off.

You can show the current value of the checksum options with the following MQL commands:

print store STORE_NAME checksum;

print store STORE_NAME checksumwarnonly; 

Corruption Detection

If a data inconsistency is detected during checkout, the systems throws an FcsException error and displays an error message to the user, and the checkout process fails. The user should alert the system administrator.

throw new FcsException("HttpOutputHandler: File Checksum Error - db checksum is "+checksum+", runtime checksum is "+rtChecksum+”, for file ”+hashname);

Data Correction

The system administrator should manually correct the problem by checking in a non-corrupted copy of the file. This will obsolete all old (corrupted or not) copies and start again with the copy that is newly checked in.

Data Migration

Normally, a checksum is created for a file when it is checked in. For migrated data that doesn't go through the checkin process, you can create checksums with MQL commands. Note that this migration method assumes the files are not already corrupted. You can run the validate command first to help detect any problems before creating the checksums. See Validation.

To calculate the checksum for all files owned by a business object, use:

rechecksum businessobject TNR;

To calcaulate checksums for a list of business objects, committing a transaction for every N business objects (the default is 10), use:

rechecksum businessobjectlist QUERY [commit N] [continue];

The continue option allows the next transaction to carry on when an error occurs; the default value is to quit.

To calculate checksums for all the files in a store, committing a transaction for every N files (the default is 10), use:

rechecksum store STORE_NAME [commit N] [continue]; 

The continue option allows the next transaction to carry on when an error occurs; the default value is to quit.

A checksum will be calculated for a file only if it has not been previously calculated. The checksum will be based on an up-to-date copy of the file at an arbitrary location. The checksum value will then be propagated to all non-obsolete copies of the file.

The keyword rechecksum can be included in the MQL commands modify businessobject and modify businessobjectlist to allow a checksum to be recomputed without rechecking in a file.

Print Checksum

To show the current checksum value of a file, use:

print businessobject TNR select format.file.checksum;

Inventory Store

You can add the checksum value field to the inventory result. This is optional.

inventory store STORE_NAME [fcsdbchecksum];

Validation

To calculate the current checksum values of the files in a business object and compare them with the recorded values, use:

validate businessobject TNR fcsdbchecksum; 

To validate the checksum for a list of business objects, use:

validate businessobjectlist QUERY fcsdbchecksum;  

Note that the commands for validating checksums are expensive operations. For each file to be validated, a compute checksum request is sent to the corresponding FCS and the file is scanned to compute the current checksum.

Also, the validate checksum commands are read-only operations. If a new checksum is different, it will be reported to the output file, but not stored.

Performance

The checksum computation is based on streamed data, which increases FCS checkin and FCS checkout time slightly, but the increase is a reasonable tradeoff for the increased data integrity.

Using the rechecksum command during migration may consume a lot of time. For large stores and locations, it may be impractical to do a rechecksum store command. In this case, you can either use rechecksum buslist to migrate only active objects or do not use rechecksum at all. If you do not use rechecksum at all, only files that are newly checked in (with the checksum option on) will have their checksums verified on checkout.