by Tim Heagarty
October 16, 2007
[Ed. note: The author is a highly experienced computer and internet security expert, certified as an Information Systems Security Professional (CISSP) and a Certified Information Systems Auditor (CISA). In addition, the author is a Microsoft Certified Systems Engineer (MCSE).]
Introduction
Integrity refers to the wholeness or completeness of a person, an object, or in our case, information and information processing technology. Integrity by its presence also provides a level of assurance. You know that you can rely on a person of integrity or on information from a known, trusted source.
Obviously when integrity is lost, the individual or data can no longer be relied upon. They are not trustworthy and must be assumed to be suspect until integrity can once again be obtained and proven. The best decision support tools in the world become useless if the incoming data is not trustworthy. The data may as well not be there at all or just be random. Of course, the worst-case scenario is when information has been changed without intent or authorization and the alteration has not been detected at all. The possibly corrupt information may then be used for its intended purpose with possibly disastrous consequences.
Users of Web 2.0 services are familiar with these problems. The advent of Web 2.0 social networking properties that attract millions of users has also attracted a large contingent of malicious hackers, who take advantage of software that is quickly thrown together, without adequate focus on security vulnerabilities. This exposes the users to situations where their carefully designed online presence can be tampered with, and all of their hard work compromised. In some cases, it becomes impossible for a user to reassert control over their own online identity on a site. Tiny "customer service" organizations at start-ups typically don't have time to respond to hundreds of thousands of "I can't log into my account anymore" complaints.
Loss of the integrity of information directly impacts our previous topic of confidentiality. If, for example, the database of network user IDs has been compromised, then the authentication and authorization of credentials is suspect and cannot be relied upon. The network has to be closed to everyone until the authentication system can be restored (obviously a loss of availability, as well). Can you imagine AOL staying in business very long if all the member IDs were hacked? You couldn't be sure that someone logging in was the true holder of that ID and whether you should provide them your services or not.
Let's take a closer look at how data integrity is defined by various users and creators of information systems and how we can lose, regain, and maintain the integrity of our information and information systems.
Definition of Integrity
The US Federal Information Processing Standards (FIPS) Publication 199 defines integrity as:
"Guarding against improper information modification or destruction, and includes ensuring information non-repudiation and authenticity" [44 U.S.C., Sec. 3542]
A loss of integrity is the unauthorized modification or destruction of information.
Integrity of information and information processing technology is considered to be one of the modern pillars of information security. Other prime points of IT security are the confidentiality of information, which we looked at in the last article, and availability, which we will cover in a future article.
Integrity, of course, is not a new concept developed for information security. Integrity is traditionally used when referring to an individual's character or the way that they behave in certain situations. A person’s behavior is typically based on a set of principles and may be morally, legally, or ethically based. If all people followed a moral set of principles and never veered from them, we might not need a police force. But that’s not the real world for human conduct; nor is it realistic for information systems. There are simply too many ways, accidental and intentional, that we can compromise the integrity of information.
Integrity, for the purpose of this article, describes the requirement of maintaining accurate information and data. Maintaining integrity involves checking information in a number of ways and protecting it from intentional or accidental modification. We'll take a look at some of the methods used to protect data integrity, including hashing and encryption of data files and streams.
Intentional modification comes in the form of hacking just for the fun of it, malicious tampering, and even total destruction of information. The unauthorized changes may be for political gain, monetary reward, or may even be part of an information infrastructure attack by one nation on another.
Accidental modification of information includes everything from disk crashes, corrupted hardware like memory faults, and just plain bad programming that miscomputes results and then stores them as correct data. Noise on a communications circuit can certainly change the information flowing over the line. It is our job as IT professionals (or just plain good programmers) to be sure that all of our information stays correct and reliable.
When you write a new application or create a mashup of existing services, you imply that the information you provide will be accurate and have integrity. If you don't employ safeguards to ensure the integrity of the information that you provide, then you have violated the trust created between you and your client that we spoke of in the last article. It is even better if you give your client a way to confirm the integrity or correctness of the information or service that you provide.
Integrity also comes in different degrees of importance or value. The FIPS standard mentioned above separates the degrees into high, medium, and low, based on the amount of damage that would be caused to information processing systems and then people if the data is corrupted.
News web sites like CNN.com and USATODAY.com regularly publish online opinion polls. The publishers, being journalists, always state that the polls are unscientific and may not represent an accurate cross-section of the audience. In other words, the poll data has little to no integrity, as the results can be easily manipulated. The value of the integrity in this case is low, because hopefully no one is going to make an important decision based on the results of a web poll. A medium value of integrity could be assigned to the content of the stories. Factual, accurate reporting is very important today, but with so many news sources it is much easier now to check facts and determine if a story is true. A high integrity value must be placed on information such as election results and medical records. If information about a patient's medical conditions or allergies is accidentally changed, the patient may be exposed to an incorrect drug. If election results are intentionally or accidentally tampered with, the fate of one or more nations may be changed forever.
Methods of Maintaining Integrity
Let's examine a few basic methods that can be used to maintain the integrity of our information.
Identification, Authentication, and Authorization
If you're paying attention, and I know that you are, you'll recognize these same points from the previous article on confidentiality. Not only do we care about who is receiving the information to maintain confidentiality, but we must know who we're dealing with to maintain integrity. We can't let just anybody have access (especially modify access) to our information. There's enough accidental damage going on that we don' t want to add the threats of intentional damage by some unknown person as well.
Remember that we must identify the requestor by a credential. In our world, the credential is the AOL screen name. AOL goes to great lengths to guarantee to us that the screen names are unique and can be relied upon to represent a single entity. We determine the authenticity of the credential as soon as we present it to AOL and receive word of its validity. After we get the credential, we have to authenticate that the individual is allowed to make the edit or change that they are asking to make. This one task will remediate the vast majority of accidental modification risk.
Confirming that a party has the authority to perform a particular edit goes a long way to confirming that they will do it correctly. In the case of AIM, the authenticating factor is something that the user knows: the password. Other factors that are traditionally part of multi-factor authentication are something that a person has, for example, an Automated Teller Machine card in combination with something that they know, the Personal Identification Number (PIN).
Data Storage Integrity
Most likely, when you gather information to present to a client in a mashup or other offering, you will be storing that information in a database of some sort. Whether you use a commercial database like Microsoft's SQL Server or IBM's DB2, or an open source product such as MySQL, or just a spreadsheet or flatfile, the data must still be accurate. There are a number of ways to assure integrity of this information.
A database may be corrupted at any one of a small number of events in the life of the data. Data entry, modification, and deletion are the main touch points to database information. Static storage is when the data just sits there and exists in the file. It can still be corrupted at this time by hardware crashes, etc.; static storage is the state in which the data exists for the majority of its lifecycle.
Data entry is when the information is first entered into the system. If you put restrictions on what the client can do while inputting information, you can reduce the number of opportunities for someone to accidentally or intentionally inject incorrect information into your service. If you're asking for a date of birth, then perform sanity checks on the value entered to make sure they're not zero years old or over 120 or so. Make sure that the date is really a valid date using a standard date-checking routine. Never ever just accept what the client types into your input fields. This is where SQL injection attacks and buffer overflows come from.
Data modification may, of course, happen multiple times over the lifecycle of the information. The key here is to assure that the person performing the modification is authorized to do so. Again, make sure that the change makes sense in the context of the information. Don't allow edits to change information so that it is outside your initial input restrictions. You should also audit all changes to your data, as well. Keep a transaction record either in your database or in a separate log system so that you can look after the fact and determine just what happened and who made it happen. Maintaining accurate and complete logs is one of the keys to the new Payment Card Industry (PCI) Data Security Standards (DSS) (www.pcisecuritystandards.org). If your solution involves working with debit or credit card data in any way, be sure that you understand the PCI DSS and what level of responsibility you have based on your card volume and other factors that you can read more about at the above site.
When allowing a client to delete data, be sure that there is no way that they can perform mass deletion intentionally or accidentally by manipulating your input fields or even accidentally getting a "%" or "*" character into the input stream. Use the tried and true "Are you sure?" message boxes to give a person the opportunity to back out of a potentially damaging transaction. Of course, only administrators should have access to be able to delete information permanently.
The risk of modification of data at rest can be mitigated by sound backup and restore procedures to recover data affected by a hardware failure or other malfunction. You can also reduce the risk of corruption by using RAID (Redundant Array of Inexpensive Disks) technology in your storage system. If one of the volumes in the RAID set becomes damaged, it can be replaced and the data can be slowly recovered by the system itself. Be certain of where your information is stored and that you have procedures in place that will allow you to recover the information in a timeframe reasonable in comparison to how long it took to create the information in the first place.
Data Transmission Integrity
Information on the wire (or fiber these days) is susceptible to modification by static interference, equipment failures, line drops, or even interception and tampering. There are a few ways you can assure the integrity of your information including hashing and encryption (or a combination of both).
Hashing is the process of computing a value based on the content of the information in the stream. This value is tacked onto the data before it is transmitted. The receiving end of the conversation, your client, uses the same algorithm to perform the calculation again and compares it to the initial calculation. If the two match, then the data itself has not been modified during the transmission process. MD5 is freely available and can be easily added to your client.
Encryption can be used to hide the data as well as ensure its integrity. The blocks of information to be transmitted are passed through an encryption algorithm and changed so that the original information is no longer viewable. If the data was modified during transmission it will not decrypt to a usable form and will have obviously been modified. The receiving party would then notify the sending party that the received data was corrupted, and both parties could investigate what caused the loss of integrity.
Conclusion
Integrity is the second leg of the CIA triad and as such is vitally important to providing a usable, valuable service to our customers. Be careful of the initial entry of your information, and how you allow it to be manipulated and stored and ultimately destroyed. Be sure that only authorized individuals are allowed to perform these actions and even then keep an audit trail to follow up with if necessary. Today's internet protocols (e.g., TCP/IP) will perform some of the communication retransmission for you, but they cannot usually detect purposeful modification of the packet payloads. Use hashing or full encryption to assure yourself and your user that the information they are relying on from your service is full and complete and accurate, and you will live long and prosper.
References
- "Web 2.0 Confidentiality": The first article in this series

awesome
great information!
Bst Rgds,
Michael B.