Web Application Security Design Guidelines - Input / Data Validation
From Guidance Share
- J.D. Meier, Alex Mackman, Michael Dunner, Srinath Vasireddy, Ray Escamilla and Anandha Murukan
Contents |
Assume all input is malicious
Input validation starts with a fundamental supposition that all input is malicious until proven otherwise. Whether input comes from a service, a file share, a user, or a database, validate your input if the source is outside your trust boundary. For example, if you call an external Web service that returns strings, how do you know that malicious commands are not present? Also, if several applications write to a shared database, when you read data, how do you know whether it is safe?
References
- See Design Guidelines for Secure Web Applications at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnnetsec/html/THCMCh04.asp
Centralize your approach
Make your input validation strategy a core element of your application design. Consider a centralized approach to validation, for example, by using common validation and filtering code in shared libraries. This ensures that validation rules are applied consistently. It also reduces development effort and helps with future maintenance.
In many cases, individual fields require specific validation, for example, with specifically developed regular expressions. However, you can frequently factor out common routines to validate regularly used fields such as e-mail addresses, titles, names, postal addresses including ZIP or postal codes, and so on.
References
- See Design Guidelines for Secure Web Applications at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnnetsec/html/THCMCh04.asp
Do not rely on client-side validation
Server-side code should perform its own validation. What if an attacker bypasses your client, or shuts off your client-side script routines, for example, by disabling JavaScript? Use client-side validation to help reduce the number of round trips to the server but do not rely on it for security. This is an example of defense in depth.
References
- See Design Guidelines for Secure Web Applications at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnnetsec/html/THCMCh04.asp
Be careful with canonicalization issues
Data in canonical form is in its most standard or simplest form. Canonicalization is the process of converting data to its canonical form. File paths and URLs are particularly prone to canonicalization issues and many well-known exploits are a direct result of canonicalization bugs. For example, consider the following string that contains a file and path in its canonical form.
c:\temp\somefile.dat
The following strings could also represent the same file.
somefile.dat c:\temp\subdir\..\somefile.dat c:\ temp\ somefile.dat ..\somefile.dat c%3A%5Ctemp%5Csubdir%5C%2E%2E%5Csomefile.dat
In the last example, characters have been specified in hexadecimal form:
- %3A is the colon character.
- %5C is the backslash character.
- %2E is the dot character.
You should generally try to avoid designing applications that accept input file names from the user to avoid canonicalization issues. Consider alternative designs instead. For example, let the application determine the file name for the user.
If you do need to accept input file names, make sure they are strictly formed before making security decisions such as granting or denying access to the specified file.
For more information about how to handle file names and to perform file I/O in a secure manner, see the "File I/O" sections in Chapter 7, "Building Secure Assemblies," at http://msdn.microsoft.com/library/en-us/dnnetsec/html/THCMCh07.asp and Chapter 8, "Code Access Security in Practice." at http://msdn.microsoft.com/library/en-us/dnnetsec/html/THCMCh08.asp
References
- See Design Guidelines for Secure Web Applications at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnnetsec/html/THCMCh04.asp
Constrain, reject, and sanitize your input
The preferred approach to validating input is to constrain what you allow from the beginning. It is much easier to validate data for known valid types, patterns, and ranges than it is to validate data by looking for known bad characters. When you design your application, you know what your application expects. The range of valid data is generally a more finite set than potentially malicious input. However, for defense in depth you may also want to reject known bad input and then sanitize the input. The recommended strategy is shown in following figure
http://www.guidancelibrary.com/images/Design%20Guidelines/InputVal.gif
Figure : Input validation strategy: constrain, reject, and sanitize input
To create an effective input validation strategy, be aware of the following approaches and their tradeoffs:
- Constrain input.
- Validate data for type, length, format, and range.
- Reject known bad input.
- Sanitize input.
Constrain Input
Constraining input is about allowing good data. This is the preferred approach. The idea here is to define a filter of acceptable input by using type, length, format, and range. Define what is acceptable input for your application fields and enforce it. Reject everything else as bad data.
Constraining input may involve setting character sets on the server so that you can establish the canonical form of the input in a localized way.
Validate Data for Type, Length, Format, and Range
Use strong type checking on input data wherever possible, for example, in the classes used to manipulate and process the input data and in data access routines. For example, use parameterized stored procedures for data access to benefit from strong type checking of input fields.
String fields should also be length checked and in many cases checked for appropriate format. For example, ZIP codes, personal identification numbers, and so on have well defined formats that can be validated using regular expressions. Thorough checking is not only good programming practice; it makes it more difficult for an attacker to exploit your code. The attacker may get through your type check, but the length check may make executing his favorite attack more difficult.
Reject Known Bad Input
Deny "bad" data; although do not rely completely on this approach. This approach is generally less effective than using the "allow" approach described earlier and it is best used in combination. To deny bad data assumes your application knows all the variations of malicious input. Remember that there are multiple ways to represent characters. This is another reason why "allow" is the preferred approach.
While useful for applications that are already deployed and when you cannot afford to make significant changes, the "deny" approach is not as robust as the "allow" approach because bad data, such as patterns that can be used to identify common attacks, do not remain constant. Valid data remains constant while the range of bad data may change over time.
Sanitize Input
Sanitizing is about making potentially malicious data safe. It can be helpful when the range of input that is allowed cannot guarantee that the input is safe. This includes anything from stripping a null from the end of a user-supplied string to escaping out values so they are treated as literals.
Another common example of sanitizing input in Web applications is using URL encoding or HTML encoding to wrap data and treat it as literal text rather than executable script. HtmlEncode methods escape out HTML characters, and UrlEncode methods encode a URL so that it is a valid URI request.
References
- See Design Guidelines for Secure Web Applications at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnnetsec/html/THCMCh04.asp
