User Input: Sanitization and Validation

To protect your website from malicious attacks (and also simply to prevent weird errors for users), you need to sanitize and validate user input.

Sanitization

Sanitization means removing any potentially malicious content, like a user writing a comment that contains a <script> tag intended to be unwittingly run on the page.

Validation

Validation means checking the data the user inputted is correct for your use case (e.g. that an email address has a specific format). This is usually more about data correctness than security, but is still relevant here.

The importance of sanitizing and validating user input

All user input is, by definition, out of your control - and thus you cannot trust it. It might contain malicious scripts, or any number of other things you do not want it to contain.

There are two critical places you need to be careful, though one of them is more of a bygone relic in modern development terms (at least for remotely competent developers).

SQL Injection

If you're at risk of this, you are doing something very, very wrong. Illidan Stormrage will be having a word with you.

So long as you do not use any string substitution in SQL queries that involve user input, and parameterize ALL inputs to a SQL query (e.g. using the ? syntax), then SQL injection is not a problem. In the past it was a problem due to developers doing things like this, using PHP as an example language:

$userInput = getUserInputFromHTMLForm();
$query = "SELECT column1 FROM some_table WHERE column2=\"" . $userInput . "\"";

Here, we're not parameterising our query - we're directly putting it into the query string. As a result, there is no way for the SQL engine we are using to differentiate between a value we intended to be used in the query, and a command that forms part of the query itself. A malicious user can therefore submit input that ends the current query and executes another query you did not intend to execute - like dropping your entire database, destroying a bunch of data, or outputting sensitive info onto a page you did not want to be visible.

Per an (in)famous XKCD comic...

Instead, use a parameterized query and a prepared statement, like so (using the PDO library in PHP as an example):

$databaseConnection = new \PDO($params, $username, $password, $options);
$selectQuery = "SELECT column1 FROM some_table where column2=?";
$selectStmt = $databaseConnection->prepare($selectQuery);
try {
	$selectStmt->execute([$userInput]);
} catch (\PDOException $e) {
	error_log(print_r($e, true));
}

This way, the SQL engine knows that anything that is passed as a value to the search of column2 is a parameter, and is not to be treated as SQL to be run as part of the command, no matter what it contains.

Cross-Site Scripting (XSS)

The other security risk is from not removing HTML tags from user input. Suppose you have a form that asks the user to write their name, then shows their name on the page in another box. What if the user writes something like this?

<script>alert('Your mother was a hamster');</script>

If you output this directly onto the page as HTML, it will not be shown as text but executed as a script - causing the user's browser to display a popup with the given message. Of course, that script could do anything - it could connect to another website, send some form submission, and on top of that it has access to the user's cookie and browsing session for the website it is running on. As far the browser knows, this script came from the website itself, so it's trusted.

Preventing this means sanitising user input so that HTML-specific characters are encoded as string literals, so that the browser knows they are not meant to be executed and are just characters. You need to do this both server-side when receiving user input, and client-side before displaying something that contains user input, to ensure malicious scripts are not executed. Doing it client-side when the user is submitting input is not useful, as the user can easily override the client-side scripts that do this and thus submit unsanitised data to your server anyway.

The one time where it is useful to sanitise content client-side (from a security standpoint) is when outputting data onto a page. For example, when showing some user-submitted content retrieved from the database, using a sanitisation library on it is an extra measure to prevent any malicious code getting onto the page. While a user can override this like any other client-side script, there's no incentive to do so, as they are only messing with their own security and not that of other users.

Pitfalls

Server-side HTML sanitisation when using a library

Libraries that deal with user input, e.g. WYSIWYG rich text editors like TinyMCE, will often expect to receive unsanitised input and then sanitise it themselves before outputting content onto the page.

If you choose to additionally sanitise the input server-side, you can potentially break the library's functionality, as it will be unable to parse the user content correctly if some elements were removed. In these cases you need to sanitise the content serverside carefully, as many times WYSIWYG editors can contain a whole load of HTML that may fall foul of HTML sanitisation libraries.

Apache configuration

Language-specific configuration (e.g. PHP)

Generating SSL/TLS Certificates via LetsEncrypt

User Input: Sanitization and Validation

Storing Credentials for Cloud Services

Storing User Credentials

WebSockets-specific security practices: JSON Web Tokens

Generating Credentials: secure data sources

Cookie Security Considerations for CSRF attacks

CSRF tokens and CSRF headers

Displaying user-inputted content on webpages

Web Application Firewall (WAF)

User Input: Sanitization and Validation

Sanitization

Validation

The importance of sanitizing and validating user input

SQL Injection

Cross-Site Scripting (XSS)

Pitfalls

Server-side HTML sanitisation when using a library