User Input: Sanitization and Validation

To protect your website from malicious attacks (and also simply to prevent weird errors for users), you need to sanitize and validate user input.

Sanitization

Sanitization means removing any potentially malicious content, like a user writing a comment that contains a <script> tag intended to be unwittingly run on the page.

Validation

Validation means checking the data the user inputted is correct for your use case (e.g. that an email address has a specific format). This is usually more about data correctness than security, but is still relevant here.

The importance of sanitizing and validating user input

All user input is, by definition, out of your control - and thus you cannot trust it. It might contain malicious scripts, or any number of other things you do not want it to contain.

There are two critical places you need to be careful, though one of them is more of a bygone relic in modern development terms (at least forremotely competent developers).

SQL Injection

If you're at risk of this, you are doing something very, very wrong. Illidan Stormrage will be having a word with you.

So long as you do not use any string substition in SQL queries that involve user input, and parameterize ALL inputs to a SQL query (e.g. using the ? syntax), then SQL injection is not a problem. In the past it was a problem due to developers doing things like this, using PHP as an example language:

$userInput = getUserInputFromHTMLForm();
$query = "SELECT column1 FROM some_table WHERE column2=\"" . $userInput . "\"";

Here, we're not parameterising our query - we're directly putting it into the query string. As a result, there is no way for the SQL engine we are using to differentiate between a value we intended to be used in the query, and other query syntax. A malicious user can therefore submit input that ends the current query and executes another query you did not intend to execute - like dropping your entire database, destroying a bunch of data, or outputting sensitive info onto a page you did not want to be visible.

Per an (in)famous XKCD comic...

Instead, use a parameterized query and a prepared statement, like so (using the PDO library in PHP as an example):

$databaseConnection = new \PDO($params, $username, $password, $options);
$selectQuery = "SELECT column1 FROM some_table where column2=?";
$selectStmt = $databaseConnection->prepare($selectQuery);
try {
	$selectStmt->execute([$userInput]);
} catch (\PDOException $e) {
	error_log(print_r($e, true));
}

Cross-Site Scripting (XSS)

The other security risk is from not removing HTML tags from user input. Suppose you have a form that asks the user to write their name, then shows their name on the page in another box. What if the user writes something like this?

<script>alert('Your mother was a hamster');</script>

If you output this directly onto the page as HTML, it will not be shown as text but executed as a script - causing the user's browser to display a popup with the given message. Of course, that script could do anything - it could connect to another website, send some form submission, and on top of that it has access to the user's cookie and browsing session for the website it is running on. As far the browser knows, this script came from the website itself, so it's trusted.

Preventing this means sanitising user input so that HTML-specific characters are encoded as string literals, so that the browser knows they are not meant to be executed and are just characters. You should do this server-side upon receiving user input, AND client-side upon displaying something that contains user input, to ensure malicious scripts are not executed.