Input Validation: Necessary but Not Sufficient; It Doesn't Target the Fundamental Issue

I have reviewed several solutions for our fix the flag contests, contributed by seasoned developers and prominent CTF players.

What has been the most commonly adopted approach to address security vulnerabilities?

The answer is Input Validation.

This doesn’t come as a surprise, as the majority of security vulnerabilities stem from trusting untrusted inputs. Consequently, input validation appears to be the go-to solution. When you follow secure coding guidelines or prevention cheat sheets, input validation prominently features in most remediation strategies.

However, have you ever questioned what input validation truly entails? What should the input be validated against? Is there a well-defined criteria for input validation, or is it up to the developers’ discretion? Most importantly, does input validation effectively target the root cause of the vulnerability?

After reviewing solutions for several of our challenges, I’ve observed that, in many cases, input validation is not a sufficient solution, and the underlying vulnerability has not been adequately addressed.

Lets explore three patterns where input validation may seem to be the right choice:

  1. Input validation on a non-normalised (or canonical) data
  2. Validation of data that is later mixed with the code
  3. Range validation using fixed size data types

Below you can find example vulnerabilities showcasing each of the above areas where input validation is insufficient.

Path Traversal

Path traversal occurs when untrusted data is used to access a file or directory within the file system or cloud storage. Typically, it enables an adversary to read the content of arbitrary files. An example of path traversal in Python code is provided below…

# Path traversal in python
def read_file_content(request):
  try:
    if not request.GET["filename"].startswith("resources/"):
      return HttpResponseBadRequest()
    content = open(os.path.join("resources/" + request.GET["filename"]))
    return HttpResponse(content, status=200)
  except Exception as ex:
    return HttpResponse(str(ex), status=404)

Following an input validation approach, a common submission to mitigate path traversal involves checking if the filename contains unsafe characters like ../. Below is an example of a flawed patch that has been submitted.

# Bad security patch
def read_file_content(request):
  if "../" in request.GET["filename"]:
    return HttpResponseBadRequest()
  try:
    if not request.GET["filename"].startswith("resources/"):
      return HttpResponseBadRequest()
    content = open(os.path.join("resources/" + request.GET["filename"]))
    return HttpResponse(content, status=200)
  except Exception as ex:
    return HttpResponse(str(ex), status=404)

However, the security patch mentioned above does not tackle the root cause. It merely checks for a single edge case, specifically ../, which can lead to path traversal.

The root cause of path traversal is the lack of path canonicalization. In other words, we wrongly compare a relative path to an absolute path.

Instead of relying solely on input validation, our approach should involve converting the path to an absolute path first, followed by the validation check. Below is a sample security patch that addresses it.

# A good security patch
def read_file_content(request):
  try:
    path = os.path.join("/app/resources/", filename)
    if not os.path.normpath(path).startswith("/app/resources"):
      return HttpResponseBadRequest()
    content = open(path)
    return HttpResponse(content, status=200)
  except Exception as ex:
    return HttpResponse(str(ex), status=404)

Injection Vulnerability Class (SSTI, SQLi, XSS, Shell Injection, etc.)

Injection vulnerabilities occur when untrusted data is used to construct templates, database queries, portions of webpages, or system commands. They enable adversaries to execute arbitrary code on the backend server or, in the case of XSS, within a user’s browser. An example of a shell injection vulnerability in Python is provided below.

# Shell injection in python
def open(request):
  url = request.GET["url"]
  cmd = "ping -c 1 %s" % url
  p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
  result = p.communicate()[0]
  return HttpResponse(f"[i] {url}\r\n{result}")

In the vulnerable code, an untrusted query parameter is obtained and appended to a shell command.

Following an input validation strategy, a common approach was to check if the url conforms to URL specifications.

# Bad security patch
def is_valid_url(url):
  pattern = re.compile(
    r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+
      (?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'
    r'localhost|'
    r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
    r'(?::\d+)?'
    r'(?:/?|[/?]\S+)$', re.IGNORECASE)
  if (not bool(pattern.match(url))):
    	raise ValidationError(("bad domain"))

def open(request):
  url = is_valid_url(request.GET["url"])
  cmd = "ping -c 1 %s" % url
  p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
  result = p.communicate()[0]
  return HttpResponse(f"[i] {url}\r\n{result}")

However, the security patch mentioned above does not address the root cause. It merely checks if the URL is valid. If it is, execution proceeds; otherwise, an exception is thrown.

The root cause of injection vulnerabilities is in the treatment of data as code. Whenever we interpret data as code, we expose ourselves to injection vulnerabilities.

Rather than relying solely on input validation, we should employ a method that does not evaluate data as code. This method should segregate data from code and perform contextual escaping on the input. A prime example of this technique is a parameterized query. It segregates input for the database query and performs escaping, effectively addressing the root cause of SQL injection.

# A good security patch
def open(request):
  url = request.GET["url"]
  p = subprocess.Popen(["ping","-c","1",url], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  result = p.communicate()[0]
  return HttpResponse(f"[i] {url}\r\n{result}")

By removing shell=True and supplying the command as an array, we avoid constructing the shell command directly from data. This defensive pattern escapes shell metadata characters, preventing their execution.

Integer or Numeric Overflow

Integer or Numeric Overflow occurs when an arithmetic operation (or downcasting of a numeric value) results in a value that exceeds the representative range of the data type. Let’s examine an example of integer overflow in Go.

// Integer overflow in go
func add(a int32, b int32) int32 {
    return a + b
}

If the result of the arithmetic operation, i.e., a + b, exceeds the 32-bit integer range, the value wraps around.

The following weak patch checks if the result of the arithmetic can go beyond the int32 range. If it does, the program exits. This patch follows input validation strategy.

// Bad security patch
func add(a int32, b int32) int32 {
    if (int64(a)+int64(b) > math.MaxInt32) ||
   	 (int64(a)+int64(b) < math.MaxInt32) {
   	 panic("Integer overflow happened")
    }
    return a + b
}

I’ve previously elaborated on why this approach is insufficient. This method of patching introduces unnecessary calculations, data type conversions, and fails to tackle the root cause.

The root cause of integer overflow lies in the restriction of fixed data types and modulus arithmetic, leading to unexpected results in arithmetic operations. One approach to address this issue is by employing dynamic data types like BigInteger, although this comes at the cost of performance. A better alternative is to consider modulus arithmetic properties. For example, given two positive numbers, overflow happens when the result of the arithmetic is smaller than the input numbers.

// A good security patch when input is a positive number
func add(a int32, b int32) int32 {
    c := a + b
    if c < a || c < b {
   	 panic("Integer overflow happened")
    }
    return c
}

Wrap up

Input validation alone is not a comprehensive solution for addressing the root cause of certain vulnerability classes. It may prevent specific scenarios but does not eliminate the vulnerability entirely. If you are a security professional, it’s crucial not to overly rely on input validation. Instead, conduct thorough analysis to identify the root cause and offer clear recommendations that address the root cause.

If you are a developer, avoid implementing quick fixes for individual inputs that trigger security issues. Dive into the bug, identify its root cause, and refactor the code accordingly.

Here are links to the related challenges that I have referenced. Give them a try:

2 Likes

Hi, I think all three cases of using os.path.join are wrong. The point of that function is not to assemble your own path as string and give your own path separators, but to aid in joining the path segments with the canonical path separators of the target system.

PS: well that’s silly, your system has assigned me a username based on the part of my email address before the @ instead of using the username I signed up with.

Hey, exactly. os.path.join is just a path separator but it has been used mistakenly. In the final solution there is os.path.normpath for canonicalisation.

Thanks. I’ve updated the final solution to use , instead of +.