Not normalising before validation bypasses security checks

@cheetah_211018 @leopard_211018 Check this out Add ap_normalize_path() to replace ap_getparents() (with options). · apache/[email protected] · GitHub that is the cause for CVE-2021-41773: Apache Path Traversal - Censys

As we discussed in the workshop, if we do not normalise or enforce charset-safety before security parsing, our validation can get bypassed. Apache’s patch for the above path traversal bug is prime example of this. They did a right thing to implement a validation, but an encoded version of the path could bypass it.

Can you spot the bug?

while (path[l] != '\0') {
        /* RFC-3986 section 2.3:
         *  For consistency, percent-encoded octets in the ranges of
         *  ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D),
         *  period (%2E), underscore (%5F), or tilde (%7E) should [...]
         *  be decoded to their corresponding unreserved characters by
         *  URI normalizers.
         */
        if ((flags & AP_NORMALIZE_DECODE_UNRESERVED)
                && path[l] == '%' && apr_isxdigit(path[l + 1])
                                  && apr_isxdigit(path[l + 2])) {
            const char c = x2c(&path[l + 1]);
            if (apr_isalnum(c) || (c && strchr("-._~", c))) {
                /* Replace last char and fall through as the current
                 * read position */
                l += 2;
                path[l] = c;
            }
        }

Thanks to @raaqim to point me to this recent finding