7

I'm using URL safe Base64 encoding to encode my randomly generated byte arrays. But I have a problem on decoding. When I decode two different strings (all but the last chars are identical), it produces the same byte array. For example, for both "dGVzdCBzdHJpbmr" and "dGVzdCBzdHJpbmq" strings the result is same:

Array(116, 101, 115, 116, 32, 115, 116, 114, 105, 110, 106)

For encoding/decoding I use java.util.Base64 in that way:

// encoding...
Base64.getUrlEncoder().withoutPadding().encodeToString(myString.getBytes())
// decoding...
Base64.getUrlDecoder().decode(base64String)

What is the reason of this collision? Is it also possible with chars other than the last one? And how can I fix this and make decoding to return a different byte array for each different string?

4
  • 1
    Are you sure the withoutPadding() option is a good idea? Commented Apr 29, 2015 at 10:44
  • 1
    Fundamentally, it doesn't matter if the same array can be encoded in two different ways. What matters is that if you take an array, encode it, then decode it, you get back the same array. Commented Apr 29, 2015 at 10:45
  • 1
    Which isn't to say that it's not really interesting that you have two strings that both come out the same. :-) Pretty sure it's just a matter of there being unused bits at the end (as Base64 encodes octets across byte boundaries), but I'd have to work out the bits to be sure. Commented Apr 29, 2015 at 10:51
  • @haraldK I used it to remove the trailing "=" chars at the end. Actually, I tried with padding and the result is same. Commented Apr 29, 2015 at 10:52

3 Answers 3

10

The issue you are seeing, is caused by the fact that the number of bytes you have in the "result" (11 bytes) doesn't completely "fill" the last char of the Base64 encoded string.

Remember that Base64 encodes each 8 bit entity into 6 bit chars. The resulting string then needs exactly 11 * 8 / 6 bytes, or 14 2/3 chars. But you can't write partial characters. Only the first 4 bits (or 2/3 of the last char) are significant. The last two bits are not decoded. Thus all of:

dGVzdCBzdHJpbmo
dGVzdCBzdHJpbmp
dGVzdCBzdHJpbmq
dGVzdCBzdHJpbmr

All decode to the same 11 bytes (116, 101, 115, 116, 32, 115, 116, 114, 105, 110, 106).

PS: Without padding, some decoders will try to decode the "last" byte as well, and you'll have a 12 byte result (with different last byte). This is the reason for my comment (asking if withoutPadding() option is a good idea). But your decoder seems to handle this.

Sign up to request clarification or add additional context in comments.

Comments

0

May be this is how Base64 encodes and decodes...see if this this helps. Read the below description for knowing actual working of Base 64.If the array string have difference at the end, the encoded value will be possibly be reflected the same place.

Comments

0

The array you showed is an ASCII representation for "test strinj" (see http://www.unit-conversion.info/texttools/ascii/) and doesn't seem to be a base64 representation of anything.

Seems like you are analysing the wrong 'result' array

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.