Image

Imagedavidfrey wrote in Imagephp

URL Encoding

I have an interesting problem that's a little verbose so I'm going to use lj-cut for those of you who hate long entries. Basically I have a problem with a string being sent to my script via a software application that does not properly encode the URL string. On occasion the string contains a percent (%) symbol which is automatically decoded as a hex character when it shouldn't be. I've solved the problem using regular expressions which suits my needs just fine. I'm just curious what the rest of you PHP coders would have done. Your comments are appreciated.

Details
A software application is sending data to one of my scripts for usage tracking. The software is hard-coded on several thousand CD's that have been distributed - so basically there is no way I can go to the source to correct the problem.

The data is sent via a GET request to my script of which the arguments contain username, domain, cdkey, etc. Unfortunate neither username nor domain are validated during user entry, so I get some pretty interesting results. Here are some examples of username values (the username argument is a splice of user and domain):
SAMPLE USERNAME POSIBILITIES
Passed username value		|User entry (user)		|User entry (domain)
=======================================================================================|
user%domain.id@user@domain.id	|username=user%domain.id	|domain=user@domain.id
user%domain.id@domain.id	|username=user%domain.id	|domain=domain.id
user@domain.id@domain.id	|username=user@domain.id	|domain=domain.id
user@domain.id			|username=user			|domain=domain.id
Because these strings are not URL encoded I'm stuck with bug time bomb. Most of the time the first two letters of the domain name don't translate into a hex character, but if it ever runs into %ca (username=hyperguy@caffeine.com) it is automatically decoded to Ê (e.g. username=hyperguy@Êffeine.com).

Why are users typing % as a delimiter for their e-mail address?
Basically it's due to an e-mail client configuration utility. The mail server requires the domain name of the e-mail address to be included in the username since each domain on the server is assigned a Virtual IP - thus providing the domain name is the only way to authenticate a user for that domain. Early versions of Netscape did not provide separate entries for outgoing and incoming mail servers so the user entry was something like username@mailserver.domain.id. Since our server requires a fully qualified e-mail address, the entry had to utilize the percent (%) symbol as an alternative to the @ symbol since Netscape used the @ symbol as a delimiter between the username and incoming mail server (e.g. username%domain.id@mailserver.domainid).

Okay, so that still doesn't explain why users entering a % symbol in their address
Basically the account reference sheet that is mailed to users has their e-mail address username printed with the % symbol to avoid conflicts between Netscape and non-Netscape users. So they are probably not paying attention and simply typing what they see on the reference sheet.

My solution
I grabbed the query string from $_SERVER['QUERY_STRING'] and used a regular expression to extract the passed value:
$rawquery = $_SERVER['QUERY_STRING'];
if (preg_match("/username=(.*?)\&/",$rawquery,$rawuser)) {
	$username = $rawuser['1'];
}
list ($u, $d) = split ('[@%]', $username);
I repaired the username by using split(). This way I figure I have the username and domain values as they should have been entered by the user to begin with.

Question
Is this a logical approach to solving this problem or is there a better way to go about it?