
Question:
<strong>Quick version:</strong> Do the names of parameters of "forms" being sent using the standard <em>multipart/form-data</em> encoding need to be encoded?
<strong>Longer version:</strong> The upload form on <a href="http://www.1fichier.com/en" rel="nofollow">1fichier.com</a> (a service to upload large files) uses the following to specify the file parameter to upload:
<input type="file" name="file[]" size="50" title="Select the files to upload" />
The name of the parameter is <strong>file[]</strong> (notice the brackets).
Using LiveHTTPHeaders I see that the parameter is sent like this (i.e. with brackets) when submitting the form in Firefox. However, for a <a href="https://launchpad.net/nautilus-image-manipulator" rel="nofollow">program</a> I'm writing in Python, I am using the <a href="http://atlee.ca/software/poster/" rel="nofollow">poster</a> module to be able to upload files using the standard <em>multipart/form-data</em> encoding. If I enter the parameter name with the brackets, it gets sent like this:
file%5B%5D
Internally, poster encodes the names of the parameters using this function:
def encode_and_quote(data):
"""If ``data`` is unicode, return urllib.quote_plus(data.encode("utf-8"))
otherwise return urllib.quote_plus(data)"""
if data is None:
return None
if isinstance(data, unicode):
data = data.encode("utf-8")
return urllib.quote_plus(data)
The <a href="http://docs.python.org/library/urllib.html#urllib.quote_plus" rel="nofollow">urllib.quote_plus</a> documentation says that this is only "required for quoting HTML form values when building up a query string to go into a URL". But here we're doing a POST, so the form values don't go in the url.
So, do they still need to be encoded, or is it an error of poster to be doing this?
Answer1:<a href="http://www.faqs.org/rfcs/rfc2388.html" rel="nofollow">RFC 2388</a> covers multipart/form-data submissions. Section 3 specifies that parameter names should be either ASCII or encoded as per <a href="http://www.faqs.org/rfcs/rfc2047.html" rel="nofollow">RFC 2047</a>.
So if your POST request is encoded as multipart/form-data (which poster is doing), then no, parameter names don't need to be encoded this way. I suggest filing a bug with the author (ahem...), he might be willing to fix it in a future release ;)
A workaround is to set your MultipartParam's name attribute directly, e.g.
p.name = 'file[]'
Answer2:Although in essence this question has been answered, I'm including some more details on how to dig through those RFCs.
<a href="http://tools.ietf.org/html/rfc2388#section-3" rel="nofollow">RFC 2388 section 3</a> states that a Content-Disposition header is reqired. Non-ASCII data should be encoded using <a href="http://tools.ietf.org/html/rfc2047" rel="nofollow">RFC 2047</a> even though that <a href="https://stackoverflow.com/q/13649258/1468366" rel="nofollow">looks like a conflict</a>. <a href="http://tools.ietf.org/html/rfc2183#section-2" rel="nofollow">RFC 2183 section 2</a> describes the format of this Content-disposition header. The name
fits in the general parameter
rule of that grammar, but references <a href="http://tools.ietf.org/html/rfc2045" rel="nofollow">RFC 2045</a> for that. <a href="http://tools.ietf.org/html/rfc2045" rel="nofollow">There in section 5.1</a> you find that the right hand side of a parameter
is either a token
or a quoted-string
. Neither production mentions any URL-encoded format for form names. But [
and ]
are in tspecials
, so they cannot be part of a token
. So we get
Content-Disposition: form-data; name="file[]" (correct)
Content-Disposition: form-data; name=file[] (invalid)
Content-Disposition: form-data; name="file%5B%5D" (wrong name)
Content-Disposition: form-data; name=file%5B%5D (wrong name)
One more note for non-ASCII file names: the <a href="http://www.w3.org/TR/2013/WD-html51-20130528/forms.html#multipart-form-data" rel="nofollow">current HTML 5 specification draft</a> requires not encoding them in a 7-bit safe manner, but instead transferring them in the encoding used throughout the request. <a href="https://stackoverflow.com/q/20591599/1468366" rel="nofollow">A question about non-ascii field names</a> is what brought me to look at this question of yours today.