@@ -24,7 +24,15 @@ following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
2424``rsync ``, ``rtsp ``, ``rtspu ``, ``sftp ``, ``shttp ``, ``sip ``, ``sips ``,
2525``snews ``, ``svn ``, ``svn+ssh ``, ``telnet ``, ``wais ``.
2626
27- The :mod: `urllib.parse ` module defines the following functions:
27+ The :mod: `urllib.parse ` module defines functions that fall into two broad
28+ categories: URL parsing and URL quoting. These are covered in detail in
29+ the following sections.
30+
31+ URL Parsing
32+ -----------
33+
34+ The URL parsing functions focus on splitting a URL string into its components,
35+ or on combining URL components into a URL string.
2836
2937.. function :: urlparse(urlstring, scheme='', allow_fragments=True)
3038
@@ -242,6 +250,161 @@ The :mod:`urllib.parse` module defines the following functions:
242250 string. If there is no fragment identifier in *url *, return *url * unmodified
243251 and an empty string.
244252
253+ The return value is actually an instance of a subclass of :class: `tuple `. This
254+ class has the following additional read-only convenience attributes:
255+
256+ +------------------+-------+-------------------------+----------------------+
257+ | Attribute | Index | Value | Value if not present |
258+ +==================+=======+=========================+======================+
259+ | :attr: `url ` | 0 | URL with no fragment | empty string |
260+ +------------------+-------+-------------------------+----------------------+
261+ | :attr: `fragment ` | 1 | Fragment identifier | empty string |
262+ +------------------+-------+-------------------------+----------------------+
263+
264+ See section :ref: `urlparse-result-object ` for more information on the result
265+ object.
266+
267+ .. versionchanged :: 3.2
268+ Result is a structured object rather than a simple 2-tuple
269+
270+
271+ Parsing ASCII Encoded Bytes
272+ ---------------------------
273+
274+ The URL parsing functions were originally designed to operate on character
275+ strings only. In practice, it is useful to be able to manipulate properly
276+ quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
277+ URL parsing functions in this module all operate on :class: `bytes ` and
278+ :class: `bytearray ` objects in addition to :class: `str ` objects.
279+
280+ If :class: `str ` data is passed in, the result will also contain only
281+ :class: `str ` data. If :class: `bytes ` or :class: `bytearray ` data is
282+ passed in, the result will contain only :class: `bytes ` data.
283+
284+ Attempting to mix :class: `str ` data with :class: `bytes ` or
285+ :class: `bytearray ` in a single function call will result in a
286+ :exc: `TypeError ` being thrown, while attempting to pass in non-ASCII
287+ byte values will trigger :exc: `UnicodeDecodeError `.
288+
289+ To support easier conversion of result objects between :class: `str ` and
290+ :class: `bytes `, all return values from URL parsing functions provide
291+ either an :meth: `encode ` method (when the result contains :class: `str `
292+ data) or a :meth: `decode ` method (when the result contains :class: `bytes `
293+ data). The signatures of these methods match those of the corresponding
294+ :class: `str ` and :class: `bytes ` methods (except that the default encoding
295+ is ``'ascii' `` rather than ``'utf-8' ``). Each produces a value of a
296+ corresponding type that contains either :class: `bytes ` data (for
297+ :meth: `encode ` methods) or :class: `str ` data (for
298+ :meth: `decode ` methods).
299+
300+ Applications that need to operate on potentially improperly quoted URLs
301+ that may contain non-ASCII data will need to do their own decoding from
302+ bytes to characters before invoking the URL parsing methods.
303+
304+ The behaviour described in this section applies only to the URL parsing
305+ functions. The URL quoting functions use their own rules when producing
306+ or consuming byte sequences as detailed in the documentation of the
307+ individual URL quoting functions.
308+
309+ .. versionchanged :: 3.2
310+ URL parsing functions now accept ASCII encoded byte sequences
311+
312+
313+ .. _urlparse-result-object :
314+
315+ Structured Parse Results
316+ ------------------------
317+
318+ The result objects from the :func: `urlparse `, :func: `urlsplit ` and
319+ :func: `urldefrag`functions are subclasses of the :class:`tuple ` type.
320+ These subclasses add the attributes listed in the documentation for
321+ those functions, the encoding and decoding support described in the
322+ previous section, as well as an additional method:
323+
324+ .. method :: urllib.parse.SplitResult.geturl()
325+
326+ Return the re-combined version of the original URL as a string. This may
327+ differ from the original URL in that the scheme may be normalized to lower
328+ case and empty components may be dropped. Specifically, empty parameters,
329+ queries, and fragment identifiers will be removed.
330+
331+ For :func: `urldefrag ` results, only empty fragment identifiers will be removed.
332+ For :func: `urlsplit ` and :func: `urlparse ` results, all noted changes will be
333+ made to the URL returned by this method.
334+
335+ The result of this method remains unchanged if passed back through the original
336+ parsing function:
337+
338+ >>> from urllib.parse import urlsplit
339+ >>> url = ' HTTP://www.Python.org/doc/#'
340+ >>> r1 = urlsplit(url)
341+ >>> r1.geturl()
342+ 'http://www.Python.org/doc/'
343+ >>> r2 = urlsplit(r1.geturl())
344+ >>> r2.geturl()
345+ 'http://www.Python.org/doc/'
346+
347+
348+ The following classes provide the implementations of the structured parse
349+ results when operating on :class: `str ` objects:
350+
351+ .. class :: DefragResult(url, fragment)
352+
353+ Concrete class for :func: `urldefrag ` results containing :class: `str `
354+ data. The :meth: `encode ` method returns a :class: `DefragResultBytes `
355+ instance.
356+
357+ .. versionadded :: 3.2
358+
359+ .. class :: ParseResult(scheme, netloc, path, params, query, fragment)
360+
361+ Concrete class for :func: `urlparse ` results containing :class: `str `
362+ data. The :meth: `encode ` method returns a :class: `ParseResultBytes `
363+ instance.
364+
365+ .. class :: SplitResult(scheme, netloc, path, query, fragment)
366+
367+ Concrete class for :func: `urlsplit ` results containing :class: `str `
368+ data. The :meth: `encode ` method returns a :class: `SplitResultBytes `
369+ instance.
370+
371+
372+ The following classes provide the implementations of the parse results when
373+ operating on :class: `bytes ` or :class: `bytearray ` objects:
374+
375+ .. class :: DefragResultBytes(url, fragment)
376+
377+ Concrete class for :func: `urldefrag ` results containing :class: `bytes `
378+ data. The :meth: `decode ` method returns a :class: `DefragResult `
379+ instance.
380+
381+ .. versionadded :: 3.2
382+
383+ .. class :: ParseResultBytes(scheme, netloc, path, params, query, fragment)
384+
385+ Concrete class for :func: `urlparse ` results containing :class: `bytes `
386+ data. The :meth: `decode ` method returns a :class: `ParseResult `
387+ instance.
388+
389+ .. versionadded :: 3.2
390+
391+ .. class :: SplitResultBytes(scheme, netloc, path, query, fragment)
392+
393+ Concrete class for :func: `urlsplit ` results containing :class: `bytes `
394+ data. The :meth: `decode ` method returns a :class: `SplitResult `
395+ instance.
396+
397+ .. versionadded :: 3.2
398+
399+
400+ URL Quoting
401+ -----------
402+
403+ The URL quoting functions focus on taking program data and making it safe
404+ for use as URL components by quoting special characters and appropriately
405+ encoding non-ASCII text. They also support reversing these operations to
406+ recreate the original data from the contents of a URL component if that
407+ task isn't already covered by the URL parsing functions above.
245408
246409.. function :: quote(string, safe='/', encoding=None, errors=None)
247410
@@ -322,8 +485,7 @@ The :mod:`urllib.parse` module defines the following functions:
322485 If it is a :class: `str `, unescaped non-ASCII characters in *string *
323486 are encoded into UTF-8 bytes.
324487
325- Example: ``unquote_to_bytes('a%26%EF') `` yields
326- ``b'a&\xef' ``.
488+ Example: ``unquote_to_bytes('a%26%EF') `` yields ``b'a&\xef' ``.
327489
328490
329491.. function :: urlencode(query, doseq=False, safe='', encoding=None, errors=None)
@@ -340,12 +502,13 @@ The :mod:`urllib.parse` module defines the following functions:
340502 the optional parameter *doseq * is evaluates to *True *, individual
341503 ``key=value `` pairs separated by ``'&' `` are generated for each element of
342504 the value sequence for the key. The order of parameters in the encoded
343- string will match the order of parameter tuples in the sequence. This module
344- provides the functions :func: `parse_qs ` and :func: `parse_qsl ` which are used
345- to parse query strings into Python data structures.
505+ string will match the order of parameter tuples in the sequence.
346506
347507 When *query * parameter is a :class: `str `, the *safe *, *encoding * and *error *
348- parameters are sent the :func: `quote_plus ` for encoding.
508+ parameters are passed down to :func: `quote_plus ` for encoding.
509+
510+ To reverse this encoding process, :func: `parse_qs ` and :func: `parse_qsl ` are
511+ provided in this module to parse query strings into Python data structures.
349512
350513 .. versionchanged :: 3.2
351514 Query parameter supports bytes and string objects.
@@ -376,57 +539,3 @@ The :mod:`urllib.parse` module defines the following functions:
376539
377540 :rfc: `1738 ` - Uniform Resource Locators (URL)
378541 This specifies the formal syntax and semantics of absolute URLs.
379-
380-
381- .. _urlparse-result-object :
382-
383- Results of :func: `urlparse ` and :func: `urlsplit `
384- ------------------------------------------------
385-
386- The result objects from the :func: `urlparse ` and :func: `urlsplit ` functions are
387- subclasses of the :class: `tuple ` type. These subclasses add the attributes
388- described in those functions, as well as provide an additional method:
389-
390- .. method :: ParseResult.geturl()
391-
392- Return the re-combined version of the original URL as a string. This may differ
393- from the original URL in that the scheme will always be normalized to lower case
394- and empty components may be dropped. Specifically, empty parameters, queries,
395- and fragment identifiers will be removed.
396-
397- The result of this method is a fixpoint if passed back through the original
398- parsing function:
399-
400- >>> import urllib.parse
401- >>> url = ' HTTP://www.Python.org/doc/#'
402-
403- >>> r1 = urllib.parse.urlsplit(url)
404- >>> r1.geturl()
405- 'http://www.Python.org/doc/'
406-
407- >>> r2 = urllib.parse.urlsplit(r1.geturl())
408- >>> r2.geturl()
409- 'http://www.Python.org/doc/'
410-
411-
412- The following classes provide the implementations of the parse results:
413-
414- .. class :: BaseResult
415-
416- Base class for the concrete result classes. This provides most of the
417- attribute definitions. It does not provide a :meth: `geturl ` method. It is
418- derived from :class: `tuple `, but does not override the :meth: `__init__ ` or
419- :meth: `__new__ ` methods.
420-
421-
422- .. class :: ParseResult(scheme, netloc, path, params, query, fragment)
423-
424- Concrete class for :func: `urlparse ` results. The :meth: `__new__ ` method is
425- overridden to support checking that the right number of arguments are passed.
426-
427-
428- .. class :: SplitResult(scheme, netloc, path, query, fragment)
429-
430- Concrete class for :func: `urlsplit ` results. The :meth: `__new__ ` method is
431- overridden to support checking that the right number of arguments are passed.
432-
0 commit comments