Skip to content

Catchall 'encoding needs to get un-crap-ified' ticket #350

@bitprophet

Description

@bitprophet

Background

This is a follow-on from #289 and also encompasses some or all of #241, #274, #262, #332, #242, #321, and #338. My goal is to make this general story better, ASAP, while leaving remaining work clearly defined (regardless of who ends up taking a swing at it).

I need to doublecheck some of those linked issues that I've not spent a lot of time on, but the bulk of discussion is in #289 and #274, much of it driven by @pfmoore who (as a Python 3 on Windows user) serves as a very competent & useful counterweight to myself (a default-Python-2 on Unix developer). Any solution that works for both of us is likely one that serves most users.

The basics that fall out of what's been said there so far:

  • Where & how bytes/strings are decoded/encoded, needs to be better defined/documented/contracted;
  • Encoding/decoding should be pushed as far out as possible, to the "edges" of the code, to limit how often it happens & becomes an issue;
    • But also centralized further - the code handling it needs to be clearly labeled & no faffing about should happen outside that one area.
  • The default behavior should encompass as many use cases as is reasonable to do, and should degrade as gracefully as it can otherwise (re: exceptions, vs replacements, vs ???);
    • But users should also be given the ability to work around that default cleanly & fully whenever it doesn't function for their use case.
    • Ideally, we'll balance these two things well instead of using "the user can twiddle the knobs we provided" as an excuse to let the default behavior suffer. But this is hard and time is short, so...

TODO

  • Triple check existing locations where we explicitly or implicitly encode/decode, and move them all somewhere centralized & obvious (without otherwise altering them).
    • Right now the only explicit mentions of (encode|decode)\( are a bare handful in runner.py, so there's likely others that are implicit instead.
    • Not sure what the "most obvious" place is, but given most of this is directly tied to run() and friends, it may still just live in the Runner class or runner.py generally.
  • Identify & implement additional tests (integration or regular) for use cases not represented by the funky_characters_in_stdout integration tests
    • Those tests should have caught some of the basic e.g. cat tree -> kaboom bugs that snuck in during 0.11.x, but did not.
    • I identified the basic reason why & have updated it so they fail again, in a branch.
    • But they only represent a single slice of the problem - look at the discussion in Fixed a number of encoding issues (in run context) #274, as well as the other top level issues above, and see what other tests can be added that currently fail.
    • Ideally at least some of these would go in the regular, not integration, suite.
  • Get those tests passing, either based off of Fixed a number of encoding issues (in run context) #274 or with something new
  • If not implied in above: probably makes sense to add more encoding config toggles (maybe up to "one for each stream") so users can override them granularly; the single encoding is nice and all, but won't work in cases where e.g. one's invoke-invoking terminal is Encoding A but the environment the subprocess is operating in is using Encoding B.
  • Handle, or surface into this or other tickets, any TODOs added in the branch left undone
  • See if @pfmoore has time for immediate feedback to those changes (sadly I wasn't able to give him a thumbs up for his offer of making more PRs on his end, back in Feb) just in case.
  • Re: comments below, update read_proc_output so we always expect bytes:
    • Enhance docstring(s) to make it clear that any file-like objects given explicitly, need to yield bytes
    • Remove the "if" statement
    • Update our test suites to use BytesIO instead of StringIO (& any related string literal changes, or maybe a wrapper if lazy).
  • Update changelog entry for any fully-closed tickets & their authors

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions