bpo-33684:parse failed for mutibytes characters, encode will show in \xxx#7286

zhouronghua · 2018-05-31T14:45:42Z

when type this command in windows(xp or win7, all the same):
python -m json.tool xxx.txt xxx.json
if xxx.txt contains Chinese(or other multibytes characters):
if xxx.txt is encoded in ansi, xxx.json will encode Chinese as \xxx, very bad to see what they are;
if xxx.txt is encoded in utf8(without bom for most of the time), because with no bom, json.tool will think it is encoded in ansi, and decode fail.

as now, utf8 is widely use, set default to utf8 for most of the time when auto detect encoding failed

https://bugs.python.org/issue33684

the-knights-who-say-ni · 2018-05-31T14:45:45Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

When your account is ready, please add a comment in this pull request
and a Python core developer will remove the CLA not signed label
to make the bot check again.

Thanks again to your contribution and we look forward to looking at it!

pablogsal · 2018-06-14T13:29:38Z

Lib/json/tool.py

            raise SystemExit(e)
    with outfile:
-        json.dump(obj, outfile, sort_keys=sort_keys, indent=4)
+        outfile.write(json.dumps(obj,ensure_ascii=False,sort_keys=sort_keys, indent=4))


Pep8: Please, separate every argument with space:

outfile.write(json.dumps(obj, ensure_ascii=False, sort_keys=sort_keys, indent=4)

brettcannon · 2019-03-26T19:18:58Z

Thanks for the PR, but closing as the CLA has not been signed within the last month. If you do decide to sign the CLA we can re-open this PR.

zhou.ronghua added 2 commits May 29, 2018 21:59

default utf-8 for detect encode failed

7e9ebea

'utf8_encode'

afd9b44

the-knights-who-say-ni added the CLA not signed label May 31, 2018

bedevere-bot added the awaiting review label May 31, 2018

pablogsal requested changes Jun 14, 2018

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels Jun 14, 2018

brettcannon removed the CLA not signed label Mar 26, 2019

the-knights-who-say-ni added the CLA not signed label Mar 26, 2019

brettcannon closed this Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bpo-33684:parse failed for mutibytes characters, encode will show in \xxx#7286