-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
bpo-29505: Add fuzzing for re.compile, re.load and csv.reader #14255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7425844 to
bbe54f6
Compare
Modules/_xxtestfuzz/fuzzer.c
Outdated
| } | ||
| /* Use the first byte as a uint8_t specifying the index of the | ||
| regex to use */ | ||
| uint8_t idx = ((uint8_t*) data)[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsafe. Safer would be:
uint8_t idx = (uint8_t)(data[0]);
(at any rate, this is defined). But if you want an unsigned type that is guaranteed to hold all of the values of a char, then you want unsigned char instead of uint8_t.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, unsigned char makes way more sense, done.
Out of curiosity, why is accessing the element and then casting safer than casting the pointer and then accessing the element?
gpshead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, looks good. just some pedantry around NULL checks after CPython API calls even if you think they're unlikely to be taken.
Modules/_xxtestfuzz/fuzzer.c
Outdated
|
|
||
| /* Some random patterns used to test re.match. | ||
| Be careful not to add catostraphically slow regexes here, we want to | ||
| excercise the matchign code without causing timeouts.*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matching
Modules/_xxtestfuzz/fuzzer.c
Outdated
| rv |= _run_fuzz(data, size, fuzz_json_loads); | ||
| #endif | ||
| #if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_compile) | ||
| /* Impore sre_compile.compile and sre.error */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import
Modules/_xxtestfuzz/fuzzer.c
Outdated
| #if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_match) | ||
| /* Precompile all the regex patterns on the first run for faster fuzzing */ | ||
| if (compiled_patterns == NULL) { | ||
| PyObject* re_module = PyImport_ImportModule("re"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
being pedantic: check re_module for NULL.
Modules/_xxtestfuzz/fuzzer.c
Outdated
| #if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_compile) | ||
| /* Impore sre_compile.compile and sre.error */ | ||
| if (sre_compile_method == NULL) { | ||
| PyObject* sre_compile_module = PyImport_ImportModule("sre_compile"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pedantic: check sre_compile_module for NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same goes for all of your GetAttrString return values.
Modules/_xxtestfuzz/fuzzer.c
Outdated
| PyObject* sre_compile_module = PyImport_ImportModule("sre_compile"); | ||
| sre_compile_method = PyObject_GetAttrString(sre_compile_module, "compile"); | ||
|
|
||
| PyObject* sre_constants = PyImport_ImportModule("sre_constants"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NULL check
Modules/_xxtestfuzz/fuzzer.c
Outdated
| #if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_csv_reader) | ||
| /* Import csv and csv.Error */ | ||
| if (csv_module == NULL) { | ||
| csv_module = PyImport_ImportModule("csv"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NULL checks
| sre_compile_method, pattern_bytes, flags_obj, NULL); | ||
| /* Ignore ValueError as the fuzzer will more than likely | ||
| generate some invalid combination of flags */ | ||
| if (compiled == NULL && PyErr_ExceptionMatches(PyExc_ValueError)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this series if if's with PyErr_ExceptionMatches calls seems somewhat redundant (the comments for each match type are useful). It can also call PyErr_ExceptionMatches after PyErr_Clear has been called as these are sequential if's not else if's. perhaps write it as:
if (compiled == NULL) {
if (PyErr_ExceptionMatches(xxx) ||
PyErr_ExceptionMatches(yyy) ||
...) {
}
}
with the comments interspersed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the comments were primarily why I went with this style but I'll see if I can put them between the ORs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, kinda iffy about the formatting. Got any suggestions?
if (PyErr_ExceptionMatches(xxx) ||
PyErr_ExceptionMatches(yyy)
) {
PyErr_Clear();
}
}and
if (PyErr_ExceptionMatches(xxx) ||
PyErr_ExceptionMatches(yyy)) {
PyErr_Clear();
}
}both look really confusing, so I went with some alternative style.
|
Thanks @ammaraskar for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8. |
…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader (cherry picked from commit 5cbbbd7) Co-authored-by: Ammar Askar <[email protected]>
|
GH-14478 is a backport of this pull request to the 3.8 branch. |
|
GH-14479 is a backport of this pull request to the 3.7 branch. |
…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader (cherry picked from commit 5cbbbd7) Co-authored-by: Ammar Askar <[email protected]>
…H-14255) Add more fuzz testing for re.compile, re.load and csv.reader (cherry picked from commit 5cbbbd7) Co-authored-by: Ammar Askar <[email protected]>
…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader
…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader
@gpshead This should finish up the original bpo ticket, adds fuzzing for all the high-risk functions likely to be exposed to user input. Please let me know if I missed anything you intended to have fuzzed in the original ticket. Current list:
float(x)int(x)str(x)json.loads(x)re.compile(x)re.match(..., x)csv.reader(x)I would have liked to combine the
re.compileandre.matchtests but what ended up happening was we would generate some catastrophically slow regex like^(A+)*Band then"A" * 25will cause a timeout.https://bugs.python.org/issue29505