Improve kernel argument parsing#1476
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new, more robust kernel command line parser, kernel::Cmdline, which correctly handles quoted values. The new parser is used throughout the codebase, replacing the previous simpler implementation. The changes are well-tested and improve correctness. I've provided a few suggestions in kernel.rs to improve code conciseness and use more idiomatic Rust patterns.
e021bb8 to
f56e9d4
Compare
cgwalters
left a comment
There was a problem hiding this comment.
Looks sane just some style nits. Thanks for working on this!
crates/lib/src/kernel.rs
Outdated
| impl<'a, T: AsRef<[u8]> + ?Sized> From<&'a T> for Parameter<'a> { | ||
| fn from(input: &'a T) -> Self { | ||
| let input = input.as_ref(); | ||
| let equals = input.iter().position(|b| *b == EQUALS); |
There was a problem hiding this comment.
I think it's a lot cleaner to use split_once for this. There are several other examples in the code.
There was a problem hiding this comment.
split_once is nightly-only for slices though
https://doc.rust-lang.org/std/primitive.slice.html#method.split_once
There was a problem hiding this comment.
Ah yeah, annoying.
You know I think the bigger picture thing here is that almost everyone doing parsing of u8 is using a higher level library and so there's not much motivation to improve std here. There's a lot of those.
One most relevant for this use case is BStr which has https://docs.rs/bstr/latest/bstr/trait.ByteSlice.html#method.split_once_str
bstr is a pretty good fit for this, it's not in our depchain today but that's just because we don't use regex among other things.
OK as is but perhaps worth a comment like
// https://doc.rust-lang.org/std/primitive.slice.html#method.split_once is nightly only right now
// We might consider using bstr in the future too
or so?
|
Not a blocker but something to discuss I think raised by this that we will want to address is that we still have overlap/duplication with at least the kargs parsing in composefs-boot. So one option is to move this there (and also make it |
f56e9d4 to
d650114
Compare
This adds a new `kernel::Cmdline` struct, which is populated either via `Cmdline::from` (borrowed) or `Cmdline::from_proc` (owned). This attempts to follow the same behavior as the kernel, which is mostly covered in: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/cmdline.c?id=e8d780dcd957d80725ad5dd00bab53b856429bc0#n227 The algorithm is basically: - Scan forward until you find the first unquoted isspace() byte. That's the end of the param. - If you encounter an `=` along the way, note where. That's where it will terminate the key and split for the value. Any future `=` are not treated as special. - The value can be quoted to allow spaces, but is unquoted only in as much as `"` is removed from the first or last byte. You can still have `"` in the middle of the value. This operates on `&[u8]` because the kernel does not enforce any particular encoding for the cmdline. Iterating using `Cmdline::iter()` will emit the `Parameter` type, which has helper methods `key_lossy()` and `value_lossy()` to convert potentially-non-UTF8 data into `String`s. Resolves: bootc-dev#1425 Signed-off-by: John Eckersberg <jeckersb@redhat.com>
d650114 to
cf7f150
Compare
This adds a new
kernel::Cmdlinestruct, which is populated eithervia
Cmdline::from(borrowed) orCmdline::from_proc(owned).This attempts to follow the same behavior as the kernel, which is
mostly covered in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/cmdline.c?id=e8d780dcd957d80725ad5dd00bab53b856429bc0#n227
The algorithm is basically:
byte. That's the end of the param.
=along the way, note where. That's where itwill terminate the key and split for the value. Any future
=arenot treated as special.
much as
"is removed from the first or last byte. You can stillhave
"in the middle of the value.This operates on
&[u8]because the kernel does not enforce anyparticular encoding for the cmdline. Iterating using
Cmdline::iter()will emit theParametertype, which has helpermethods
key_lossy()andvalue_lossy()to convertpotentially-non-UTF8 data into
Strings.Resolves: #1425
Signed-off-by: John Eckersberg jeckersb@redhat.com