[arrow-string]: add like::eq_ascii_ignore_case kernel#9871
Conversation
| /// let result = eq_ignore_ascii_case(&strings, &patterns).unwrap(); | ||
| /// assert_eq!(result, BooleanArray::from(vec![true, true, true, false])); | ||
| /// ``` | ||
| pub fn eq_ignore_ascii_case( |
There was a problem hiding this comment.
I'm hoping like is the correct module to add this function?
I added it here because I wanted to reuse a lot of the machinery that was already in place to invoke the Predicate, but I was hesitating because this does not have an equivalent SQL function, unlike the other functions/Operator variants in this module.
There was a problem hiding this comment.
I think this is fine
The other potential option would be in the eq module of arrow-ord -- but that doesn't have string stuff, so I think this is actually better
There was a problem hiding this comment.
Maybe we can change the module doc to be more general:
from
//! Provide SQL's LIKE operators for Arrow's string arrays
to something like:
//! String predicate kernels for Arrow arrays.
//!
//! Provides SQL `LIKE`/`ILIKE` kernels as well as related
//! string predicates such as `contains`, `starts_with`, `ends_with`, and
//! ASCII case-insensitive equality.
alamb
left a comment
There was a problem hiding this comment.
Thanks @albertlockett -- I have some comment / test suggestions but I also think we could do them as follow ons too
| /// let result = eq_ignore_ascii_case(&strings, &patterns).unwrap(); | ||
| /// assert_eq!(result, BooleanArray::from(vec![true, true, true, false])); | ||
| /// ``` | ||
| pub fn eq_ignore_ascii_case( |
There was a problem hiding this comment.
I think this is fine
The other potential option would be in the eq module of arrow-ord -- but that doesn't have string stuff, so I think this is actually better
| like_op(Op::Contains, left, right) | ||
| } | ||
|
|
||
| /// Perform equality check on two byte arrays using an ASCII case-insensitive match. |
There was a problem hiding this comment.
It can also be string / stringview / largestring arrays too I think
There was a problem hiding this comment.
good catch. fixed in 1f3b03c
"byte array" probably isn't the right wording here. I removed the word "byte" to avoid confusion, and the rust doc also lists the types.
Unrelated question - wondering if e should support large string view in these functions as well?
There was a problem hiding this comment.
Unrelated question - wondering if e should support large string view in these functions as well?
Eventually yes. That would be a good follow on PR / issue perhaps
|
|
||
| test_utf8!( | ||
| test_utf8_array_eq_ignore_ascii_case, | ||
| vec!["arrow", "arrow", "arrow", "parquet", "parquet"], |
There was a problem hiding this comment.
could we add a substring test too -- like comparing arrow to arro and verify they don't match
| /// let result = eq_ignore_ascii_case(&strings, &patterns).unwrap(); | ||
| /// assert_eq!(result, BooleanArray::from(vec![true, true, true, false])); | ||
| /// ``` | ||
| pub fn eq_ignore_ascii_case( |
There was a problem hiding this comment.
Maybe we can change the module doc to be more general:
from
//! Provide SQL's LIKE operators for Arrow's string arrays
to something like:
//! String predicate kernels for Arrow arrays.
//!
//! Provides SQL `LIKE`/`ILIKE` kernels as well as related
//! string predicates such as `contains`, `starts_with`, `ends_with`, and
//! ASCII case-insensitive equality.
alamb
left a comment
There was a problem hiding this comment.
Thanks again @albertlockett
# Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - Closes apache#9870 # Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Users might wish to have a mechanism to do check if two strings are equal in a case-insensitive way. We already have some machinery to do this in the arrow-string crate (in the `predicate` module). This PR exposes a function so it can be invoked easily. # What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Adds a function `like::eq_ascii_ignore_case` to the arrow-string crate that performs a case-insensitive equals check on two string arrays. # Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? If this PR claims a performance improvement, please include evidence such as benchmark results. --> yes unit tests # Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. --> Yes this function is public
Which issue does this PR close?
Rationale for this change
Users might wish to have a mechanism to do check if two strings are equal in a case-insensitive way. We already have some machinery to do this in the arrow-string crate (in the
predicatemodule). This PR exposes a function so it can be invoked easily.What changes are included in this PR?
Adds a function
like::eq_ascii_ignore_caseto the arrow-string crate that performs a case-insensitive equals check on two string arrays.Are these changes tested?
yes unit tests
Are there any user-facing changes?
Yes this function is public