-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Allow validators to customize validation JSON schema #10094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploying pydantic-docs with
|
| Latest commit: |
0193985
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://93a766ef.pydantic-docs.pages.dev |
| Branch Preview URL: | https://8208.pydantic-docs.pages.dev |
CodSpeed Performance ReportMerging #10094 will not alter performanceComparing Summary
|
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
sydney-runkle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work on this thus far. I'm still struggling with how we send over the input schema via metadata. Will dig around for an alternative idea, but your implementation seems like a good approach, if we end up going with this route.
tests/pyright/decorators.py
Outdated
| @field_validator('foo', mode='before', input_type=int) # `input_type` allowed here. | ||
| @classmethod | ||
| def valid_with_info(cls, value: Any, info: ValidationInfo) -> Any: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall we chatted about this recently on a call, but so that we have it documented on this PR:
Why can't we use the following for this case, and infer the input type from the method signature?
@field_validator('foo', mode='before')
@classmethod
def valid_with_info(cls, value: int, info: ValidationInfo) -> Any: ...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mention in the PR description:
want to enforce the validator functions to check for any type (by using typing.Any)
Could you elaborate on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because in reality any value can be passed in, so the type hint won't reflect the actual possible values. I prefer to enforce that users use Any for the value type annotation, so that they make sure to actually check the allowed values. If we were to encourage users to annotate value to the allowed types:
@field_validator('foo', mode='before')
@classmethod
def valid_with_info(cls, value: int, info: ValidationInfo) -> Any: ...Users will likely forget to do the proper runtime checks on value and assume it is already an int. This is even more likely if they do something like:
@field_validator('foo', mode='before')
@classmethod
def valid_with_info(cls, value: int, info: ValidationInfo) -> Any:
if not isinstance(value, int): # This should be done by users
raise ValidationError(...) # Type checker error, marked as unreachable codeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @Viicos here, it's dangerous to annotate the value field with any type hint besides Any/object (and technically object is the safer thing to use as the type hint for this, and then do isinstance checks to narrow it), as it will mean that the type checker will assume that you can't get arbitrary stuff in the before validator, but you can.
In particular, I don't want to encourage users to try to annotate with anything besides that as a way to reduce the amount of code they write, as it will lead to problems. The JSON schema is useful for conveying what users should be sending, but shouldn't be in tension/conflict with actual correctness-oriented type checking.
(Relevant mypy issue mentioning stating explicitly that object is the type to use when you want the type checker to make no assumptions about the input value, and also not treat it as Any: python/mypy#3712)
|
One idea that I had was trying to take advantage of: pydantic/pydantic/_internal/_generate_schema.py Lines 2448 to 2458 in ed54f28
like we do in other areas of |
|
I guess, as you've mentioned above, the challenge comes from the fact that we don't need to represent the One thing that I might be a fan of is adding WDYT @davidhewitt? |
pydantic/_internal/_decorators.py
Outdated
| input_type: The input type of the function. This is only used to generate | ||
| the appropriate JSON Schema (in validation mode) and can only specified | ||
| when `mode` is either `'before'`, `'plain'` or `'wrap'`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this only affects the JSON schema and not any actual validation, I think the name of this argument should reflect that. Otherwise I think you might expect/have a hard time remembering if it validates the value into an int first.
I understand the point of this is to handle non-after modes, so it makes sense for it to not add any validation, but I think you have to think about it a bit hard to remember this. If the argument was just json_schema_input_type it's very clear what it means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all fairness, this is a good argument against my idea to add support for this to the core schema. We can chat about this in more detail on our next catchup 👍
|
Why wouldn't Can we take e.g. instead of @field_validator('a', mode='plain', input_type=int)
@classmethod
def validate_a(self, value):
return valuecan we have @field_validator('a', mode='plain')
@classmethod
def validate_a(self, value: int):
return value |
|
Hmm, interesting thought re enforcing @davidhewitt re your other question, see this thread: #10094 (comment) |
It seems to me that if |
|
Imo this complicates the logic of such validators. Taking the example of from typing import Annotated
from pydantic import BaseModel, BeforeValidator
def validate(value: str | int) -> int:
if isinstance(value, str):
value = int(value)
return value
class Model(BaseModel):
a: Annotated[int, BeforeValidator(validate)]if
But at the same time I can see how this can be useful |
|
I'd be in favor of this change, though perhaps we should add a config setting to support skipping validation on input type to preserve backwards compatibility? |
|
I agree it would add complexity, and probably for unions there would be a slight cost as the user basically has to make the same Upsides:
Downsides:
Performance might be better in some cases and worse in others... |
0eff620 to
791ba59
Compare
sydney-runkle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress here - I think with one more round of significant changes + another final review, this will be close to ready 👍. I've left some change requests + some nitpicks, as well as a review request from @adriangb, who helped design + implement the awesome annotated validator patterns.
I'm going to open a PR shortly with a simplification for some of the CoreMetadata stuff that you can hopefully pull in to make upcoming changes easier!
pydantic/functional_validators.py
Outdated
| input_schema = ( | ||
| None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding your question from above:
Decide if we want to default to Any for input_type (so that the JSON Schema for a model with a PlainValidator works out of the box).
I'd opt for yes - we basically infer this under the hood, so I'd opt into exposing this in JSON schema as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made Any the default only for PlainValidator. For the others, the current behavior is to generate a JSON Schema for the type annotation (i.e. in related tests, int). Overriding this with Any would be a breaking change. I updated the tests if you want to see how it behaves.
| @classmethod | ||
| def _from_decorator(cls, decorator: _decorators.Decorator[_decorators.FieldValidatorDecoratorInfo]) -> Self: | ||
| return cls(func=decorator.func) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adriangb, what do you think about this _from_decorator pattern? Sort of replaces the old pattern of apply_validators using _VALIDATOR_F_MATCH. I didn't help to write that section of the codebase, so wanted to get feedback from someone who might have some more thoughts / context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'll note that this removes the need for _VALIDATOR_F_MATCH and apply_validators. They are kept for deprecated decorator code but will be removed in v3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I honestly don't recall any of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. I'm satisfied with this pattern, let's move forward here 👍
pydantic/json_schema.py
Outdated
| if self._mode == 'validation' and (input_schema := schema.get('metadata', {}).get('input_schema')): | ||
| return self.generate_inner(input_schema) | ||
|
|
||
| return self.generate_inner(schema['schema']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to use the CoreMetadataHandler thing here once we make the switch to support json_schema_input_type there -- perhaps we also want to name the attribute with json in the name for consistency + clarity
| if field_info.discriminator is not None: | ||
| schema = self._apply_annotations(source_type, annotations, transform_inner_schema=set_discriminator) | ||
| schema = self._apply_annotations( | ||
| source_type, annotations + validators_from_decorators, transform_inner_schema=set_discriminator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting pattern - I'm assuming, based on passing tests, that this preserves annotation order as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it does. Previously, annotations where applied and then the validators from decorators. Same happens now, as _apply_annotations will apply the annotations argument from "left to right".
|
What about having the following pattern for metadata handling? from typing_extensions import TypedDict, Generic, TypeVar, Literal, Required
ExtraT = TypeVar("ExtraT")
class CoreMetadata(TypedDict, Generic[ExtraT], total=False):
pydantic_js_functions: list[GetJsonSchemaFunction]
pydantic_js_annotation_functions: list[GetJsonSchemaFunction]
extra_metadata: ExtraT
class BeforeValidatorFunctionSchemaExtra(TypedDict, total=False):
input_schema: CoreSchema
class BeforeValidatorFunctionSchema(TypedDict, total=False):
type: Required[Literal['function-before']]
function: Required[ValidationFunction]
schema: Required[CoreSchema]
ref: str
metadata: CoreMetadata[BeforeValidatorFunctionSchemaExtra]
serialization: SerSchemaIn the meanwhile I'm still going to update this PR to have an explicit |
a56d7e6 to
35205c6
Compare
The annotated validators now have a `input_type` argument. Users can provide a type hint reflecting the allowed type for the validator. Unlike serializers where the type hint can be inferred from the return type annotation of the function, we don't allow this pattern in our case as we want to enforce the validator functions to check for any type (by using `typing.Any`).
…ema generation This reduces code duplication, as the only source of truth is now defined in the `__get_pydantic_core_schema__` methods. The `apply_validators` function is kept for deprecated decorators but will be removed at some point. Additionally, this allows `input_type` to be taken into account, without having to duplicate code even more.
Update documentation for the added user error code.
sydney-runkle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great - awesome work on this.
As mentioned elsewhere in this PR, I'm going to do a deeper dive into refactoring CoreMetadata and related logic after we get v2.9 out 🚀
|
Given this comment about no breaking changes in how we apply validators, I'm happy to merge here :). |
The error during `JsonSchema` generation for `PlainValidatorFunctionSchema` (#6345) is fixed in a newer version of pydantic. See: pydantic/pydantic#10140 (comment) See: pydantic/pydantic#10094 Updates `pydantic` from 2.7.1 to 2.10.6 Updates `pydantic-core` from 2.18.2 to 2.27.2 Updates `typing-extensions` from 4.11.0 4.12.2
The error during `JsonSchema` generation for `PlainValidatorFunctionSchema` (#6345) is fixed in a newer version of pydantic. See: pydantic/pydantic#10140 (comment) See: pydantic/pydantic#10094 Updates `pydantic` from 2.7.1 to 2.10.6 Updates `pydantic-core` from 2.18.2 to 2.27.2 Updates `typing-extensions` from 4.11.0 4.12.2

Fixes #8208
The annotated validators now have a
json_schema_input_typeargument.Users can provide a type hint reflecting the allowed type
for the validator. Unlike serializers where the type hint
can be inferred from the return type annotation of the
function, we don't allow this pattern in our case as we want
to enforce the validator functions to check for any type
(by using
typing.Any).For plain validators only, this argument defaults to
Any(for wrap and before validators, behavior remains unchanged
and the inner schema is being used).
Some internal refactoring related to application of validators
was also done to reduce code duplication.
Pyright tests were enhanced as well.
TODO:
field_validatoras well. This is going to be tricky as core schema from validators are created inapply_validators. This function has less flexibility as noGetCoreSchemaHandleris available, but we require one (or at least maybe an instance of the currentGenerateSchema) to callhandler.generate_schema(input_type).Anyforinput_type(so that the JSON Schema for a model with aPlainValidatorworks out of the box).Change Summary
Related issue number
Checklist