Skip to content

Conversation

@Viicos
Copy link
Member

@Viicos Viicos commented Aug 9, 2024

Fixes #8208

The annotated validators now have a json_schema_input_type argument.
Users can provide a type hint reflecting the allowed type
for the validator. Unlike serializers where the type hint
can be inferred from the return type annotation of the
function, we don't allow this pattern in our case as we want
to enforce the validator functions to check for any type
(by using typing.Any).

For plain validators only, this argument defaults to Any
(for wrap and before validators, behavior remains unchanged
and the inner schema is being used).

Some internal refactoring related to application of validators
was also done to reduce code duplication.

Pyright tests were enhanced as well.

TODO:

  • Update field_validator as well. This is going to be tricky as core schema from validators are created in apply_validators. This function has less flexibility as no GetCoreSchemaHandler is available, but we require one (or at least maybe an instance of the current GenerateSchema) to call handler.generate_schema(input_type).
  • Decide if we want to default to Any for input_type (so that the JSON Schema for a model with a PlainValidator works out of the box).

Change Summary

Related issue number

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

@github-actions github-actions bot added the relnotes-fix Used for bugfixes. label Aug 9, 2024
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Aug 9, 2024

Deploying pydantic-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0193985
Status: ✅  Deploy successful!
Preview URL: https://93a766ef.pydantic-docs.pages.dev
Branch Preview URL: https://8208.pydantic-docs.pages.dev

View logs

@Viicos Viicos marked this pull request as draft August 9, 2024 20:35
@codspeed-hq
Copy link

codspeed-hq bot commented Aug 9, 2024

CodSpeed Performance Report

Merging #10094 will not alter performance

Comparing 8208 (0193985) with main (03a90cb)

Summary

✅ 17 untouched benchmarks

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2024

Coverage report

Image Image

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  pydantic
  functional_validators.py ImageImageImageImage
  json_schema.py ImageImageImageImage1014
  pydantic/_internal
  _core_metadata.py ImageImageImageImage
  _decorators.py ImageImageImageImage
  _generate_schema.py ImageImageImageImage
Project Total ImageImageImageImage 

This report was generated by python-coverage-comment-action

@Viicos Viicos marked this pull request as ready for review August 15, 2024 13:45
@sydney-runkle sydney-runkle added relnotes-change Used for changes to existing functionality which don't have a better categorization. and removed relnotes-fix Used for bugfixes. labels Aug 16, 2024
Copy link
Contributor

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work on this thus far. I'm still struggling with how we send over the input schema via metadata. Will dig around for an alternative idea, but your implementation seems like a good approach, if we end up going with this route.

Comment on lines 111 to 113
@field_validator('foo', mode='before', input_type=int) # `input_type` allowed here.
@classmethod
def valid_with_info(cls, value: Any, info: ValidationInfo) -> Any: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall we chatted about this recently on a call, but so that we have it documented on this PR:

Why can't we use the following for this case, and infer the input type from the method signature?

@field_validator('foo', mode='before') 
    @classmethod
    def valid_with_info(cls, value: int, info: ValidationInfo) -> Any: ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention in the PR description:

want to enforce the validator functions to check for any type (by using typing.Any)

Could you elaborate on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in reality any value can be passed in, so the type hint won't reflect the actual possible values. I prefer to enforce that users use Any for the value type annotation, so that they make sure to actually check the allowed values. If we were to encourage users to annotate value to the allowed types:

@field_validator('foo', mode='before') 
    @classmethod
    def valid_with_info(cls, value: int, info: ValidationInfo) -> Any: ...

Users will likely forget to do the proper runtime checks on value and assume it is already an int. This is even more likely if they do something like:

@field_validator('foo', mode='before') 
    @classmethod
    def valid_with_info(cls, value: int, info: ValidationInfo) -> Any:
        if not isinstance(value, int):  # This should be done by users
            raise ValidationError(...)  # Type checker error, marked as unreachable code

Copy link
Contributor

@dmontagu dmontagu Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Viicos here, it's dangerous to annotate the value field with any type hint besides Any/object (and technically object is the safer thing to use as the type hint for this, and then do isinstance checks to narrow it), as it will mean that the type checker will assume that you can't get arbitrary stuff in the before validator, but you can.

In particular, I don't want to encourage users to try to annotate with anything besides that as a way to reduce the amount of code they write, as it will lead to problems. The JSON schema is useful for conveying what users should be sending, but shouldn't be in tension/conflict with actual correctness-oriented type checking.

(Relevant mypy issue mentioning stating explicitly that object is the type to use when you want the type checker to make no assumptions about the input value, and also not treat it as Any: python/mypy#3712)

@sydney-runkle
Copy link
Contributor

One idea that I had was trying to take advantage of:

def get_json_schema_update_func(
json_schema_update: JsonSchemaValue, json_schema_extra: JsonDict | typing.Callable[[JsonDict], None] | None
) -> GetJsonSchemaFunction:
def json_schema_update_func(
core_schema_or_field: CoreSchemaOrField, handler: GetJsonSchemaHandler
) -> JsonSchemaValue:
json_schema = {**handler(core_schema_or_field), **json_schema_update}
add_json_schema_extra(json_schema, json_schema_extra)
return json_schema
return json_schema_update_func

like we do in other areas of _generate_schema.py. But, that doesn't seem to support the kind of changes we need that are dependent on JSON schema mode

@sydney-runkle
Copy link
Contributor

I guess, as you've mentioned above, the challenge comes from the fact that we don't need to represent the input_schema in the core schema like we do for return_schema in serialization schemas, like:

https://github.com/pydantic/pydantic-core/blob/a6d0d631b529e5b4f38ccd71dd79af60a61516b8/python/pydantic_core/core_schema.py#L275-L282

One thing that I might be a fan of is adding input_schema to the core schema for validator functions. Even though it wouldn't be enforced in pydantic-core, it'd make things cleaner on the pydantic side of things.

WDYT @davidhewitt?

Comment on lines 59 to 61
input_type: The input type of the function. This is only used to generate
the appropriate JSON Schema (in validation mode) and can only specified
when `mode` is either `'before'`, `'plain'` or `'wrap'`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this only affects the JSON schema and not any actual validation, I think the name of this argument should reflect that. Otherwise I think you might expect/have a hard time remembering if it validates the value into an int first.

I understand the point of this is to handle non-after modes, so it makes sense for it to not add any validation, but I think you have to think about it a bit hard to remember this. If the argument was just json_schema_input_type it's very clear what it means.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all fairness, this is a good argument against my idea to add support for this to the core schema. We can chat about this in more detail on our next catchup 👍

@davidhewitt
Copy link
Contributor

Why wouldn't input_schema be enforced in pydantic-core? Should we make it so core can enforce it (and it defaults to any_schema() )?

Can we take input_type from function annotations?

e.g. instead of

        @field_validator('a', mode='plain', input_type=int)
         @classmethod
         def validate_a(self, value):
             return value

can we have

        @field_validator('a', mode='plain')
         @classmethod
         def validate_a(self, value: int):
             return value

@sydney-runkle
Copy link
Contributor

Hmm, interesting thought re enforcing input_type. That makes me more in favor of the change to support this in core schema.

@davidhewitt re your other question, see this thread: #10094 (comment)

@davidhewitt
Copy link
Contributor

@davidhewitt re your other question, see this thread: #10094 (comment)

It seems to me that if pydantic-core enforced input_type then that problem goes away (because the type hint would indeed be what pydantic-core would pass in).

@Viicos
Copy link
Member Author

Viicos commented Aug 19, 2024

Imo this complicates the logic of such validators. Taking the example of BeforeValidator:

from typing import Annotated

from pydantic import BaseModel, BeforeValidator

def validate(value: str | int) -> int:
    if isinstance(value, str):
        value = int(value)
    return value

class Model(BaseModel):
    a: Annotated[int, BeforeValidator(validate)]

if input_type is enforced during validation, it means:

  1. We validate the input value to be either int | str
  2. The validation function is called
  3. The returned value is validated against the source type: int

But at the same time I can see how this can be useful

@sydney-runkle
Copy link
Contributor

I'd be in favor of this change, though perhaps we should add a config setting to support skipping validation on input type to preserve backwards compatibility?

@davidhewitt
Copy link
Contributor

I agree it would add complexity, and probably for unions there would be a slight cost as the user basically has to make the same isinstance checks after we've already done them.

Upsides:

  • More consistent error messages for custom validator inputs (core would emit them instead of users)
  • Nice to be able to add type hints to these functions

Downsides:

  • Complexity
  • Backwards-compatibility concern?

Performance might be better in some cases and worse in others...

@sydney-runkle sydney-runkle mentioned this pull request Aug 20, 2024
19 tasks
@Viicos Viicos force-pushed the 8208 branch 3 times, most recently from 0eff620 to 791ba59 Compare August 20, 2024 19:29
Copy link
Contributor

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress here - I think with one more round of significant changes + another final review, this will be close to ready 👍. I've left some change requests + some nitpicks, as well as a review request from @adriangb, who helped design + implement the awesome annotated validator patterns.

I'm going to open a PR shortly with a simplification for some of the CoreMetadata stuff that you can hopefully pull in to make upcoming changes easier!

Comment on lines 201 to 202
input_schema = (
None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your question from above:

Decide if we want to default to Any for input_type (so that the JSON Schema for a model with a PlainValidator works out of the box).

I'd opt for yes - we basically infer this under the hood, so I'd opt into exposing this in JSON schema as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made Any the default only for PlainValidator. For the others, the current behavior is to generate a JSON Schema for the type annotation (i.e. in related tests, int). Overriding this with Any would be a breaking change. I updated the tests if you want to see how it behaves.

Comment on lines +82 to +85
@classmethod
def _from_decorator(cls, decorator: _decorators.Decorator[_decorators.FieldValidatorDecoratorInfo]) -> Self:
return cls(func=decorator.func)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriangb, what do you think about this _from_decorator pattern? Sort of replaces the old pattern of apply_validators using _VALIDATOR_F_MATCH. I didn't help to write that section of the codebase, so wanted to get feedback from someone who might have some more thoughts / context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll note that this removes the need for _VALIDATOR_F_MATCH and apply_validators. They are kept for deprecated decorator code but will be removed in v3).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I honestly don't recall any of this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I'm satisfied with this pattern, let's move forward here 👍

Comment on lines 984 to 988
if self._mode == 'validation' and (input_schema := schema.get('metadata', {}).get('input_schema')):
return self.generate_inner(input_schema)

return self.generate_inner(schema['schema'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to use the CoreMetadataHandler thing here once we make the switch to support json_schema_input_type there -- perhaps we also want to name the attribute with json in the name for consistency + clarity

if field_info.discriminator is not None:
schema = self._apply_annotations(source_type, annotations, transform_inner_schema=set_discriminator)
schema = self._apply_annotations(
source_type, annotations + validators_from_decorators, transform_inner_schema=set_discriminator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting pattern - I'm assuming, based on passing tests, that this preserves annotation order as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it does. Previously, annotations where applied and then the validators from decorators. Same happens now, as _apply_annotations will apply the annotations argument from "left to right".

@pydantic-hooky pydantic-hooky bot added awaiting author revision awaiting changes from the PR author labels Aug 20, 2024
@Viicos
Copy link
Member Author

Viicos commented Aug 21, 2024

What about having the following pattern for metadata handling?

from typing_extensions import TypedDict, Generic, TypeVar, Literal, Required

ExtraT = TypeVar("ExtraT")

class CoreMetadata(TypedDict, Generic[ExtraT], total=False):
    pydantic_js_functions: list[GetJsonSchemaFunction]
    pydantic_js_annotation_functions: list[GetJsonSchemaFunction]
    extra_metadata: ExtraT

class BeforeValidatorFunctionSchemaExtra(TypedDict, total=False):
    input_schema: CoreSchema

class BeforeValidatorFunctionSchema(TypedDict, total=False):
    type: Required[Literal['function-before']]
    function: Required[ValidationFunction]
    schema: Required[CoreSchema]
    ref: str
    metadata: CoreMetadata[BeforeValidatorFunctionSchemaExtra]
    serialization: SerSchema

In the meanwhile I'm still going to update this PR to have an explicit input_schema key on the metadata dict to not complicate things. But the suggested approach can be discussed for a future refactor.

@Viicos Viicos force-pushed the 8208 branch 2 times, most recently from a56d7e6 to 35205c6 Compare August 22, 2024 07:54
Viicos added 9 commits August 22, 2024 10:22
The annotated validators now have a `input_type` argument.
Users can provide a type hint reflecting the allowed type
for the validator. Unlike serializers where the type hint
can be inferred from the return type annotation of the
function, we don't allow this pattern in our case as we want
to enforce the validator functions to check for any type
(by using `typing.Any`).
…ema generation

This reduces code duplication, as the only source of truth is now
defined in the `__get_pydantic_core_schema__` methods.
The `apply_validators` function is kept for deprecated
decorators but will be removed at some point.

Additionally, this allows `input_type` to be taken into account,
without having to duplicate code even more.
Update documentation for the added user error code.
Copy link
Contributor

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great - awesome work on this.

As mentioned elsewhere in this PR, I'm going to do a deeper dive into refactoring CoreMetadata and related logic after we get v2.9 out 🚀

@sydney-runkle
Copy link
Contributor

Given this comment about no breaking changes in how we apply validators, I'm happy to merge here :).

@Viicos Viicos merged commit 74c81a8 into main Aug 22, 2024
@Viicos Viicos deleted the 8208 branch August 22, 2024 13:43
@Viicos Viicos mentioned this pull request Dec 2, 2024
13 tasks
mmmarcos added a commit to Scille/parsec-cloud that referenced this pull request Mar 20, 2025
The error during `JsonSchema` generation for
`PlainValidatorFunctionSchema` (#6345) is fixed in a newer version of
pydantic.

See: pydantic/pydantic#10140 (comment)
See: pydantic/pydantic#10094

Updates `pydantic` from 2.7.1 to 2.10.6
Updates `pydantic-core` from 2.18.2 to 2.27.2
Updates `typing-extensions` from 4.11.0 4.12.2
github-merge-queue bot pushed a commit to Scille/parsec-cloud that referenced this pull request Mar 21, 2025
The error during `JsonSchema` generation for
`PlainValidatorFunctionSchema` (#6345) is fixed in a newer version of
pydantic.

See: pydantic/pydantic#10140 (comment)
See: pydantic/pydantic#10094

Updates `pydantic` from 2.7.1 to 2.10.6
Updates `pydantic-core` from 2.18.2 to 2.27.2
Updates `typing-extensions` from 4.11.0 4.12.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting author revision awaiting changes from the PR author relnotes-change Used for changes to existing functionality which don't have a better categorization.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better support for JSON schema for Callable validators

6 participants