-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies #32550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies #32550
Conversation
4c2bbb4 to
16314f6
Compare
|
I'm not sure who to tag about this, but I have added to the |
|
Hey everyone, @NielsRogge @amyeroberts @Narsil, is there anything missing from this PR so that it can be merged? |
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the investigation work and adding this logic!
Most comments are about simplifying the added logic and bringing it more in-line with transformers. Overall looks great!
src/transformers/image_transforms.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The post-processing methods for image processors should follow a similar pattern to the tokenizers i.e. accept and return the same array type (numpy arrays or torch tensors) e.g. like here for batch_decode. No need to bother with tensorflow tensors at the moment as we have so few TF vision models. We shouldn't be converting to PIL.Image.Image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ArthurZucker , regarding your question, initially the post-processing method returned a PIL image too, however, my understanding is that this functionality did not follow the "usual" transformers conventions. We can easily add it back though.
16314f6 to
e727238
Compare
|
@amyeroberts let me know if there's anything else that needs changing. |
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the continued work on this!
Mostly nits on formatting and docs
e727238 to
32a6b18
Compare
|
@amyeroberts kind reminder |
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
The only outstanding question is about the handling of different input image sizes.
|
Linking to the outstanding comment here, in case it's been lost in review, as it's always hard to find again in github: #32550 (comment) |
fca0507 to
672df15
Compare
|
Hey @amyeroberts , I finished up with the last pending comments; let me know if it seems ok. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hey @amyeroberts , in the docs, I have added a code block and a link inside a "tip" block that did not render markdown assets as usual, as seen in the image. Do you have a good idea of how to fix this?
|
672df15 to
038187b
Compare
|
@alex-bene Tbh, I'm not sure. I'd suggest just removing the tip tags altogether -- the section follows on directly from the part above, so I don't think this will impact the message in the docs much |
amyeroberts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - thanks for iterating on this!
Just two nits on the docs and fixing the rendering for the top section. Once done we're good to merge!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't right - it's not returning a list of dictionaries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Hey @amyeroberts , I was thinking of doing something like this if it makes sense: <Tip>
<p>In the <a href="https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/models/depth_model.py#L131">original implementation</a> ZoeDepth model performs inference on both the original and flipped images and averages out the results. The <code>post_process_depth_estimation</code> function can handle this for us by passing the flipped outputs to the optional <code>outputs_flipped</code> argument:</p>
<pre><code class="language-Python">>>> with torch.no_grad():
... outputs = model(pixel_values)
... outputs_flipped = model(pixel_values=torch.flip(inputs.pixel_values, dims=[3]))
>>> post_processed_output = image_processor.post_process_depth_estimation(
... outputs,
... source_sizes=[image.size[::-1]],
... outputs_flipped=outputs_flipped,
... )
</code></pre>
</Tip> |
You can always try and see how it renders :) |
c330e43 to
0f6c4c6
Compare
|
Yeah, it seems to have done the trick @amyeroberts -- I think we're good!
|
|
@alex-bene Great! Could you resolve the conflicts? Once done I think we're good to merge! cc @qubvel |
qubvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add a comment regarding the target size. I believe the following format improves readability.
src/transformers/models/depth_anything/modeling_depth_anything.py
Outdated
Show resolved
Hide resolved
0972c23 to
7981b09
Compare
|
@amyeroberts @qubvel I think we're done, but let me know if there's anything else pending. |
Co-authored-by: Pavel Iakubovskii <[email protected]>
…Depth's `post_process_depth_estimation`
7981b09 to
c93c415
Compare
qubvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for update!
|
@qubvel is it now ready to merge? |
|
@qubvel @amyeroberts kind reminder |
|
@alex-bene thanks for ping! cc @ArthurZucker for final review |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left a small question but good otherwise!
| >>> depth = predicted_depth * 255 / predicted_depth.max() | ||
| >>> depth = depth.detach().cpu().numpy() | ||
| >>> depth = Image.fromarray(depth.astype("uint8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while we are at it , is there a reason not to do this in the image processing? (*225, divide by max etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ArthurZucker , check here please.
| depth = output["predicted_depth"].detach().cpu().numpy() | ||
| depth = (depth - depth.min()) / (depth.max() - depth.min()) | ||
| depth = Image.fromarray((depth * 255).astype("uint8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same Q here!
|
Ok thanks, merging as you answered 🤗 |
|
Hi @alex-bene Thank you for the contribution! There are 2 failing tests after this PR is merged to (GLPN model) FAILED tests/models/glpn/test_modeling_glpn.py::GLPNModelTest::test_pipeline_depth_estimation - AttributeError: 'GLPNImageProcessor' object has no attribute 'post_process_depth_estimation'
FAILED tests/models/glpn/test_modeling_glpn.py::GLPNModelTest::test_pipeline_depth_estimation_fp16 - AttributeError: 'GLPNImageProcessor' object has no attribute 'post_process_depth_estimation'Could you take a look? Thanks in advance. |
|
Hey @ydshieh , that's because this is a new model not available when we initially added the post processing interface for depth models. I'll try to make the necessary changes until ~tomorrow. |
…Depth's inference intricacies (huggingface#32550) * add colorize_depth and matplotlib availability check * add post_process_depth_estimation for zoedepth + tests * add post_process_depth_estimation for DPT + tests * add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth * run `make fixup` * fix import related error on tests * fix more import related errors on test * forgot some `torch` calls in declerations * remove `torch` call in zoedepth tests that caused error * updated docs for depth estimation * small fix for `colorize` input/output types * remove `colorize_depth`, fix various names, remove matplotlib dependency * fix formatting * run fixup * different images for test * update examples in `forward` functions * fixed broken links * fix output types for docs * possible format fix inside `<Tip>` * Readability related updates Co-authored-by: Pavel Iakubovskii <[email protected]> * Readability related update * cleanup after merge * refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation` * rewrite dict merging to support python 3.8 --------- Co-authored-by: Pavel Iakubovskii <[email protected]>


What does this PR do?
This PR adds a
post_process_depth_estimationmethod for the image processors of depth estimation models, similar to thepost_process_semantic_segmentationmethods for the segmentation models. Also, it updates the depth estimation pipeline to use the newpost_process_depth_estimationmethod. Lastly, it adds full support for the ZoeDepth special inference (dynamically padded input + inference of the flipped of the input). A small update to the documentation is pending.Fixes #30917 #32381
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@NielsRogge @amyeroberts @Narsil