Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies#32550
Conversation
4c2bbb4 to
16314f6
Compare
|
I'm not sure who to tag about this, but I have added to the |
|
Hey everyone, @NielsRogge @amyeroberts @Narsil, is there anything missing from this PR so that it can be merged? |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for all the investigation work and adding this logic!
Most comments are about simplifying the added logic and bringing it more in-line with transformers. Overall looks great!
There was a problem hiding this comment.
The post-processing methods for image processors should follow a similar pattern to the tokenizers i.e. accept and return the same array type (numpy arrays or torch tensors) e.g. like here for batch_decode. No need to bother with tensorflow tensors at the moment as we have so few TF vision models. We shouldn't be converting to PIL.Image.Image
There was a problem hiding this comment.
Hey @ArthurZucker , regarding your question, initially the post-processing method returned a PIL image too, however, my understanding is that this functionality did not follow the "usual" transformers conventions. We can easily add it back though.
16314f6 to
e727238
Compare
|
@amyeroberts let me know if there's anything else that needs changing. |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for the continued work on this!
Mostly nits on formatting and docs
e727238 to
32a6b18
Compare
|
@amyeroberts kind reminder |
amyeroberts
left a comment
There was a problem hiding this comment.
Looks great!
The only outstanding question is about the handling of different input image sizes.
|
Linking to the outstanding comment here, in case it's been lost in review, as it's always hard to find again in github: #32550 (comment) |
fca0507 to
672df15
Compare
|
Hey @amyeroberts , I finished up with the last pending comments; let me know if it seems ok. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hey @amyeroberts , in the docs, I have added a code block and a link inside a "tip" block that did not render markdown assets as usual, as seen in the image. Do you have a good idea of how to fix this?
|
672df15 to
038187b
Compare
|
@alex-bene Tbh, I'm not sure. I'd suggest just removing the tip tags altogether -- the section follows on directly from the part above, so I don't think this will impact the message in the docs much |
amyeroberts
left a comment
There was a problem hiding this comment.
Looks great - thanks for iterating on this!
Just two nits on the docs and fixing the rendering for the top section. Once done we're good to merge!
There was a problem hiding this comment.
This isn't right - it's not returning a list of dictionaries
Hey @amyeroberts , I was thinking of doing something like this if it makes sense: <Tip>
<p>In the <a href="https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/models/depth_model.py#L131">original implementation</a> ZoeDepth model performs inference on both the original and flipped images and averages out the results. The <code>post_process_depth_estimation</code> function can handle this for us by passing the flipped outputs to the optional <code>outputs_flipped</code> argument:</p>
<pre><code class="language-Python">>>> with torch.no_grad():
... outputs = model(pixel_values)
... outputs_flipped = model(pixel_values=torch.flip(inputs.pixel_values, dims=[3]))
>>> post_processed_output = image_processor.post_process_depth_estimation(
... outputs,
... source_sizes=[image.size[::-1]],
... outputs_flipped=outputs_flipped,
... )
</code></pre>
</Tip> |
You can always try and see how it renders :) |
c330e43 to
0f6c4c6
Compare
|
Yeah, it seems to have done the trick @amyeroberts -- I think we're good!
|
|
@alex-bene Great! Could you resolve the conflicts? Once done I think we're good to merge! cc @qubvel |
qubvel
left a comment
There was a problem hiding this comment.
I'd like to add a comment regarding the target size. I believe the following format improves readability.
0972c23 to
7981b09
Compare
|
@amyeroberts @qubvel I think we're done, but let me know if there's anything else pending. |
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
…Depth's `post_process_depth_estimation`
7981b09 to
c93c415
Compare
|
@qubvel is it now ready to merge? |
|
@qubvel @amyeroberts kind reminder |
|
@alex-bene thanks for ping! cc @ArthurZucker for final review |
ArthurZucker
left a comment
There was a problem hiding this comment.
Thanks! Left a small question but good otherwise!
| >>> depth = predicted_depth * 255 / predicted_depth.max() | ||
| >>> depth = depth.detach().cpu().numpy() | ||
| >>> depth = Image.fromarray(depth.astype("uint8")) |
There was a problem hiding this comment.
while we are at it , is there a reason not to do this in the image processing? (*225, divide by max etc?
| depth = output["predicted_depth"].detach().cpu().numpy() | ||
| depth = (depth - depth.min()) / (depth.max() - depth.min()) | ||
| depth = Image.fromarray((depth * 255).astype("uint8")) |
|
Ok thanks, merging as you answered 🤗 |
|
Hi @alex-bene Thank you for the contribution! There are 2 failing tests after this PR is merged to (GLPN model) FAILED tests/models/glpn/test_modeling_glpn.py::GLPNModelTest::test_pipeline_depth_estimation - AttributeError: 'GLPNImageProcessor' object has no attribute 'post_process_depth_estimation'
FAILED tests/models/glpn/test_modeling_glpn.py::GLPNModelTest::test_pipeline_depth_estimation_fp16 - AttributeError: 'GLPNImageProcessor' object has no attribute 'post_process_depth_estimation'Could you take a look? Thanks in advance. |
|
Hey @ydshieh , that's because this is a new model not available when we initially added the post processing interface for depth models. I'll try to make the necessary changes until ~tomorrow. |
…Depth's inference intricacies (huggingface#32550) * add colorize_depth and matplotlib availability check * add post_process_depth_estimation for zoedepth + tests * add post_process_depth_estimation for DPT + tests * add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth * run `make fixup` * fix import related error on tests * fix more import related errors on test * forgot some `torch` calls in declerations * remove `torch` call in zoedepth tests that caused error * updated docs for depth estimation * small fix for `colorize` input/output types * remove `colorize_depth`, fix various names, remove matplotlib dependency * fix formatting * run fixup * different images for test * update examples in `forward` functions * fixed broken links * fix output types for docs * possible format fix inside `<Tip>` * Readability related updates Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Readability related update * cleanup after merge * refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation` * rewrite dict merging to support python 3.8 --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>


What does this PR do?
This PR adds a
post_process_depth_estimationmethod for the image processors of depth estimation models, similar to thepost_process_semantic_segmentationmethods for the segmentation models. Also, it updates the depth estimation pipeline to use the newpost_process_depth_estimationmethod. Lastly, it adds full support for the ZoeDepth special inference (dynamically padded input + inference of the flipped of the input). A small update to the documentation is pending.Fixes #30917 #32381
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@NielsRogge @amyeroberts @Narsil