Skip to content

Conversation

@alex-bene
Copy link
Contributor

@alex-bene alex-bene commented Aug 9, 2024

What does this PR do?

This PR adds a post_process_depth_estimation method for the image processors of depth estimation models, similar to the post_process_semantic_segmentation methods for the segmentation models. Also, it updates the depth estimation pipeline to use the new post_process_depth_estimation method. Lastly, it adds full support for the ZoeDepth special inference (dynamically padded input + inference of the flipped of the input). A small update to the documentation is pending.

Fixes #30917 #32381

Before submitting

Who can review?

@NielsRogge @amyeroberts @Narsil

@alex-bene alex-bene force-pushed the post-process-depth-estimation branch 2 times, most recently from 4c2bbb4 to 16314f6 Compare August 13, 2024 15:04
@alex-bene
Copy link
Contributor Author

I'm not sure who to tag about this, but I have added to the image_transforms.py file the function colorize_depth, which, given a depth prediction, generates a colored PIL image or numpy image. If this is not the correct place for this function let me know. @NielsRogge

@alex-bene
Copy link
Contributor Author

Hey everyone, @NielsRogge @amyeroberts @Narsil, is there anything missing from this PR so that it can be merged?

@NielsRogge NielsRogge requested a review from amyeroberts August 16, 2024 09:39
Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the investigation work and adding this logic!

Most comments are about simplifying the added logic and bringing it more in-line with transformers. Overall looks great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The post-processing methods for image processors should follow a similar pattern to the tokenizers i.e. accept and return the same array type (numpy arrays or torch tensors) e.g. like here for batch_decode. No need to bother with tensorflow tensors at the moment as we have so few TF vision models. We shouldn't be converting to PIL.Image.Image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ArthurZucker , regarding your question, initially the post-processing method returned a PIL image too, however, my understanding is that this functionality did not follow the "usual" transformers conventions. We can easily add it back though.

@alex-bene alex-bene force-pushed the post-process-depth-estimation branch from 16314f6 to e727238 Compare August 27, 2024 18:36
@alex-bene
Copy link
Contributor Author

@amyeroberts let me know if there's anything else that needs changing.

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued work on this!

Mostly nits on formatting and docs

@alex-bene alex-bene force-pushed the post-process-depth-estimation branch from e727238 to 32a6b18 Compare September 5, 2024 17:52
@alex-bene
Copy link
Contributor Author

@amyeroberts kind reminder

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

The only outstanding question is about the handling of different input image sizes.

@amyeroberts
Copy link
Contributor

Linking to the outstanding comment here, in case it's been lost in review, as it's always hard to find again in github: #32550 (comment)

@alex-bene alex-bene force-pushed the post-process-depth-estimation branch from fca0507 to 672df15 Compare September 23, 2024 17:45
@alex-bene
Copy link
Contributor Author

Hey @amyeroberts , I finished up with the last pending comments; let me know if it seems ok.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@alex-bene
Copy link
Contributor Author

Hey @amyeroberts , in the docs, I have added a code block and a link inside a "tip" block that did not render markdown assets as usual, as seen in the image. Do you have a good idea of how to fix this?

image

@alex-bene alex-bene force-pushed the post-process-depth-estimation branch from 672df15 to 038187b Compare September 24, 2024 17:05
@amyeroberts
Copy link
Contributor

@alex-bene Tbh, I'm not sure. I'd suggest just removing the tip tags altogether -- the section follows on directly from the part above, so I don't think this will impact the message in the docs much

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - thanks for iterating on this!

Just two nits on the docs and fixing the rendering for the top section. Once done we're good to merge!

Comment on lines 492 to 493
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right - it's not returning a list of dictionaries

Comment on lines 482 to 483
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@alex-bene
Copy link
Contributor Author

@alex-bene Tbh, I'm not sure. I'd suggest just removing the tip tags altogether -- the section follows on directly from the part above, so I don't think this will impact the message in the docs much

Hey @amyeroberts , I was thinking of doing something like this if it makes sense:

<Tip>
<p>In the <a href="https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/models/depth_model.py#L131">original implementation</a> ZoeDepth model performs inference on both the original and flipped images and averages out the results. The <code>post_process_depth_estimation</code> function can handle this for us by passing the flipped outputs to the optional <code>outputs_flipped</code> argument:</p>
<pre><code class="language-Python">&gt;&gt;&gt; with torch.no_grad():   
...     outputs = model(pixel_values)
...     outputs_flipped = model(pixel_values=torch.flip(inputs.pixel_values, dims=[3]))
&gt;&gt;&gt; post_processed_output = image_processor.post_process_depth_estimation(
...     outputs,
...     source_sizes=[image.size[::-1]],
...     outputs_flipped=outputs_flipped,
... )
</code></pre>
</Tip>

@amyeroberts
Copy link
Contributor

@alex-bene Tbh, I'm not sure. I'd suggest just removing the tip tags altogether -- the section follows on directly from the part above, so I don't think this will impact the message in the docs much

Hey @amyeroberts , I was thinking of doing something like this if it makes sense:

<Tip>
<p>In the <a href="https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/models/depth_model.py#L131">original implementation</a> ZoeDepth model performs inference on both the original and flipped images and averages out the results. The <code>post_process_depth_estimation</code> function can handle this for us by passing the flipped outputs to the optional <code>outputs_flipped</code> argument:</p>
<pre><code class="language-Python">&gt;&gt;&gt; with torch.no_grad():   
...     outputs = model(pixel_values)
...     outputs_flipped = model(pixel_values=torch.flip(inputs.pixel_values, dims=[3]))
&gt;&gt;&gt; post_processed_output = image_processor.post_process_depth_estimation(
...     outputs,
...     source_sizes=[image.size[::-1]],
...     outputs_flipped=outputs_flipped,
... )
</code></pre>
</Tip>

You can always try and see how it renders :)

@alex-bene alex-bene force-pushed the post-process-depth-estimation branch from c330e43 to 0f6c4c6 Compare September 27, 2024 18:39
@alex-bene
Copy link
Contributor Author

Yeah, it seems to have done the trick @amyeroberts -- I think we're good!

image

@amyeroberts
Copy link
Contributor

@alex-bene Great! Could you resolve the conflicts? Once done I think we're good to merge!

cc @qubvel

Copy link
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to add a comment regarding the target size. I believe the following format improves readability.

@alex-bene alex-bene force-pushed the post-process-depth-estimation branch 3 times, most recently from 0972c23 to 7981b09 Compare October 2, 2024 14:54
@alex-bene
Copy link
Contributor Author

@amyeroberts @qubvel I think we're done, but let me know if there's anything else pending.

@alex-bene alex-bene force-pushed the post-process-depth-estimation branch from 7981b09 to c93c415 Compare October 2, 2024 17:53
Copy link
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for update!

@alex-bene
Copy link
Contributor Author

@qubvel is it now ready to merge?

@alex-bene
Copy link
Contributor Author

@qubvel @amyeroberts kind reminder

@qubvel
Copy link
Contributor

qubvel commented Oct 9, 2024

@alex-bene thanks for ping!

cc @ArthurZucker for final review

@qubvel qubvel requested a review from ArthurZucker October 9, 2024 17:55
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left a small question but good otherwise!

Comment on lines +1350 to +1352
>>> depth = predicted_depth * 255 / predicted_depth.max()
>>> depth = depth.detach().cpu().numpy()
>>> depth = Image.fromarray(depth.astype("uint8"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we are at it , is there a reason not to do this in the image processing? (*225, divide by max etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ArthurZucker , check here please.

Comment on lines +116 to +118
depth = output["predicted_depth"].detach().cpu().numpy()
depth = (depth - depth.min()) / (depth.max() - depth.min())
depth = Image.fromarray((depth * 255).astype("uint8"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same Q here!

@ArthurZucker
Copy link
Collaborator

Ok thanks, merging as you answered 🤗

@ArthurZucker ArthurZucker merged commit c31a6ff into huggingface:main Oct 22, 2024
@ydshieh
Copy link
Collaborator

ydshieh commented Oct 22, 2024

Hi @alex-bene Thank you for the contribution!

There are 2 failing tests after this PR is merged to main:

(GLPN model)

FAILED tests/models/glpn/test_modeling_glpn.py::GLPNModelTest::test_pipeline_depth_estimation - AttributeError: 'GLPNImageProcessor' object has no attribute 'post_process_depth_estimation'

FAILED tests/models/glpn/test_modeling_glpn.py::GLPNModelTest::test_pipeline_depth_estimation_fp16 - AttributeError: 'GLPNImageProcessor' object has no attribute 'post_process_depth_estimation'

Could you take a look? Thanks in advance.

job run page

@alex-bene
Copy link
Contributor Author

Hey @ydshieh , that's because this is a new model not available when we initially added the post processing interface for depth models. I'll try to make the necessary changes until ~tomorrow.

@alex-bene
Copy link
Contributor Author

Hey @ydshieh , the fix for this is on #34413

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
…Depth's inference intricacies (huggingface#32550)

* add colorize_depth and matplotlib availability check

* add post_process_depth_estimation for zoedepth + tests

* add post_process_depth_estimation for DPT + tests

* add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth

* run `make fixup`

* fix import related error on tests

* fix more import related errors on test

* forgot some `torch` calls in declerations

* remove `torch` call in zoedepth tests that caused error

* updated docs for depth estimation

* small fix for `colorize` input/output types

* remove `colorize_depth`, fix various names, remove matplotlib dependency

* fix formatting

* run fixup

* different images for test

* update examples in `forward` functions

* fixed broken links

* fix output types for docs

* possible format fix inside `<Tip>`

* Readability related updates

Co-authored-by: Pavel Iakubovskii <[email protected]>

* Readability related update

* cleanup after merge

* refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation`

* rewrite dict merging to support python 3.8

---------

Co-authored-by: Pavel Iakubovskii <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add post_process_depth_estimation to image processors

7 participants