Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies #32550

alex-bene · 2024-08-09T02:12:26Z

What does this PR do?

This PR adds a post_process_depth_estimation method for the image processors of depth estimation models, similar to the post_process_semantic_segmentation methods for the segmentation models. Also, it updates the depth estimation pipeline to use the new post_process_depth_estimation method. Lastly, it adds full support for the ZoeDepth special inference (dynamically padded input + inference of the flipped of the input). A small update to the documentation is pending.

Fixes #30917 #32381

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@NielsRogge @amyeroberts @Narsil

alex-bene · 2024-08-13T15:13:37Z

I'm not sure who to tag about this, but I have added to the image_transforms.py file the function colorize_depth, which, given a depth prediction, generates a colored PIL image or numpy image. If this is not the correct place for this function let me know. @NielsRogge

alex-bene · 2024-08-16T08:42:32Z

Hey everyone, @NielsRogge @amyeroberts @Narsil, is there anything missing from this PR so that it can be merged?

amyeroberts

Thanks for all the investigation work and adding this logic!

Most comments are about simplifying the added logic and bringing it more in-line with transformers. Overall looks great!

amyeroberts · 2024-08-13T15:30:56Z

src/transformers/image_transforms.py

The post-processing methods for image processors should follow a similar pattern to the tokenizers i.e. accept and return the same array type (numpy arrays or torch tensors) e.g. like here for batch_decode. No need to bother with tensorflow tensors at the moment as we have so few TF vision models. We shouldn't be converting to PIL.Image.Image

Hey @ArthurZucker , regarding your question, initially the post-processing method returned a PIL image too, however, my understanding is that this functionality did not follow the "usual" transformers conventions. We can easily add it back though.

src/transformers/image_transforms.py

src/transformers/models/zoedepth/image_processing_zoedepth.py

src/transformers/pipelines/depth_estimation.py

src/transformers/models/zoedepth/image_processing_zoedepth.py

alex-bene · 2024-08-27T18:45:36Z

@amyeroberts let me know if there's anything else that needs changing.

amyeroberts

Thanks for the continued work on this!

Mostly nits on formatting and docs

docs/source/en/model_doc/depth_anything.md

docs/source/en/model_doc/depth_anything_v2.md

docs/source/en/model_doc/zoedepth.md

src/transformers/pipelines/depth_estimation.py

src/transformers/models/zoedepth/image_processing_zoedepth.py

alex-bene · 2024-09-12T12:50:22Z

@amyeroberts kind reminder

amyeroberts

Looks great!

The only outstanding question is about the handling of different input image sizes.

amyeroberts · 2024-09-20T09:28:41Z

Linking to the outstanding comment here, in case it's been lost in review, as it's always hard to find again in github: #32550 (comment)

alex-bene · 2024-09-23T17:56:30Z

Hey @amyeroberts , I finished up with the last pending comments; let me know if it seems ok.

HuggingFaceDocBuilderDev · 2024-09-24T10:53:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

alex-bene · 2024-09-24T17:04:06Z

Hey @amyeroberts , in the docs, I have added a code block and a link inside a "tip" block that did not render markdown assets as usual, as seen in the image. Do you have a good idea of how to fix this?

amyeroberts · 2024-09-24T21:12:37Z

@alex-bene Tbh, I'm not sure. I'd suggest just removing the tip tags altogether -- the section follows on directly from the part above, so I don't think this will impact the message in the docs much

amyeroberts

Looks great - thanks for iterating on this!

Just two nits on the docs and fixing the rendering for the top section. Once done we're good to merge!

amyeroberts · 2024-09-24T10:34:12Z

src/transformers/models/dpt/image_processing_dpt.py

This isn't right - it's not returning a list of dictionaries

amyeroberts · 2024-09-24T21:13:19Z

src/transformers/models/zoedepth/image_processing_zoedepth.py

alex-bene · 2024-09-27T18:23:43Z

@alex-bene Tbh, I'm not sure. I'd suggest just removing the tip tags altogether -- the section follows on directly from the part above, so I don't think this will impact the message in the docs much

Hey @amyeroberts , I was thinking of doing something like this if it makes sense:

<Tip>
<p>In the <a href="https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/models/depth_model.py#L131">original implementation</a> ZoeDepth model performs inference on both the original and flipped images and averages out the results. The <code>post_process_depth_estimation</code> function can handle this for us by passing the flipped outputs to the optional <code>outputs_flipped</code> argument:</p>
<pre><code class="language-Python">&gt;&gt;&gt; with torch.no_grad():   
...     outputs = model(pixel_values)
...     outputs_flipped = model(pixel_values=torch.flip(inputs.pixel_values, dims=[3]))
&gt;&gt;&gt; post_processed_output = image_processor.post_process_depth_estimation(
...     outputs,
...     source_sizes=[image.size[::-1]],
...     outputs_flipped=outputs_flipped,
... )
</code></pre>
</Tip>

amyeroberts · 2024-09-27T18:36:04Z

@alex-bene Tbh, I'm not sure. I'd suggest just removing the tip tags altogether -- the section follows on directly from the part above, so I don't think this will impact the message in the docs much

Hey @amyeroberts , I was thinking of doing something like this if it makes sense:

<Tip>
<p>In the <a href="https://github.com/isl-org/ZoeDepth/blob/edb6daf45458569e24f50250ef1ed08c015f17a7/zoedepth/models/depth_model.py#L131">original implementation</a> ZoeDepth model performs inference on both the original and flipped images and averages out the results. The <code>post_process_depth_estimation</code> function can handle this for us by passing the flipped outputs to the optional <code>outputs_flipped</code> argument:</p>
<pre><code class="language-Python">&gt;&gt;&gt; with torch.no_grad():   
...     outputs = model(pixel_values)
...     outputs_flipped = model(pixel_values=torch.flip(inputs.pixel_values, dims=[3]))
&gt;&gt;&gt; post_processed_output = image_processor.post_process_depth_estimation(
...     outputs,
...     source_sizes=[image.size[::-1]],
...     outputs_flipped=outputs_flipped,
... )
</code></pre>
</Tip>

You can always try and see how it renders :)

alex-bene · 2024-09-27T19:12:03Z

Yeah, it seems to have done the trick @amyeroberts -- I think we're good!

amyeroberts · 2024-10-02T12:28:02Z

@alex-bene Great! Could you resolve the conflicts? Once done I think we're good to merge!

cc @qubvel

qubvel

I'd like to add a comment regarding the target size. I believe the following format improves readability.

docs/source/en/model_doc/depth_anything_v2.md

docs/source/en/model_doc/depth_anything.md

docs/source/en/model_doc/zoedepth.md

docs/source/en/tasks/monocular_depth_estimation.md

src/transformers/models/depth_anything/modeling_depth_anything.py

src/transformers/models/dpt/modeling_dpt.py

src/transformers/models/zoedepth/modeling_zoedepth.py

src/transformers/pipelines/depth_estimation.py

alex-bene · 2024-10-02T15:03:31Z

@amyeroberts @qubvel I think we're done, but let me know if there's anything else pending.

Co-authored-by: Pavel Iakubovskii <[email protected]>

…Depth's `post_process_depth_estimation`

qubvel

Thanks for update!

src/transformers/pipelines/depth_estimation.py

alex-bene · 2024-10-02T19:05:14Z

@qubvel is it now ready to merge?

alex-bene · 2024-10-09T16:27:25Z

@qubvel @amyeroberts kind reminder

qubvel · 2024-10-09T17:55:00Z

@alex-bene thanks for ping!

cc @ArthurZucker for final review

ArthurZucker

Thanks! Left a small question but good otherwise!

ArthurZucker · 2024-10-14T15:11:23Z

src/transformers/models/zoedepth/modeling_zoedepth.py

+        >>> depth = predicted_depth * 255 / predicted_depth.max()
+        >>> depth = depth.detach().cpu().numpy()
+        >>> depth = Image.fromarray(depth.astype("uint8"))


while we are at it , is there a reason not to do this in the image processing? (*225, divide by max etc?

Hey @ArthurZucker , check here please.

ArthurZucker · 2024-10-14T15:11:49Z

src/transformers/pipelines/depth_estimation.py

+            depth = output["predicted_depth"].detach().cpu().numpy()
+            depth = (depth - depth.min()) / (depth.max() - depth.min())
+            depth = Image.fromarray((depth * 255).astype("uint8"))


same Q here!

ArthurZucker · 2024-10-22T13:50:51Z

Ok thanks, merging as you answered 🤗

ydshieh · 2024-10-22T14:33:47Z

Hi @alex-bene Thank you for the contribution!

There are 2 failing tests after this PR is merged to main:

(GLPN model)

FAILED tests/models/glpn/test_modeling_glpn.py::GLPNModelTest::test_pipeline_depth_estimation - AttributeError: 'GLPNImageProcessor' object has no attribute 'post_process_depth_estimation'

FAILED tests/models/glpn/test_modeling_glpn.py::GLPNModelTest::test_pipeline_depth_estimation_fp16 - AttributeError: 'GLPNImageProcessor' object has no attribute 'post_process_depth_estimation'

Could you take a look? Thanks in advance.

job run page

alex-bene · 2024-10-22T14:46:53Z

Hey @ydshieh , that's because this is a new model not available when we initially added the post processing interface for depth models. I'll try to make the necessary changes until ~tomorrow.

alex-bene · 2024-10-25T11:56:33Z

Hey @ydshieh , the fix for this is on #34413

…Depth's inference intricacies (huggingface#32550) * add colorize_depth and matplotlib availability check * add post_process_depth_estimation for zoedepth + tests * add post_process_depth_estimation for DPT + tests * add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth * run `make fixup` * fix import related error on tests * fix more import related errors on test * forgot some `torch` calls in declerations * remove `torch` call in zoedepth tests that caused error * updated docs for depth estimation * small fix for `colorize` input/output types * remove `colorize_depth`, fix various names, remove matplotlib dependency * fix formatting * run fixup * different images for test * update examples in `forward` functions * fixed broken links * fix output types for docs * possible format fix inside `<Tip>` * Readability related updates Co-authored-by: Pavel Iakubovskii <[email protected]> * Readability related update * cleanup after merge * refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation` * rewrite dict merging to support python 3.8 --------- Co-authored-by: Pavel Iakubovskii <[email protected]>

alex-bene force-pushed the post-process-depth-estimation branch 2 times, most recently from 4c2bbb4 to 16314f6 Compare August 13, 2024 15:04

NielsRogge requested a review from amyeroberts August 16, 2024 09:39

amyeroberts reviewed Aug 19, 2024

View reviewed changes

alex-bene force-pushed the post-process-depth-estimation branch from 16314f6 to e727238 Compare August 27, 2024 18:36

amyeroberts reviewed Sep 3, 2024

View reviewed changes

alex-bene force-pushed the post-process-depth-estimation branch from e727238 to 32a6b18 Compare September 5, 2024 17:52

amyeroberts reviewed Sep 12, 2024

View reviewed changes

alex-bene force-pushed the post-process-depth-estimation branch from fca0507 to 672df15 Compare September 23, 2024 17:45

alex-bene force-pushed the post-process-depth-estimation branch from 672df15 to 038187b Compare September 24, 2024 17:05

amyeroberts approved these changes Sep 24, 2024

View reviewed changes

alex-bene force-pushed the post-process-depth-estimation branch from c330e43 to 0f6c4c6 Compare September 27, 2024 18:39

qubvel reviewed Oct 2, 2024

View reviewed changes

alex-bene force-pushed the post-process-depth-estimation branch 3 times, most recently from 0972c23 to 7981b09 Compare October 2, 2024 14:54

alex-bene and others added 9 commits October 2, 2024 20:52

different images for test

cdf1634

update examples in forward functions

efd6b1c

fixed broken links

4374a68

fix output types for docs

0bdcc4a

possible format fix inside <Tip>

0cb2efd

Readability related updates

dcd6643

Co-authored-by: Pavel Iakubovskii <[email protected]>

Readability related update

9c47bfd

cleanup after merge

f41c37b

refactor post_process_depth_estimation to return dict; simplify Zoe…

c93c415

…Depth's `post_process_depth_estimation`

alex-bene force-pushed the post-process-depth-estimation branch from 7981b09 to c93c415 Compare October 2, 2024 17:53

qubvel reviewed Oct 2, 2024

View reviewed changes

src/transformers/pipelines/depth_estimation.py Outdated Show resolved Hide resolved

rewrite dict merging to support python 3.8

623cd0d

qubvel approved these changes Oct 9, 2024

View reviewed changes

qubvel requested a review from ArthurZucker October 9, 2024 17:55

ArthurZucker approved these changes Oct 14, 2024

View reviewed changes

ArthurZucker merged commit c31a6ff into huggingface:main Oct 22, 2024

ydshieh mentioned this pull request Oct 22, 2024

skip test_pipeline_depth_estimation temporarily #34316

Merged

pkubiak mentioned this pull request Oct 23, 2024

Predicted depth map incorrectly rendered as an image #34300

Closed

4 tasks

alex-bene mentioned this pull request Oct 25, 2024

Add post_process_depth_estimation for GLPN #34413

Merged

2 tasks

qubvel added Vision Processing labels Oct 31, 2024

RyanJDick mentioned this pull request Nov 27, 2024

Bump transformers to ingest a DepthAnything post-processing fix invoke-ai/InvokeAI#7386

Merged

7 tasks

Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies #32550

Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies #32550

Uh oh!

Conversation

alex-bene commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

alex-bene commented Aug 13, 2024

Uh oh!

alex-bene commented Aug 16, 2024

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts Aug 13, 2024

Choose a reason for hiding this comment

Uh oh!

alex-bene Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alex-bene commented Aug 27, 2024

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alex-bene commented Sep 12, 2024

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts commented Sep 20, 2024

Uh oh!

alex-bene commented Sep 23, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Sep 24, 2024

Uh oh!

alex-bene commented Sep 24, 2024

Uh oh!

amyeroberts commented Sep 24, 2024

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts Sep 24, 2024

Choose a reason for hiding this comment

Uh oh!

amyeroberts Sep 24, 2024

Choose a reason for hiding this comment

Uh oh!

alex-bene commented Sep 27, 2024

Uh oh!

amyeroberts commented Sep 27, 2024

Uh oh!

alex-bene commented Sep 27, 2024

Uh oh!

amyeroberts commented Oct 2, 2024

Uh oh!

qubvel left a comment

alex-bene commented Aug 9, 2024 •

edited

Loading

qubvel commented Oct 9, 2024 •

edited

Loading