3

I couldn't find any good explanation about YOLOv3 SPP which has better mAP than YOLOv3. The author himself states YOLOv3 SPP as this on his repo:

YOLOv3 with spatial pyramid pooling, or something

But still I don't really understand it. In yolov3-spp.cfg I notice there are some additions

575 ### SPP ###
576 [maxpool]
577 stride=1
578 size=5
579 
580 [route]
581 layers=-2
582 
583 [maxpool]
584 stride=1
585 size=9
586 
587 [route]
588 layers=-4
589 
590 [maxpool]
591 stride=1
592 size=13
593 
594 [route]
595 layers=-1,-3,-5,-6
596 
597 ### End SPP ###
598 
599 [convolutional]
600 batch_normalize=1
601 filters=512
602 size=1
603 stride=1
604 pad=1
605 activation=leaky

Anybody can give further explanation about how YOLOv3 SPP works? Why layers -2, -4 and -1, -3, -5, -6 are chosen in [route] layers? Thanks.

2 Answers 2

11

Finally some researchers published a paper about SPP application in Yolo https://arxiv.org/abs/1903.08589.

For yolov3-tiny, yolov3, and yolov3-spp differences :

  • yolov3-tiny.cfg uses downsampling (stride=2) in Max-Pooling layers
  • yolov3.cfg uses downsampling (stride=2) in Convolutional layers
  • yolov3-spp.cfg uses downsampling (stride=2) in Convolutional layers + gets the best features in Max-Pooling layers

But they got only mAP = 79.6% on Pascal VOC 2007 test with using Yolov3SPP-model on original framework.

But we can achive higher accuracy mAP = 82.1% even with yolov3.cfg model by using AlexeyAB's repository https://github.com/AlexeyAB/darknet/issues/2557#issuecomment-474187706

And for sure we can achieve even higher mAP with yolov3-spp.cfg using Alexey's repo. enter image description here

Original github question : https://github.com/AlexeyAB/darknet/issues/2859

Sign up to request clarification or add additional context in comments.

Comments

5

See Figure 3. SPP explanation.

In yolov3-spp.cfg, they use 3 different size max pool to the same image by using [route]

After then, they collect created feature map as called "fixed-length representation" in Figure 3.

enter image description here

2 Comments

Thanks, can you explain a lil bit more why route layers -2, -4 and -1,-3,-5,-6? I know how route layer works, I'm just not sure why those above layers are chosen for applying spp
@gameon67, -2, -4 layers mean feature maps of conv5 in figure 3. -1 layer : gray one, -3 layer : green one, -5 layer : blue one, -6 layer : feature maps of conv5 in figure 3. I think -6 layer could be a role like residual link in fixed-length representation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.