6

I am using scrapy to crawl multiple websites, and I want to analyze the crawling rate. The stats dumped at the end contain a downloader/response_count value and a response_received_count value. The former is systematically greater than the latter.

Why is there a difference and what element of the crawler does increment the two values in the stats collector?

0

1 Answer 1

8
  • CoreStats is the Extension responsible for response_received_count
  • DownloaderStats is the Middleware responsible for downloader/response_count.

CoreStats extension is connecting the signal of signals.response_received to incrementing the value of response_received_count, so it should count every response that you get (even bad statuses), whilst DownloaderStats middleware processes the response on a specific order as we can see here its order is 850, so previous Downloader Middlewares (ones set with a number lower than 850 could drop or even get errors processing the response, and the downloader/response_count would never be increased.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.