Skip to content

Conversation

@michael-lazar
Copy link
Contributor

@michael-lazar michael-lazar commented Feb 17, 2018

The RobotFileParser's string representation was incomplete and missing some valid rule lines.

https://bugs.python.org/issue32861

https://bugs.python.org/issue32861

The RobotFileParser's string representation was incomplete and missing
some valid rule lines.
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have added just few suggestions. And the two unnecessary trailing newlines should be kept in maintained releases.

Please add a news entry.


def __str__(self):
return ''.join([str(entry) + "\n" for entry in self.entries])
ret = [str(entry) for entry in self.entries]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be faster:

entries = self.entries
if self.default_entry is not None:
    entries = entries + [self.default_entry]
return '\n\n'.join(map(str, entries))

ret = []
for agent in self.useragents:
ret.extend(["User-agent: ", agent, "\n"])
ret.append("User-agent: {0}".format(agent))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f-strings can be used in 3.6+.

for line in self.rulelines:
ret.extend([str(line), "\n"])
return ''.join(ret)
ret.append(str(line))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just

ret.extend(map(str, self.rulelines))

@michael-lazar
Copy link
Contributor Author

michael-lazar commented Apr 3, 2018

All suggestions have been implemented, I have no strong opinions on any of them. I also added a news entry.

Is backporting part of this PR, or do you merge this into 3.8 and then create separate issues for the other python branches? Is that something that I can help with?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Just add your credits.

@@ -0,0 +1,3 @@
The urllib.robotparser's ``__str__`` representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra
newlines that were being appended to the end of the string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add "Patch by yourname." and add your name into Misc/ACKS.

@michael-lazar
Copy link
Contributor Author

Cool, I added my name to the news entry. I'm already in the ACKS file so all good there.

@serhiy-storchaka serhiy-storchaka merged commit bd08a0a into python:master May 14, 2018
@miss-islington
Copy link
Contributor

Thanks @michael-lazar for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6, 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 14, 2018
…GH-5711)

The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra
newlines that were being appended to the end of the string.
(cherry picked from commit bd08a0a)

Co-authored-by: Michael Lazar <[email protected]>
@bedevere-bot
Copy link

GH-6795 is a backport of this pull request to the 3.7 branch.

@miss-islington
Copy link
Contributor

Sorry, @michael-lazar and @serhiy-storchaka, I could not cleanly backport this to 2.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker bd08a0af2d88c590ede762102bd42da3437e9980 2.7

@bedevere-bot
Copy link

GH-6796 is a backport of this pull request to the 3.6 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 14, 2018
…GH-5711)

The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra
newlines that were being appended to the end of the string.
(cherry picked from commit bd08a0a)

Co-authored-by: Michael Lazar <[email protected]>
serhiy-storchaka pushed a commit that referenced this pull request May 14, 2018
…H-5711) (GH-6795)

The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields.
(cherry picked from commit bd08a0a)

Co-authored-by: Michael Lazar <[email protected]>
miss-islington added a commit to miss-islington/cpython that referenced this pull request May 14, 2018
…ythonGH-5711) (pythonGH-6795)

The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields.
(cherry picked from commit bd08a0a)

Co-authored-by: Michael Lazar <[email protected]>
(cherry picked from commit c3fa1f2)

Co-authored-by: Miss Islington (bot) <[email protected]>
serhiy-storchaka pushed a commit to serhiy-storchaka/cpython that referenced this pull request May 14, 2018
…ythonGH-5711) (pythonGH-6795)

The robotparser's __str__ representation now includes wildcard
entries.
(cherry picked from commit c3fa1f2)

Co-authored-by: Michael Lazar <[email protected]>.
miss-islington added a commit to miss-islington/cpython that referenced this pull request May 14, 2018
…ythonGH-5711) (pythonGH-6795)

The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields.
(cherry picked from commit bd08a0a)

Co-authored-by: Michael Lazar <[email protected]>
(cherry picked from commit c3fa1f2)

Co-authored-by: Miss Islington (bot) <[email protected]>
serhiy-storchaka pushed a commit that referenced this pull request May 14, 2018
…H-5711) (GH-6795) (GH-6818)

The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields.
(cherry picked from commit c3fa1f2)

Co-authored-by: Michael Lazar <[email protected]>
serhiy-storchaka added a commit that referenced this pull request May 14, 2018
GH-6795) (GH-6817)

The robotparser's __str__ representation now includes wildcard
entries.
(cherry picked from commit c3fa1f2)

Co-authored-by: Michael Lazar <[email protected]>.
@serhiy-storchaka serhiy-storchaka removed their assignment Dec 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants