Skip to content

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jan 1, 2018

@pitrou
Copy link
Member

pitrou commented Jan 2, 2018

This doesn't look like it's really removing a peephole optimization. Instead, it moves a peephole optimization from the peepholer to the compiler, but the implementation looks similar. At least having the peepholer in a separate file means the compiler doesn't get more complicated by optimizations.

@brettcannon brettcannon removed the request for review from a team March 23, 2018 21:11
@markshannon
Copy link
Member

The changes to importlib.h and importlib_external.h imply that this PR is behaviour changing.
That they shrink suggests that this might be an improvement.

What are the changes? Are there branches that are now removed that weren't before?

@methane
Copy link
Member

methane commented Dec 3, 2018

@markshannon This is diff of importlib dis: http://paste.ubuntu.com/p/4yn278YDrV/

This PR removes unused codes after RETURN_VALUE. It merges two step unconditional JUMPs into one JUMP.

@vstinner
Copy link
Member

vstinner commented Dec 6, 2018

What's the status of this PR?

@serhiy-storchaka asked me to not fix compiler warnings with my PR #10652 before this PR is merged.

@pitrou doesn't seem to like the overall idea:

I'm not sure if @rhettinger likes the idea (but his comment is not really on the PR itself):

I understand that this change allows to remove more deadcode. Would it be possible to get the same effect by organizing the peephole optimizer differently? For example, would we also remove more deadcode by running the peephole optimizer twice? I'm trying to understand if removing more deadcode can only be done by moving the jump optimization inside the compiler.

I have no opinion on this PR at this point :-)

@vstinner
Copy link
Member

vstinner commented Dec 6, 2018

EDIT: _module_repr_from_spec() code was wrong, I fixed it.

I tested on this Python code:

def _module_repr_from_spec(spec):
    """Return the repr to use for the module."""
    # We mostly replicate _module_repr() using the spec attributes.
    name = '?' if spec.name is None else spec.name
    if spec.origin is None:
        if spec.loader is None:
            return '<module {!r}>'.format(name)
        else:
            return '<module {!r} ({!r})>'.format(name, spec.loader)
    else:
        if spec.has_location:
            return '<module {!r} from {!r}>'.format(name, spec.origin)
        else:
            return '<module {!r} ({})>'.format(spec.name, spec.origin)

I ran the peephole optimizer twice:

diff --git a/Python/compile.c b/Python/compile.c
index 45e78cb22c..7d361c6af0 100644
--- a/Python/compile.c
+++ b/Python/compile.c
@@ -5577,6 +5577,13 @@ makecode(struct compiler *c, struct assembler *a)
     if (!bytecode)
         goto error;
 
+    PyObject *bytecode2 = PyCode_Optimize(bytecode, consts, names, a->a_lnotab);
+    Py_CLEAR(bytecode);
+    bytecode = bytecode2;
+    if (!bytecode) {
+        goto error;
+    }
+
     tmp = PyList_AsTuple(consts); /* PyCode_New requires a tuple */
     if (!tmp)
         goto error;

The bytecode is different.

Before:

             60 CALL_METHOD              2
             62 RETURN_VALUE
             64 JUMP_FORWARD            36 (to 102)

 13     >>   66 LOAD_FAST                0 (spec)
             68 LOAD_ATTR                4 (has_location)
             70 POP_JUMP_IF_FALSE       86

After:

             60 CALL_METHOD              2
             62 RETURN_VALUE

 13     >>   64 LOAD_FAST                0 (spec)
             66 LOAD_ATTR                4 (has_location)
             68 POP_JUMP_IF_FALSE       84

=> the JUMP_FORWARD has been removed: it was deadcode. As a side effect, offsets and jump targets are different. Example: LOAD_FAST moves from offset 66 to 62, and "POP_JUMP_IF_FALSE 86" becomes "POP_JUMP_IF_FALSE 84".

The question is now if we can modify the peephole to get the same result when it's only run once, or if we have to move the jump optimization to the compiler.

@vstinner
Copy link
Member

vstinner commented Dec 6, 2018

I can be wrong, but it seems like the optimizer doesn't remove basic blocks which are deadcode.

I implemented such optimization in my peepholer optimizer (for Python bytecode) written in pure Python:

https://github.com/vstinner/bytecode/blob/aa2d29f018cfd4a5fc0dea5503eb280a5dfd37ab/bytecode/peephole_opt.py#L436-L454

peephole.c uses an hash table: offset => block index. My bytecode project uses an additiona "next_block" information. It is used in the dead code removal to follow the chain of blocks and check which basic blocks are dead code.

compile.c uses a different structure for basic blocks than peephole.c:

typedef struct basicblock_ {
    ...
    /* If b_next is non-NULL, it is a pointer to the next
       block reached by normal control flow. */
    struct basicblock_ *b_next;
    ...
} basicblock;

Maybe this one can be used to detect unreachable basic blocks?

I would suggest to leave peephole.c as it is, but try to remove unreachable basic blocks in compile.c.

What do you think @serhiy-storchaka?

@vstinner
Copy link
Member

vstinner commented Dec 6, 2018

Note: the pure Python peephole optimizer has a few doc at https://bytecode.readthedocs.io/en/latest/peephole.html#optimizations

@vstinner
Copy link
Member

vstinner commented Dec 7, 2018

I discussed with @serhiy-storchaka on IRC, he told me:

  • We need to optimize jumps before removing unreachable blocks, because jumps optimization can create unreachable blocks.
  • We can not remove some instructions that are used in frame_setlineno() to determining the end of logical blocks.

For the first point, I think that it's fine. We can organize PyCode_Optimize() to first optimize jumps, and only later remove unreachable blocks.

For the second point, maybe that's a reference to this comment in peephole.c?

                /* END_FINALLY should be kept since it denotes the end of
                   the 'finally' block in frame_setlineno() in frameobject.c.
                   SETUP_FINALLY should be kept for balancing.
                 */
                while (h < codelen && ISBASICBLOCK(blocks, i, h) &&
                       _Py_OPCODE(codestr[h]) != END_FINALLY)
                { ... }

@markshannon
Copy link
Member

I'm closing this, as it is obsolete.

@markshannon markshannon closed this Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting merge performance Performance or resource usage skip news

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants