-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Fix Policy Based Routing for private gateway static routes #3604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I doubt this was introduced in #3366 - all I did there was change to the proper way of adding rules to the front. Before this change, rules were still added to the front, but this happened multiple times as the code couldn't detect it as a duplicate rule. |
|
Thanks you for helping to find the source of this issue @richardlawley. Or could the issue solved by #3366 have caused the packet mark not to work somehow? |
|
I'm not sure. I made a few changes around the same time for static nat, especially when there are multiple public networks, but some of these were in code branches which don't get hit on a vpc. Unfortunately I'm only running 4.11.2 in production, but have all of my fixes ported into our systemvm iso. If I get a chance I'll see if I can reproduce the problem. |
|
That would be a great help. I think most cs users don't use the private gateways and maybe this nat issue was not noticed? What do you think of the proposed changes? Do you think it breaks functionality? |
|
@rhtyd I looked into the failed test and I think it failed because of:
It seems this is not caused by the changes in this PR. Can you confirm? |
|
should this go on 4.13 as well, @andrijapanicsb @DennisKonrad ? |
|
Makes sense @DaanHoogland |
|
@DennisKonrad I would like to test this manually. Thanks! |
|
@andrijapanicsb I'll describe the setup to you so you can reproduce it. In short: If you use VPC with private Gateways and static NAT, the one VM that the NAT points to isn't able to use the private gateway anymore. This worked up to a change that I cannot pin down exactly and after we updated the NATed VM lost connection to the machines behind the private gateway. Therefor the fix |
|
@DennisKonrad can you change the PR base to 4.13 and rebase against origin/4.13? |
and mark milestone 4.13.1, please? |
|
@DaanHoogland @rhtyd After a lot of pain: |
DaanHoogland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing the rule on source/internal to destination/public makes sense in view of the report. code looks good
|
@blueorangutan package |
|
@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✖centos6 ✔centos7 ✔debian. JID-691 |
|
@blueorangutan test |
|
@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-844)
|
|
tests look good, second pair of eyes, @andrijapanicsb @wido @rhtyd @weizhouapache ? |
weizhouapache
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tested manually
the issue can be reproduced, and fixed with this change.
wido
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
tnx guys |
* 4.13: Fix Policy Based Routing for private gateway static routes (#3604)
* Fix for routing table issue with NAT interfaces * Mark only packets with the public ip as destination
…pache#3604)" This reverts commit 82d94a8.
…pache#3604)" This reverts commit 82d94a8.
* master: (25 commits) integration test: skip vlan of public ip range in get_free_vlan vpc vr: plugin nics by this order: public/private/guest vpc vr: fix Conflicting device id on private gw nic Adding zone name to physicalnetworkresponse (apache#4510) Disallowing udp for lb rules for haproxy (apache#4501) Make global setting non-dynamic (apache#4505) Adding cpuallocated percentage and value to host and hostsformigrationresponse (apache#4499) kvm: fix router.aggregation.command.each.timeout is reset to 600 when update other kvm configs (apache#4496) fix failures with test_multiple_nic_support.py (apache#4495) Fix hosts for migration count (apache#4500) sql: Fix Zones are returned in a random order (apache#3934) (apache#4494) integration test: update steps integration test: add private gateway in test integration test: verify public nics state bugfix apache#9 vpc vr: Add PREROUTING rule for vm with static nat to multiple private gateways bugfix apache#8 vpc: add rule for traffic between vm and private gateway bugfix apache#7 vpc vr: allow servers in private gateway to reach internet via the VPC VR if it is gateway bugfix apache#6 vpc vr: Add iptables rules for ACL of private gateway Revert "Fix Policy Based Routing for private gateway static routes (apache#3604)" Revert "Add private gateway IP to router initialization config" ...
Description
We noticed that VMs that are target of a static nat cannot use private gateways.
Investigation showed that these VMs behave differently than all other VMs in a VPC because their whole traffic gets marked and is only allowed to leave through the public ip provided by the static nat.
Could be related to: #3366
Maybe @richardlawley @rhtyd can contribute if this is an unwanted side-effect.
To not break anything we propose a solution that does only mark the nat portion of the traffic and allow the VM use the private gateway like normal.
Types of changes
How Has This Been Tested?
We built and manually tested this to check if function is restored and nothing else breaks.