-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[BLOCKER] Combined PRs that fix VR issues #887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STOP-START/RECREATE
The ongoing ICMP request reply session is broken when the VR is down, the expectation is that it would resume once the VR is up. Investigations revealed that the ongoing ICMP packets are sent out of eth2 without being NATed post VR stop/start or restart or recreate.
TCPDUMP output from VR post restart/stop-start/recreate on eth2:
root@r-4-VM:~# tcpdump -i eth2 icmp -n -vvv
tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
06:22:52.749770 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 81, length 64
06:22:53.749782 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 82, length 64
06:22:54.749771 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 83, length 64
06:22:55.749775 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 84, length 64
06:22:56.749765 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 85, length 64
06:22:57.749776 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 86, length 64
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel
root@r-4-VM:~#
root@r-4-VM:~# grep icmp /proc/net/ip_conntrack
icmp 1 29 src=192.168.200.67 dst=173.194.33.163 type=8 code=0 id=30996 [UNREPLIED] src=173.194.33.163 dst=192.168.200.67 type=0 code=0 id=30996 mark=0 use=2
This get fixed after flushing the conntrack table.
Screenshots:
Before fix (ping session doesn't resume, stop and starting the ping works, 120 packets lost):

After fix(ping session resumes, 27 packets lost):

* pr/836:
CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STOP-START/RECREATE
Signed-off-by: Remi Bergsma <[email protected]>
CLOUDSTACK-8864: Not able to add TCP port forwarding rule in VPN for specific ports Setting port forwarding rules for port 500,1701 and 4500 after enabling VPN, gives the error message "The range specified, xxxx, conflicts with rule xxxx which has xxxx." This happens because the rules added for vpn doesn't have a matching condition to allow port forwarding rules. Added a unit test to verify the detectRulesConflict function of FirewallManagerImpl. * pr/851: CLOUDSTACK-8864: Not able to add TCP port forwarding rule in VPN for specific ports Signed-off-by: Remi Bergsma <[email protected]>
CLOUDSTACK-8891: Fixed default iptables rules on VR for guest trafficVR default iptables rules in INPUT chain are configured partially. In CsAddress.py rules are configured while configuring public interface, guest interface post configuration is missed. Fixed to configure guest post configuration so that iptables rules are configured. Testing: 1. Deployed vm in the network. 2.iptables rules on the VR configured correctly. 3.VM got the dhcp ip address from the VR. * pr/867: CLOUDSTACK-8891: Fixed default iptables rules on VR for guest traffic Signed-off-by: Remi Bergsma <[email protected]>
Configured dnsmasq to listen on all interfaces so that vpn client gets dns1. Dnsmasq is not listening on the ppp+ interfaces due to this remote access vpn clients dns requests are dropped. 2. Configured the dnsmasq to listen on all the interfaces except public. There is firewall to allow only specific cidr to allow the dns requests. Tested from windows client nslookup. * pr/870: Configured dnsmasq to listen on all interfaces so that vpn client gets dns Signed-off-by: Remi Bergsma <[email protected]>
CLOUDSTACK-8905: Fixed hooking egress rulesAdded hooking the FIREWALL_EGRESS_RULES chain into FW_OUTBOUND chain. With this egress rules will effective. * pr/881: CLOUDSTACK-8905: Fixed hooking egress rules Signed-off-by: Remi Bergsma <[email protected]>
CLOUDSTACK-8881: Fixed Static and PF configuration issue1. For static nat filter rules are not configured in VR. 2. Corrected vm ip in PF rule. * pr/882: CLOUDSTACK-8881: Fixed Static and PF configuration issue Signed-off-by: Remi Bergsma <[email protected]>
CLOUDSTACK-8843: Fixed issue in default iptables rules on shared network VROn basic zone share network VR default iptables rules are not applied correctly. Due to this ssh to VR got failed.
In shared network the VR type is 'dhcpsrvr' not router. So corrected it in the ''del_standard' method to select the correct type.
Testing:
1. VR is deployed correctly.
2. Tested restart, stop, start VR.
3. New VM deployment is success.
4. ssh to VR from the host is successful.
5. iptables rules on the VR came up correctly.
below is the output from the VR:
iptables -L INPUT -nv
Chain INPUT (policy DROP 16 packets, 1056 bytes)
pkts bytes target prot opt in out source destination
0 0 ACCEPT all -- * * 0.0.0.0/0 224.0.0.18
0 0 ACCEPT all -- * * 0.0.0.0/0 225.0.0.50
104 9800 ACCEPT all -- eth0 * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
281 36500 ACCEPT all -- eth1 * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
0 0 ACCEPT all -- eth2 * 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
6 504 ACCEPT icmp -- * * 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0
2 656 ACCEPT udp -- eth0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:67
13 780 ACCEPT tcp -- eth1 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:3922 state NEW,ESTABLISHED
0 0 ACCEPT tcp -- eth0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:80 state NEW
0 0 ACCEPT tcp -- eth0 * 10.147.40.0/23 0.0.0.0/0 state NEW tcp dpt:8080
* pr/842:
CLOUDSTACK-8843: Fixed issue in default iptables rules on shared network VR
Signed-off-by: Remi Bergsma <[email protected]>
|
cloudstack-pull-rats #726 SUCCESS |
|
cloudstack-pull-analysis #675 SUCCESS |
|
@jayapalu @SudharmaJain can you check this asap, as it merges all your past PRs in one and rebases with master. thanks |
6efa038 to
91f36cf
Compare
|
cloudstack-pull-rats #730 SUCCESS |
|
cloudstack-pull-rats #731 SUCCESS |
|
cloudstack-pull-rats #732 SUCCESS |
4e3ce4d to
c15b724
Compare
|
cloudstack-pull-rats #733 SUCCESS |
|
cloudstack-pull-rats #734 SUCCESS |
|
cloudstack-pull-rats #735 SUCCESS |
|
cloudstack-pull-analysis #679 SUCCESS |
|
cloudstack-pull-analysis #680 SUCCESS |
|
cloudstack-pull-analysis #681 ABORTED |
|
cloudstack-pull-analysis #682 SUCCESS |
|
cloudstack-pull-analysis #683 SUCCESS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to not have the try catch structure in a test. If a test throws an exceptions it will automatically be handled as failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I did not write it, I would say that if the exception happens, and was not expected, the test should fail, as the code does it. Another way to do this would be to expect the exception and check if the exception thrown is the one expected, then the test would pass, given the expectation.
Given the code above, it's okay to fail it. It's like the person who wrote is saying: if an exception happens, then fail it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is syntactically correct. Though this way you remove the error message from the test failure. It just fails because it hits a fail(). If the exception is thrown without the try catch then if the test fails it will show why.
|
cloudstack-pull-rats #739 SUCCESS |
|
cloudstack-pull-analysis #688 SUCCESS |
|
ping @remibergsma @karuturi @borisroman @miguelaferreira @Runseb I executed a few Marvin tests but some manual tests. The changes are working fine and will fix quite a few issues. @remibergsma also executed his CloudMonkey VPN tests, which can be found here: https://github.com/schubergphilis/MCT-shared/tree/master/helper_scripts/cloudstack/vpn_tests All went fine and you can simply execute them as well! 👍 LGTM Manual tests:
VPC outgoing traffic
Automated tests executed: test_vpc_routers Test start/stop of router after addition of one guest network ... === TestName: test_01_start_stop_router_after_addition_of_one_guest_network | Status : SUCCESS === Ran 8 tests in 1107.930s OK Test create VPC offering ... === TestName: test_01_create_vpc_offering | Status : SUCCESS === Ran 8 tests in 1486.109s OK Test advanced zone virtual router ... === TestName: test_advZoneVirtualRouter | Status : SUCCESS === Ran 10 tests in 1636.117s OK (SKIP=1) Test VPN in VPC ... === TestName: test_vpc_remote_access_vpn | Status : SUCCESS === Ran 2 tests in 846.637s OK VPN remote access user limit tests ... === TestName: test_01_VPN_user_limit | Status : SUCCESS === Ran 7 tests in 2295.400s OK Test iptables default INPUT/FORWARD policy on RouterVM ... === TestName: test_02_routervm_iptables_policies | Status : SUCCESS === Ran 2 tests in 908.229s OK |
|
cloudstack-pull-rats #740 SUCCESS |
|
cloudstack-pull-analysis #689 SUCCESS |
|
Update: we found a bug introduced in PR #784 (that we merge here) that needs resolving. Working on that with @wilderrodrigues as we speak so don't merge yet. |
|
One the above mentioned bug is also fixed, this is what we achieved: Blockers fixed: Critical fixed: |
76d1524 to
0de25ac
Compare
|
cloudstack-pull-rats #741 SUCCESS |
|
cloudstack-pull-analysis #690 SUCCESS |
0de25ac to
09e05f2
Compare
|
cloudstack-pull-rats #743 SUCCESS |
|
@wilderrodrigues and I spent yesterday testing and fixing small issues. I just (force) pushed the commits we that we tested against over night. The following automated tests were run to verify the final change: Next to that, we setup an environment where we created two VPCs, each with one tier and one VM in it. Then created a site2site VPN between them. Both VMs can now reach each other over their internal ip addresses (source code link mentioned above). This was done on both KVM and XenServer. Note: if you test on KVM, please build agent rpm packages from this brancha nd install them prior to adding the KVM hypervisor. Hint: While Travis and Apache Jenkins do another build, we'll run some extra Marvin tests. Pinging @wilderrodrigues @borisroman to do their final reviews. Thanks! Goal is to merge it today. |
|
cloudstack-pull-analysis #692 SUCCESS |
|
Thanks for all the debugging and testing @remibergsma and @wilderrodrigues |
|
Yes, @Rajan! It will be stable today! I'm running the last test as we speak and will add the complete test report on the PR, so everybody can see what we have worked on and what's now fixed and back on track! Please, stay around for the next 20min so you can have a look and give your LATM - Looks Awesome to Me - as well. :) Cheers, |
|
Hi @wilderrodrigues @remibergsma, I've tested: All returning success. Based on which I will 👍 LGTM for this PR. Hopefully we will resolve the other bugs in the near future. |
|
Hi @remibergsma, @karuturi and @borisroman , Finally got most of the ACS working as it was 4 weeks ago! Below the tests I executed manually and via Marvin. It all looks fine! However, there is still 1 thing that nobody noticed - because they don't test - which concerns the Redundant VPC routers: we cannot ssh into VM in a rVPC, unfortunately! @isoutham wrote a nice test for that (component/test_vpc_redundnat.py) which was working really fine. Unfortunately I tried it now and also some manual tests, but the rVPC no longer works. I will create an issue for that and try to get it fixed. I would like to suggest that for any router related changes the following tests are always executed:
The PR 887 LATM - Look AWESOME to Me! 💯 👍 Cheers, P.S.: I already executed all tests below, but I will continue for a bit more and will edit the PR comment later. Please, go ahead and merge it!
Manual tests
"Marvin tests* |
[BLOCKER] Combined PRs that fix VR issuesTonight I worked with @wilderrodrigues to figure out what is wrong with the virtual router. As we couldn't test single PRs any more (because of other issues with them causing tests to fail) we added all VR related PRs in a separate branch and started testing from there. We combined the following PRs into this PR: #836 #851 #867 #870 #881 #882 #842 After that, one issue remains: the VPC does not get a default gateway. Which is strange, because we already solved it in PR #738. When I look back, it was fixed again in PR #784. It could very well be that either one fixed one specific case, but also breaking the other. We need to investigate this, and make sure there will be a fix that works both for VPCs and VRs. When we manually add the default gateway on the VPC, most tests pass and also spinning up two VPCs with one tier each, having a VM and them using s2s to VPN them together works fine. See for more details the report Wilder sent earlier. Tomorrow we'll try to figure out how to fix the default gateway and merge this. Then we should have a base to work from again. Any PR that fixes another blocker, should at least then be rebased against the fixed master so we can run the tests against the PR branch. I'm not saying everything is fixed, I'm just saying that we can spin up a cloud that has working VMs. When, in the mean time, someone has the time to checkout this branch and make the default route work for both VPC and VR that would be awesome. After that we should double check and verify the test results. Pinging @karuturi to let her know the current status. Regards, Wilder / Remi * pr/887: Fixing the index out of bounds error in the check_if_link_up() function small cleanups Fixing the defaut route for VPC routers Formatting the get_gateway() method in the CsDatabag.py file Fixing the dhcpsrvr iptables file Formatting the router_proxy.sh script CLOUDSTACK-8881: Fixed Static and PF configuration issue CLOUDSTACK-8905: Fixed hooking egress rules CLOUDSTACK-8891: Fixed default iptables rules on VR for guest traffic Configured dnsmasq to listen on all interfaces so that vpn client gets dns CLOUDSTACK-8864: Not able to add TCP port forwarding rule in VPN for specific ports CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STOP-START/RECREATE CLOUDSTACK-8843: Fixed issue in default iptables rules on shared network VR Signed-off-by: Remi Bergsma <[email protected]>
…the vm's zone (#887) Custom component for start VM for admins Signed-off-by: Rohit Yadav <[email protected]>
Tonight I worked with @wilderrodrigues to figure out what is wrong with the virtual router. As we couldn't test single PRs any more (because of other issues with them causing tests to fail) we added all VR related PRs in a separate branch and started testing from there.
We combined the following PRs into this PR:
#836 #851 #867 #870 #881 #882 #842
After that, one issue remains: the VPC does not get a default gateway. Which is strange, because we already solved it in PR #738. When I look back, it was fixed again in PR #784. It could very well be that either one fixed one specific case, but also breaking the other. We need to investigate this, and make sure there will be a fix that works both for VPCs and VRs.
When we manually add the default gateway on the VPC, most tests pass and also spinning up two VPCs with one tier each, having a VM and them using s2s to VPN them together works fine. See for more details the report Wilder sent earlier.
Tomorrow we'll try to figure out how to fix the default gateway and merge this. Then we should have a base to work from again. Any PR that fixes another blocker, should at least then be rebased against the fixed master so we can run the tests against the PR branch. I'm not saying everything is fixed, I'm just saying that we can spin up a cloud that has working VMs.
When, in the mean time, someone has the time to checkout this branch and make the default route work for both VPC and VR that would be awesome. After that we should double check and verify the test results.
Pinging @karuturi to let her know the current status.
Regards,
Wilder / Remi