-
Notifications
You must be signed in to change notification settings - Fork 1.3k
CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STO… #836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of ps | grep, why not use status result and start, like: service conntrackd status || service conntrackd start (or other commands) etc.
|
cloudstack-pull-rats #622 SUCCESS |
|
cloudstack-pull-analysis #561 SUCCESS |
321bba7 to
79d2c96
Compare
|
cloudstack-pull-rats #630 SUCCESS |
|
@bhaisaab , using service conntrackd status don't provide the status and always gives the usage information for conntrackd. So i have added the changes by starting conntrackd in deamon mode with (-d) switch. |
|
cloudstack-pull-analysis #569 UNSTABLE |
79d2c96 to
56d4429
Compare
|
cloudstack-pull-rats #639 SUCCESS |
|
cloudstack-pull-analysis #578 SUCCESS |
|
cloudstack-pull-analysis #590 SUCCESS |
|
We, @remibergsma, Funs and I will test the 3 PRs which are VR related now. |
CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STOP-START/RECREATE
The ongoing ICMP request reply session is broken when the VR is down, the expectation is that it would resume once the VR is up. Investigations revealed that the ongoing ICMP packets are sent out of eth2 without being NATed post VR stop/start or restart or recreate.
TCPDUMP output from VR post restart/stop-start/recreate on eth2:
root@r-4-VM:~# tcpdump -i eth2 icmp -n -vvv
tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
06:22:52.749770 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 81, length 64
06:22:53.749782 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 82, length 64
06:22:54.749771 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 83, length 64
06:22:55.749775 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 84, length 64
06:22:56.749765 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 85, length 64
06:22:57.749776 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 86, length 64
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel
root@r-4-VM:~#
root@r-4-VM:~# grep icmp /proc/net/ip_conntrack
icmp 1 29 src=192.168.200.67 dst=173.194.33.163 type=8 code=0 id=30996 [UNREPLIED] src=173.194.33.163 dst=192.168.200.67 type=0 code=0 id=30996 mark=0 use=2
This get fixed after flushing the conntrack table.
Screenshots:
Before fix (ping session doesn't resume, stop and starting the ping works, 120 packets lost):

After fix(ping session resumes, 27 packets lost):

* pr/836:
CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STOP-START/RECREATE
Signed-off-by: Remi Bergsma <[email protected]>
[BLOCKER] Combined PRs that fix VR issuesTonight I worked with @wilderrodrigues to figure out what is wrong with the virtual router. As we couldn't test single PRs any more (because of other issues with them causing tests to fail) we added all VR related PRs in a separate branch and started testing from there. We combined the following PRs into this PR: #836 #851 #867 #870 #881 #882 #842 After that, one issue remains: the VPC does not get a default gateway. Which is strange, because we already solved it in PR #738. When I look back, it was fixed again in PR #784. It could very well be that either one fixed one specific case, but also breaking the other. We need to investigate this, and make sure there will be a fix that works both for VPCs and VRs. When we manually add the default gateway on the VPC, most tests pass and also spinning up two VPCs with one tier each, having a VM and them using s2s to VPN them together works fine. See for more details the report Wilder sent earlier. Tomorrow we'll try to figure out how to fix the default gateway and merge this. Then we should have a base to work from again. Any PR that fixes another blocker, should at least then be rebased against the fixed master so we can run the tests against the PR branch. I'm not saying everything is fixed, I'm just saying that we can spin up a cloud that has working VMs. When, in the mean time, someone has the time to checkout this branch and make the default route work for both VPC and VR that would be awesome. After that we should double check and verify the test results. Pinging @karuturi to let her know the current status. Regards, Wilder / Remi * pr/887: Fixing the index out of bounds error in the check_if_link_up() function small cleanups Fixing the defaut route for VPC routers Formatting the get_gateway() method in the CsDatabag.py file Fixing the dhcpsrvr iptables file Formatting the router_proxy.sh script CLOUDSTACK-8881: Fixed Static and PF configuration issue CLOUDSTACK-8905: Fixed hooking egress rules CLOUDSTACK-8891: Fixed default iptables rules on VR for guest traffic Configured dnsmasq to listen on all interfaces so that vpn client gets dns CLOUDSTACK-8864: Not able to add TCP port forwarding rule in VPN for specific ports CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STOP-START/RECREATE CLOUDSTACK-8843: Fixed issue in default iptables rules on shared network VR Signed-off-by: Remi Bergsma <[email protected]>
…P-START/RECREATE
The ongoing ICMP request reply session is broken when the VR is down, the expectation is that it would resume once the VR is up. Investigations revealed that the ongoing ICMP packets are sent out of eth2 without being NATed post VR stop/start or restart or recreate.
TCPDUMP output from VR post restart/stop-start/recreate on eth2:
root@r-4-VM:
# tcpdump -i eth2 icmp -n -vvv#tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
06:22:52.749770 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 81, length 64
06:22:53.749782 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 82, length 64
06:22:54.749771 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 83, length 64
06:22:55.749775 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 84, length 64
06:22:56.749765 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 85, length 64
06:22:57.749776 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 86, length 64
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel
root@r-4-VM:
root@r-4-VM:~# grep icmp /proc/net/ip_conntrack
icmp 1 29 src=192.168.200.67 dst=173.194.33.163 type=8 code=0 id=30996 [UNREPLIED] src=173.194.33.163 dst=192.168.200.67 type=0 code=0 id=30996 mark=0 use=2
This get fixed after flushing the conntrack table.
Screenshots:
Before fix (ping session doesn't resume, stop and starting the ping works, 120 packets lost):

After fix(ping session resumes, 27 packets lost):
