CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STO… #836

SudharmaJain · 2015-09-16T06:36:25Z

…P-START/RECREATE

The ongoing ICMP request reply session is broken when the VR is down, the expectation is that it would resume once the VR is up. Investigations revealed that the ongoing ICMP packets are sent out of eth2 without being NATed post VR stop/start or restart or recreate.

TCPDUMP output from VR post restart/stop-start/recreate on eth2:

root@r-4-VM:# tcpdump -i eth2 icmp -n -vvv
tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes
06:22:52.749770 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 81, length 64
06:22:53.749782 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 82, length 64
06:22:54.749771 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 83, length 64
06:22:55.749775 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 84, length 64
06:22:56.749765 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 85, length 64
06:22:57.749776 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 86, length 64
^C
6 packets captured
6 packets received by filter
0 packets dropped by kernel
root@r-4-VM:#
root@r-4-VM:~# grep icmp /proc/net/ip_conntrack
icmp 1 29 src=192.168.200.67 dst=173.194.33.163 type=8 code=0 id=30996 [UNREPLIED] src=173.194.33.163 dst=192.168.200.67 type=0 code=0 id=30996 mark=0 use=2

This get fixed after flushing the conntrack table.

Screenshots:

Before fix (ping session doesn't resume, stop and starting the ping works, 120 packets lost):

After fix(ping session resumes, 27 packets lost):

rohityadavcloud · 2015-09-16T07:02:59Z

systemvm/patches/debian/config/opt/cloud/bin/vr_cfg.sh

instead of ps | grep, why not use status result and start, like: service conntrackd status || service conntrackd start (or other commands) etc.

asfbot · 2015-09-16T07:10:28Z

cloudstack-pull-rats #622 SUCCESS
This pull request looks good

asfbot · 2015-09-16T08:24:58Z

cloudstack-pull-analysis #561 SUCCESS
This pull request looks good

asfbot · 2015-09-16T09:31:02Z

cloudstack-pull-rats #630 SUCCESS
This pull request looks good

SudharmaJain · 2015-09-16T09:39:58Z

@bhaisaab , using service conntrackd status don't provide the status and always gives the usage information for conntrackd. So i have added the changes by starting conntrackd in deamon mode with (-d) switch.

asfbot · 2015-09-16T10:39:47Z

cloudstack-pull-analysis #569 UNSTABLE
Looks like there's a problem with this pull request

…P-START/RECREATE

asfbot · 2015-09-17T06:28:22Z

cloudstack-pull-rats #639 SUCCESS
This pull request looks good

asfbot · 2015-09-17T07:36:47Z

cloudstack-pull-analysis #578 SUCCESS
This pull request looks good

asfbot · 2015-09-17T14:49:55Z

cloudstack-pull-analysis #590 SUCCESS
This pull request looks good

wilderrodrigues · 2015-09-24T13:13:34Z

@jayapalu @karuturi

We, @remibergsma, Funs and I will test the 3 PRs which are VR related now.

CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STOP-START/RECREATE The ongoing ICMP request reply session is broken when the VR is down, the expectation is that it would resume once the VR is up. Investigations revealed that the ongoing ICMP packets are sent out of eth2 without being NATed post VR stop/start or restart or recreate. TCPDUMP output from VR post restart/stop-start/recreate on eth2: root@r-4-VM:~# tcpdump -i eth2 icmp -n -vvv tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 65535 bytes 06:22:52.749770 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 81, length 64 06:22:53.749782 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 82, length 64 06:22:54.749771 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 83, length 64 06:22:55.749775 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 84, length 64 06:22:56.749765 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 85, length 64 06:22:57.749776 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.200.67 > 173.194.33.163: ICMP echo request, id 30996, seq 86, length 64 ^C 6 packets captured 6 packets received by filter 0 packets dropped by kernel root@r-4-VM:~# root@r-4-VM:~# grep icmp /proc/net/ip_conntrack icmp 1 29 src=192.168.200.67 dst=173.194.33.163 type=8 code=0 id=30996 [UNREPLIED] src=173.194.33.163 dst=192.168.200.67 type=0 code=0 id=30996 mark=0 use=2 This get fixed after flushing the conntrack table. Screenshots: Before fix (ping session doesn't resume, stop and starting the ping works, 120 packets lost): ![image](https://cloud.githubusercontent.com/assets/12229259/9897800/4de7488e-5c6a-11e5-98eb-3bd79cc3a8b1.png) After fix(ping session resumes, 27 packets lost): ![image](https://cloud.githubusercontent.com/assets/12229259/9897822/9112e866-5c6a-11e5-95b3-1b20600d2e44.png) * pr/836: CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STOP-START/RECREATE Signed-off-by: Remi Bergsma <[email protected]>

@wilderrodrigues

[BLOCKER] Combined PRs that fix VR issuesTonight I worked with @wilderrodrigues to figure out what is wrong with the virtual router. As we couldn't test single PRs any more (because of other issues with them causing tests to fail) we added all VR related PRs in a separate branch and started testing from there. We combined the following PRs into this PR: #836 #851 #867 #870 #881 #882 #842 After that, one issue remains: the VPC does not get a default gateway. Which is strange, because we already solved it in PR #738. When I look back, it was fixed again in PR #784. It could very well be that either one fixed one specific case, but also breaking the other. We need to investigate this, and make sure there will be a fix that works both for VPCs and VRs. When we manually add the default gateway on the VPC, most tests pass and also spinning up two VPCs with one tier each, having a VM and them using s2s to VPN them together works fine. See for more details the report Wilder sent earlier. Tomorrow we'll try to figure out how to fix the default gateway and merge this. Then we should have a base to work from again. Any PR that fixes another blocker, should at least then be rebased against the fixed master so we can run the tests against the PR branch. I'm not saying everything is fixed, I'm just saying that we can spin up a cloud that has working VMs. When, in the mean time, someone has the time to checkout this branch and make the default route work for both VPC and VR that would be awesome. After that we should double check and verify the test results. Pinging @karuturi to let her know the current status. Regards, Wilder / Remi * pr/887: Fixing the index out of bounds error in the check_if_link_up() function small cleanups Fixing the defaut route for VPC routers Formatting the get_gateway() method in the CsDatabag.py file Fixing the dhcpsrvr iptables file Formatting the router_proxy.sh script CLOUDSTACK-8881: Fixed Static and PF configuration issue CLOUDSTACK-8905: Fixed hooking egress rules CLOUDSTACK-8891: Fixed default iptables rules on VR for guest traffic Configured dnsmasq to listen on all interfaces so that vpn client gets dns CLOUDSTACK-8864: Not able to add TCP port forwarding rule in VPN for specific ports CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STOP-START/RECREATE CLOUDSTACK-8843: Fixed issue in default iptables rules on shared network VR Signed-off-by: Remi Bergsma <[email protected]>

rohityadavcloud reviewed Sep 16, 2015
View reviewed changes

SudharmaJain force-pushed the cs-8863 branch from 321bba7 to 79d2c96 Compare September 16, 2015 09:29

CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STO…

56d4429

…P-START/RECREATE

SudharmaJain force-pushed the cs-8863 branch from 79d2c96 to 56d4429 Compare September 17, 2015 06:22

remibergsma mentioned this pull request Sep 24, 2015

[BLOCKER] Combined PRs that fix VR issues #887

Merged

asfgit merged commit 56d4429 into apache:master Sep 27, 2015

SudharmaJain deleted the cs-8863 branch October 12, 2015 04:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STO… #836

CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STO… #836

Uh oh!

SudharmaJain commented Sep 16, 2015

Uh oh!

rohityadavcloud Sep 16, 2015

Uh oh!

asfbot commented Sep 16, 2015

Uh oh!

asfbot commented Sep 16, 2015

Uh oh!

asfbot commented Sep 16, 2015

Uh oh!

SudharmaJain commented Sep 16, 2015

Uh oh!

asfbot commented Sep 16, 2015

Uh oh!

asfbot commented Sep 17, 2015

Uh oh!

asfbot commented Sep 17, 2015

Uh oh!

asfbot commented Sep 17, 2015

Uh oh!

wilderrodrigues commented Sep 24, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STO… #836

CLOUDSTACK-8863: VM doesn't reconnect to internet post VR RESTART/STO… #836

Uh oh!

Conversation

SudharmaJain commented Sep 16, 2015

Uh oh!

rohityadavcloud Sep 16, 2015

Choose a reason for hiding this comment

Uh oh!

asfbot commented Sep 16, 2015

Uh oh!

asfbot commented Sep 16, 2015

Uh oh!

asfbot commented Sep 16, 2015

Uh oh!

SudharmaJain commented Sep 16, 2015

Uh oh!

asfbot commented Sep 16, 2015

Uh oh!

asfbot commented Sep 17, 2015

Uh oh!

asfbot commented Sep 17, 2015

Uh oh!

asfbot commented Sep 17, 2015

Uh oh!

wilderrodrigues commented Sep 24, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants