Ubuntu – Can Ping but Cannot SSH to Openstack VM Instance

jujumaasopenstackssh

Have a multi-node MAAS-JUJU-Openstack setup, i.e., nova-cloud-controller, nova-compute, and quantum/neutron gateway are on separate hosts. In this setup, MAAS and openstack share 10.0.0.0/24 as their management network, while Quantum charm is connected via eth0 to public network 10.0.10.0. I am able to spawn instances, and assign floating ip addresses to them. I can also ping to and from public network. Furthermore, the 2 cirros instances, which are on the same subnet, can ssh each other. However, am not able to ssh to instances either through router namespace, i.e., ip netns exec qr-xxxx ssh -i <full path to key> cirros@privateipaddr, or from outside. I've generated key-pairs both through dashboard and nova, with similar results. Also tried chmod 0600 key, ssh-add key to no avail. A sample output is shown here http://pastebin.ubuntu.com/7676660/,
which eventually times out. Connecting via vnc to cirros image shows, under /var/log messages, the following:

Jun20 14:29:21 cir3 authpriv.info dropbear[364]: Child connection from GatewayPrivateIp:32818
Jun20 14:29:21 cir3 authpriv.info dropbear[364]: Exit before auth: Timeout before auth

Similar logs are observed, when ssh-ing from public network. In case of public network access, wireshark shows repeated retransmission of ssh ACK from source to VM's public address, with ARP calls asking who owns either the source or VM's ip address, and finally, VM sending [FIN, ACK] and closing the connection.

I have set up the meta-data server and followed the 2nd method as noted in Cloud instances in OpenStack can't import public SSH key, and am seeing the following during boot up, http://pastebin.ubuntu.com/7676789/. (Not sure if these are significant: failed to get http://169.254.169.254/2009-04-04/user-data
warning: no ec2 metadata for user-data)
To isolate testing, I've created a new security group that allows all tcp ranges in both ingress and egress directions.
Seems to me that this is not a firewall or policy issue as connection to port 22 is possible, I am wondering if wrong meta-data is being generated during boot up.
Any and all suggestions are appreciated.
Cheers,

Best Answer

In our multi-node environment, problem ended up being packet fragmentation. The work around for is to increase mtu to 1700 on management nics of both compute and neutron nodes, e.g., ifconfig ethxxx mtu 1700

Related Question