Debian – Ethernet intermittently fails to come up

debiannetworking

I'm working with an embedded Debian system and I'm having trouble getting ethernet working consistently. Once every 5 or 10 times that eth0 is brought up something fails and I can't connect to it via ssh and it doesn't respond to ping. The solution is to either reboot or log in via serial console and bring eth0 down and then up again. I can replicate the problem either by repeatedly rebooting or by issuing ifconfig eth0 down && ifconfig eth0 up repeatedly until the device stops responding.

My /etc/network/interfaces is:

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static
  address 192.168.1.122
  gateway 192.168.1.1
  netmask 255.255.255.0

When networking works dmesg says:

[ 2612.775183] PHY found at addr 7
[ 2612.776944] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 2614.414704] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

And when it doesn't dmesg says:

[ 2617.224970] PHY found at addr 7
[ 2617.227005] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

When networking works ifconfig outputs:

eth0      Link encap:Ethernet  HWaddr 00:d0:69:46:d9:08  
          inet addr:192.168.1.122  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::2d0:69ff:fe46:d908/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1528  Metric:1
          RX packets:3242 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1382 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:300701 (293.6 KiB)  TX bytes:132344 (129.2 KiB)
          Interrupt:22

And when it doesn't ifconfig output is:

eth0      Link encap:Ethernet  HWaddr 00:d0:69:46:d9:08  
          inet addr:192.168.1.122  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:1536  Metric:1
          RX packets:3355 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1430 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:310120 (302.8 KiB)  TX bytes:136800 (133.5 KiB)
          Interrupt:22

When networking works ip link show eth0 outputs:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1528 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 00:d0:69:46:d9:08 brd ff:ff:ff:ff:ff:ff

When things don't work ip link show eth0 gives:

2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1536 qdisc pfifo_fast state DOWN mode DEFAULT qlen 1000
    link/ether 00:d0:69:46:d9:08 brd ff:ff:ff:ff:ff:ff

My current solution is to have a script parse the output of ip link show eth0 and restart eth0 until it comes up, but this seems pretty hacky.

Any idea what the problem might be or where else I should be looking?

Edit:
Output from ethtool eth0 when things work:

Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                             100baseT/Half 100baseT/Full 
        Link partner advertised pause frame use: Symmetric
        Link partner advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 7
        Transceiver: internal
        Auto-negotiation: on
        Link detected: yes

Output from ethtool eth0 when it doesn't:

Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Speed: 10Mb/s
        Duplex: Half
        Port: MII
        PHYAD: 7
        Transceiver: internal
        Auto-negotiation: on
        Link detected: no

I also imaged the system I've been working on and tested on a second identical machine, but with different cables and a different router and saw the same behavior.

Edit 2:
Per ttsiodras' observation I did some MTU testing. I found that when the device boots the MTU is initially 1508. Every time I bring eth0 down then back up the MTU increases by 4, to a maximum of 1540 after which point it stays the same. Unfortunately there didn't seem to be any correlation between MTU and when I would lose network connectivity. I also tried manually setting the MTU to a variety of values between 1508 and 1540 and the network would still occasionally fail regardless of the manual MTU setting.

Best Answer

This may be related to the fact that Debian patches systemd slightly for backwards compatibility. That's a workaround, however, and one that is somewhat problematic; the full story can be found in the Debian wiki page on the subject. The goal is to fix this for Stretch (the next Debian release) by adding systemd-specific code to the packages containing rcS init scripts. Most of the work there has been done, but there's still a minor amount left.

Things which may be able to fix this issue:

  • Add a script to rc.local which checks if the most important rcS scripts (for your situation) ran successfully, and which fixes things if not (running systemctl status foo.service may help here)
  • Write a systemd unit for the networking script, or grab it from stretch (some testing would be required)
  • Replace systemd on your systems by sysvinit (although that may be overkill)
  • Check the system logs to figure out which services (other than networking) are involved in the dependency loop, and remove one or more of them from the system
  • Install network-manager and use that rather than ifupdown.