• by robcxyz on 3/19/2024, 3:54:06 AM

    This might not be related, but I had an outage of a VKE cluster from Friday to Monday morning and their customer support blamed it on dockerhub. This didn't seem right at all though since the issue only came up when I upgraded a cluster and didn't impact every node. So like their customer support normally does, they figure out some way to deflect the problem (ie point at dockerhub despite the status page only showing some degradation) and ignore it. What didn't inspire confidence though is that their customer support clearly doesn't understand k8s well giving me a response to the effect of "clearly it is dockerhub's fault" when highlighting a pod's status without going into the events or logs of the pod to see the containers were being pulled.

    Again, not sure if this is related, but using this as an opportunity to share how bad my experience has been with vultr's customer support over the last couple years. Every time I have interacted with them over an issue it is some diagnosis that makes things not their fault somehow. When people have clusters out because of control plane errors for multiple days, I would think they would be somewhat concerned or give some kind of response to the effect of an apology especially when spending thousands every month. I doubt I'll get any reimbursement.

    Worst situation in the past was when I complained about connectivity issues that I was sure related to some firewall on their side that was throwing alarms for my app and kept on trying to get them to look at it. Going absolutely crazy for a month trying to figure out what the hell is going on, finally got my rep to look at it and bam, they see the issues and blamed it on a faulty cable. Faulty cables don't drop packets like what I saw though so now I honestly just don't know what to believe from them.

  • by LinuxBender on 3/18/2024, 2:06:29 PM

    Just anecdotally and perhaps unrelated to your issue, I have a Primary DNS server in Vultr and at times IPv4 times out, then IPv6. It hasn't been persistent enough for me to start troubleshooting it or setting up 3rd party monitoring but I may do that today if others are seeing odd behavior now. Perhaps together we could create a list of service endpoints to monitor each other using curl or dig maybe and find a pattern to it.

    Something to play around with

        # TCP AXFR.
        kdig @2001:19f0:b001:e83:5400:4ff:fe72:e740 +nocookie +padding=64 +retry=0 +all -t axfr example.net
        dig @216.128.176.142 +nocookie +padding=64 +retry=0 +all -t axfr example.net
    
        # UDP TXT or whatever
        kdig @2001:19f0:b001:e83:5400:4ff:fe72:e740 +nocookie +padding=64 +retry=0 +all -t txt example.net
        dig @216.128.176.142 +nocookie +padding=64 +retry=0 +all -t txt example.net

  • by inok6743 on 3/18/2024, 2:20:58 AM

    I am having the same issue now with the Cloud compute servers at the Tokyo region.

    In terms of a workaround, it seems like a server created without an IPv6 address works fine and assigning an IPv6 network to the server causes the issue again for me.

    So I guess that something is going wrong with Vultr's network configuration at this point.

  • by rjst01 on 3/18/2024, 10:56:12 AM

    We noticed this yesterday trying to release a minor bug fix. As of this morning it appears to still be broken.

    It's hard to see how it could be anything other than an issue on Docker's side - we are seeing a 500 after all. I need to unblock development ASAP so for now the workaround for us has been to migrate our container registry to Azure.