← Blog

VyOS on Hetzner as a Static WAN Hub for Dynamic On-Prem Connectivity

How inverting the IPsec initiation direction eliminates the dynamic on-prem IP problem, and why running VyOS on both sides of the link simplifies everything else.

The common assumption about site-to-site VPN is that both ends need a fixed public IP. That assumption breaks down the moment one end is an on-prem environment running on a residential or small-office ISP, which is exactly the situation for a homelab, a small office, or an on-prem node in a hybrid infrastructure setup where the upstream ISP assigns addresses dynamically.

The workaround most people reach for is dynamic DNS: register a hostname, run a DDNS client on the router, and configure the remote peer to resolve by name rather than IP. That works, but it adds a dependency on a third-party DNS update service, introduces a propagation delay when the IP changes, and adds a failure mode, the tunnel drops because the DNS record hasn’t yet updated, that is genuinely annoying to debug.

There is a cleaner structural fix: invert the initiation direction. Design the architecture so the dynamic end never needs to be known in advance.


The initiator/responder model

In an IPsec tunnel, one side initiates and the other responds. The initiating side needs to know where to connect. The responding side just needs to be reachable, it does not need to know the initiator’s address in advance.

If the Hetzner edge is always the responder and on-prem is always the initiator, the dynamic IP problem disappears. Hetzner has static public addresses assigned by the provider. On-prem connects outbound to those fixed endpoints. If the on-prem IP changes, ISP failover, DHCP renewal, physical move, it re-establishes the tunnel from its new address. Hetzner does not care what address the connection came from as long as the IKE credentials are valid.

That is the pattern I ended up with in HybridOps: the Hetzner edge pair sits at the centre of the WAN fabric as the static hub, on-prem always initiates outbound, and GCP peers in via HA VPN while BGP distributes the routes. It is one of those architectural choices that feels obvious in retrospect but still saves real time once you stop trying to force the dynamic site to behave like the static one.


VyOS on both sides

Both the Hetzner edges and the on-prem edge run VyOS. The same image artifact, built via Packer from the same VyOS 1.5 source, is deployed to Hetzner dedicated servers via hcloud-upload-image and to Proxmox as a VM disk. The cloud-init provisioning follows the same pattern on both sides. The routing and tunnel configuration uses the same VyOS CLI structure.

This matters because it means there is no translation layer. Debugging a tunnel means reading the same log format, running the same diagnostic commands, and applying the same mental model regardless of which side of the link you’re on. The configuration for the IKE groups, the ESP proposals, and the BGP peer parameters is consistent between environments.

On Hetzner, the edge nodes get static public addresses and a private network interface for internal Hetzner connectivity. On-prem, the VyOS VM runs with a static management interface on the Proxmox SDN and a DHCP-addressed WAN interface (eth1) facing the upstream ISP. The WAN interface is intentionally dynamic, that is the design, not a limitation.


Routing with BGP

IPsec tunnels move packets between peers. BGP distributes the prefixes that tell each environment what routes exist on the other side.

The on-prem VyOS and each Hetzner edge node peer via BGP inside the site-extension tunnels. On-prem advertises its internal subnets. Hetzner receives them and redistributes them toward GCP. GCP peers with the Hetzner pair via Cloud Router using a separate eBGP session over the HA VPN tunnels.

set protocols bgp 65001 neighbor 10.0.0.1 remote-as 64512
set protocols bgp 65001 neighbor 10.0.0.1 address-family ipv4-unicast
set protocols bgp 65001 neighbor 10.0.0.1 prefix-list import ON-PREM-IN
set protocols bgp 65001 neighbor 10.0.0.1 prefix-list export HETZNER-OUT

The result is a fully routed mesh. A workload in GCP can reach an on-prem address. An on-prem process can reach a GCP service over a private path. The routing decisions happen at the BGP layer, adding or withdrawing prefixes changes where traffic flows without touching tunnel configuration.

Prefix filtering is strict. Each peer relationship has an explicit import and export policy. On-prem cannot accidentally advertise a prefix that reaches GCP. Hetzner only redistributes what the policy allows. This is not a flat network, it is a routed fabric with controlled advertisement.


High availability without VRRP

Two Hetzner edge nodes run simultaneously, each with its own tunnel pair to on-prem and to GCP. Both are active. On-prem establishes a tunnel to each edge independently. GCP HA VPN similarly maintains a session to each Hetzner edge.

Failover is handled by BGP reconvergence. If one Hetzner edge or one tunnel becomes unavailable, BGP withdraws the prefixes learned through that path. The remaining path picks up the traffic. There is no VRRP election, no virtual IP handoff, no coordination between the two Hetzner edge nodes required for data-plane failover.

This HA design is worth examining for what it doesn’t require. No shared state between edge nodes. No cluster protocol to configure or troubleshoot. No VIP that both nodes compete for. The BGP routing protocol that is already present for prefix distribution handles failover as a natural consequence of the topology.

One Hetzner edge holds a floating IP for management ingress, SSH access, operational tooling, control-plane connectivity. If that edge becomes unavailable, the floating IP can be reassigned. But the tunnel traffic does not depend on the floating IP at all.


What the on-prem edge does not need

The on-prem VyOS edge VM sits between the Proxmox SDN and the upstream WAN. From the platform workloads’ perspective it is just a gateway. They do not see the tunnel architecture behind it.

The on-prem VyOS does not need a static public IP. It does not need a DDNS client. It does not need firewall rules that permit inbound IKE from a fixed remote address, the Hetzner edge is the responder, so on-prem just needs outbound UDP 500 and 4500 unrestricted. In most ISP and home router setups, that is already permitted.

This is the practical value of the architectural decision made at the start. By inverting the initiation direction and designing Hetzner as the static hub, every constraint that would otherwise apply to the dynamic on-prem end is removed. The dynamic address is no longer a problem to be managed, it is a non-issue.


Making the pattern repeatable

The important thing is not that this exact topology can be deployed once. It is that the design can be repeated with different prefixes, regions, and edge targets without having to rediscover the same routing decisions every time.

That means codifying the peer relationships, the route filters, and the expected failover behaviour in a form that can be rehearsed. It also means capturing evidence that the pattern is working: BGP peer state, tunnel health, prefix advertisement, and failover behaviour should be observable rather than assumed.

That is the operational value of treating WAN architecture as a repeatable design rather than a one-off procedure. The knowledge stops living in the heads of the people who happened to build the first version.