Skip to content

vCluster restarts due to timeouts in 0.30.0 - probably due to compactions #3343

@ilia-medvedev-codefresh

Description

@ilia-medvedev-codefresh

What happened?

vCluster crashes on API access timeout, during this time in the logs it appears that compaction is running:

syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    [core[] [Channel #305 SubChannel #306]grpc: addrConn.createTransport failed to connect to {Addr: "unix:///data/kine.sock", ServerName: "kine.sock", BalancerAttributes: {"<%!p(pickfirstleaf.managedByPickfirstKeyType={})>": "<%!p(bool=true)>" }}. Err: connection error: desc = "error reading server preface: read unix @->/data/kine.sock: use of closed network connection"    {"component": "vcluster", "component": "apiserver", "location": "logging.go:55"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion authentication.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API authentication.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API authentication.k8s.io/v1alpha1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion authorization.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API authorization.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion autoscaling v2 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion autoscaling v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API autoscaling/v2beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API autoscaling/v2beta2 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion batch v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API batch/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion certificates.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API certificates.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API certificates.k8s.io/v1alpha1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion coordination.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API coordination.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API coordination.k8s.io/v1alpha2 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion discovery.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API discovery.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion networking.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API networking.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion node.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API node.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API node.k8s.io/v1alpha1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion policy v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API policy/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion rbac.authorization.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API rbac.authorization.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion scheduling.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API scheduling.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion storage.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API storage.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API storage.k8s.io/v1alpha1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion flowcontrol.apiserver.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API flowcontrol.apiserver.k8s.io/v1beta3 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API flowcontrol.apiserver.k8s.io/v1beta2 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API flowcontrol.apiserver.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion apps v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API apps/v1beta2 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API apps/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion admissionregistration.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API admissionregistration.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API admissionregistration.k8s.io/v1alpha1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion events.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API events.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API resource.k8s.io/v1beta2 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion resource.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API resource.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API resource.k8s.io/v1alpha3 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    [core[] [Channel #310 SubChannel #311]grpc: addrConn.createTransport failed to connect to {Addr: "unix:///data/kine.sock", ServerName: "kine.sock", BalancerAttributes: {"<%!p(pickfirstleaf.managedByPickfirstKeyType={})>": "<%!p(bool=true)>" }}. Err: connection error: desc = "transport: Error while dialing: dial unix /data/kine.sock: operation was canceled"    {"component": "vcluster", "component": "apiserver", "location": "logging.go:55"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Adding GroupVersion apiregistration.k8s.io v1 to ResourceManager    {"component": "vcluster", "component": "apiserver", "location": "handler.go:285"}
syncer 2025-11-05 06:09:44    INFO    commandwriter/commandwriter.go:128    Skipping API apiregistration.k8s.io/v1beta1 because it has no resources.    {"component": "vcluster", "component": "apiserver", "location": "genericapiserver.go:784"}
syncer 2025-11-05 06:09:50    INFO    commandwriter/commandwriter.go:128    Starting controller    {"component": "vcluster", "component": "apiserver", "location": "dynamic_cafile_content.go:161", "name": "client-ca-bundle::/data/pki/client-ca.crt"}
syncer 2025-11-05 06:09:58    INFO    commandwriter/commandwriter.go:128    Starting controller    {"component": "vcluster", "component": "apiserver", "location": "dynamic_cafile_content.go:161", "name": "request-header::/data/pki/front-proxy-ca.crt"}
syncer 2025-11-05 06:10:07    INFO    commandwriter/commandwriter.go:128    Starting controller    {"component": "vcluster", "component": "apiserver", "location": "dynamic_serving_content.go:135", "name": "serving-cert::/data/pki/apiserver.crt::/data/pki/apiserver.key"}
syncer 2025-11-05 06:10:08    INFO    setup/controller_context.go:216    couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": net/http: TLS handshake timeout), will retry in 1 seconds    {"component": "vcluster"}
syncer 2025-11-05 06:10:13    INFO    commandwriter/commandwriter.go:128    Starting DynamicServingCertificateController    {"component": "vcluster", "component": "apiserver", "location": "tlsconfig.go:243"}
syncer 2025-11-05 06:10:11    INFO    k8s/k8s.go:391    could not create the request to wait for the api: Get "https://127.0.0.1:6443/readyz": net/http: TLS handshake timeout    {"component": "vcluster"}
syncer 2025-11-05 06:10:15    INFO    commandwriter/commandwriter.go:128    Serving securely on 127.0.0.1:6443    {"component": "vcluster", "component": "apiserver", "location": "secure_serving.go:211"}
syncer 2025-11-05 06:10:15    INFO    commandwriter/commandwriter.go:128    Starting RemoteAvailability controller    {"component": "vcluster", "component": "apiserver", "location": "remote_available_controller.go:425"}
syncer 2025-11-05 06:10:15    INFO    commandwriter/commandwriter.go:128    Starting APIServiceRegistrationController    {"component": "vcluster", "component": "apiserver", "location": "apiservice_controller.go:100"}
syncer 2025-11-05 06:10:15    INFO    commandwriter/commandwriter.go:128    Starting OpenAPI V3 AggregationController    {"component": "vcluster", "component": "apiserver", "location": "controller.go:80"}
syncer 2025-11-05 06:10:15    INFO    commandwriter/commandwriter.go:128    Starting API Priority and Fairness config controller    {"component": "vcluster", "component": "apiserver", "location": "apf_controller.go:377"}
syncer 2025-11-05 06:10:15    INFO    commandwriter/commandwriter.go:128    Starting LocalAvailability controller    {"component": "vcluster", "component": "apiserver", "location": "local_available_controller.go:156"}
syncer 2025-11-05 06:10:15    INFO    commandwriter/commandwriter.go:128    Waiting for caches to sync for LocalAvailability controller    {"component": "vcluster", "component": "apiserver", "location": "cache.go:32"}
syncer 2025-11-05 06:10:15    INFO    commandwriter/commandwriter.go:128    Waiting for caches to sync for RemoteAvailability controller    {"component": "vcluster", "component": "apiserver", "location": "cache.go:32"}
syncer 2025-11-05 06:10:15    INFO    commandwriter/commandwriter.go:128    Waiting for caches to sync for APIServiceRegistrationController controller    {"component": "vcluster", "component": "apiserver", "location": "cache.go:32"}
syncer 2025-11-05 06:10:28    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:10:02.011080743 +0000 UTC m=+20.427905745) (total time: 11.239760538s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:10:16.984033298Z", "level": "warning", "duration": "11.239760538s"}
syncer 2025-11-05 06:10:37    INFO    commandwriter/commandwriter.go:128    Starting controller    {"component": "vcluster", "component": "apiserver", "location": "dynamic_serving_content.go:135", "name": "aggregator-proxy-cert::/data/pki/front-proxy-client.crt::/data/pki/front-proxy-client.key"}
syncer 2025-11-05 06:10:48    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:10:31.080645981 +0000 UTC m=+49.497471000) (total time: 10.157023203s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:10:45.944090322Z", "level": "warning", "duration": "10.157023203s"}
syncer 2025-11-05 06:11:05    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:10:52.650158539 +0000 UTC m=+71.066983544) (total time: 4.554091132s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:11:02.959826096Z", "level": "info", "duration": "4.554091132s"}
syncer 2025-11-05 06:11:10    INFO    k8s/k8s.go:391    could not create the request to wait for the api: Get "https://127.0.0.1:6443/readyz": net/http: TLS handshake timeout    {"component": "vcluster"}
syncer 2025-11-05 06:11:21    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:11:06.70618099 +0000 UTC m=+85.123005996) (total time: 5.518766207s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:11:16.780097726Z", "level": "warning", "duration": "5.518766207s"}
syncer 2025-11-05 06:11:40    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:11:24.078377995 +0000 UTC m=+102.495202997) (total time: 7.857062394s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:11:34.958410776Z", "level": "warning", "duration": "7.857062394s"}
syncer 2025-11-05 06:11:47    INFO    setup/controller_context.go:216    couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": net/http: TLS handshake timeout), will retry in 1 seconds    {"component": "vcluster"}
syncer 2025-11-05 06:11:49    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:11:40.957988956 +0000 UTC m=+119.374813957) (total time: 3.314365504s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:11:47.2686089Z", "level": "info", "duration": "3.314365504s"}
syncer 2025-11-05 06:12:02    INFO    k8s/k8s.go:391    could not create the request to wait for the api: Get "https://127.0.0.1:6443/readyz": net/http: TLS handshake timeout    {"component": "vcluster"}
syncer 2025-11-05 06:12:14    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:11:54.942563309 +0000 UTC m=+133.359388313) (total time: 9.070889228s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:12:08.885859076Z", "level": "warning", "duration": "9.070889228s"}
syncer 2025-11-05 06:12:40    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:12:17.011144615 +0000 UTC m=+155.427969630) (total time: 10.295057872s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:12:31.898780468Z", "level": "warning", "duration": "10.295057872s"}
syncer 2025-11-05 06:13:00    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:12:42.373764597 +0000 UTC m=+180.790589599) (total time: 8.872722199s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:12:57.074930408Z", "level": "warning", "duration": "8.872722199s"}
syncer 2025-11-05 06:13:11    INFO    k8s/k8s.go:391    could not create the request to wait for the api: Get "https://127.0.0.1:6443/readyz": net/http: TLS handshake timeout    {"component": "vcluster"}
syncer 2025-11-05 06:13:18    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:13:03.957335141 +0000 UTC m=+202.374160149) (total time: 5.694177923s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:13:13.963567441Z", "level": "warning", "duration": "5.694177923s"}
syncer 2025-11-05 06:13:36    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:13:21.44011222 +0000 UTC m=+219.856937245) (total time: 5.823595769s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:13:29.598275688Z", "level": "warning", "duration": "5.823595769s"}
syncer 2025-11-05 06:13:57    INFO    k8s/k8s.go:391    could not create the request to wait for the api: Get "https://127.0.0.1:6443/readyz": net/http: TLS handshake timeout    {"component": "vcluster"}
syncer 2025-11-05 06:13:54    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:13:36.692154258 +0000 UTC m=+235.108979259) (total time: 7.849208247s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:13:50.179890473Z", "level": "warning", "duration": "7.849208247s"}
syncer 2025-11-05 06:13:59    INFO    setup/controller_context.go:216    couldn't retrieve virtual cluster version (Get "https://127.0.0.1:6443/version": net/http: TLS handshake timeout), will retry in 1 seconds    {"component": "vcluster"}
syncer 2025-11-05 06:14:10    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:13:57.225412075 +0000 UTC m=+255.642237081) (total time: 6.540065908s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:14:05.888340297Z", "level": "warning", "duration": "6.540065908s"}
syncer 2025-11-05 06:14:10    INFO    rest/request.go:752    Waited before sending request    {"component": "vcluster", "delay": 1.641682522, "reason": "client-side throttling, not priority and fairness", "verb": "GET", "URL": "https://127.0.0.1:6443/readyz"}
syncer 2025-11-05 06:14:21    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:14:09.924401415 +0000 UTC m=+268.341226425) (total time: 3.952061825s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:14:18.237970675Z", "level": "info", "duration": "3.952061825s"}
syncer 2025-11-05 06:14:30    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:14:22.835266543 +0000 UTC m=+281.252091545) (total time: 2.248010917s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:14:27.395146625Z", "level": "info", "duration": "2.248010917s"}
syncer 2025-11-05 06:14:40    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:14:32.866728725 +0000 UTC m=+291.283553732) (total time: 2.298743305s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:14:36.524711715Z", "level": "info", "duration": "2.298743305s"}
syncer 2025-11-05 06:14:45    INFO    commandwriter/commandwriter.go:128    COMPACT compactRev=29736 targetCompactRev=30223 currentRev=31223    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:14:43.718777261Z", "level": "info"}
syncer 2025-11-05 06:14:45    INFO    commandwriter/commandwriter.go:128    Slow SQL (started: 2025-11-05 06:14:41.30577714 +0000 UTC m=+299.722602159) (total time: 1.334983048s):  SELECT ( SELECT MAX(rkv.id) AS id FROM kine AS rkv), ( SELECT MAX(crkv.prev_revision) AS prev_revision FROM kine AS crkv WHERE crkv.name = 'compact_rev_key'), kv.id AS theid, kv.name AS thename, kv.created, kv.deleted, kv.create_revision, kv.prev_revision, kv.lease, kv.value, kv.old_value FROM kine AS kv WHERE kv.name LIKE ? AND kv.id > ? ORDER BY kv.id ASC LIMIT 500    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:14:43.999567059Z", "level": "info", "duration": "1.334983048s"}
syncer 2025-11-05 06:14:45    INFO    commandwriter/commandwriter.go:128    COMPACT deleted 489 rows from 487 revisions in 643.698645ms - compacted to 30223/31223    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:14:44.463858113Z", "level": "info"}
syncer 2025-11-05 06:14:45    INFO    commandwriter/commandwriter.go:128    COMPACT compacted from 29736 to 30223 in 1 transactions over 2.327s    {"component": "vcluster", "component": "kine", "time": "2025-11-05T06:14:44.55188287Z", "level": "info"}
syncer 2025-11-05 06:14:46    INFO    k8s/k8s.go:391    could not create the request to wait for the api: Get "https://127.0.0.1:6443/readyz": context deadline exceeded    {"component": "vcluster"}
syncer 2025-11-05 06:14:46    INFO    setup/initialize.go:138    Error running k8s: waited until timeout for the api to be up: error waiting for API server: Get "https://127.0.0.1:6443/readyz": context deadline exceeded    {"component": "vcluster"}
stream closed EOF for ilia-test-vcluster/vcluster-0 (syncer)

Seems similar to a previously resolved issue by @FabianKramm
#2770

Occurs on a newly installed vCluster running zero workloads.

What did you expect to happen?

vCluster should operate normally during compaction and not crash

How can we reproduce it (as minimally and precisely as possible)?

Occured on EKS vCluster deployed on EKS multiple times on newly installed vclusters

Anything else we need to know?

No response

Host cluster Kubernetes version

1.32

$ kubectl version
# paste output here

vcluster version

Seen both on 0.31.0-alpha.0 & 0.30.0

VCluster Config


controlPlane:
  advanced:
    cloudControllerManager:
      enabled: true
    defaultImageRegistry: ""
    globalMetadata:
      annotations: {}
    headlessService:
      annotations: {}
      labels: {}
    konnectivity:
      agent:
        enabled: true
        extraArgs: []
        extraEnv: []
        image: ""
        imagePullPolicy: ""
        nodeSelector: {}
        replicas: 1
        tolerations: []
      server:
        enabled: true
        extraArgs: []
    kubeVip:
      enabled: false
    registry:
      anonymousPull: true
      config: {}
      enabled: false
    serviceAccount:
      annotations: {}
      enabled: true
      imagePullSecrets: []
      labels: {}
      name: ""
    virtualScheduler:
      enabled: false
    workloadServiceAccount:
      annotations: {}
      enabled: true
      imagePullSecrets: []
      labels: {}
      name: ""
  backingStore:
    database:
      embedded:
        enabled: false
        extraArgs: []
      external:
        caFile: ""
        certFile: ""
        connector: ""
        dataSource: ""
        enabled: false
        extraArgs: []
        identityProvider: ""
        keyFile: ""
    etcd:
      deploy:
        enabled: false
        headlessService:
          annotations: {}
        service:
          annotations: {}
          enabled: true
        statefulSet:
          annotations: {}
          enableServiceLinks: true
          enabled: true
          env: []
          extraArgs: []
          highAvailability:
            replicas: 1
          image:
            registry: registry.k8s.io
            repository: etcd
            tag: 3.6.4-0
          imagePullPolicy: ""
          labels: {}
          persistence:
            addVolumeMounts: []
            addVolumes: []
            volumeClaim:
              accessModes:
              - ReadWriteOnce
              enabled: true
              retentionPolicy: Retain
              size: 5Gi
              storageClass: ""
            volumeClaimTemplates: []
          pods:
            annotations: {}
            labels: {}
          resources:
            requests:
              cpu: 20m
              memory: 150Mi
          scheduling:
            affinity: {}
            nodeSelector: {}
            podManagementPolicy: Parallel
            priorityClassName: ""
            tolerations: []
            topologySpreadConstraints: []
          security:
            containerSecurityContext: {}
            podSecurityContext: {}
      embedded:
        enabled: false
        extraArgs: []
        migrateFromDeployedEtcd: false
      external:
        enabled: false
        endpoint: ""
        tls:
          caFile: ""
          certFile: ""
          keyFile: ""
  coredns:
    deployment:
      affinity: {}
      annotations: {}
      image: ""
      labels: {}
      nodeSelector: {}
      pods:
        annotations: {}
        labels: {}
      replicas: 1
      resources:
        limits:
          cpu: 1000m
          memory: 170Mi
        requests:
          cpu: 20m
          memory: 64Mi
      tolerations: []
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            k8s-app: vcluster-kube-dns
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
    embedded: false
    enabled: true
    overwriteConfig: ""
    overwriteManifests: ""
    priorityClassName: ""
    security:
      containerSecurityContext: {}
      podSecurityContext: {}
    service:
      annotations: {}
      labels: {}
      spec:
        type: ClusterIP
  distro:
    k3s:
      command: []
      enabled: false
      extraArgs: []
      image:
        registry: ""
        repository: rancher/k3s
        tag: v1.34.1-k3s1
      imagePullPolicy: ""
      resources:
        limits:
          cpu: 100m
          memory: 256Mi
        requests:
          cpu: 40m
          memory: 64Mi
      securityContext: {}
    k8s:
      apiServer:
        command: []
        enabled: true
        extraArgs: []
      controllerManager:
        command: []
        enabled: true
        extraArgs: []
      enabled: true
      env: []
      image:
        registry: ghcr.io
        repository: loft-sh/kubernetes
        tag: v1.34.0
      imagePullPolicy: ""
      resources:
        limits:
          cpu: 100m
          memory: 256Mi
        requests:
          cpu: 40m
          memory: 64Mi
      scheduler:
        command: []
        enabled: false
        extraArgs: []
      securityContext: {}
      version: ""
  ingress:
    annotations:
      nginx.ingress.kubernetes.io/backend-protocol: HTTPS
      nginx.ingress.kubernetes.io/ssl-passthrough: "true"
      nginx.ingress.kubernetes.io/ssl-redirect: "true"
    enabled: false
    host: my-host.com
    labels: {}
    pathType: ImplementationSpecific
    spec:
      tls: []
  proxy:
    bindAddress: 0.0.0.0
    extraSANs: []
    port: 8443
  service:
    annotations: {}
    enabled: true
    httpsNodePort: 0
    kubeletNodePort: 0
    labels: {}
    spec:
      type: ClusterIP
  serviceMonitor:
    annotations: {}
    enabled: false
    labels: {}
  standalone:
    dataDir: /var/lib/vcluster
    joinNode:
      containerd:
        enabled: true
      enabled: true
  statefulSet:
    annotations: {}
    args: []
    command: []
    enableServiceLinks: true
    env: []
    highAvailability:
      leaseDuration: 60
      renewDeadline: 40
      replicas: 1
      retryPeriod: 15
    image:
      registry: ghcr.io
      repository: loft-sh/vcluster-pro
      tag: ""
    imagePullPolicy: ""
    initContainers: []
    labels: {}
    persistence:
      addVolumeMounts: []
      addVolumes: []
      binariesVolume:
      - emptyDir: {}
        name: binaries
      dataVolume: []
      volumeClaim:
        accessModes:
        - ReadWriteOnce
        enabled: auto
        retentionPolicy: Retain
        size: 5Gi
        storageClass: ""
      volumeClaimTemplates: []
    pods:
      annotations: {}
      labels: {}
    probes:
      livenessProbe:
        enabled: true
        failureThreshold: 60
        initialDelaySeconds: 60
        periodSeconds: 2
        timeoutSeconds: 3
      readinessProbe:
        enabled: true
        failureThreshold: 60
        periodSeconds: 2
        timeoutSeconds: 3
      startupProbe:
        enabled: true
        failureThreshold: 300
        periodSeconds: 6
        timeoutSeconds: 3
    resources:
      limits:
        ephemeral-storage: 10Gi
        memory: 4Gi
      requests:
        cpu: 200m
        ephemeral-storage: 1Gi
        memory: 256Mi
    scheduling:
      affinity: {}
      nodeSelector: {}
      podManagementPolicy: Parallel
      priorityClassName: ""
      tolerations: []
      topologySpreadConstraints: []
    security:
      containerSecurityContext:
        allowPrivilegeEscalation: false
        runAsGroup: 0
        runAsUser: 0
      podSecurityContext: {}
    sidecarContainers: []
    workingDir: ""
deploy:
  cni:
    flannel:
      enabled: true
  ingressNginx:
    defaultIngressClass: true
    enabled: false
  kubeProxy:
    config: {}
    enabled: true
    extraArgs: []
    extraEnv: []
    image: ""
    imagePullPolicy: ""
    nodeSelector: {}
    tolerations: []
  localPathProvisioner:
    enabled: true
  metallb:
    enabled: false
    ipAddressPool:
      addresses: []
      l2Advertisement: true
  metricsServer:
    enabled: false
  volumeSnapshotController:
    enabled: false
experimental:
  deploy:
    host:
      manifests: ""
      manifestsTemplate: ""
    vcluster:
      helm: []
      manifests: []
      manifestsTemplate: ""
  syncSettings:
    setOwner: true
exportKubeConfig:
  context: ""
  insecure: true
  secret:
    name: ""
    namespace: ""
  server: 'https://vcluster.ilia-test-vcluster.svc.cluster.local'
  serviceAccount:
    clusterRole: cluster-admin
    name: cluster-admin
    namespace: ""
external: {}

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions