Skip to content
✨ Markdown

Upgrading

How to upgrade the Electric sync engine with minimal disruption using rolling deployments. This guide covers two deployment scenarios: shared storage (recommended) and separate storage for ephemeral environments.

Before reading this guide, make sure you're familiar with the Deployment guide for general setup.

Overview

Electric is designed to run as a single active instance per replication stream. It uses a PostgreSQL advisory lock — a cooperative lock used for application-level coordination that does not lock any tables or rows — to ensure only one instance actively replicates from Postgres at a time.

When you deploy a new version:

  1. The new instance starts and loads shape metadata from storage
  2. While the old instance holds the lock, the new instance enters read-only mode — it can serve requests for existing shapes but cannot create new ones
  3. Once the old instance shuts down, its database connection drops and the lock is released
  4. The new instance acquires the lock and becomes fully active
Time ────────────────────────────────────────────►

Old    [==== active (200) ====]--shutdown--X
                                lock released─┐
New       [starting][waiting (202)]───────────┴─[== active ==]
              │           │                         │
          loading    serves existing          fully operational
          metadata   shapes (read-only)

The read-only window is typically brief — a few seconds to under a minute, depending on how quickly your orchestrator terminates the old instance. During this window, existing shapes continue to be served. Requests for new shapes return 503 with a Retry-After header until the new instance becomes active. The official TypeScript client handles both of these automatically.

Version compatibility

Shape handle stability across deploys depends on Electric's internal shape identity computation not changing between versions. If a new version changes how shapes are identified or changes the storage schema, even shared-storage upgrades may trigger 409 (must-refetch) responses. Check the release notes for any such breaking changes before upgrading.

Choosing a strategy

Shared storageSeparate storage
Client disruptionMinimal (new shapes briefly delayed)409s (clients must refetch shapes)
Sticky sessions requiredNoYes
Postgres overheadSingle slotOne slot per instance
Best forMost deploymentsEphemeral environments

How the advisory lock works

The advisory lock is tied to the replication slot name:

sql
SELECT pg_advisory_lock(hashtext('electric_slot_{stream_id}'))

This lock is scoped to Electric's replication slot name and does not conflict with any other advisory locks or table-level locks in your database.

  • Only one instance can hold the lock per ELECTRIC_REPLICATION_STREAM_ID
  • The lock is held on the replication database connection — if the connection drops (e.g., instance shutdown), the lock is automatically released

Lock breaker

Electric includes a lock breaker mechanism that checks every 10 seconds whether the replication slot associated with the lock is inactive in Postgres. If the slot is inactive but a backend still holds the advisory lock, Electric terminates that backend. This only affects connections where the replication stream has already stopped, so it will not interfere with a healthy instance during a normal rolling deploy.

Health check behavior during upgrades

The /v1/health endpoint reflects the instance's current state:

HTTP StatusResponseMeaning
200{"status": "active"}The instance is active — it holds the advisory lock and is fully operational
202{"status": "waiting"}The instance is ready — it can serve existing shapes in read-only mode but is not yet active
202{"status": "starting"}The instance is starting up and not yet ready to serve any requests

During the waiting state:

  • Requests for existing shapes are served normally (read-only mode)
  • Requests that require creating new shapes return 503 with a Retry-After: 5 header
  • Shape deletion also requires active mode and returns 503 while waiting

For orchestrator probe configuration, see the health check section below.

When instances share the same filesystem (e.g., a persistent volume), they share shape data and metadata. This is the recommended approach because shape handles remain stable across deploys — clients don't need sticky sessions and experience minimal disruption.

When to use

  • Kubernetes with ReadWriteMany PersistentVolumeClaims
  • AWS ECS on EC2 with shared host volumes (use placement constraints to keep tasks on the same host)
  • Any platform where both instances can access the same filesystem

Network filesystems and performance

Electric is IO-intensive — it reads and writes shape logs and metadata frequently. Network filesystems like EFS or NFS add significant latency compared to local storage and may not perform well for large deployments. Prefer local volumes (e.g., NVMe SSDs on EC2 with host bind mounts) where possible. If you must use a network filesystem, see the troubleshooting guide for important SQLite configuration.

Configuration

Both instances use identical configuration. The key requirement is that ELECTRIC_STORAGE_DIR points to a shared filesystem:

shell
DATABASE_URL=postgresql://user:password@host:5432/mydb
ELECTRIC_STORAGE_DIR=/shared/electric/data
ELECTRIC_SECRET=your-secret

`ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE` for shared storage

When using a network filesystem (NFS, EFS) for shared storage, you must set ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE=true. This configures SQLite to use a single read-write connection, preventing corruption from concurrent access — SQLite's default WAL mode relies on shared-memory locking that does not work correctly on network filesystems. For local shared volumes (e.g., a K8s PVC backed by local SSD), this setting is not strictly required but is recommended as a safe default. It is included in all shared-storage examples below.

Docker Compose example

This example demonstrates the shared-storage setup. In practice, your orchestrator handles starting and stopping instances during an upgrade.

yaml
services:
  electric:
    image: electricsql/electric:0.9  # pin to a specific version
    environment:
      DATABASE_URL: ${DATABASE_URL}
      ELECTRIC_STORAGE_DIR: /var/lib/electric/data
      ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE: "true"
      ELECTRIC_SECRET: ${ELECTRIC_SECRET}
    volumes:
      - electric_data:/var/lib/electric/data
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:3000/v1/health"]
      interval: 10s
      timeout: 2s
      retries: 3
    # ...ports, networks, etc.

volumes:
  electric_data:

Simulating a rolling deploy

To test the lock handover locally, start a second container pointing at the same volume, then stop the first. Note that docker compose --scale requires removing static port mappings or using a port range to avoid conflicts.

Kubernetes example

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: electric
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    # ...labels, selectors
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: electric
          image: electricsql/electric:0.9  # pin to a specific version
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: electric-secrets
                  key: database-url
            - name: ELECTRIC_STORAGE_DIR
              value: "/var/lib/electric/data"
            - name: ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE
              value: "true"
            - name: ELECTRIC_SECRET
              valueFrom:
                secretKeyRef:
                  name: electric-secrets
                  key: electric-secret
          volumeMounts:
            - name: electric-storage
              mountPath: /var/lib/electric/data
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            # ...limits
          livenessProbe:
            httpGet:
              path: /v1/health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 2
            failureThreshold: 6
          readinessProbe:
            httpGet:
              path: /v1/health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 2
            failureThreshold: 3
      volumes:
        - name: electric-storage
          persistentVolumeClaim:
            claimName: electric-shared-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: electric-shared-pvc
spec:
  accessModes:
    - ReadWriteMany
  # storageClassName: efs-sc  # use a storage class that supports RWX
  resources:
    requests:
      storage: 10Gi

With maxSurge: 1 and maxUnavailable: 0, Kubernetes will:

  1. Start a new pod alongside the existing one
  2. The new pod enters read-only mode (202 "waiting") and passes the readiness probe (any 2xx)
  3. Kubernetes terminates the old pod
  4. The old pod shuts down, releasing the advisory lock
  5. The new pod acquires the lock and becomes fully active (200)

AWS ECS example

This example uses EC2 launch type with a host bind mount for shared storage. Both old and new tasks share the same directory on the EC2 host.

Same-host placement

ECS does not guarantee that the new task lands on the same host as the old one. To ensure both tasks share the same host volume, your ECS cluster must have exactly one EC2 instance matching your placement constraint, or use a custom instance attribute to pin tasks to a specific host.

json
{
  "family": "electric",
  "networkMode": "awsvpc",
  "cpu": "1024",
  "memory": "2048",
  "executionRoleArn": "arn:aws:iam::...:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "electric",
      "image": "electricsql/electric:0.9",
      "portMappings": [
        { "containerPort": 3000, "protocol": "tcp" }
      ],
      "environment": [
        { "name": "ELECTRIC_STORAGE_DIR", "value": "/var/lib/electric/data" },
        { "name": "ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE", "value": "true" }
      ],
      "secrets": [
        {
          "name": "DATABASE_URL",
          "valueFrom": "arn:aws:secretsmanager:..."
        },
        {
          "name": "ELECTRIC_SECRET",
          "valueFrom": "arn:aws:secretsmanager:..."
        }
      ],
      "mountPoints": [
        { "sourceVolume": "electric-data", "containerPath": "/var/lib/electric/data" }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -sf http://localhost:3000/v1/health || exit 1"],
        "interval": 10,
        "timeout": 2,
        "retries": 3,
        "startPeriod": 60
      }
    }
  ],
  "volumes": [
    {
      "name": "electric-data",
      "host": { "sourcePath": "/var/lib/electric/data" }
    }
  ]
}

Configure your ECS service for rolling upgrades:

json
{
  "deploymentConfiguration": {
    "minimumHealthyPercent": 100,
    "maximumPercent": 200
  }
}

This ensures ECS starts the new task before stopping the old one, allowing the advisory lock handover to occur. Set the health check grace period on your ECS service to 60–90 seconds to allow time for the new task to acquire the advisory lock.

Health checks must accept HTTP 202

Your orchestrator's health or readiness check must accept 202 responses during upgrades. If it only considers 200 as healthy, the new instance can never become ready while the old instance holds the lock — creating a deadlock where the orchestrator waits for the new instance before terminating the old one.

Both Kubernetes httpGet probes and ECS health checks using curl -sf accept any 2xx by default, which is the correct behavior for rolling upgrades.

Single-instance readiness probes

The Deployment guide recommends an exec readiness probe that checks for exactly HTTP 200. That approach is correct for single-instance deployments where you don't want a starting instance to receive traffic, but it will deadlock during rolling upgrades. If you are performing rolling upgrades, use httpGet readiness probes as shown in the examples above.

Separate storage (ephemeral)

When shared storage is not available (e.g., ECS with ephemeral block storage, containers with local-only disks), each instance must have its own replication slot and maintains its own shape data independently. This means each instance has different shape handles for the same shape definitions, so clients must use sticky sessions and will receive 409 (must-refetch) responses when they switch between instances during a deploy.

The platform examples from the shared storage section above apply — just remove the shared volume mount and use the configuration shown here.

There are two ways to manage the per-instance replication slots:

Temporary replication slots

Use temporary replication slots that are automatically cleaned up when the connection closes. This is the simplest approach for ephemeral storage and avoids accumulating orphaned slots.

shell
CLEANUP_REPLICATION_SLOTS_ON_SHUTDOWN=true
ELECTRIC_TEMPORARY_REPLICATION_SLOT_USE_RANDOM_NAME=true
ELECTRIC_STORAGE_DIR=/local/electric/data

The random name option avoids replication slot name conflicts when old and new instances briefly overlap during a rolling upgrade.

With this configuration:

  • Electric creates a TEMPORARY replication slot on the database connection
  • The slot is automatically dropped by Postgres when the connection closes (on clean shutdown or crash)
  • The new instance creates a fresh temporary slot and starts replicating

Network partitions cause shape rotations

If Electric crashes or loses its database connection unexpectedly, the temporary slot is eventually cleaned up by Postgres once it detects the dead connection (which depends on TCP keepalive settings and may take minutes). When the new instance starts with a fresh slot, all existing shapes are invalidated and clients receive 409 (must-refetch) responses requiring a full resync. See Replication slot recreation in the troubleshooting guide for more details.

See the config reference for CLEANUP_REPLICATION_SLOTS_ON_SHUTDOWN and ELECTRIC_TEMPORARY_REPLICATION_SLOT_USE_RANDOM_NAME.

Separate replication stream IDs

Alternatively, give each concurrent instance its own ELECTRIC_REPLICATION_STREAM_ID. This creates named replication slots that persist, giving you more explicit control. This is different from sharding, where separate stream IDs are used for instances connecting to different databases — here, both instances connect to the same database.

shell
# Instance A (e.g., blue deployment)
ELECTRIC_REPLICATION_STREAM_ID=deploy-blue
ELECTRIC_STORAGE_DIR=/local/electric/data

# Instance B (e.g., green deployment)
ELECTRIC_REPLICATION_STREAM_ID=deploy-green
ELECTRIC_STORAGE_DIR=/local/electric/data

Postgres resource overhead

Each replication stream ID creates its own replication slot and publication. Multiple replication slots increase WAL retention on Postgres since each slot independently prevents WAL from being cleaned up.

Monitor your replication slots as described in the Troubleshooting guide. Clean up unused slots promptly when old instances are fully decommissioned.

When the old deployment is fully stopped, clean up its replication slot and publication in Postgres. The names follow the pattern electric_slot_{stream_id} and electric_publication_{stream_id}:

sql
SELECT pg_drop_replication_slot('electric_slot_deploy_blue');
DROP PUBLICATION IF EXISTS electric_publication_deploy_blue;

Client behavior during deploys

The official TypeScript client handles deploy transitions automatically:

  • 503 with Retry-After header: The client backs off and retries. This happens when requesting new shapes during the read-only window.
  • 409 (must-refetch): The client refetches the shape from scratch. This happens with separate-storage strategies or when shapes are rotated.
  • Long-poll connections: Existing long-poll connections on active shapes continue working normally during the read-only window.

If you're using a custom client, ensure it handles these response codes. See the HTTP API docs for details on the protocol.

Next steps