Upgrading
How to upgrade the Electric sync engine with minimal disruption using rolling deployments. This guide covers two deployment scenarios: shared storage (recommended) and separate storage for ephemeral environments.
Before reading this guide, make sure you're familiar with the Deployment guide for general setup.
Overview
Electric is designed to run as a single active instance per replication stream. It uses a PostgreSQL advisory lock — a cooperative lock used for application-level coordination that does not lock any tables or rows — to ensure only one instance actively replicates from Postgres at a time.
When you deploy a new version:
- The new instance starts and loads shape metadata from storage
- While the old instance holds the lock, the new instance enters read-only mode — it can serve requests for existing shapes but cannot create new ones
- Once the old instance shuts down, its database connection drops and the lock is released
- The new instance acquires the lock and becomes fully active
Time ────────────────────────────────────────────►
Old [==== active (200) ====]--shutdown--X
lock released─┐
New [starting][waiting (202)]───────────┴─[== active ==]
│ │ │
loading serves existing fully operational
metadata shapes (read-only)The read-only window is typically brief — a few seconds to under a minute, depending on how quickly your orchestrator terminates the old instance. During this window, existing shapes continue to be served. Requests for new shapes return 503 with a Retry-After header until the new instance becomes active. The official TypeScript client handles both of these automatically.
Version compatibility
Shape handle stability across deploys depends on Electric's internal shape identity computation not changing between versions. If a new version changes how shapes are identified or changes the storage schema, even shared-storage upgrades may trigger 409 (must-refetch) responses. Check the release notes for any such breaking changes before upgrading.
Choosing a strategy
| Shared storage | Separate storage | |
|---|---|---|
| Client disruption | Minimal (new shapes briefly delayed) | 409s (clients must refetch shapes) |
| Sticky sessions required | No | Yes |
| Postgres overhead | Single slot | One slot per instance |
| Best for | Most deployments | Ephemeral environments |
How the advisory lock works
The advisory lock is tied to the replication slot name:
SELECT pg_advisory_lock(hashtext('electric_slot_{stream_id}'))This lock is scoped to Electric's replication slot name and does not conflict with any other advisory locks or table-level locks in your database.
- Only one instance can hold the lock per
ELECTRIC_REPLICATION_STREAM_ID - The lock is held on the replication database connection — if the connection drops (e.g., instance shutdown), the lock is automatically released
Lock breaker
Electric includes a lock breaker mechanism that checks every 10 seconds whether the replication slot associated with the lock is inactive in Postgres. If the slot is inactive but a backend still holds the advisory lock, Electric terminates that backend. This only affects connections where the replication stream has already stopped, so it will not interfere with a healthy instance during a normal rolling deploy.
Health check behavior during upgrades
The /v1/health endpoint reflects the instance's current state:
| HTTP Status | Response | Meaning |
|---|---|---|
200 | {"status": "active"} | The instance is active — it holds the advisory lock and is fully operational |
202 | {"status": "waiting"} | The instance is ready — it can serve existing shapes in read-only mode but is not yet active |
202 | {"status": "starting"} | The instance is starting up and not yet ready to serve any requests |
During the waiting state:
- Requests for existing shapes are served normally (read-only mode)
- Requests that require creating new shapes return
503with aRetry-After: 5header - Shape deletion also requires active mode and returns
503while waiting
For orchestrator probe configuration, see the health check section below.
Shared storage (recommended)
When instances share the same filesystem (e.g., a persistent volume), they share shape data and metadata. This is the recommended approach because shape handles remain stable across deploys — clients don't need sticky sessions and experience minimal disruption.
When to use
- Kubernetes with ReadWriteMany PersistentVolumeClaims
- AWS ECS on EC2 with shared host volumes (use placement constraints to keep tasks on the same host)
- Any platform where both instances can access the same filesystem
Network filesystems and performance
Electric is IO-intensive — it reads and writes shape logs and metadata frequently. Network filesystems like EFS or NFS add significant latency compared to local storage and may not perform well for large deployments. Prefer local volumes (e.g., NVMe SSDs on EC2 with host bind mounts) where possible. If you must use a network filesystem, see the troubleshooting guide for important SQLite configuration.
Configuration
Both instances use identical configuration. The key requirement is that ELECTRIC_STORAGE_DIR points to a shared filesystem:
DATABASE_URL=postgresql://user:password@host:5432/mydb
ELECTRIC_STORAGE_DIR=/shared/electric/data
ELECTRIC_SECRET=your-secret`ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE` for shared storage
When using a network filesystem (NFS, EFS) for shared storage, you must set ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE=true. This configures SQLite to use a single read-write connection, preventing corruption from concurrent access — SQLite's default WAL mode relies on shared-memory locking that does not work correctly on network filesystems. For local shared volumes (e.g., a K8s PVC backed by local SSD), this setting is not strictly required but is recommended as a safe default. It is included in all shared-storage examples below.
Docker Compose example
This example demonstrates the shared-storage setup. In practice, your orchestrator handles starting and stopping instances during an upgrade.
services:
electric:
image: electricsql/electric:0.9 # pin to a specific version
environment:
DATABASE_URL: ${DATABASE_URL}
ELECTRIC_STORAGE_DIR: /var/lib/electric/data
ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE: "true"
ELECTRIC_SECRET: ${ELECTRIC_SECRET}
volumes:
- electric_data:/var/lib/electric/data
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:3000/v1/health"]
interval: 10s
timeout: 2s
retries: 3
# ...ports, networks, etc.
volumes:
electric_data:Simulating a rolling deploy
To test the lock handover locally, start a second container pointing at the same volume, then stop the first. Note that docker compose --scale requires removing static port mappings or using a port range to avoid conflicts.
Kubernetes example
apiVersion: apps/v1
kind: Deployment
metadata:
name: electric
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
# ...labels, selectors
spec:
terminationGracePeriodSeconds: 60
containers:
- name: electric
image: electricsql/electric:0.9 # pin to a specific version
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: electric-secrets
key: database-url
- name: ELECTRIC_STORAGE_DIR
value: "/var/lib/electric/data"
- name: ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE
value: "true"
- name: ELECTRIC_SECRET
valueFrom:
secretKeyRef:
name: electric-secrets
key: electric-secret
volumeMounts:
- name: electric-storage
mountPath: /var/lib/electric/data
resources:
requests:
cpu: "500m"
memory: "512Mi"
# ...limits
livenessProbe:
httpGet:
path: /v1/health
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 6
readinessProbe:
httpGet:
path: /v1/health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 2
failureThreshold: 3
volumes:
- name: electric-storage
persistentVolumeClaim:
claimName: electric-shared-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: electric-shared-pvc
spec:
accessModes:
- ReadWriteMany
# storageClassName: efs-sc # use a storage class that supports RWX
resources:
requests:
storage: 10GiWith maxSurge: 1 and maxUnavailable: 0, Kubernetes will:
- Start a new pod alongside the existing one
- The new pod enters read-only mode (
202"waiting") and passes the readiness probe (any 2xx) - Kubernetes terminates the old pod
- The old pod shuts down, releasing the advisory lock
- The new pod acquires the lock and becomes fully active (
200)
AWS ECS example
This example uses EC2 launch type with a host bind mount for shared storage. Both old and new tasks share the same directory on the EC2 host.
Same-host placement
ECS does not guarantee that the new task lands on the same host as the old one. To ensure both tasks share the same host volume, your ECS cluster must have exactly one EC2 instance matching your placement constraint, or use a custom instance attribute to pin tasks to a specific host.
{
"family": "electric",
"networkMode": "awsvpc",
"cpu": "1024",
"memory": "2048",
"executionRoleArn": "arn:aws:iam::...:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "electric",
"image": "electricsql/electric:0.9",
"portMappings": [
{ "containerPort": 3000, "protocol": "tcp" }
],
"environment": [
{ "name": "ELECTRIC_STORAGE_DIR", "value": "/var/lib/electric/data" },
{ "name": "ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE", "value": "true" }
],
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:..."
},
{
"name": "ELECTRIC_SECRET",
"valueFrom": "arn:aws:secretsmanager:..."
}
],
"mountPoints": [
{ "sourceVolume": "electric-data", "containerPath": "/var/lib/electric/data" }
],
"healthCheck": {
"command": ["CMD-SHELL", "curl -sf http://localhost:3000/v1/health || exit 1"],
"interval": 10,
"timeout": 2,
"retries": 3,
"startPeriod": 60
}
}
],
"volumes": [
{
"name": "electric-data",
"host": { "sourcePath": "/var/lib/electric/data" }
}
]
}Configure your ECS service for rolling upgrades:
{
"deploymentConfiguration": {
"minimumHealthyPercent": 100,
"maximumPercent": 200
}
}This ensures ECS starts the new task before stopping the old one, allowing the advisory lock handover to occur. Set the health check grace period on your ECS service to 60–90 seconds to allow time for the new task to acquire the advisory lock.
Health checks must accept HTTP 202
Your orchestrator's health or readiness check must accept 202 responses during upgrades. If it only considers 200 as healthy, the new instance can never become ready while the old instance holds the lock — creating a deadlock where the orchestrator waits for the new instance before terminating the old one.
Both Kubernetes httpGet probes and ECS health checks using curl -sf accept any 2xx by default, which is the correct behavior for rolling upgrades.
Single-instance readiness probes
The Deployment guide recommends an exec readiness probe that checks for exactly HTTP 200. That approach is correct for single-instance deployments where you don't want a starting instance to receive traffic, but it will deadlock during rolling upgrades. If you are performing rolling upgrades, use httpGet readiness probes as shown in the examples above.
Separate storage (ephemeral)
When shared storage is not available (e.g., ECS with ephemeral block storage, containers with local-only disks), each instance must have its own replication slot and maintains its own shape data independently. This means each instance has different shape handles for the same shape definitions, so clients must use sticky sessions and will receive 409 (must-refetch) responses when they switch between instances during a deploy.
The platform examples from the shared storage section above apply — just remove the shared volume mount and use the configuration shown here.
There are two ways to manage the per-instance replication slots:
Temporary replication slots
Use temporary replication slots that are automatically cleaned up when the connection closes. This is the simplest approach for ephemeral storage and avoids accumulating orphaned slots.
CLEANUP_REPLICATION_SLOTS_ON_SHUTDOWN=true
ELECTRIC_TEMPORARY_REPLICATION_SLOT_USE_RANDOM_NAME=true
ELECTRIC_STORAGE_DIR=/local/electric/dataThe random name option avoids replication slot name conflicts when old and new instances briefly overlap during a rolling upgrade.
With this configuration:
- Electric creates a
TEMPORARYreplication slot on the database connection - The slot is automatically dropped by Postgres when the connection closes (on clean shutdown or crash)
- The new instance creates a fresh temporary slot and starts replicating
Network partitions cause shape rotations
If Electric crashes or loses its database connection unexpectedly, the temporary slot is eventually cleaned up by Postgres once it detects the dead connection (which depends on TCP keepalive settings and may take minutes). When the new instance starts with a fresh slot, all existing shapes are invalidated and clients receive 409 (must-refetch) responses requiring a full resync. See Replication slot recreation in the troubleshooting guide for more details.
See the config reference for CLEANUP_REPLICATION_SLOTS_ON_SHUTDOWN and ELECTRIC_TEMPORARY_REPLICATION_SLOT_USE_RANDOM_NAME.
Separate replication stream IDs
Alternatively, give each concurrent instance its own ELECTRIC_REPLICATION_STREAM_ID. This creates named replication slots that persist, giving you more explicit control. This is different from sharding, where separate stream IDs are used for instances connecting to different databases — here, both instances connect to the same database.
# Instance A (e.g., blue deployment)
ELECTRIC_REPLICATION_STREAM_ID=deploy-blue
ELECTRIC_STORAGE_DIR=/local/electric/data
# Instance B (e.g., green deployment)
ELECTRIC_REPLICATION_STREAM_ID=deploy-green
ELECTRIC_STORAGE_DIR=/local/electric/dataPostgres resource overhead
Each replication stream ID creates its own replication slot and publication. Multiple replication slots increase WAL retention on Postgres since each slot independently prevents WAL from being cleaned up.
Monitor your replication slots as described in the Troubleshooting guide. Clean up unused slots promptly when old instances are fully decommissioned.
When the old deployment is fully stopped, clean up its replication slot and publication in Postgres. The names follow the pattern electric_slot_{stream_id} and electric_publication_{stream_id}:
SELECT pg_drop_replication_slot('electric_slot_deploy_blue');
DROP PUBLICATION IF EXISTS electric_publication_deploy_blue;Client behavior during deploys
The official TypeScript client handles deploy transitions automatically:
503withRetry-Afterheader: The client backs off and retries. This happens when requesting new shapes during the read-only window.409(must-refetch): The client refetches the shape from scratch. This happens with separate-storage strategies or when shapes are rotated.- Long-poll connections: Existing long-poll connections on active shapes continue working normally during the read-only window.
If you're using a custom client, ensure it handles these response codes. See the HTTP API docs for details on the protocol.
Next steps
- Deployment guide for general deployment setup
- Sharding guide for multi-database deployment patterns
- Config reference for all configuration options
- Troubleshooting guide for common upgrade issues