ServicesARN1BOM1BRU1CDG1CHS1CLE1DUB1GRU1HEL1HKG1HND1ICN1IAD1LAX1LHR1OMA1PDX1SFO1SIN1SYD1TPE1YUL1ZRH1

Load Balancers

Physical Nodes

Deployments

Database

Logs

Routing (Ingress)

Routing (Egress)

Routing (Internal)

API

Control Panel

Community

DNS (zeit.world)

Monitoring

Domain Registration

Elevated error rates on deployment creation.

19:16 UTC

All systems normal.

GLOBAL

Elevated error rates on deployment creation.

12:00 UTC

We are observing sporadic errors on deployment creation due to an upstream provider issue.

All

Maintenance on the deployment list endpoint.

15:07 UTC

Performing maintenance on the deployment list endpoint. Some `now list` results might be paginated incorrectly.

All

Upstream issues with our service provider, Google Cloud, in LAX and YUL. Rerouting in effect.

22:50 UTC

Google Cloud has resolved their incident and, after evaluating the health of our edge regions hosted on their infrastructure, we have brought YUL and LAX back to the pool of production Now Regions.

19:40 UTC

We are still monitoring the Google Cloud issues. Our users are not affected due to rerouting mitigations. For more information on the upstream issue, see their status report: https://status.cloud.google.com/incident/compute/19003

19:00 UTC

We have identified issues with the Google Cloud infrastructure in the LAX and YUL Now Edge Regions. We have automatically re-routed traffic away from them as a mitigation.

LAX1
YUL1

Increased error rates for lambdas in Now 2.0 for all regions

18:55 UTC

The issue causing increased error rates for Now 2.0 lambdas has been resolved. Continuing to monitor

18:30 UTC

Seeing increased error rates for Now 2.0 lambdas, for a small portion of users. Investigating

GLOBAL

Outage of ZEIT API (api.zeit.co) in GRU1, IAD1, and HND1

21:05 UTC

A failover has been completed and the issue is mitigated. ZEIT API access is restored. Continuing to monitor.

20:40 UTC

Investigating an outage of the ZEIT API (api.zeit.co) in the GRU1, IAD1, and HND1 regions.

GRU1
IAD1
HND1

Sporadic errors in DNS resolution for the SFO1 region.

21:50 UTC

The sporadic DNS resolution errors (upstream) have been resolved (SFO1). We are monitoring.

21:35 UTC

Investigating sporadic errors in DNS resolution for SFO1.

SFO1

Elevated build times for Now 2.0 deployments in all regions.

20:25 UTC

Build times are normalizing as we accommodate an increase in new builds.

20:15 UTC

Investigating elevated build times for Now 2.0 deployments in all regions.

SFO1
BRU1
IAD1
GRU1

Elevated error rates in Now 2.0 for function invocation.

03:28 UTC

Full function invocation capacity is being restored. Monitoring.

01:46 UTC

Investigating elevated error rates for function invocation with Now 2.0 deployments.

SFO1
BRU1
IAD1
GRU1

Outage affecting customers that use Cloudflare nameservers in the SFO region.

03:30 UTC

The remaining IPv6 routing issues between Cloudflare and ZEIT DNS have been resolved. Continuing to monitor.

22:20 UTC

We are working closely with our DNS provider and Cloudflare on investigating connectivity and peering issues. This only impacts customers who use Cloudflare as their nameservers.

20:26 UTC

Cloudflare is experiencing DNS resolution errors. If you have pointed Cloudflare to `alias.zeit.co` manually, you can move your domain to our nameservers to restore availability.

20:09 UTC

Disabling Cloudflare integrations temporarily to restore availability of deployments in the affected regions.

19:00 UTC

Investigating an outage impacting customers using Cloudflare nameservers in the SFO region.

Loss of availability for `now.sh` URLs. Does not affect domains.

14:15 UTC

Issue with the upstream provider has been resolved. Continuing to monitor.

13:30 UTC

Experiencing a loss of availability for `now.sh` URLs due to an issue with an upstream provider. Working to resolve.

BRU1
SFO1
IAD1
GRU1

Elevated routing latency and errors

06:27 UTC

Routing availability restored on all regions. Continuing to monitor.

05:59 UTC

A service in charge of managing networking blacklists had an unexpected misbehaviour that prevented routing from functioning correctly.

BRU1
SFO1
IAD1
GRU1
POST-MORTEM →

Intermittent logs availability for Now 2.0 deployments

01:45 UTC

Logging availability restored on all regions for Now 2.0 deployments. Continuing to monitor.

20:30 UTC

Investigating intermittent loss of logs availability for Now 2.0 deployments.

BRU1
SFO1
IAD1
GRU1

Caching issue for Now 1.0 deployment in BRU1

14:35 UTC

Caching restored for Now 1.0 deployments in BRU1. Continuing to monitor.

14:00 UTC

Deploying fix for caching issue for Now 1.0 deployments in BRU1.

10:30 UTC

Caching issue for Now 1.0 deployments in BRU1 occurred.

Elevated errors for deployment builds access for Now 2.0

17:10 UTC

Deployment builds in full working order. Continuing to monitor.

15:40 UTC

Investigating deployment build errors for Now 2.0 on BRU1.

BRU1

Elevated errors for deployment access for Now 1.0

19:10 UTC

All errors have resolved. Continuing to monitor.

18:50 UTC

SFO1 errors have resolved. We are continuing to monitor mitigations in BRU1.

15:40 UTC

Mitigations for hardware issues have been deployed. Continuing to monitor and investigate for SFO1 and BRU1.

15:15 UTC

Investigating deployment access errors for Now 1.0 on SFO1 and BRU1.

SFO1
BRU1

Elevated errors for deployment creation for Now 2.0

12:30 UTC

Deployment creation is restored fully in SFO1 for Now 2.0 deployments.

12:00 UTC

Investigating deployment creation errors for Now 2.0 on SFO1.

SFO1

System Capacity Reached in BRU1

11:00 UTC

New server capacity deployed and services restored.

10:30 UTC

System capacity reached in BRU1 due to some nodes failing in the region.

BRU1

Investigating reported instance verification issues

06:00 UTC

Service restored. Continuing to monitor.

05:00 UTC

We investigating elevated reports of instance verification issues. Existing deployments will remain unaffected. We continue investigating the situation.

BRU1
SFO1

Issues with an Upstream Domain Provider

12:05 UTC

One of our upstream providers for domains is experiencing issues, this impacts domain purchases. Existing domains will remain unaffected. We continue monitoring the situation.

BRU1
SFO1
IAD1
GRU1

Investigating Issues with an Upstream DNS Provider

11:01 UTC

The availability of `now.sh` links has been restored by our upstream registrar. Continuing to monitor.

10:57 UTC

The availability of `now.sh` links is being restored as our upstream registrar resolves the issue behind the DNS failures.

08:39 UTC

We have identified the issue behind the DNS failure for `now.sh` links (does not affect custom domains) and are working with our upstream provider to resolve it.

08:00 UTC

We are investigating issues with an upstream DNS provider impacting the `now.sh` domain.

Investigating Issues with an Upstream Infrastructure Provider

19:44 UTC

The issue with the upstream provider has been resolved. Continuing to monitor.

19:22 UTC

The issue with the upstream provider has reoccurred. We are aware of the cause and working on a solution.

08:00 UTC

The issue impacting one of our global infrastructure providers has been resolved. Full API access has been restored. Live deployments were not affected and we are conducting an in-depth investigation.

06:53 UTC

We are investigating issues with an upstream infrastructure provider impacting API calls.

Elevated Routing Times

10:10 UTC

Service restored. Continuing to monitor.

09:24 UTC

Investigating elevated routing times for all DCs.

BRU1
SFO1

Service degradation for Static Deployments in BRU1

14:42 UTC

Availability for static deployments restored. Continuing to monitor.

14:12 UTC

Investigating reduced availability for static deployments.

BRU1

Service degradation in SFO1

16:33 UTC

Service is fully restored. Continuing to monitor the system.

12:45 UTC

Progress made to restore availability.

06:40 UTC

Restoring API availability for deployment creation and scaling.

06:10 UTC

Progressing towards complete failover.

05:15 UTC

Failed over affected deployments and progress made towards restoring full capacity and reducing routing latency.

05:12 UTC

Investigating degraded availability.

APIs and abilities for domain registering and adding are degraded due to upstream issues

21:45 UTC

Upstream availability restored.

18:13 UTC

We are monitoring the situation.

GLOBAL

APIs for domain pricing and registering are degraded due to upstream issues

15:40 UTC

All systems normal.

15:33 UTC

We are monitoring the situation.

GLOBAL

Elevated error rates in SFO1 load balancers

00:51 UTC

We are currently mitigating another DOS attack against our SFO1 frontend load balancers. We will keep you updated.

00:42 UTC

We are currently investigating elevated error rates in SFO1 load balancers.

SFO1

Elevated error rates in SFO1 as a result of a DOS attack

20:11 UTC

We are seeing elevated error rates in SFO1 as a result of a DOS attack targeting all our load balancers.

20:02 UTC

We are seeing elevated error rates in SFO1 as a result of a DOS attack targeting all our load balancers.

SFO1

Intermittent Failures on the SFO1 Load Balancers

19:08

We continue to observe sizable attacks against our core infrastructure. We are working on deploying further mitigations. We apologize for the inconvenience.

10:02

All systems normal. We will be posting a post-mortem on our deployed mitigations in coming days.

SFO1

Intermittent networking issues

17:05 UTC

Our upstream provider has stopped reporting networking issues. We are actively monitoring our SFO1 load balancers.

16:51 UTC

One of our infrastructure providers in US-West-1 experienced issues across all availability zones impacting one of our replicated caches. This degraded our responses for a period lasting a few minutes. We are monitoring the situation.

16:38 UTC

We are experiencing intermittent networking issues with one of our US-WEST infrastructure providers that is causing elevated 500 error rates.

SFO1

Investigating intermittent API degradation

13:02 UTC

All systems normal.

11:06 UTC

The API errors have been mitigated. We're monitoring the situation.

10:19 UTC

We are investigating intermittent API degradation

SFO1

Increased error rate on dashboard

09:52 UTC

All systems normal

07:46 UTC

We are investigating an increased error rate on the zeit.co dashboard.

GLOBAL

Increased Error Rate on Deployment Creation

22:31 UTC

Increased error rate on deployment creation.

GLOBAL

We're looking into access issues to deployments logs

4:37 UTC

Our logging infrastructure has been scaled and enhanced to tolerate high-load and high-stress scenarios. The intermittent issues experienced with deployment logging have been resolved.

17:00 UTC

We're looking into access issues to deployments logs.

GLOBAL

We are seeing elevated error rates in new Docker deployments.

17:47 UTC

New Docker deployment stability has been restored. We are monitoring.

17:32 UTC

We are seeing elevated error rates in new Docker deployments. Investigating.

GLOBAL

OSS Docker Deployments Are Experiencing Degraded Internet Access

1:12 UTC

The issue was identified and corrected. Full networking capabilities are restored for all affected OSS deployments. We will continue to monitor the affected cluster instances.

SFO1

Elevated error rates on new deployments

2:33 UTC

The API endpoints for deployment creation are experiencing elevated error rates due to upstream database issues

SFO1

API availability is degraded on some endpoints

19:49 PDT

All systems normal.

17:02 PDT

We have identified the root problem and have deployed a fix. We are monitoring the affected services.

16:27 PDT

We are seeing intermittent errors in response to the deployment list endpoint.

SFO1

Increased error rate new deployments and unfreezes

04:56 PDT

All systems normal

02:14 PDT

We are investigating an increased error rate for new deployments and unfreezes.

Elevated 500 errors on new deployments

16:50 PDT

Everything is operating normally.

11:40 PDT

We are investigating elevated error rates in new deployments

SFO1

Load balancers are experiencing a partial outage

14:50 PDT

We experienced an abnormal behavior across all load balancer instances. We quickly failed over the traffic and the outage was under control in under 5 minutes.

14:45 PDT

Our load balancers are experiencing a partial outage. Traffic is degraded. Nodes are recovering.

Short-term loss of availability in single load balancer

13:29 PDT

One of our load balancer nodes experienced a short-term loss of availability of around 60 seconds. A small percentage of requests were affected. It self-healed and new requests should be working correctly.

SFO1

Elevated error rate for dependency installation resolved

03:55 PDT

GLOBAL

Elevated error rate for dependency installation with NPM deployments

03:51 PDT

We are investigating an increased error rate with NPM dependency installation.

GLOBAL

Restored Domain Name Registration

Domain Name Purchasing is back at full capacity.

GLOBAL

Degraded Domain Name Registration

Domain Name Purchasing is temporarily throttled due to very high traffic.

GLOBAL

Restored service on '/domains/price' API

The upstream issue was solved. The endpoint is fully operational.

GLOBAL

Degraded service on '/domains/price' API

Due to an upstream issue, we're seeing an increased error rate at the /domains/price endpoint.

GLOBAL

Added details about a recent outage

We have added a post-mortem report on a recent major outage that impacted the SFO1 datacenter on Dec 28, 2017.

Partial loss of package availability

11:46 PDT

We're seeing failed Node deployments due to a partial loss of package availability with the npm registry. Please see https://t.co/YmVgZIROjx for updates to this incident.

SFO1

Intermittent Node Reboots

16:16 PDT

We are experiencing intermittent rebooting of our production machines by our upstream cloud providers.

We are currently investigating a definitive cause, but we suspect it is due to the recently disclosed Intel CPU bug and the mitigations and patches being rolled out to the host operating systems by our VPC providers.

Most deployments should see minimal downtime, if any. We are not considering this an outage as our systems are working as intended, moving deployments in the event they become unavailable.

The situation is being closely monitored and we will remain on alert until we are certain the reboot period has ended.

SFO1

Service degradation (SFO1)

19:23 PDT

Core services and user deployments are back at 100% availability. We are actively monitoring the system health, and are curating a post-mortem to be released soon.

18:50 PDT

We are closely monitoring all activity to ensure integrity and stability system-wide, and will continue to do so in the coming days.

16:33 PDT

Service is quickly approaching complete recovery. We are double checking the integrity of all core services.

16:07 PDT

Deployment availability is completely restored. We are working on restoring load balancers.

11:47 PDT

More deployments have been fully restored. We continue to make steady progress bringing all affected resources online.

10:38 PDT

Our whole team is focused on restoring availability of all affected deployments.

8:42 PDT

We continue to make progress restoring cluster capacity.

5:54 PDT

We're in the process of adding more capacity to our infrastructure. Our existing systems are recovering.

4:48 PDT

We will update this status page when new information becomes available.

4:25 PDT

We have detected an internal connection issue with our Europe based API which could be reflected on errors while trying to unfreeze or scale a deployment.

Degraded API / Unfreeze performance

13:34 PDT

We have detected an increased number of errors, as a result of this API and Unfreeze performance is degraded. Live instances are not affected.

We will update this status page as new information becomes available.

SFO1

API service degraded in Europe

13:10 PDT

We have detected an internal connection issue with our Europe based API which could be reflected on errors while trying to unfreeze or scale a deployment.

We will update this status page when new information becomes available.

SFO1

Unfreezing Disabled

21:37 PDT

We have detected an internal problem related to unfreezing new and existing deployments and have disabled them temporarily.

SFO1

Upstream Logs Degradation

16:09 PDT

The logs store has recovered. We will continue to monitor for any further loss of availability.

This concludes this incident.

15:46 PDT

We are experiencing an upstream outage with our logs store. The impact or severity is currently unknown.

Due to this, logs may be dropped and the system may appear intermittently read-only.

We will continue to monitor and update the status page as more information becomes available.

SFO1

System Capacity Reached

9:14 PDT

Capacity has been increased, unfreezing has been re-enabled and the deployment queue has been drained.

All systems normal. This concludes the incident.

8:10 PDT

We have reached system-wide capacity in our SFO cluster. We are currently working to increase the capacity of our physical nodes.

This will temporarily affect unfreezes and all new deployments.

We will post back here when capacity has been increased and new deployments have retroactively been restored.

SFO1

System Capacity Reached

16:25 PDT

Capacity has been increased, unfreezing has been re-enabled and the deployment queue has been drained.

All systems normal. This concludes the incident.

15:43 PDT

We have reached system-wide capacity in our SFO cluster. We are currently working to increase the capacity of our physical nodes.

This will temporarily affect unfreezes and all new deployments.

We will post back here when capacity has been increased and new deployments have retroactively been restored.

SFO1

Degradation of Logs for Teams

18:08 (PDT)

The cause of the degradation was an upstream outage that has since been resolved.

This concludes the incident.

17:31 (PDT)

We have detected a problem with an internal service that handles log streams for teams.

We are currently investigating the root cause.

SFO1

OSS Capacity Reached

19:43 (PDT)

Capacity has been increased. This concludes this incident.

19:01 (PDT)

We have reached capacity for our Open Source deployments. We are actively working to increase OSS deployment capacity to meet the spike in demand.

SFO1

VPC Instance Unavailability

18:30 (PDT)

All systems are normal.

This concludes the outage. We will continue to update you on the improvements we are making to prevent similar issues from arising.

17:41 (PDT)

Availability has been restored. We are closely monitoring all service stability for potential regressions or service interruptions.

17:09 (PDT)

We have determined the root cause for the unhealthy nodes and have patched the issue. Unfreeze backlogs are currently being processed, and new deployments for paid accounts will resume within the hour. OSS deployments will follow after the new paid deployment backlog has stabilized.

We encountered several failures, ultimately resulting in what is our largest outage to date:

We will include Lessons Learned in the post-mortem, along with the plethora of fixes we employed both actively (in order to fix the outage) as well as tentatively (to prevent another outage of this nature in the future).

Current system status is still unstable, but improving. We will provide another status update once we re-enable deployment-related API access (new deployments).

15:07 (PDT)

Capacity has been increased to an acceptable level; we have begun processing the backlog for new deployments.

We will re-enable new deployments when we have determined the system has returned to a more stable state.

10:21 (PDT)

Upstream service has lifted connection limits.

07:02 (PDT)

An upstream service began dropping connections due to a misconfigured connection count upper limit, surfacing a bug within our infrastructure that caused a portion of our SFO cluster to become unavailable.

Attempts to flush the backlog have been overloading the system and as such we have been incrementally increasing cluster-wide capacity.

New deployments, as well as unfreezing existing deployments, are currently disabled. Deployments that are currently unfrozen (and that remain unfrozen) should still be available.

SFO1