facebook killed my site

how I spent a sunday debugging an OOM and found zuckerberg at the other end


I run databakkes.be. free lookup for every company in Belgium. 1.9 million pages. SvelteKit + postgres.

sunday morning. site loads like rock. so I open rock logs.

FATAL ERROR: Ineffective mark-compacts near heap limit
Allocation failed - JavaScript heap out of memory

node process: 2GB heap. dead. GC pauses: 4 seconds. only one page loading. everything else: gone.


step 1: find the symptom

postgres.js was throwing this:

TimeoutNegativeWarning: -93125.98 is a negative number.
Timeout duration was set to 1.

reconnection loop clamped to 1ms. memory goes brrr. but this was just a symptom.


step 2: add logging

added one line to the hooks. logged every request.

GET /nl/0851149363 -> 301 (937ms, ua=meta-externalagent/1.1)
GET /nl/0533845834 -> 301 (902ms, ua=meta-externalagent/1.1)
GET /fr/0599815237 -> 301 (825ms, ua=meta-externalagent/1.1)
GET /fr/0868985980 -> 301 (194ms, ua=meta-externalagent/1.1)

every. single. request. facebook.

meta-externalagent - crawling all 3 language versions simultaneously. ~6 requests per second. 24/7.

and look - 6 req/s is not crazy. this is a 12 core / 48GB machine. shared between postgres, meilisearch, and SvelteKit. it should eat 6 req/s for breakfast. the problem wasn’t the traffic. it was the code.

bing too. google? ignoring my site. thanks google.


step 3: understand why it’s so bad

each company page = 9 database queries + SSR of 1842 lines of svelte.

the slug redirect? at the bottom of the load function.

// first: run ALL 9 queries
const [addresses, denominations, contacts, ...] = await Promise.all([...])

// then: oh wait, wrong slug? throw it all away lol
if (params.slug !== expectedSlug) redirect(301, '...')

facebook hits /nl/0851149363 (no slug). runs 9 queries. redirects. runs 9 queries again. = 18 queries per bot visit. ~6 visits per second.


step 4: it gets worse

connection pool was set to 5.

each page grabs 6+ connections via Promise.all.

two concurrent requests = pool saturated. everything queues.

checked pg_stat_activity:

Active queries: 43
  35x address lookups           -- bot traffic
   1x COUNT(DISTINCT nace...)   -- running for 4 MINUTES

the sync process ran a massive aggregation on every deploy. competing with bots for connections.


the fixes

slug redirect moved up - was at the bottom after 9 queries. moved to the top. redirects: 3-5s -> ~100ms.

connection pool 5 -> 50 - Promise.all grabs 6 connections per request. pool of 5 = instant saturation.

cached 5000 NACE codes - same static data loaded from DB on every request. now a 1-hour in-memory cache.

keyset pagination for sitemaps - was doing OFFSET 390,000 with a correlated subquery. 10s -> 36ms.

sync only at midnight - a 4-minute aggregation query was running on every deploy. competing with bot traffic.

all good improvements. none of them fixed it.

the real problem: node.js is single-threaded. SSR rendering 1842 lines of svelte takes ~600ms of CPU. while that’s happening, the event loop is blocked. every other request just sits there waiting. at 6 req/s, they pile up. total_db climbs from 11ms to 14 seconds - not because postgres is slow, but because node hasn’t gotten around to reading the response yet.

the fix was stupid simple: 3 replicas. three node processes, three event loops, three CPUs doing SSR in parallel.

could have also dropped SSR entirely. but I need it for SEO - the whole point is google indexing these pages. (whenever google decides to actually visit my site.)

so: 3 replicas. done.


the result

# before
total_db = 14458ms    total = 14979ms    OOM every 4 hours

# after
total_db = 11ms       total = 648ms      stable

tldr

javascript slow? add more servers.


thanks for the free load test, zuck.

zuckerberg as a robot spamming databakkes.be

databakkes.be - free Belgian company lookup. 1.9M companies. now with 100ms redirects.

if you’re here because meta-externalagent is destroying your server: you’re not alone. check your logs. move your redirects up. cache your static data. and maybe send zuck the hosting bill. hope you’re not on vercel.

hit me up on X (Twitter) if you have the same problem.