Deploying n8n to scrape data into Google Sheets

If you need a recurring scrape that lands in a Google Sheet, n8n is hard to beat. Visual builder, first-party Google Sheets integration, and when the client owns the n8n Cloud account, the whole thing keeps running long after the build is done. Here's the setup we use.

Why n8n

A few things make it the default for this kind of job:

Visual workflows that are still version-controllable as JSON
Built-in nodes for HTTP requests, HTML parsing, and Google Sheets
Easy to schedule with a cron trigger
Cheaper per execution than Zapier or Make at any real volume

If you're comparing it to those two, the trade is a slightly steeper learning curve in exchange for lower cost and far more flexibility.

Why have the client host it

We don't run n8n for clients. The client signs up for n8n Cloud on their own account and invites us in as a collaborator. Reasons:

They own the data and the workflows. Nothing is locked behind our infrastructure.
No bus factor. If they ever want to part ways, the automation keeps running. Same login, same workflows, same data.
No middleman billing. They pay n8n directly. We don't mark up infra.
Cleaner permissions. Their Google account connects to their n8n. Their data never touches ours.

It's a small thing that builds a lot of trust. Most agencies host everything themselves so the client is stuck. We do the opposite on purpose.

Setting it up

The flow with a new client looks like this:

They create an n8n Cloud account at n8n.io. Starter plan is fine for most jobs.
They invite us as a member of their workspace.
We build the workflow inside their account, connected to their Google Sheets via OAuth on their side.
We hand off a short doc explaining what the workflow does, how to pause it, and what to do if it errors.

Total setup: under an hour for a straightforward scrape.

The workflow shape

Almost every "scrape to sheets" job follows the same four-node shape:

Schedule Trigger. Daily, hourly, whatever cadence you need.
HTTP Request. Fetch the page. Set a real User-Agent header so you don't get blocked instantly.
HTML node. Use CSS selectors to pull out the fields you care about. Output is a clean JSON array.
Google Sheets node. Append rows to the client's sheet. Authenticated via their Google account.

That's the whole thing for a static page. About 15 minutes of work once the account is set up.

Gotchas worth knowing

A few things we learned the painful way:

JS-rendered pages won't work with the basic HTTP node. You'll need a community node or to find the underlying API the page is calling (often easier).
Rate limits. Add a Wait node between requests if you're hitting a single domain repeatedly. Polite scraping keeps you unblocked.
Error handling. Wire up an Error Trigger workflow that pings the client in Slack or email when a run fails. Otherwise they find out three weeks later when their sheet has a gap.
Deduplication. Use a row ID or a hash of the row to avoid appending duplicates on every run. The Google Sheets node has an "upsert" mode for this.

When to reach for something else

n8n is great for jobs that fit on one screen. If you're scraping millions of pages, doing serious parsing, or need real distributed execution, write a Python script and run it on a schedule. n8n shines in the "small but recurring" zone, which is most of what an operations team actually needs.