The Problem With Cloud LLM APIs in Automation Workflows

OpenAI, Anthropic, and Google charge per token. At small scale, that is manageable. At workflow scale, processing hundreds of support tickets, documents, or events per day, the cost compounds quickly.

There is also the data question. If your workflows handle internal HR records, customer contracts, or proprietary product data, pushing that content through a third-party API is not always acceptable. Even with enterprise data agreements in place, some teams will not send sensitive data to an external model.

Ollama is the practical answer. It is an open-source runtime that runs LLMs locally: Llama 3.2, Mistral, Gemma 2, Phi-3, and Qwen 2.5, among others. You run it on your own machine, a private VPS, or inside a Docker container. The REST API it exposes is simple, one endpoint for completions, another for embeddings, which makes it a direct drop-in for n8n's HTTP Request node.

The result: AI-powered workflows with zero per-token cost and no data leaving your infrastructure.

What You Need Before Starting

Ollama installed on a machine your n8n instance can reach, same server, local network, or a private VPS. Download it from ollama.com. On Linux, installation is a single shell command.
A model pulled. Run ollama pull llama3.2 or ollama pull mistral to download your model of choice. Llama 3.2 3B works well on modest hardware; Mistral 7B is stronger for general-purpose tasks if you have more RAM.
n8n running, self-hosted or n8n Cloud. Self-hosted n8n on the same server as Ollama is the simplest setup for local inference.

No API keys required.

How n8n Connects to Ollama

Ollama runs a local HTTP server on port 11434. Its completion endpoint accepts a JSON body with three fields: the model name, your prompt, and stream set to false.

In n8n, you make this call using the HTTP Request node:

Method: POST
URL: http://localhost:11434/api/generate (replace localhost with your VPS IP if Ollama runs remotely)
Body type: JSON
Body fields: model (e.g., "llama3.2"), prompt (your instruction plus dynamic data using n8n expressions), and "stream": false

The response includes a response field containing the model's output as a plain string. Use a Set node immediately after to extract and clean this value before passing it downstream.

That is the entire integration. No credentials to configure, no SDK to install, no authentication headers.

Example n8n Workflow: Classify Incoming Support Tickets

Here is a complete workflow that uses Ollama to triage support tickets and route them to the appropriate Slack channel.

Step 1: Webhook Trigger node Configure a Webhook node to receive incoming ticket data, from a Typeform submission, a Zendesk webhook, or a custom form. The node outputs the ticket body, submitter email, and a ticket ID.

Step 2: HTTP Request node (Ollama) POST to your local Ollama instance. The prompt instructs the model to read the ticket body and respond with exactly one word: billing, technical, or general. Set stream to false so the full response arrives in one JSON object.

Step 3: Set node Extract the classification from the response field. Use an n8n expression to trim whitespace and lowercase the value so it matches cleanly in the next step.

Step 4: Switch node Branch the workflow on the classification value:

billing → branch A
technical → branch B
All others → branch C (fallback)

Step 5: Slack node (one per branch) Each branch posts a formatted message to the appropriate channel: #billing-support, #technical-support, or #general-support. Include the ticket body and submitter email in the message.

Step 6: Google Sheets node (optional) Append each classified ticket to a tracking spreadsheet, timestamp, category, submitter email, and ticket ID, for later review.

Total build time: under 30 minutes. Token cost: zero.

More Workflow Patterns With Ollama

Once Ollama is connected to n8n, the same HTTP Request mechanic applies to any workflow that needs language understanding.

Document summarization: Trigger on a new file in Google Drive. Read the file contents with the Google Drive node. Send the text to Ollama with a summarization prompt. Write the output to a Notion page or an Airtable record using the appropriate output node.

Email intent routing: Trigger on new Gmail messages using the Gmail Trigger node. Send the email body to Ollama and ask it to classify intent: demo request, support issue, billing question, or partnership inquiry. Route each category to a different team or CRM pipeline using a Switch node.

Data extraction from unstructured input: Receive a contract or free-text form response via Webhook. Prompt Ollama to extract structured fields, name, company, budget, timeline, and return them in a predictable format. Parse the output in a Set node and write the fields to HubSpot or Airtable.

Content moderation: Trigger on new user-submitted content. Send the text to Ollama with a moderation prompt. Flag responses that match problematic patterns and route them to a manual review queue via an IF node.

Each pattern uses the same core mechanic: HTTP Request → Ollama → Set node → Switch or IF node → downstream action.

Choosing a Model

Llama 3.2 3B: Fast, low memory requirement (4 GB RAM). Best for classification and extraction tasks where speed matters more than nuance.

Mistral 7B / Mistral Nemo: Better reasoning and stronger instruction-following. The right general-purpose choice if you have 8+ GB of RAM available.

Qwen 2.5 7B: Strong multilingual support, useful if your workflows process non-English content.

Gemma 2 2B: Best performance on very constrained hardware; works well on a small VPS with 4–6 GB RAM.

For most n8n automation workflows, Llama 3.2 3B or Mistral 7B will handle the majority of classification, extraction, and summarization tasks without requiring a GPU.

Practical Notes

Concurrency: Ollama processes one request at a time by default. For high-volume workflows, use n8n's built-in concurrency settings or add a Wait node between batch items to avoid queue buildup.

Local vs. VPS: A small VPS with 8 GB RAM (typically $10–20/month) is often more practical than relying on a local development machine. Firewall port 11434 and restrict access to your n8n server's IP only.

Streaming: Always set "stream": false in the Ollama request body. Streaming mode returns newline-delimited JSON fragments that require additional parsing, unnecessary for workflow automation.

Prompt design: Keep prompts short and direct. Instruction-tuned models respond best to explicit instructions with a defined output format. For classification tasks, list the exact labels you expect and instruct the model to return only one of them. This eliminates the need for fuzzy matching downstream.

What This Unlocks

Local LLM inference removes the two blockers that stop most teams from adding AI to internal automation workflows: per-token cost and data residency. For n8n users running high-volume workflows on sensitive data, Ollama is the path to AI-powered automation without a cloud API dependency.

For workflows that need to stay on cloud APIs, or that mix local and cloud inference, the n8n AI agent templates on n8nresources.dev cover a range of LLM-backed workflow patterns. The full template library at n8nresources.dev/templates includes classifiers, extractors, and routers that map directly to the Ollama patterns above.

n8n + Ollama: Run AI Without API Costs