What is the Best LLM for n8n in 2026 (Real Benchmark Data)

Uploaded: 2026-02-19
This video introduces the NEN AI benchmarks page and API, which help users select the best large language models for their natural language processing workflows by providing rankings based on various performance metrics. It also explains how to use the benchmarks and API to fit specific use cases and workflows.

channel	Ryan & Matt Data Science
top	Engagement_month

Add to wishlistAdded to wishlistRemoved from wishlist 0

Add your review

SKU: nNDCCtUZxa0

Description
Statistics
Additional information

Video demonstrates how to use the NAN AI benchmarks page and API to evaluate 60+ large language models for NE workflows. It walks through the web interface, search and filtering, cost and category metrics, model copying via OpenRouter, and practical limits such as the 65,000-character JSON paste restriction. The presenter also shows API endpoints, rate limits, common errors, and cURL import examples for integrating results into workflows.

– Benchmarks UI: Describe your use case or paste workflow JSON to get ranked model suggestions; copy model names or OpenRouter entries directly for quick integration. Note the 65,000-character paste limit for full workflows.

– Scoring and filters: Scores combine categories like tool use, hallucination, logic, structured output, speed and cost; toggling categories changes rankings. There is no per-category weighting or US/overseas model filter yet.

– Pricing and run cost: Page shows cost per thousand/per million and an average run estimate (based on 5,000 prompt tokens and 500 completion tokens); prompt vs completion tokens affect total cost.

– API details & caveats: Endpoints return top models, single-model details, and recommendations. Demo covers cURL import, rate limits (5 requests/min, 15/hour), a temporary include_results boolean bug, and advice to submit workflow sections rather than entire JSON.

Quotes:

Paste your workflow JSON — but watch out: there’s a 65,000-character limit.

Benchmarks rank models by what matters in NE: tool use, hallucinations, logic, structured output, speed, and cost.

The recommendation API is a little broken right now — it expects a boolean that hates booleans.