Agentic AI: A developer’s guide to the latest coding assistants and models

April 1, 2025 7 minute read

In the first few months of 2025, coding assistants such as GitHub Copilot have seen major improvements which are set to significantly change the way developers interact with them. These new agentic features offer greater integration with an application codebase but also have the ability to make decisions and execute certain commands.

Overview
GenAI vs coding assistants vs models
What is ‘Agentic AI’?
Why are agentic AI coding assistants so important?
What agentic coding assistants are available?
What are the key differences between agentic coding assistants?
What models are available?
What are the key differences between models?
What are the local options for running a model?

GenAI vs coding assistants vs models

This document will talk about a couple of key terms:

GenAI – high level term for generative AI. These tools can take many formats, including chat-based such as ChatGPT or coding assistants like Copilot.
Coding assistants – a specific type or subset of GenAI tools. These are usually embedded within a developer IDE’s (e.g. VS Code) and are specifically tailored to software development tasks. There are many different coding assistants such as GitHub Copilot or Amazon Q Developer. Coding assistants usually offer a selection of models that can be used.
Models/Large Language Models (LLM) – models are the underlying engine that power a GenAI tool or coding assistant. There are many different models from many different suppliers, including OpenAI GPT-4o and o1-mini, Google Gemini, Meta LLaMA 3.1 and Anthropic Claude 3.7 Sonnet.

What is ‘Agentic AI’?

In late 2024/early 2025, coding assistants began to offer agentic capabilities.

The word agentic has become a bit of a catch-all term within AI however, OpenAI and Anthropic characterise agentic tools as being able to perform complex, multi-step tasks independently and autonomously.

Within coding assistants, the term agentic means being able to:

Make multi-file changes across a codebase, where the output of one file influences the input of another.
Execute commands outside the core code base, e.g. terminal commands, running tests, git commits.

A real-world example would be, asking an agentic tool to implement a piece of functionality that is able to:

Make changes across multiple files
Integrate the changes together
Run tests, determine whether they pass
If tests don’t pass, inspect logs
Make changes to respective files to fix issue
Rerun tests
Commit changes to git

A model doesn’t necessarily make a coding assistant agentic, but some models which specialise in more advanced reasoning logic such as Claude 3.7 Sonnet or OpenAI o1-mini are often well-suited to the tasks of agentic coding assistants.

Why are agentic AI coding assistants so important?

Agentic AI coding assistants are transformative to software development as they have the potential to be much more integrated than standard chat-based tools.

What agentic coding assistants are available?

As of March 2025, there are 3 popular coding assistants:

Cursor Agent Mode - https://docs.cursor.com/chat/agent
GitHub Copilot Agent Mode - https://code.visualstudio.com/blogs/2025/02/24/introducing-copilot-agent-mode. Available in preview, requires VS Code Insiders
Continue - https://www.continue.dev/

There are also offerings such as Windsurf, Cline, Aider, Claude Code, which haven’t been included for the purposes of brevity.

What are the key differences between agentic coding assistants?

Different coding assistants have different core functionality, data privacy and pricing. The following table outlines some of the key differences.

Feature	GitHub Copilot Agent Mode	Cursor	Continue
Provider	GitHub	Anysphere	Continue
Current status	In preview	General availability	General availability
Models available	• OpenAI GPT-4o • Claude 3.5, 3.7 Sonnet	• OpenAI GPT-4o • Claude 3.5, 3.7 Sonnet • Google Gemini 2.5	• OpenAI GPT-4o • Claude 3.5, 3.7 Sonnet • Google Gemini 2.5 Also provides support for Ollama for local-based LLMs.
Pricing	Seat-based subscription, unlimited usage of models	Seat-based subscription with usage limits	Seat-based subscription, unlimited usage of models
IDE support	• VS Code and IntelliJ (reduced functionality) • Inline suggestions and chat	• Native Cursor app (modified VS Code) • Inline suggestions and chat	• VS Code and IntelliJ plugins • Inline suggestions and chat
Security and Privacy	• Prompts and code will not be used as training data • Content Exclusions to prevent certain repository content from ever being sent to Copilot • No self-host of models • SOC 2 Type I • https://copilot.github.trust.page/	• Good overview of how data is accessed and stored across its services - https://www.cursor.com/security • Privacy mode - prompts and code will not be stored or used as training data • SOC 2 Type II certified	• Open source • Options for self-hosting models using Ollama • No specific security certifications • Anonymised usage telemetry • https://www.continue.dev/privacy
Fine-tuning of models/customisation	No fine-tuning of models other than basic system prompts	No fine-tuning of models other than basic system prompts	• Continue Hub - ability to create, share and use custom AI code assistants • Can use context providers to pull data from GitHub, Jira, Confluence
Enterprise features	• Integration with SSO • Enterprise admin control of features, manage seats • Integration with wider Microsoft network incl. GitHub (Azure DevOps coming soon)	Centralised admin controls	Offers enterprise features (more info required)
RAG and MCP support	• RAG: Supported with RAG plugin • MCP: Supports tools and resources	MCP: Supports tools	MCP: Full support for all MCP features

Key points

All 3 coding assistants offer agentic functionality and inline chat/code suggestions within an IDE.
Model selection is standard, however Continue stands-out by offering integration with Ollama for locally running models.
In terms of pricing, all offer a seat-based subscription. Both GitHub Copilot and Continue feature unlimited usage of all models, but there are some usage constraints for certain models with Cursor.
GitHub Copilot is probably the most enterprise ready and will continue to offer tight integration with the wider Microsoft stack (Azure, GitHub).
Neither Copilot or Cursor offer much in the form of customisation but Continue has a focus on user submitted content. Users can share prompts, models and rules using the Continue Hub.
The Context Providers feature within Continue is not something that appears to be featured in other tools and allows the coding assistant to specifically reference things like Jira and Confluence.
Cursor had the most detailed breakdown of how data is handled and stored across its system.

What models are available?

There are dozens of models available for different tasks.

There is not a definitive list but here are a useful set of sources:

Cursor – list of supported models (https://docs.cursor.com/settings/models)
Ollama – list of supported models (https://ollama.com/library)
WebDev Arena Leaderboard – ranking of models (https://web.lmarena.ai/leaderboard)

Typically, we’d consider popular models that are supported in coding assistants like GitHub Copilot. This includes:

OpenAI’s GPT-4o, o1-mini
Anthropic’s Claude 3.7 and 3.5 Sonnet
Google’s Gemini 2.0 and 2.5

There are also other models such as DeepSeek’s R1 and Alibaba’s Qwen 2.5 which have not been investigated in enough detail.

What are the key differences between models?

Feature	Claude 3.7 Sonnet	GPT-4o	Gemini 2.5
Provider	Anthropic	OpenAI	Google
Release date	March 2025	May 2024	February 2025
Model type	Thinking and reasoning	Multimodal, generalist	Multimodal with thinking and reasoning
Description	Currently considered one of the best models for agentic coding. Supported by most tools. Claude 3.7 emphasizes “extended thinking” modes for complex tasks and excels at agentic coding benchmarks	A good all-rounder, suitable for most tasks. Doesn’t feature the thinking and reasoning of Claude 3.7 or Gemini 2.5, however makes up for in speed.	Multimodal and “thinking-oriented”. Capable of handling huge codebases
Context window	200k tokens	128k tokens	1M (2M coming soon)
Benchmarking	WebDev Arena ranking: #1 HumanEval ~82% MMLU ~90%	WebDev Arena ranking: #19 HumanEval ~85% MMLU 88.7%	WebDev Arena ranking: #2 Code generation ~70% MMLU ~90%
Pricing (approx. per 1M tokens)	$3 input, $15 output	$2.50 input, $10 output	Free preview; pricing TBD (2.0 was $0.10 input, $0.40 output)
Knowledge cutoff	October 2024	October 2023	January 2025

Key points

Claude and Gemini offer reasoning models which may suit the tasks required by agentic coding assistants. They are often the two top ranked models in various benchmarking lists.
Gemini boasts a large context window of 1M tokens, which would make it ideal for larger codebases.
GPT-4o is probably the weakest of the 3 models but is still suitable for everyday tasks.

What are the local options for running a model?

There are various privacy and security concerns about sending application codebases to models which are often hosted in the cloud.

As a result, there is a desire to consider options for running models locally. One of the best options for doing so is Ollama which allows you to run popular, open-source models like Google’s gemma3, DeepSeek’s R1 and Meta’s Llama 3. Most models come in a variety of different parameter configurations.

As the table below outlines, only the smaller parameter model variations (1B-7B) could be run locally (i.e. on MacBook Pro).

Size	Notation	Example	Typical Use Case	Can you run it locally?
Small	1B–7B	Tiny LLama, Mistral 7B	Lightweight assistants, mobile apps, edge devices	Yes, laptop.
Medium	8B–30B	LLaMA 2 13B, Mixtral 8x7B	On-prem inference, enterprise private models	Maybe, required dedicated on-prem, large memory system. Multiple GPUs required.
Large	65B-100B+	GPT-4o	State-of-the-art models (ChatGPT, Claude, etc.)	No, distributed infra with large GPU clusters required. Use only via API/Cloud service.

Table of Contents