LLM models for creating custom AI agents

What does it mean

When a new colleague first joined us in 2020, who at the time was studying and specializing in deep neural networks, almost no one in the room understood him at first. Discussions and the development of our own AI quickly gained momentum, and a few months later, we released our own Chatbot, our own AI Recommender, and in 2025, the first internal version of an AI agent, whose knowledge base was, of course, still somewhere completely different from where it is today.

More info

Today, we have not just a month, but years of AI solutions development behind us, and our current (again, our own) is directly connected to the API from Anthropic, without frameworks, simply on our own code with our own SKILL.md, where it stores knowledge, context, and errors to avoid. We are currently preparing processes and procedures on who and how can pass on know-how to the AI agent so that it learns truly from the best and spreads this know-how further.

And that's why we can say what really works in building agents in 2026 and where their limits are.

If you want to build your own AI agent today, the key question is no longer whether, but on which model to build it, because the choice of LLM (Large Language Model) significantly affects:

  • the quality of outputs
  • the degree of autonomy
  • costs
  • and how much you will have to "monitor" the agent

How are LLM models compared?

Currently, several leading creators of global LLM models are available on the market. Almost every month, they compete with each other on who releases a better, higher quality, and faster language model. But how can its quality be determined without testing it in practice?

LLM models are compared through benchmarks such as:

  • MMLU (general knowledge)
  • HumanEval (coding)
  • GSM8K (logic, mathematics)
  • bar exam / legal tests in the USA (argumentation, working with complex text)

It is important to say that a benchmark, i.e., what works in a test, may not work the same in reality. Especially the speed of processing the maximum number of requests per second often only shows in practice. However, the trend and individual model rankings at least hold true.

1. Anthropic (Claude Opus)

Best for complex agents and reasoning. Very popular among developers because it is clear, all tools are natively included

Claude can very effectively evaluate the current context, call tools (external tools like GA4, GSC, etc.) as needed, from which it requests information. It can process provided information, evaluate its volume, and if it doesn't have enough data, it can ask another tool to gather enough data for evaluation. 

It is one of the more expensive models. Its price can change over time, but currently, it is around $10–15 / 1M tokens for input, $30–75 / 1M tokens for output

2. OpenAI (GPT-4.1 / GPT-4o)

It is the most universal ecosystem. OpenAI was long considered the best creator of LLM models, dethroned by Anthropic due to the aforementioned advantages of Claude, which are naturally disadvantages of GPT. Among the biggest advantages are an excellent price vs quality ratio, strong coding performance, and a wide range of integration.

The price is approximately (depending on the model) around $5–10 / 1M tokens for input, ~ $15–30 / 1M tokens for output.

Among its disadvantages is mainly that it does not provide its own native infrastructure for programmers and uses, for example, Microsoft's Copilot.

3. Google DeepMind (Gemini)

It is certainly the best for multimodal agent systems (text, image, video) and provides strong integration with the Google stack

In terms of cost, it belongs to the cheaper solutions: input ~ $3–10 / 1M tokens and output: ~ $10–30 / 1M tokens

4. Open-source models (LLaMA, Mistral, Mixtral)

Best for control and infrastructure, open source - meaning you can run them on their servers, but also on your local machine, although if you run them locally, they are significantly slower. If it runs on the infrastructure of powerful servers, statistical calculations.

The advantage is that it is a closed system on a local computer without internet access, if it runs only on a local network, you don't send anything anywhere, it is the only way to maintain security.

Why:

  • full control over data
  • on-premise deployment

Price:

  • no direct price for the model
  • but: infrastructure costs (GPU, hosting)
    → from hundreds to thousands of € per month

- cheap at large scale, expensive at small scale

- In practice: enterprise, sensitive data


We are a ONE-STOP SHOP for growing your business. We connect web and e-shop development, UX and CX design, brand building, marketing, and AI solutions into one functional unit. Because only then can digital bring performance and growth in the long term.

Contact us

Our agency adheres to the rules and principles of Fair Tender.

Everything for the growth of your business in one place

At ui42, we combine creativity, technology, and marketing into one team.
We build brands and visual identities, create websites and e-shops, design UX and CRO, produce video and creativity, and subsequently deliver results through performance marketing.
Thanks to this, you gain a partner who can cover the entire digital ecosystem of your business – from the first contact with the brand to conversion.

Web development, Performance marketing, Brand building, UX/CX

 

Thank you for subscribing!
Oops! This email is already registered.
Email We already have it in the database, please check your inbox or use a different email.
Oops! This email is incorrect.
Email It doesn't have the correct format.
Oops! Unknown error.
Please, try again later.