Thomas Letan's Blog - vibecoding

What happened in February 2026?

March 8, 2026

What happened in February 2026?

Time flies. February was a productive month, and I got many things done—with the help of my coding assistant , that is. I haven’t joined the “I haven’t written a single line of code for quite some time” flock just yet, but I am close. At the same time, I feel like I’m still locked in beginner mode. Like I have still to really tap into what agentic tools like Claude Code have to offer.

For now, the tools may have changed, but the journeys feel somewhat the same as they used to a few months back. It’s a strange feeling, and I am still trying to wrap my head around it.

Yet Another Claude Code Wrapper

In my previous retrospective, I mentioned CeCe and Vee. The former is a plugin for turning Anthropic’s model into a “modal” agent that I had vibe “prompted” with Claude itself. The latter was a second take at the exact same concept.

A month later, Vee has evolved quite a bit, to a point where the “modal” aspect of the project lost the front seat. More precisely, Vee has become a session multiplexer for Claude Code, with built-in support for spawning it in ephemeral containers with --dangerously-skip-permissions enabled. It is also a playground for me to experiment with the agent. For instance, I built a feedback system where I can point it to the good and bad things it’s doing—my feedbacks get saved in a SQLite database and appended in the system prompt during the next sessions. I’ve also tried to set up a knowledge base with embeddings. Both approaches have their benefits, but I don’t think I have clear evidence that they can be a game changer.

You may remember Vee started out of my frustration to losing control of CeCe’s internals. Well, Claude Code took over Vee’s codebase quite quickly as well. But this time, the output was a lot more satisfying. This time, I leaned into playing the role of a Tech Lead and a QA tester. Seeing the project taking shape and the tool I wanted becoming a reality that quickly has clear upsides. I also find it interesting that I have both a fairly good understanding of how the software works overall—I called most of the architectural choices—and very little insight on how the code itself is written. It reminds me of $WORK, where I am involved in some projects without being a direct IC.

I Built Myself a Cloudlab

In the first half of the month, a colleague of mine was tasked with building “staging environments à la demande” for the engineers working on $WORK’s Next Big Thing™. Their answer was a managed Kubernetes cluster and a bunch of Helm charts—and a small CLI tool to deploy the latter onto the former. A few hours later, I started building myself a k3s cluster. It’ll be a fun way to get familiar with the stack, I thought. What emerged almost resembles a managed cluster of sorts. I called it elsa , and it now hosts this very website along with a few other self-hosted services I had under my radar for a while .

Note

Yes, GitHub Pages and the like remain a better, cheaper, superior even way to host my website. But where is the fun in that? 😁

This may come as a surprise, considering I had migrated my website to a new setup called tinkerbell barely a month ago. Well, fret not. The little VPS sure has been retired early, but its legacy has not been lost. elsa was built upon the very same tools (Terraform, CoreOS, and Ignition) and promise I got hooked on a month ago: that there is freedom in designing a system whose destruction is a non-event. If anything, working on elsa has only deepened my understanding—and my appreciation—of what Fedora has been pushing for with CoreOS.

I really, really want to publish an article about elsa. I yet have to find my angle, though. Funny story, the first couple of paragraphs of this section were initially intended to be its introduction, until I grew dissatisfied with them and decided to repurpose them. Which means I need to restart from scratch. 😆

Anyway, as I was making good progress on elsa, I ran into some good opportunities to contribute to some open source software. I opened two bugfix PRs over the course of my journey: one for the official BetterStack’s Terraform provider , and the other for external-dns’ Vultr webhook sidecar . That was pretty cool! I want to do that more often.

We are almost a third into March already, and the month is clearly following the trend set by its predecessor. As such, I expect to have plenty to write about in a month’s time, meaning I should be able to keep up with publishing these retrospective pieces for a while. At least, I hope so!

What Happened in January 2026?

January 30, 2026

What Happened in January 2026?

meta vibecoding

This is my first retrospective in quite a while—I have yet to make writing these logs a real habit of mine. That being the case, so much happened this month that this article felt like the obvious thing to do.

`tinkerbell`’s First Month

On January 5th, I published my account of migrating my website to a completely new setup. Not only am I very proud of this article in and of itself, I am also quite happy that it has sparked discussions in a few places such as lobste.rs or the orange website . It has also been shared in other places like the DevOps’ish newsletter I have also created a GitHub repository to host tinkerbell configuration, and it is quietly farming stars as we speak. .

I don’t write articles with the expectation that they reach a large audience, but I will admit I had some ambitions for this one. It’s very fulfilling to know my experiment report of sorts has caught the attention of my peers, if only a little.

At first, I thought I would spend January experimenting with my new playground. I have a handful of services I want to deploy. Little did I know I would be sidetracked sharply before I had a chance to get to it.

Meeting with Claude Code

Somehow, I have managed to go through 2025 while staying afar from the code agent hype. I gave “vibecoding” a try in May by chatting with ChatGPT and Gemini while yak shaving my way towards transcribing YouTube videos in OCaml. Even I realized back then that the hype had already moved on from chat UI towards agents.

Then, the craziest thing happened. I went on holidays on December 23rd, 2025. Two weeks later, as I was going back to $WORK, Claude Code and its ilk suddenly felt inevitable. It looks like agentic workflows got very good in a matter of a few weeks, up to a point where there is a real opportunity cost in ignoring them altogether. And as I am about to take out a mortgage, it feels like a bad time to take the risk of falling out of relevance.

Looking back, I think it started with a simple, genuine suggestion—we were discussing a fairly ambitious clean-up of our test-suite at $WORK when someone mentioned Claude Code was quite relevant for this kind of tedious, boilerplate-heavy task. This resonated with all the success stories I was suddenly exposed to. Before I could realize what was happening, I was caught up in an intense introspective journey.

Part of my answer to the angst of LLMs being on the verge of reshaping the way I work is CeCe CeCe being a nickname for Claude Code (CC). .

CeCe is a Claude Code plugin ~~I have been working~~ Claude Code and I have been working on for the better part of the month, and it has become my sandbox to experiment with and understand agentic workflows. Configuring Claude Code in a very opinionated way felt only natural—that’s the only starting point I know. I am glad I did that, it enabled me to get a better understanding of how Claude Code actually works.

Ironically, since CeCe has mostly been written by CeCe itself and because it grew quite fast as a result, I do not really feel confident in modifying it myself 😆. I am now trying to rewrite it on my own, with more structure and “intention.” We will see how it goes.

New RSS Feeds

On a final note, I am glad to say that this website now features some new RSS feeds that you may be interested inThat’s actually the very first task I entrusted to Claude Code. Suffice it to say, it didn't blink, and navigated through my quite unusual setup like a champ. : one per tags, and one per series. I have been meaning to implement them since I’ve added my website to a certain aggregator that now advertises every article I publish that cites one particular programming language.

I am glad to know that camel aficionados learnt about my tinkerbell setup simply because I drew a parallel between infrastructure as code and functional programming. Maybe they don’t share my enthusiasm, though. With this change, I am one PR away from limiting this aggregator’s scope to simply the articles featuring the appropriate tag. I think they will find it useful. Maybe you will too!

How I Want to Use LLMs in 2026

January 25, 2026

How I Want to Use LLMs in 2026

opinions vibecoding

I would like to thank Xavier Van de Woestyne for his feedback and careful review.

Agentic tools are here, and they are here to stay. I don’t think it is an overstatement to say that LLMs are completely reshaping our day-to-day life. Even if mass adoption has yet to happenWe are seeing more and more public statements from key figures of our industry like Satya Nadella or Jensen Huang trying to shed positive light on AI and pushing for more people to embrace it—probably because they don’t see the increase in active users they were hoping for. , the consequences are already here. In the software engineering industry, I already witness how they come with expectations whether we want to use them or not.

In 2025, I stayed away from the agentic hype. This month, I was made acutely aware of how transformative they can be. I don’t think there is a way back from there.

As we jump into 2026 headfirst, I consequently find myself at a crossroads. I will integrate LLMs in my workflow so that I can get the most out of what they can offer, but I want to do so consciously. Hence this article, whose tone and underlying motivation are different from my other pieces. I want to set a bar, to form a contract of sorts between me and my future self.

I want to be transparent about who—or what—produced the work I am exposing to others. I am an engineer; I can expect to generate tons of code and technical documentation using (meticulously prompted) agents. I am also publishing content online, like on this very website. The last thing I want is to trick people into thinking that I wrote something that was actually generated by a tool. Or, even worse, that they end up convinced something I genuinely wrote “the old way” has been generated.

This does not mean generated content is without value. Nor do I think there is a clear, objective line to draw between what are actually two ends of a spectrum. After all, I’ve been using ChatGPT to polish my articles for a year now, and I haven’t advertised it in the past. Still, agentic tools have become good enough: I will be in a position to describe complex tasks, fully delegate their implementation, and be confident enough to publish the result. I believe it is fair for my fellow humans to be aware of that fact when they read or review the result. Only then will they be able to calibrate their own expectations in light of this information.

As a concrete example, I have started to set up dedicated accounts for Claude Code . I am trying to come up with a reliable way to let it take over the execution of well-scoped tasks, up to responding to reviewers’ feedback on its own. That’s not an end I want to pursue unless I can be transparent about it.

I want to be deliberate about when I use or don’t use LLMs. And I have a surprising number of reasons why.

On a personal level, there are skills I don’t want to lose, craft I still wish to improve. I’m fine with never having to write a bug report directly again, but do I want to forgo authorship of my blog’s articles, for instance? Clearly not.

Besides, we are still grimly on track when it comes to climate change. Even after accepting that individual behaviors have a much less impact than what we’d like, is it really reasonable to have hours of back-and-forth with an agent every day from now on? A lot has been said and written about the impact of LLMs in that regard. I am under the impression that we recently read stories that are tragically similar to what was published about Bitcoin a few years backI had a risk management training once, and something the instructor said stuck with me: when something becomes safer, we humans tend to adapt our behavior to take more risks. Are we doing the same thing with climate change? When we manage to reduce our environmental impact, do we collectively interpret that news as a blank check to find new ways to consume more energy? .

I think one answer to this is to refuse to make the LLM the default, obvious choice. Using an agent should have weight to it. There will be times when an agent will be an enabler—achieving something outside of my immediate reachBecause it would take me too much time, because it would be extremely tedious and error-prone, or for an infinite number of valid reasons. . And there will be times when I will want to use it to write a trivial patch that I could come up with myself in a matter of minutes. I want to cultivate the discipline to make the distinction between the former and the latter, so that I avoid falling into wasteful habits.

I want to be respectful of the people who will be confronted with my use of LLMs. The matter is too complicated to approach any other way. What is acceptable for some feels like an attack on others.

Yes, agents can be powerful accelerators. No, adopting them is neither easy nor the obvious, right thing to do. We are seeing too many people being hurt by the behaviors enabled by agents’ capabilities: open-source maintainers closing their projects from external contributions, junior engineers struggling like never before to find jobs, artists witnessing in real time their art being regurgitated by models trained on their portfolios. The list goes on.

I cannot commit to never using generated art, or to close my eyes to what LLMs can bring me. But I want to be aware—and to care—about the consequences of my choices, as well as the broader context behind them. And sometimes, that needs to mean renouncing the convenience that a technology can bring, even if for just a moment.

Only time will tell how I end up using LLMs, and whether the principles I claim today as my own in this article will stand the test of time. I am rather curious to reread this piece in a year—I hope it will prompt (ah!) me to write a retrospective.

Peer-Programming in Modern OCaml with ChatGPT and Gemini

June 2, 2025

Peer-Programming in Modern OCaml with ChatGPT and Gemini

ocaml vibecoding

It is June 2025, and LLMs are everywhere and do everything now. I have never been a diligent adopter of them myself. The past few months, I started to feel a bit “left out,” though. Colleagues and friends are starting to integrate LLM-powered tools into their personal toolkit, with notable successes.

Early May, I decided to challenge myself to implement a simple tool to generate a summary from YouTube videos using Vosk for speech recognition and Ollama for generating summaries using LLMs running locally. I could hit two birds with one stone—experimenting with LLMs to write and power software.

I decided to implement as much as possible in OCaml, for two main reasons. Firstly, this is the main language I use at $WORK. I wanted to get a sense of how LLMs could help with the software stack I used 7+ hours a day. Secondly, it was a good opportunity to catch-up with the OCaml 5 ecosystem (Eio in particular).

This write-up is a sort of dev log of this exercise. Its main focus is not to explain in depth the code I ended up writing, but rather to recollect on my wins and losses in adding LLMs in my developer toolkit.

TL;DR

In this article, I am using “Tip” blocks to highlight my key findings and lessons learned. That being said, for readers in a hurry, here’s how ChatGPT summarizes these blocks.

Prompting is a skill that improves through trial and error—many failed prompts help build intuition.
LLMs may suggest non-existent functions; using LSP tools helps identify these quickly.
Standard formats like WAV lead to more accurate LLM outputs.
LLMs without session memory tend to repeat mistakes; shared context is important.
Structuring commit message prompts (e.g., What / Why / How) produces consistently good results.
LLMs struggle with libraries like Eio, possibly due to name ambiguity or unstable APIs.
Providing project-specific context (e.g., via direnv) is likely to help reduce repeated hallucinations.
Prompting LLMs for MR descriptions or commits can eliminate empty submissions and speed up review.

You should still definitely read the full piece, though. I don’t think my prompt was particularly good 🤫.

Editor Integration

My first task was to grant myself the ability to leverage LLMs from my editor. I had been using the web chat of ChatGPT for a while, but it now felt antiquated since I had seen a freshly hired coworker get ChatGPT to generate for themselves a dozen tests directly from VS Code.

I have returned to Neovim for a few years, and I am not ready to migrate to VS Code. I would have been surprised if the Vim/Neovim communities wouldn’t have a viable plugin for me, though.

I asked both ChatGPT and Gemini to find my candidates, but the plugins they suggested seemed unmaintained, often outdated.

In the end, I found CodeCompanion.nvim by myself, through a good old Google research. I asked ChatGPT why it hadn’t suggested it to me, and it seems like my prompt were biased. By asking for “a Neovim ChatGPT plugin” or “a plugin to integrate Gemini to Neovim,” I had unnecessarily narrowed the LLM scope.

Tip

I guess one does not become a prompt engineer in a day. This is actually one of the reasons I want to use LLMs more seriously. To build myself intuitions of which prompts work and which don’t. After this project, I have mostly uncovered a bunch of the latter category 😅.

@yurug had told me he was impressed by Gemini Pro, so I decided to make it the default adapter for the CodeCompanionChat command. I tried to make Gemini Pro the default model for this adapter, it was challenging and LLMs weren’t able to help. When I finally found the correct setup option, it turns out I hadn’t generated a token allowing me to use Pro.

Well. That gave me the opportunity to benchmark Gemini Flash, then.

Speech Recognition with Vosk

ChatGPT suggested Vosk as a way to get a transcript of an audio file, so it was also a good opportunity to write bindings (something I had dodged for a long time for no particular reason).

As of June 2025, there is no OCaml bindings for the Vosk API , so my first task was to write my own as part of a project soberly called ocaml-vosk .

Gemini Flash was able to help me understand how ctypes and ctypes.foreign works. This was my first experience interacting with an LLM from my Neovim window, and it was pretty convincing. It gave me the opportunity to learn that one can declare opaque types in OCaml (not just via mli files). It makes sense, but it was news to me.

Then, Gemini suggested me to use EIO’s Switch to deal with automatic memory management (in place of Gc.finalise). It was the first time I heard about it, and the fact that I learned their existence from the perspective of resource management (not fiber management) was a good accident.

The first point of friction came when I started build a high-level interface for my Vosk bindings. More specifically, given a Cstruct.t value, how do I get a pointer and a length? It turns out that while both ChatGPT and Gemini Pro know how to do so, Gemini Flash hallucinates every step of the way.

The solution is actually pretty straightforward.

let ptr =
  Ctypes.bigarray_start
    Ctypes.array1
    (Cstruct.to_bigarray buffer)
in
let len = buffer.Cstruct.len in

Gemini Flash kept suggesting I use Ctypes.ptr_add instead, though. Don’t search for it, it does not existWhile reviewing this article, ChatGPT gently hinted that while ptr_add does not exist, Ctypes.(+@) does. . When I suggested Cstruct.to_bigarray, it warned me about the fact that this call would create a copy of the underlying buffer. ChatGPT and Gemini Pro disagreed, and I could convince myself that they were right by looking at the code. Interestingly, I was also able to convince Gemini Flash it was wrong by copy/pasting the relevant code snippet.

Tip

Having an LLM suggesting you to use a function which does not exist is very frustrating. Especially if it happens several times in a row—it recognizes its mistake and proposes an alternative that is as nonexistant as the first one. At least, with LSP it is pretty straightforward to know when it happens.

Using Vosk is one thing, but then I couldn’t find any OCaml package to read audio files compatible with Vosk expectations. Implementing what I needed in OCaml gave me more opportunities to learn about EIO, but most importantly, it showed how having a chat with an LLM directly from my editor was convenient. I was able to learn about WAV files, RIFF header and subchunks and PCB 16-bit mono audio data without leaving Neovim. And by giving Gemini access to my buffer, I troubleshot most of my issues fairly quickly (except when they were EIO-specific—more on that later).

Tip

For widespread encoding like WAV files, LLMs shine particularly bright.

In the end, EIO-specific code put aside, this task was roughly solved by (1) writing bindings for the few functions of the Vosk API I needed, and (2) translating C examples provided by Gemini into good-looking OCamlIt’s a little out of scope for this article, but I discovered when writing the high-level API for Vosk that Switches are very easy to misuse. It is as simple as (incorrectly) turning an eager function consuming a buffer into a Seq-based alternative, while forgetting the use of Switch.run on top of the function. .

Witnessing my example program outputting the transcript of audio files as it was processing them felt pretty good, and I was soon ready to tackle the second part of this project: prompting a LLM to summarize it.

Prompting Local LLMs with Ollama

Similarly to Vosk, there is no on the shelf package available to use Ollama from an OCaml program. As a consequence, I created a second repository (ocaml-ollama if you can believe it).

How It Started

Turns out, you don’t use Ollama the same way you use Vosk. The latter is a C library that you can call from your binary, the former actually uses a client/server architecture. I asked LLMs what was the best solution for performing HTTP requests with Eio, and cohttp-eio came back as a good candidate. I’m already familiar with cohttp, since we are using it at $WORK, but it’s actually a transitive dependency (of a framework called resto ).

I am actually a little frustrated with resto, so I welcomed the opportunity to familiar myself a little more with cohttp directly. I quickly implemented the helper fetching the list of models available from a given Ollama instance.

Then, I got myself side tracked.

More LLMs Lies

Persistent HTTP connections are a pet peeve of mine. Establishing a TCP connection, negotiating TLS encryption, all of that takes time—creating a new socket for each request a daemon really frustrates me as a result.

So I asked.

Does cohttp-eio reuses already established connections when performing two requests on the same host?

ChatGPT 4o. Gemini 2.5 Flash. Gemini 2.5 Pro. They all assured me it was the case, as long as I was careful and reused the same Cohttp_eio.Client.t instance. For instance, here is the first few words of ChatGPT when prompted with this question.

As of current behavior in cohttp-eio-client, yes, it does reuse already established connections when making multiple requests to the same host, provided certain conditions are met.

It’s a lie. Don’t trust them. They don’t reuse existing HTTP connection.

I was very doubtful, so I asked them how to check this. tcpdump was mentionedI later discovered eio-trace and it would have been much more straightforward to use this tool to inspect Cohttp_eio.Client’s default behavior. No LLM thought of that, sadly. . I got traces I couldn’t read at first glance, so I just copy/pasted them to the LLMs… and sure enough, they confirmed what I suspected. Cohttp_eio.Client does not share connections by default. It creates a socket for each request.

It’s actually pretty easy to convince yourself that it is the case by reading the implementation of Cohttp_eio.Client .

type connection = Eio.Flow.two_way_ty r
type t = sw:Switch.t -> Uri.t -> connection

(* simplified version of [make], omitting the support for HTTPS *)
let make () net : t = fun ~sw uri ->
  (Eio.Net.connect ~sw net (unix_address uri) :> connection)

There is nothing here dealing with persistent connections. Eio.Net.connect uses a switch for resource management, but does not perform any kind of connection caching.

That’s okay, though. Yak shaving is a real thing. I can stop working on my Ollama client library for a while, just to fix this.

The Questionable Side Quest of Implementing a Connection Pool for `cohttp-eio`

The bottom-line of this little adventure is: I should have updated my default prompt to remind the LLMs that Cohttp_eio.Body.drain in not a thing.

But let’s start from the beginning. Over the course of a few days, I have successfully implemented a wrapper on top of Cohttp_eio.Client to deal with persistent connections. It’s not rocket science, but it’s still a subtle endeavor, which necessitated a good understanding of Eio and cohttp. I cannot say LLMs were instrumental for the task. They gave me good pointers to start from, but they also misled me a bunch of times.

Sometimes, the help came in surprising ways. One anecdote in particular stuck with me. I decided I needed a get operation for Eio.Pool pools, which sadly only proposes use.

(* Provided by Eio.Pool *)
val use : 'a t -> ('a -> 'b) -> 'b

(* Not provided *)
val get : sw:Switch.t -> 'a t -> 'a

The key insight is that get allows callers to pick something from the pool, and only put it back when the switch is released.

My first implementation of get was roughly as followsI didn’t even consider asking an LLM to propose me an implementation, now that I think about it. I really am no vibe coder yet. .

open Eio.Std

let get ~sw t =
  let x, rx = Promise.create () in
  let never, _ = Promise.create () in
  Fiber.fork ~sw (fun () ->
      Eio.Pool.use t @@ fun conn ->
      Promise.resolve rx conn;
      Promise.await never);
  Promise.await x

And it didn’t work. The resulting program was hanging, because of how Fiber.fork ~sw works. Basically, the fiber created by fork becomes part of the set of fibers the switch sw waits for. Since, in my case, said fiber would never be resolved, I had created a deadlock.

I asked Gemini Pro 2.5 for help, and out of curiosity, I looked at its reasoning steps. Very early on, it mentioned Fiber.fork_daemon, but surprisingly Fiber.fork_daemon was not mentioned in the final answerOnce again, I had asked the wrong question. I asked for the Fiber equivalent of Lwt.async. I had overlooked that Lwt.async had a very particular behavior wrt. exceptions, that Gemini Pro tried very hard to replicate. I didn’t care at all about the exceptions I could raise, here! . Have I not been curious at that time, I would have missed the correct solution@alice provided me the answer a few minutes later, so I’d have been fine in the end 😅. .

I think my experience overall was made a little more frustrating than it should have been because I have never constructed a “context” that I could share between coding sessions. I haven’t enabled the memory saving setting in ChatGPT. Besides, everytime I opened Neovim, Gemini was starting from scratch. I should try to change that, to prevent the LLMs from doing the same mistakes again and again—typically, the Cohttp_eio.Body.drain function they kept bringing up.

Tip

I need to investigate how I can specialize my default prompt for each software project I am working on. I imagine I can rely on an environment variable and direnv .

Finally, it’s when I worked on this library that I came up with a nice prompt for Gemini to write my git commit messages for me.

@editor #buffer Add a git commit title and message. Structure the description in three sections (What, Why, How). Wrap the sections at 72 columns. Don’t forget the git title, and always insert a new line between the title and the description.

This prompt gives pretty cool result. It is still necessary to review it, because in a few instances I caught false statement in the proposal. But overall, it gives really meaningful output. Almost all commits of the library have been written with this prompt.

Tip

If anything, I don’t think I will never open a Merge Request with an empty description ever again.

And that, kids, is how I released cohttp-connpool-eio.0.1 .

Wrapping-up a Minimal Ollama Chat

Integrating cohttp-connpool-eio in my ocaml-ollama project led me to find a bug in the former. More specifically, the Cohttp_connpool_eio.warm function that can be used to pre-populate a new pool was doing so by performing a specified HEAD request to the host as many time as the pool sizeIn a later iteration of the library, warm only establishes connections, and does not perform any unnecessary HTTP requests. .

It worked well against both https://www.google.com and https://soap.coffee/~lthms, but when I tried with the Ollama server, it decided to hang. Why?

Well, I tried asking my new friends the LLMs, but didn’t get any answer I felt confident with. At this point, my trust in their EIO expertise was rather low, and I was more skimming through their answer to find a lead I would follow myself than anything else. In the end, I completely dropped the LLMs here, and went back to what I usually do: experimenting, and reading code.

I reproduced the issue with curl: curl -X HEAD hangs as well with Ollama, while curl --head does not. The former tries to read the response body, based on the response headers (e.g., content-length). The latter doesn’t, because it knows HEAD always omits the body. I am not sure why the hanging behavior does not show for curl -X HEAD https://www.google.com, though.

But anyway, once the bug was fixed, I could return to playing with Ollama.

I then decided to implement a helper to call POST /api/generate . It is the simplest way with Ollama to generate an LLM’s answer from a prompt. Interestingly enough, it is a “streamed” RPC using the application/x-ndjson content type. Instead of computing the answer before sending it to the client, the server instead sends JSON-encoded chunks (transfer-encoding: chunked ).

I tried to implement that with cohttp-eio, and it failed miserably with obscure parsing error messages.

After a bit of debugging, it became clear that Eio.Buf_read.parse was not behaving as I thought it was, which made me feel paranoid about how cohttp-connpool-eio handles connection releases. In the end, I had to unpack how the Cohttp_eio.Body.t work under the hood wrt. End_of_file to move on. Once again, my LLM friends weren’t particularly helpful: they were hallucinating Buf_read functions, and never considered to mention that parse only works for complete response.

Tip

My personal conclusion is that ChatGPT and Gemini quickly show their limits for non-trivial programming task involving Eio and its ecosystem. I am really curious to understand why. Do they keep hallucinating functions because Eio is a really generic name, and maybe they are mixing context from the Python library with the OCaml one? Or is it because the API of Eio has changed a lot over the years?

I am also wondering how, as a the author of a library, I can fix a similar situation. Assuming ChatGPT starts assuming false statements about cohttp-connpool-eio for instance, how do I address this? I suspect being “LLMs-friendly” will be increasingly important for a software library’s success.

In the end, ChatGPT and Gemini were just another source of inputs, not the main driver of my development process.

Putting Everything Together

Turns out, you really need just one RPC to generate a summary for a text input, so it wasn’t long before I could chain everything. I pulled mistral:7b-instruct-v0.2-q4_K_M (over a suggestion by ChatGPT, if I remember correctly), and got a summary from the video I had downloaded.

Just kidding. Out of nowhere, I decided to pursue yet another side quest, and gave a try to the fancy dune pkg lock command. Then I was able to generate my summary, using the following prompt.

Generate a summary of the raw transcript of a video provided after
this paragraph. The transcript may be in a language that is not
English, but the summary should always be in English. You should
adopt a neutral point of view (i.e., even if the transcript speaks
in the first person, you should always use the third person). Each
line is an utterance. Keep the summary short and engaging, your
goal is to provide a good overview of what was said.

----

{Vosk output}

And with this, it was time to wrap-up. And what better way to do so than to write this little journal entry? So I did, and when I was half way through my first draft, I fired a new chat buffer to ask for advice from my new friend Gemini Flash.

#buffer Here is a very preliminary, incomplete draft of a blogpost. Can you try to anticipate if it will find an audience?

It wasn’t long before Gemini turned me down.

My apologies, but I cannot anticipate whether this blog post will find an audience. My capabilities are focused on programming-related tasks like code explanation, review, generation, and tool execution within the Neovim environment.

Let me know if you have any questions about the code itself, or need assistance with Neovim.

🥲Fortunately, ChatGPT was less opinionated. .

Final Words

Although I had already used ChatGPT and other models in the past, this was the first time I tried to make them a central part of my workflow. I learned a lot during this experiment, and I now have an integrated setup I enjoy using.

I need to keep digging. Try more models (there are a lot of those now). And get better at writing good prompts which do not lead the LLMs astray. They are here to stay, after all. I better learn how to take the most from them.

Thomas Letan's Blog - vibecoding

What happened in February 2026?

What happened in February 2026?

Yet Another Claude Code Wrapper

I Built Myself a Cloudlab

What Happened in January 2026?

What Happened in January 2026?

tinkerbell’s First Month

Meeting with Claude Code

New RSS Feeds

How I Want to Use LLMs in 2026

How I Want to Use LLMs in 2026

Peer-Programming in Modern OCaml with ChatGPT and Gemini

Peer-Programming in Modern OCaml with ChatGPT and Gemini

TL;DR

Editor Integration

Speech Recognition with Vosk

Prompting Local LLMs with Ollama

How It Started

More LLMs Lies

The Questionable Side Quest of Implementing a Connection Pool for cohttp-eio

Wrapping-up a Minimal Ollama Chat

Putting Everything Together

Final Words

`tinkerbell`’s First Month

The Questionable Side Quest of Implementing a Connection Pool for `cohttp-eio`