Local AI for Coding

Fri, Feb 7, 2025
7-minute read

I’ve recently been playing with Artificial Intelligence and large language models, and there’s a trajectory that I’ve not seen widely discussed; Local AIs.

There are a lot of folk and companies who don’t want to use a cloud based service. They want to run things locally, and they want their data to stay behind a firewall to give them some sense of privacy when it comes to their work. At the moment there’s a push around cloud based AIs that doesn’t align well with this privacy focused thinking; Do you really want to use an IDE plugin that can push your source code, documents, etc. off to somewhere you don’t know, outside of your control?

The Hardware

In the last week I’ve built the basics of a “local” AI. My setup is running on my own machine, but there’s no reason why it couldn’t run on one machine and serve several people on a LAN.

The hardware I’ve used is;

A Dell T5810 desktop with 22-core CPU and 256GB of System Ram (~£610 from Bargain Hardware)
An Nvidia Quadro RTX 4000 GPU (~£200 from eBay)
An NVMe SSD and SSD Card (~£100 from Amazon)

Total Price: ~£910

The models you can run locally, at speed, are limited by the RTX 4000 GPUs’ 8GB of RAM, but the T5810 has two 16 lane PCIe 3.0 slots, so you can either get a more high powered GPU in the first place, or add a second card at a later date if you wanted to (my search history seems to be filling up with eBay searches for Nvidia P40’s lately ;))

The Software

Running a model locally is pretty simple. I’m using Debian Linux 12, but the software I’m using supports multiple different platforms.

You need to set up the Nvidia CUDA Toolkit, which involves following a simple set of instructions you can get from Nvidia’s Developer Site, and then Nvidia’s cuDNN

Next comes the model processor. I use ollama and the installation involves running a single command.

Once ollama is installed you can get it to download the model you want to use (e.g. ollama pull deepseek-coder-v2), and then tell it to run that model (ollama run deepseek-coder-v2). To make it easily accessible to other applications you can use ollama serve which will make an API available on a local port on your machine.

You can check ollama is using your GPU by running nvidia-smi which lists the processes which have reserved GPU memory.

The next bit is the bit I’m still working on; The IDE interface. I’m currently planning to take an existing third-party deepseek-coder plugin for JetBrains IDEs, and adapt it, but I’ve not started modifying it yet, so we’ll see if I can make progress on that, or if there’s another solution for talking to a local ollama instance already available.

Since writing the last paragraph I’ve found continue. This looks like a great way to integrate a locally running model into a development workflow.

How this all fits together

The two main steps for creating a useful AI are training and fine tuning. If you’re not familiar with these terms, I’d recommend spending 45 mins watching this video from the Royal Institute that gives a good introduction into how AIs work.

The training phase is where you teach an AI something. It can be how to code, how to understand one or more written languages, or anything else. It’s like going to school, university, or any other educational place; you learn the generic skill to a level where it’s useful.

The fine tuning stage is where you take the generic skill and learn how to apply it in a given situation. This is like when you start a new job; you may know how to write code in Java, but you’ll take time to adjust to the style, libraries, and approaches, which are common in the company your starting at.

For a Local AI the principal is that you can take a trained AI, which has had some fine tuning to your needs (e.g. language choices), and that, eventually, becomes the system your IDE queries for code suggestions, or the thing which creates changes from prompts (e.g. write a Pull Request description, and it writes the code). The code is created within your bounds of control (your machine, or your LAN), there’s no API keys or endpoints which you’re unsure of, and you can easily verify that all of your data is staying within your network by monitoring the network traffic on your LANs perimiter.

Introducing AI into the teams workflow

Introducing your new, local, AI, will take some time. The first phase is, like any new team member, just looking at existing code. This is what most folk will do when they join a new team, they learn what the local style is by example.

You can feed the Local AI your code base to start fine-tuning it, then feed it each new Pull Request or commit as they’re merged into the main branch (you don’t want to feed it all the work on a PR, because that’s not in a state to go into your main branch). This helps fine-tune the model to align with the local standards and preferences.

This process can be largely automated. You can fine tune the system by feeding it parts of commits and seeing how its predictions match the code being committed. If it’s consistently way-off, you may need to take a more manual approach and find more training data for it, but, eventually, it should become reasonably good at filling in the bits of commits you remove.

Once it has got to a reasonably acceptable level, it becomes a reviewer. It can suggest changes and, as part of its comments, folk have the ability to like or dislike the suggestions. At this point most of the suggestions shouldn’t be considered time-wasting, and this next step helps further fine-tune the model so it’s aligned with the current approaches to problems, and not just historical approaches which may have been superceeded by better ways of doing things.

After it’s considered a good reviewer, it can start becoming a coder. Like any good coder, the generated PRs should be small, tightly focused, and easy to review. Due to the earlier two steps the AI should now be at a point where reviewers will feel that they’re generally not reviewing bad submissions, and any generally accepted improvements to the PRs can also be fed back into the model to further refine the model. If they reviewers feel that the code is junk, then it’s time to step the AI back to reviewer mode.

Beyond this it’s just a matter of folk gaining confidence, and seeing how much can be offloaded to the local AI. If a human can write a pull request description, then have code to review, that’s taken a lot of the time consuming work from the humans involved, and moved them onto more valuable tasks.

All of this can happen on a local AI, and most of the time it can happen on a machine which costs less than one months salary for a developer in many companies.

Is this the end of developers as we know it?

No.

Seriously No.

AIs are good at generating output based on previously seen input. They’re not good at taking product requirements and generating flawless code. Any developer will know that human language is so complex that it’s easy for assumptions, misunderstandings, and poorly written specifications to trip up the most experienced of coders, and an AI is no different.

What they will do is take away a lot of the boilerplate code, and repetitive implementation tasks that folk currently do. Developers will need to adapt their skills to phrase AI prompts in a way that gets the most out of them, in the same way that folk learnt how to query search engines to get the best results, and that will be a valuable skill.

Another valuable, new, job will be analysing training data. Is the AI being fine-tuned on good code? Is the code that’s used for fine tuning found to later have a lot of bugs? Is it coming from a small number of folk, or automated processes, which skew its view of “correct”?

Overall I think teams can benefit from a local AI, which keeps their code “local”, but I still think there are going to be plenty of opportunities for humans to make a valuable and useful contribution for some time to come.