- Published on
Using ollama to host code companion
- Author
- Illia Vasylevskyi
I'm an old JetBrains products user and find their autocompletion excellent, that's why using things like Copilot ot AI Assistant for me was quite hard.
Anyway i've decided to try limited usage in my editor, hosting LLM on a local machine.
To host LLM locally you can use OLLAMA which is after installation will run on your machine as a service and will provide local API.
To use in our assistant I've choosen CodeQwen1.5 model, and to get it you can use
ollama run codeqwen
Now it's time to connect it to our IDE, and for this you have not alot of variants. Two best plugins I've found are:
Both have ability to connect local LLM's and pretty similar functionality. I've liked more CodeGPT, since it's more polished in my opinion.
By default ollama context for LLM's are 2k tokens, which can be small for long code or conversations, and CodeQwen1.5 actually supports 32k context.
To change this we will need to modify Modelfile.
- Retrieve model configuration:
ollama show codeqwen:latest --modelfile > codeqwen.txt
- Modify config file: Add new parameter
PARAMETER num_ctx 32768
Find the FROM
part and replace with FROM codeqwen:latest
3. Create new model with updated config:
ollama create codeqwen-32K -f codeqwen.txt
And that's it, you can use your new model.
As for performance, it's quite good on a laptop with rtx4060, and model is performing actually really good, so for small tasks it's great and faster then switching to a chat in web.