- Published on
LiteLLM for local inference
- Author
- Illia Vasylevskyi
Why Lite LLM? Need to run the model on a server and have local OpenAI compatible API to use model in Assistant. Also, you can run other providers like OpenRouter for example.
P.S. It might be easier and better just to run a docker version.
Go to venv and install LiteLLM proxy
pyenv activate python3.11
pip install "litellm[proxy]"
Now need to create config file ~/LiteLLM/litellm.yaml
, I will point it to local network server.
litellm_settings:
drop_params: true
model_list:
- model_name: gpt-oss-20b
litellm_params:
# Tell LiteLLM to forward to an OpenAI-compatible backend
model: openai/generic
api_base: http://server:8052
api_key: "EMPTY"
max_tokens: 65536
max_input_tokens: 65536
max_output_tokens: 32768
input_cost_per_token: 0
output_cost_per_token: 0
supports_function_calling: true
supports_parallel_function_calling: true
supports_response_schema: true
supports_reasoning: true
supports_tool_choice: true
supports_web_search: true
- model_name: "anthropic/*"
litellm_params:
model: "openrouter/qwen/qwen3-coder:free" # Qwen/Qwen3-Coder-480B-A35B-Instruct
max_tokens: 262144
max_input_tokens: 65536
max_output_tokens: 65536
repetition_penalty: 1.05
temperature: 0.7
top_k: 20
top_p: 0.8
router_settings:
server_url: http://0.0.0.0:11434
Now run it
litellm --config ~/LiteLLM/litellm.yaml --port 11434
If port is in use, kill the process and try again.
# find process
sudo netstat -tulnp | grep 11434
kill -9 YOUR_PID
Let's add LiteLLM to systemd service /etc/systemd/user/litellm.service
:
[Unit]
Description=LiteLLM Proxy
After=network-online.target
StartLimitInterval=1800
StartLimitBurst=60
[Service]
Type=simple
Environment=PYENV_ROOT=%h/.pyenv
Environment=PATH=%h/.pyenv/bin:%h/.local/bin:/usr/local/bin:/usr/bin
Environment=OPENROUTER_API_KEY="sk-or-12345"
ExecStart=/bin/bash -lc 'eval "$($PYENV_ROOT/bin/pyenv init -)"; pyenv activate python3.11 && litellm --config %h/LiteLLM/litellm.yaml --port 11434'
Restart=on-failure
RestartSec=20s
[Install]
WantedBy=default.target
Reload systemd and enable service
systemctl --user daemon-reload
systemctl --user enable --now litellm
LOGS and restart
journalctl --user -u litellm -f
systemctl --user restart litellm
systemctl --user status litellm -n 50
Now you can use LiteLLM proxy as a proxy for OpenAI compatible API and connect it to your applications.
If you need more features like statistics and such, I recommend running a Docker version with proper setup.