Logo
Loading...
Published on

LiteLLM for local inference

Author

Why Lite LLM? Need to run the model on a server and have local OpenAI compatible API to use model in Assistant. Also, you can run other providers like OpenRouter for example.

P.S. It might be easier and better just to run a docker version.

Go to venv and install LiteLLM proxy

pyenv activate python3.11
pip install "litellm[proxy]"

Now need to create config file ~/LiteLLM/litellm.yaml, I will point it to local network server.

litellm_settings:
  drop_params: true
model_list:
  - model_name: gpt-oss-20b
    litellm_params:
      # Tell LiteLLM to forward to an OpenAI-compatible backend
      model: openai/generic
      api_base: http://server:8052
      api_key: "EMPTY"
      max_tokens: 65536
      max_input_tokens: 65536
      max_output_tokens: 32768
      input_cost_per_token: 0
      output_cost_per_token: 0
      supports_function_calling: true
      supports_parallel_function_calling: true
      supports_response_schema: true
      supports_reasoning: true
      supports_tool_choice: true
      supports_web_search: true
  - model_name: "anthropic/*"
    litellm_params:
      model: "openrouter/qwen/qwen3-coder:free" # Qwen/Qwen3-Coder-480B-A35B-Instruct
      max_tokens: 262144
      max_input_tokens: 65536
      max_output_tokens: 65536
      repetition_penalty: 1.05
      temperature: 0.7
      top_k: 20
      top_p: 0.8
router_settings:
  server_url: http://0.0.0.0:11434

Now run it

litellm --config ~/LiteLLM/litellm.yaml --port 11434

If port is in use, kill the process and try again.

# find process
sudo netstat -tulnp | grep 11434

kill -9 YOUR_PID

Let's add LiteLLM to systemd service /etc/systemd/user/litellm.service:

[Unit]
Description=LiteLLM Proxy
After=network-online.target

StartLimitInterval=1800
StartLimitBurst=60

[Service]
Type=simple

Environment=PYENV_ROOT=%h/.pyenv
Environment=PATH=%h/.pyenv/bin:%h/.local/bin:/usr/local/bin:/usr/bin
Environment=OPENROUTER_API_KEY="sk-or-12345"

ExecStart=/bin/bash -lc 'eval "$($PYENV_ROOT/bin/pyenv init -)"; pyenv activate python3.11 && litellm --config %h/LiteLLM/litellm.yaml --port 11434'

Restart=on-failure
RestartSec=20s

[Install]
WantedBy=default.target

Reload systemd and enable service

systemctl --user daemon-reload
systemctl --user enable --now litellm

LOGS and restart

journalctl --user -u litellm -f
systemctl --user restart litellm
systemctl --user status litellm -n 50

Now you can use LiteLLM proxy as a proxy for OpenAI compatible API and connect it to your applications.

If you need more features like statistics and such, I recommend running a Docker version with proper setup.