For people self hosting LLMs.. I have a couple docker images I maintain
https://github.com/noneabove1182/text-generation-webui-docker (updated to 1.3.1 and has a fix for gqa to run llama2 70B)
https://github.com/noneabove1182/lollms-webui-docker (v3.0.0)
https://github.com/noneabove1182/koboldcpp-docker (updated to 1.36)
All should include up to date instructions, if you find any issues please ping me immediately so I can take a look or open an issue :)
lollms-webui is the jankiest of the images, but that one's newish to the scene and I'm working with the dev a bit to get it nicer (main current problem is the requirement for CLI prompts which he'll be removing) Koboldcpp and text-gen are in a good place though, happy with how those are running
Thanks! I'll check these out when I get to my server. I host a small LLM that help bots sound more human while going trivial tasks in Twitch.
Awesome work! Going to try out koboldcpp right away. Currently running llama.cpp in docker on my workstation because it would be such a mess to get cuda toolkit installed natively..
Out of curiosity, isn't conda a bit redundant in docker since it already is an isolated environment?
Yes that's a good comment for an FAQ cause I get it a lot and it's a very good question haha. The reason I use it is for image size, the base nvidia devel image is needed for a lot of compilation during python package installation and is huge, so instead I use conda, transfer it to the nvidia-runtime image which is.. also pretty big, but it saves several GB of space so it's a worthwhile hack :)
but yes avoiding CUDA messes on my bare machine is definitely my biggest motivation
Ah, nice.
Btw. perhaps you'd like to add:
build: .
to docker-compose.yml so you can just write "docker-compose build" instead of having to do it with a separate docker command. I would submit a PR for it but I have made a bunch of other changes to that file so it's probably faster if you do it.
I would love to have some GUI with optional vector database support that I could feed my docs into.
You want H2OGPT or just use Langchain with CLI