@theunknownmuncher

theunknownmuncher@lemmy.world · 4 days ago

You can overwrite the model by using the same name instead of creating one with a new name if it bothers you. Either way there is no duplication of the llm model file

theunknownmuncher@lemmy.world · 4 days ago

What I am talking about is when layers are split across GPUs. I guess this is loading the full model into each GPU to parallelize layers and do batching

theunknownmuncher@lemmy.world · edit-2 4 days ago

Can you try setting the num_ctx and num_predict using a Modelfile with ollama? https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter

theunknownmuncher@lemmy.world · edit-2 4 days ago

Are you using a tiny model (1.5B-7B parameters)? ollama pulls 4bit quant by default. It looks like vllm does not used quantized models by default so this is likely the difference. Tiny models are impacted more by quantization

I have no problems with changing num_ctx or num_predict

theunknownmuncher@lemmy.world · 5 days ago

Models are computed sequentially (the output of each layer is the input into the next layer in the sequence) so more GPUs do not offer any kind of performance benefit

theunknownmuncher@lemmy.world · edit-2 5 days ago

Ummm… did you try /set parameter num_ctx # and /set parameter num_predict #? Are you using a model that actually supports the context length that you desire…?

theunknownmuncher@lemmy.world · edit-2 7 days ago

It’s not. I can run the 2.51bit quant

theunknownmuncher@lemmy.world · 7 days ago

Tell that to my home rig currently running the 671b model…

theunknownmuncher@lemmy.world · edit-2 7 days ago

Hawley’s statement called DeepSeek “a data-harvesting, low-cost AI model that sparked international concern and sent American technology stocks plummeting.”

data-harvesting

???

It runs offline… using open-source software that provably does not collect or transmit any data…

It is low-cost and out-competes American technology, though, true

theunknownmuncher@lemmy.world · 7 days ago

this is deepseek-v3. deepseek-r1 is the model that got all the media hype: https://huggingface.co/deepseek-ai/DeepSeek-R1

theunknownmuncher@lemmy.world · 7 days ago

That’s great! Hopefully it shows up on F-Droid sometime soon

theunknownmuncher@lemmy.world · 14 days ago

there’s still a whole software-side bubble to contend with

They’re ultimately linked together in some ways (not all). OpenAI has already been losing money on every GPT subscription that they charge a premium for because they had the best product, now that premium must evaporate because there are equivalent AI products on the market that are much cheaper. This will shake things up on the software side too. They probably need more hype to stay afloat

theunknownmuncher@lemmy.world · 14 days ago

Yes, but old and “cheap” ones that were not part of the sanctions.

theunknownmuncher@lemmy.world · edit-2 14 days ago

China really has nothing to do with it, it could have been anyone. It’s a reaction to realizing that GPT4-equivalent AI models are dramatically cheaper to train than previously thought.

It being China is a noteable detail because it really drives the nail in the coffin for NVIDIA, since China has been fenced off from having access to NVIDIA’s most expensive AI GPUs that were thought to be required to pull this off.

It also makes the USA gov look extremely foolish to have made major foreign policy and relationship sacrifices in order to try to delay China by a few years, when it’s January and China has already caught up, those sacrifices did not pay off, in fact they backfired and have benefited China and will allow them to accelerate while hurting USA tech/AI companies

theunknownmuncher@lemmy.world · 14 days ago

Great analogy

theunknownmuncher@lemmy.world · edit-2 14 days ago

It literally defeats NVIDIA’s entire business model of “I shit golden eggs and I’m the only one that does and I can charge any price I want for them because you need my golden eggs”

Turns out no one actually even needs a golden egg anyway.

And… same goes for OpenAI, who were already losing money on every subscription. Now they’ve lost the ability to charge a premium for their service (anyone can train a GPT4 equivalent model cheaply, or use DeepSeek’s existing open models) and subscription prices will need to come down, so they’ll be losing money even faster

theunknownmuncher@lemmy.world · 16 days ago

I’m a big limoncello fan, I wonder if it will turn out similar

theunknownmuncher@lemmy.world · 16 days ago

Love to see it

theunknownmuncher@lemmy.world · 18 days ago

Wait really? Its been so long since I’ve been on reddit. If this is true, then it is truly a dying husk

theunknownmuncher@lemmy.world · 18 days ago

The only reason the 3070 Ti isn’t more than enough for several more years too is because of NVIDIA’s choice to include very little VRAM on these cards