• 1 Post
  • 192 Comments
Joined 2 years ago
cake
Cake day: June 16th, 2023

help-circle
  • It’s definitely a trend. More and more top Chinese students are also opting to stay in China for university, rather than going to the US or Europe to study. It’s in part due to a good thing, i.e. the improving quality of China’s universities and top companies. But I think it’s a troubling development for China overall. One of China’s strengths over the past few decades has been their people’s eagerness to engage with the outside world, and turning inward will not be beneficial for them in the long run.



  • Base models are general purpose language models, mainly useful for AI researchers and people who want to build on top of them.

    Instruct or chat models are chatbots. They are made by fine-tuning base models.

    The V3 models linked by OP are Deepseek’s non-reasoning models, similar to Claude or ChatGPT4o. These are the “normal” chatbots that reply with whatever comes to their mind. Deepseek also has a reasoning model, R1. Such models take time to “think” before supplying their final answer; they tend to give better performance for stuff like math problems, at the cost of being slower to get the answer.

    It should be mentioned that you probably won’t be able to run these models yourself unless you have a data center style rig with 4-5 GPUs. The Deepseek V3 and R1 models are chonky beasts. There are smaller “distilled” forms of R1 that are possible to run locally, though.


  • Intriguingly, there’s reason to believe the R1 distills are nowhere close to their peak performance. In the R1 paper they say that the models are released as proofs of concept of the power of distillation, and the performance can probably be improved by doing an additional reinforcement learning step (like what was done to turn V3 into R1). But they said they basically couldn’t be bothered to do it and are leaving it for the community to try.

    2025 is going to be very interesting in this space.


  • No AI org of any significant size will ever disclose its full training set, and it’s foolish to expect such a standard to be met. There is just too much liability. No matter how clean your data collection procedure is, there’s no way to guarantee the data set with billions of samples won’t contain at least one thing a lawyer could zero in on and drag you into a lawsuit over.

    What Deepseek did, which was full disclosure of methods in a scientific paper, release of weights under MIT license, and release of some auxiliary code, is as much as one can expect.






  • It’s an interesting subject. If not for Beijing’s heavy hand, could Chinese internet companies have flourished much more and become international tech giants? Maybe, but there is one obvious counterpoint: where are the European tech giants? In an open playing field, it looks like American tech giants are pretty good at buying out or simply crushing any nascent competitors. If the Chinese did not have their censorship or great firewall, maybe the situation would have been like Europe, where the government tries to impose some rules, but doesn’t really have much traction, and everyone just ends up using Google, Amazon, Facebook, etc.






  • The Turing Test codified the very real fact that computer AI systems up till a few years ago couldn’t hold a conversation (outside of special conversational tricks like Eliza and Cleverbot). Deep neural networks and the attention mechanism changed the situation; it’s not a completely solved problem, but the improvement is undeniably dramatic. It’s now possible to treat chatbots as a rudimentary research assistant, for example.

    It’s just something we have to take in stride, like computers becoming capable of playing Chess or Go. There is no need to get hung up on the word “intelligence”.




  • LLMs aren’t capable of maintaining an even remotely convincing simulacrum of human connection,

    Eh, maybe, maybe not. 99% of the human-written stuff in IM chats, or posted to social media, is superficial fluff that a fine-tuned LLM should have no problem imitating. It’s still relatively easy to recognize AI models outputs in their default settings, because of their characteristic earnest/helpful tone and writing style, but that’s quite easily adjustable.

    One example worth considering: people are already using fine tuned LLMs to copilot tabletop RPGs, with decent success. In that setting, you don’t need fine literature, just a “good enough” quality of prose. And that is already far exceeding the average quality that you see in social media.