@KingRandomGuy

KingRandomGuy@lemmy.world · 1 day ago

No, the server is on the github account linked above as well. The repo is here.

Signal however doesn’t federate and does not generally support third-party clients.

KingRandomGuy@lemmy.world · edit-2 3 days ago

I used to do something like this before Signal became a thing. We used to use OTR via the Pidgin OTR plugin to send encrypted messages over Google Hangouts. Funnily enough, I’m pretty sure Pidgin supports Discord, so you could use the exact same setup to achieve what you described.

It was pretty funny to check the official Hangouts web client and see nonsensical text being sent.

KingRandomGuy@lemmy.world · edit-2 5 days ago

This was also one of my concerns with the hype surrounding low cost SLS printers like Micronics, especially if they weren’t super well designed. The powder is incredibly dangerous to inhale so I wouldn’t want a home hobbyist buying that type of machine without realizing how harmful it could be. My understanding is even commercial SLS machines like HP’s MJF and FormLab’s Fuse need substantial ventilation (HEPA filters, full room ventilation, etc.) in order to be operated safely.

Metal is of course even worse. It has all the same respiratory hazards (the fine particles will likely cause all sorts of long-term lung damage) but it also presents a massive fire and explosion risk.

I can’t see these technologies making it into the home hobbyist sphere anytime soon as a result, unfortunately.

KingRandomGuy@lemmy.world · 9 days ago

TBH the paper is a bit light on the details, at least compared to the standards of top ML conferences. A lot of DeepSeek’s innovations on the engineering front aren’t super well documented (at least well enough that I could confidently reproduce them) in their papers.

KingRandomGuy@lemmy.world · 9 days ago

I have tried them, and to be honest I was not surprised. The hosted service was better at longer code snippets and in particular, I found that it was consistently better at producing valid chain of thought reasoning chains (I’ve found that a lot of simpler models, including the distills, tend to produce shallow reasoning chains, even when they get the answer to a question right).

I’m aware of how these models work; I work in this field and have been developing a benchmark for reasoning capabilities in LLMs. The distills are certainly still technically impressive and it’s nice that they exist, but the gap between them and the hosted version is unfortunately nontrivial.

KingRandomGuy@lemmy.world · 10 days ago

It might be trivial to a tech-savvy audience, but considering how popular ChatGPT itself is and considering DeepSeek’s ranking on the Play and iOS App Stores, I’d honestly guess most people are using DeepSeek’s servers. Plus, you’d be surprised how many people naturally trust the service more after hearing that the company open sourced the models. Accordingly I don’t think it’s unreasonable for Proton to focus on the service rather than the local models here.

I’d also note that people who want the highest quality responses aren’t using a local model, as anything you can run locally is a distilled version that is significantly smaller (at a small, but non-trivial overalll performance cost).

KingRandomGuy@lemmy.world · 10 days ago

TBF you almost certainly can’t run R1 itself. The model is way too big and compute intensive for a typical system. You can only run the distilled versions which are definitely a bit worse in performance.

Lots of people (if not most people) are using the service hosted by Deepseek themselves, as evidenced by the ranking of Deepseek on both the iOS app store and the Google Play store.

KingRandomGuy@lemmy.world · 12 days ago

Part of this was an optimization that was necessary due to their resource restrictions. Chinese firms can only purchase H800 GPUs instead of H200 or H100. These have much slower inter-GPU communication (less than half the bandwidth!) as a result of export bans by the US government, so this optimization was done to try and alleviate some of that bottleneck. It’s unclear to me if this type of optimization would make as big of a difference for a lab using H100s/H200s; my guess is that it probably matters less.

KingRandomGuy@lemmy.world · 12 days ago

I think the thing that Jensen is getting at is that CUDA is merely a set of APIs. Other hardware manufacturers can re-implement the CUDA APIs if they really wanted to (especially since AFAIK, Google v Oracle ruled that APIs cannot be copyrighted). In fact, AMD’s HIP implements many of the same APIs as CUDA, and they ship a tool (HIPIFY) to convert code written for CUDA for HIP instead.

Of course, this does not guarantee that code originally written for CUDA is going to perform well on other accelerators, since it likely was implemented with NVIDIA’s compute model in mind.

KingRandomGuy@lemmy.world · edit-2 12 days ago

What I’m curious to see is how well these types of modifications scale with compute. DeepSeek is restricted to H800s instead of H100s or H200. These are gimped cards to get around export controls, and accordingly they have lower memory bandwidth (~2 vs ~3 TB/s) and most notably, much slower GPU to GPU communication (something like 400 GB/s vs 900 GB/s). The specific reason they used PTX in this application was to help alleviate some of the bottlenecks due to the limited inter-GPU bandwidth, so I wonder if that would still improve performance on H100 and H200 GPUs where bandwidth is much higher.

KingRandomGuy@lemmy.world · 12 days ago

IIRC Zluda does support compiling PTX. My understanding is that this is part of why Intel and AMD eventually didn’t want to support it - it’s not a great idea to tie yourself to someone else’s architecture you have no control or license to.

OTOH, CUDA itself is just a set of APIs and their implementations on NVIDIA GPUs. Other companies can re-implement them. AMD has already done this with HIP.

KingRandomGuy@lemmy.world · 14 days ago

Huh. Everything I’m reading seems to imply it’s more like a DSP ASIC than an FPGA (even down to the fact that it’s a VLIW processor) but maybe that’s wrong.

I’m curious what kind of work you do that’s led you to this conclusion about FPGAs. I’m guessing you specifically use FPGAs for this task in your work? I’d love to hear about what kinds of ops you specifically find speedups in. I can imagine many exist, as otherwise there wouldn’t be a need for features like tensor cores and transformer acceleration on the latest NVIDIA GPUs (since obviously these features must exploit some inefficiency in GPGPU architectures, up to limits in memory bandwidth of course), but also I wonder how much benefit you can get since in practice a lot of features end up limited by memory bandwidth, and unless you have a gigantic FPGA I imagine this is going to be an issue there as well.

I haven’t seriously touched FPGAs in a while, but I work in ML research (namely CV) and I don’t know anyone on the research side bothering with FPGAs. Even dedicated accelerators are still mostly niche products because in practice, the software suite needed to run them takes a lot more time to configure. For us on the academic side, you’re usually looking at experiments that take a day or a few to run at most. If you’re now spending an extra day or two writing RTL instead of just slapping together a few lines of python that implicitly calls CUDA kernels, you’re not really benefiting from the potential speedup of FPGAs. On the other hand, I know accelerators are handy for production environments (and in general they’re more popular for inference than training).

I suspect it’s much easier to find someone who can write quality CUDA or PTX than someone who can write quality RTL, especially with CS being much more popular than ECE nowadays. At a minimum, the whole FPGA skillset seems much less common among my peers. Maybe it’ll be more crucial in the future (which will definitely be interesting!) but it’s not something I’ve seen yet.

Looking forward to hearing your perspective!

KingRandomGuy@lemmy.world · 14 days ago

Is XDNA actually an FPGA? My understanding was that it’s an ASIC implementation of the Xilinx NPU IP. You can’t arbitrarily modify it.

KingRandomGuy@lemmy.world · 22 days ago

Good point! For my use case (on a different brand, Sony) I’m fine with the lowered resolution since I just use it for video conferencing, in which case the raw resolution is limited anyway. But for users who need higher resolution, using an HDMI capture card might be better for a one time fee rather than a subscription.

KingRandomGuy@lemmy.world · 23 days ago

He’s apparently said he was born in 1988. In another thread others mentioned that would make him 21 when he started his PhD, which checks out.

KingRandomGuy@lemmy.world · 24 days ago

Is mechanical shutter necessary for max bit depth on your camera? It isn’t on mine (Sony), but bit depth reduces to 12 bit if you max out the framerate. You might still be able to get full 14 bit RAWs if you drop the framerate.

KingRandomGuy@lemmy.world · 24 days ago

You can do this on Linux using gphoto2, ffmpeg, and v4l2loopback. You probably won’t get full resolution but the quality will still be good enough for video conferencing. See here for a guide.

KingRandomGuy@lemmy.world · 24 days ago

Not that unusual IMO, lots of people start their PhD directly after completing their Bachelor’s. If they weren’t born born in the first half of the year then they’ll have completed their BS by 21 and start the PhD either at 21 or 22.

KingRandomGuy@lemmy.world · edit-2 2 months ago

But at least regulators can force NVIDIA to open their CUDA library and at least have some translation layers like ZLUDA.

I don’t believe there’s anything stopping AMD from re-implementing the CUDA APIs; In fact, I’m pretty sure this is exactly what HIP is for, even though it’s not 100% automatic. AMD probably can’t link against the CUDA libraries like cuDNN and cuBLAS, but I don’t know that it would be useful to do that anyway since I’m fairly certain those libraries have GPU-specific optimizations. AMD makes their own replacements for them anyway.

IMO, the biggest annoyance with ROCm is that the consumer GPU support is very poor. On CUDA you can use any reasonably modern NVIDIA GPU and it will “just work.” This means if you’re a student, you have a reasonable chance of experimenting with compute libraries or even GPU programming if you have an NVIDIA card, but less so if you have an AMD card.

KingRandomGuy@lemmy.world · 2 months ago

I work in CV and I have to agree that AMD is kind of OK-ish at best there. The core DL libraries like torch will play nice with ROCm, but you don’t have to look far to find third party libraries explicitly designed around CUDA or NVIDIA hardware in general. Some examples are the super popular OpenMMLab/mmcv framework, tiny-cuda-nn and nerfstudio for NeRFs, and Gaussian splatting. You could probably get these to work on ROCm with HIP but it’s a lot more of a hassle than configuring them on CUDA.