• 0 Posts
  • 50 Comments
Joined 2 years ago
cake
Cake day: June 15th, 2023

help-circle


  • This was also one of my concerns with the hype surrounding low cost SLS printers like Micronics, especially if they weren’t super well designed. The powder is incredibly dangerous to inhale so I wouldn’t want a home hobbyist buying that type of machine without realizing how harmful it could be. My understanding is even commercial SLS machines like HP’s MJF and FormLab’s Fuse need substantial ventilation (HEPA filters, full room ventilation, etc.) in order to be operated safely.

    Metal is of course even worse. It has all the same respiratory hazards (the fine particles will likely cause all sorts of long-term lung damage) but it also presents a massive fire and explosion risk.

    I can’t see these technologies making it into the home hobbyist sphere anytime soon as a result, unfortunately.



  • I have tried them, and to be honest I was not surprised. The hosted service was better at longer code snippets and in particular, I found that it was consistently better at producing valid chain of thought reasoning chains (I’ve found that a lot of simpler models, including the distills, tend to produce shallow reasoning chains, even when they get the answer to a question right).

    I’m aware of how these models work; I work in this field and have been developing a benchmark for reasoning capabilities in LLMs. The distills are certainly still technically impressive and it’s nice that they exist, but the gap between them and the hosted version is unfortunately nontrivial.


  • It might be trivial to a tech-savvy audience, but considering how popular ChatGPT itself is and considering DeepSeek’s ranking on the Play and iOS App Stores, I’d honestly guess most people are using DeepSeek’s servers. Plus, you’d be surprised how many people naturally trust the service more after hearing that the company open sourced the models. Accordingly I don’t think it’s unreasonable for Proton to focus on the service rather than the local models here.

    I’d also note that people who want the highest quality responses aren’t using a local model, as anything you can run locally is a distilled version that is significantly smaller (at a small, but non-trivial overalll performance cost).







  • Huh. Everything I’m reading seems to imply it’s more like a DSP ASIC than an FPGA (even down to the fact that it’s a VLIW processor) but maybe that’s wrong.

    I’m curious what kind of work you do that’s led you to this conclusion about FPGAs. I’m guessing you specifically use FPGAs for this task in your work? I’d love to hear about what kinds of ops you specifically find speedups in. I can imagine many exist, as otherwise there wouldn’t be a need for features like tensor cores and transformer acceleration on the latest NVIDIA GPUs (since obviously these features must exploit some inefficiency in GPGPU architectures, up to limits in memory bandwidth of course), but also I wonder how much benefit you can get since in practice a lot of features end up limited by memory bandwidth, and unless you have a gigantic FPGA I imagine this is going to be an issue there as well.

    I haven’t seriously touched FPGAs in a while, but I work in ML research (namely CV) and I don’t know anyone on the research side bothering with FPGAs. Even dedicated accelerators are still mostly niche products because in practice, the software suite needed to run them takes a lot more time to configure. For us on the academic side, you’re usually looking at experiments that take a day or a few to run at most. If you’re now spending an extra day or two writing RTL instead of just slapping together a few lines of python that implicitly calls CUDA kernels, you’re not really benefiting from the potential speedup of FPGAs. On the other hand, I know accelerators are handy for production environments (and in general they’re more popular for inference than training).

    I suspect it’s much easier to find someone who can write quality CUDA or PTX than someone who can write quality RTL, especially with CS being much more popular than ECE nowadays. At a minimum, the whole FPGA skillset seems much less common among my peers. Maybe it’ll be more crucial in the future (which will definitely be interesting!) but it’s not something I’ve seen yet.

    Looking forward to hearing your perspective!








  • But at least regulators can force NVIDIA to open their CUDA library and at least have some translation layers like ZLUDA.

    I don’t believe there’s anything stopping AMD from re-implementing the CUDA APIs; In fact, I’m pretty sure this is exactly what HIP is for, even though it’s not 100% automatic. AMD probably can’t link against the CUDA libraries like cuDNN and cuBLAS, but I don’t know that it would be useful to do that anyway since I’m fairly certain those libraries have GPU-specific optimizations. AMD makes their own replacements for them anyway.

    IMO, the biggest annoyance with ROCm is that the consumer GPU support is very poor. On CUDA you can use any reasonably modern NVIDIA GPU and it will “just work.” This means if you’re a student, you have a reasonable chance of experimenting with compute libraries or even GPU programming if you have an NVIDIA card, but less so if you have an AMD card.


  • I work in CV and I have to agree that AMD is kind of OK-ish at best there. The core DL libraries like torch will play nice with ROCm, but you don’t have to look far to find third party libraries explicitly designed around CUDA or NVIDIA hardware in general. Some examples are the super popular OpenMMLab/mmcv framework, tiny-cuda-nn and nerfstudio for NeRFs, and Gaussian splatting. You could probably get these to work on ROCm with HIP but it’s a lot more of a hassle than configuring them on CUDA.