• conciselyverbose@kbin.social
    link
    fedilink
    arrow-up
    8
    ·
    1 year ago

    If it’s actually High Bandwidth Memory, it’s the VRAM they use for some video cards/SoCs.

    It might be mostly the same components, but the high bandwidth part is important and harder to do. They get the much higher throughput by physically stacking the chips on top of each other directly on the chip. The much lower distance signals have to travel (combined with a lot of pins to send signal through) do more than you can do with traditional RAM.

    • GiveMemes@jlai.lu
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      There’s a company making analog chips that do the matrix calculations at a (15 or) 60x (I forget which) more efficient rate than moden chips (by multiplying voltages I believe). Even though one is only about 1/3 the processing power of a modern gpu, stack enough together and you’re cooking. The matrix multiplication aspect is what we’re using the VRAM for right?

      • conciselyverbose@kbin.social
        link
        fedilink
        arrow-up
        3
        ·
        1 year ago

        The actual models telling them what to multiply are, to my knowledge.

        VRAM isn’t the low level “working” memory. You still have to pull structures from memory and into actual use. If you’re working on pen and paper, a bookshelf might be system storage and your desk might be RAM/VRAM, but you still need to copy the numbers from your desk onto the piece of paper you’re working on. That’s lower level cache, registers, the tensor cores, etc.

        If the chip you’re discussing is a better calculator, that’s useful, but you still need the big desk to hold the huge amount of information you need to reference at any given time.

        My brain is mush for some reason today, so that might not make sense, but better matrix operations shouldn’t remove the need to have access to a huge model.

        • GiveMemes@jlai.lu
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          Thanks for the informative reply! Looks like I need to brush up on my hardware knowledge lol

    • lol3droflxp@kbin.social
      link
      fedilink
      arrow-up
      0
      ·
      1 year ago

      I get that this is expensive. However, it should also work with RAM if you accept slower speeds I guess. The question is of course if it’s still usable then.

      • averyminya@beehaw.org
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        Most current locally hosted software has some option to offload to RAM, CPU, and disk. VRAM is fastest, but RAM and CPU offloading lets you cut down to less than 4GB VRAM for certain applications, at plenty reasonable speed.

      • abhibeckert@beehaw.org
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        1 year ago

        GPT-4 is already kinda slow - it works best as a “conversational” tool where you ask follow up questions and clarify things that have already been said. That’s painful when you have to wait 10 seconds for a response. I couldn’t imagine it being useful if it was minutes.