Will Manidis is the CEO of AI-driven healthcare startup ScienceIO

  • Thrife@feddit.org
    link
    fedilink
    arrow-up
    14
    ·
    4 days ago

    Is reddit still feeding Googles LLM or was it just a one time thing? Meaning will the newest LLM generated posts feed LLMs to generate posts?

    • shittydwarf@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      22
      arrow-down
      1
      ·
      edit-2
      4 days ago

      The truly valuable data is the stuff that was created prior to LLMs, anything after this is tainted by slop. Any verifiable human data would be worth more, which is why they are simultaneously trying to erode any and all privacy

      • gandalf_der_12te@discuss.tchncs.de
        link
        fedilink
        arrow-up
        2
        arrow-down
        2
        ·
        4 days ago

        I’m not sure about that. It implies that only humans are able to produce high-quality output. But that seems wrong to me.

        • First of all, not everything that humans produce has high quality; rather, the opposite.
        • Second, with the development of AI i think it will be very well possible for AI to generate good-quality output in the future.
        • Danquebec@sh.itjust.works
          link
          fedilink
          arrow-up
          1
          ·
          2 days ago

          They can produce high-quality answers now, but that’s just because they wwre trained on things written by humans.

          Any training on things produced by LLMs will just reproduce the same stuff, or even worse actually because it will include hallucinations.

          For an AI to discover new things and truly innovate, or learn about existing products, the world, etc. it would need to do something entirely different than what LLMs are doing.

    • whotookkarl@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      4 days ago

      These days the LLMs feed the LLMs so you can model models unless you’re excluding any public data from the last decade. You have to assume all public data based on users is tainted when used for training.