• SheeEttin@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 year ago

    Sure, but fair use is rather narrowly defined. You must consider the purpose, nature, amount, and effect. In the case of scraping entire bodies of work as training data, the purpose is commercial, the nature is not in the public interest, the amount is the work in its entirety, and the effect is to compete with the original author. It fails to meet any criteria for fair use.

    • Dr Cog@mander.xyz
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      The work is not reproduced in its entirety. Simply using the work in its entirety is not a violation of copyright law, just as reading a book or watching a movie (even if pirated) is not a violation. The reproduction of that work is the violation, and LLMs simply do not store the works in their entirety nor are they capable of reproducing them.

      • SheeEttin@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        It doesn’t have to be reproduced to be a copyright violation, only used. For example, publishing your Harry Potter fanfic would be infringement. You’re not reproducing the original material in any way, but you’re still heavily depending on it.