• Grimy@lemmy.world
    link
    fedilink
    English
    arrow-up
    26
    arrow-down
    26
    ·
    7 hours ago

    80% of the book market is owned by 5 publishing houses.

    They want to create a monopoly around AI and kill open source. The copyright industry is not our friend. This is a win, not a loss.

    • OmegaMouse@pawb.social
      link
      fedilink
      English
      arrow-up
      25
      arrow-down
      1
      ·
      7 hours ago

      What, how is this a win? Three authors lost a lawsuit to an AI firm using their works.

      • ShittyBeatlesFCPres@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        57 minutes ago

        It would harm the A.I. industry if Anthropic loses the next part of the trial on whether they pirated books — from what I’ve read, Anthropic and Meta are suspected of getting a lot off torrent sites and the like.

        It’s possible they all did some piracy in their mad dash to find training material but Amazon and Google have bookstores and Google even has a book text search engine, Google Scholar, and probably everything else already in its data centers. So, not sure why they’d have to resort to piracy.

      • Grimy@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        2
        ·
        4 hours ago

        The lawsuit would not have benefitted their fellow authors but their publishing houses and the big ai companies.

        • Sentient Loom@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          1 hour ago

          used to train both commercial

          commercial training is, in this case, stealing people’s work for commercial gain

          and open source language models

          so, uh, let us train open-source models on open-source text. There’s so much of it that there’s no need to steal.

          ?

          I’m not sure why you added a question mark at the end of your statement.

    • hendrik@palaver.p3x.de
      link
      fedilink
      English
      arrow-up
      13
      ·
      7 hours ago

      Keep in mind this isn’t about open-weight vs other AI models at all. This is about how training data can be collected and used.

      • Grimy@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        4 hours ago

        Because of the vast amount of data needed, there will be no competitive viable open source solution if half the data is kept in a walled garden.

        This is about open weights vs closed weights.

        • JcbAzPx@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 hour ago

          They haven’t dewalled the garden yet. The copyright infringement part of the case will continue.

      • bob_omb_battlefield@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        24
        arrow-down
        4
        ·
        7 hours ago

        If you aren’t allowed to freely use data for training without a license, then the fear is that only large companies will own enough works or be able to afford licenses to train models.

        • Nomad Scry@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          13
          arrow-down
          3
          ·
          7 hours ago

          If they can just steal a creator’s work, how do they suppose creators will be able to afford continuing to be creators?

          Right. They think we have enough original works that the machines can just make any new creations.

          😠

          • Grimy@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            4 hours ago

            The companies like record studio who already own all the copyrights aren’t going to pay creators for something they already owned.

            All the data has already been signed away. People are really optimistic about an industry that has consistently fucked everyone they interact with for money.

          • MudMan@fedia.io
            link
            fedilink
            arrow-up
            11
            arrow-down
            2
            ·
            7 hours ago

            It is entirely possible that the entire construct of copyright just isn’t fit to regulate this and the “right to train” or to avoid training needs to be formulated separately.

            The maximalist, knee-jerk assumption that all AI training is copying is feeding into the interests of, ironically, a bunch of AI companies. That doesn’t mean that actual authors and artists don’t have an interest in regulating this space.

            The big takeaway, in my book, is copyright is finally broken beyond all usability. Let’s scrap it and start over with the media landscape we actually have, not the eighteenth century version of it.

            • hendrik@palaver.p3x.de
              link
              fedilink
              English
              arrow-up
              4
              ·
              edit-2
              6 hours ago

              I’m fairly certain this is the correct answer here. Also there is a seperation between judicative and legislative. It’s the former which is involved, but we really need to bother the latter. It’s the only way, unless we want to use 18th century tools on the current situation.

            • Grimy@lemmy.world
              link
              fedilink
              English
              arrow-up
              4
              ·
              edit-2
              4 hours ago

              Yes precisely.

              I don’t see a situation where the actual content creators get paid.

              We either get open source ai, or we get closed ai where the big ai companies and copyright companies make bank.

              I think people are having huge knee jerk reactions and end up supporting companies like Disney, Universal Music and Google.

        • hendrik@palaver.p3x.de
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          5 hours ago

          Yes. But then do something about it. Regulate the market. Or pass laws which address this. I don’t really see why we should do something like this then, it still kind of contributes to the problem as free reign still advantages big companies.

          (And we can write in law whatever we like. It doesn’t need to be a stupid and simplistic solution. If you’re concerned with big companies, just write they have to pay a lot and small companies don’t. Or force everyone to open their models. That’s all options which can be formulated as a new rule. And those would address the issue at hand.)

    • SonOfAntenora@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      2
      ·
      edit-2
      7 hours ago

      Cool than, try to do some torrenting out there and don’t hide that. Tell us how it goes.

      The rules don’t change. This just means AI overlords can do it, not that you can do it too

      • OfCourseNot@fedia.io
        link
        fedilink
        arrow-up
        4
        arrow-down
        1
        ·
        5 hours ago

        I’ve been pirating since Napster, never have hidden shit. It’s usually not a crime, except in America it seems, to download content, or even share it freely. What is a crime is to make a business distributing pirated content.

        • SonOfAntenora@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          5 hours ago

          I know but you see what they’re doing with ai, a small server used for piracy and sharing is punished, in some cases, worse than a theft. AI business are making bank (or are they? There is still no clear path to profitability) on troves pirated content. This (for small guys like us) is not going to change the situation. For instance, if we used the same dataset to train some AI in a garage and with no business or investor behind things would be different. We’re at a stage where AI is quite literally to important to fail for somebody out there. I’d argue that AI is, in fact going to be shielded for this reason regardless of previous legal outcomes.

          • hendrik@palaver.p3x.de
            link
            fedilink
            English
            arrow-up
            2
            ·
            5 hours ago

            Agreed. And even if it were, it’s always like this. Anthropic is a big company. They likely have millions available for good lawyers. While the small guy hasn’t. So they’re more able to just do stuff and do away with some legal restrictions. Or just pay a fine and that’s pocket change for them. So big companies always have more options than the small guy.