• Pechente@feddit.org
    link
    fedilink
    arrow-up
    105
    ·
    19 hours ago

    Wikipedia going down like that makes me sad, especially since due to ai crawlers, their traffic costs went up significantly.

    • clb92@feddit.dk
      link
      fedilink
      English
      arrow-up
      36
      arrow-down
      1
      ·
      edit-2
      18 hours ago

      Why would anyone crawl Wikipedia when you can freely download the complete databases in one go, likely served on a CDN…

      But sure, crawlers, go ahead and spend a week doing the same thing in a much more expensive, disruptive and error-prone way…

      • Eager Eagle@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        1
        ·
        edit-2
        17 hours ago

        There are valid reasons for not wanting the whole database e.g. storage constraints, compatibility with ETL pipelines, and incorporating article updates.

        What bothers me is that they – apparently – crawl instead of just… using the API, like:

        https://en.wikipedia.org/w/api.php?action=parse&format=json&page=Lemmy_(social_network)&formatversion=2

        I’m guessing they just crawl the whole web and don’t bother to add a special case to turn Wikipedia URLs into their API versions.

        • clb92@feddit.dk
          link
          fedilink
          arrow-up
          1
          ·
          2 hours ago

          valid reasons for not wanting the whole database e.g. storage constraints

          If you’re training AI models, surely you have a couple TB to spare. It’s not like Wikipedia takes up petabytes or anything.

      • Pechente@feddit.org
        link
        fedilink
        arrow-up
        1
        ·
        18 hours ago

        My comment was based on a podcast I listened to (Tech won’t save us, I think?). My guess is they also wanna crawl all the edits, discussion etc. which is usually not included in the complete dumps.

    • ThePantser@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      14
      ·
      19 hours ago

      Yes they should really block crawlers or force them to pay. The only way I can think of that they could do is make you have to register an account to access content but that goes against what they originally intended. But these are new times and it’s probably for the best. Wouldn’t be hard to flag obvious AI scrappers.

      • skvlp@lemmy.wtf
        link
        fedilink
        arrow-up
        3
        ·
        15 hours ago

        It seems there are ways to stop crawlers. Do a web search for “stop ai crawlers” or similar to learn more. I hope it doesn’t escalate into an arms race, but I realise I might be disappointed.

    • Pechente@feddit.org
      link
      fedilink
      arrow-up
      23
      ·
      18 hours ago

      Ironically it’s getting more popular but to me it seems it’s getting more popular in the way facebook used to get more popular. At some point your weird uncle is on it and all the good content creators just leave.

      • lance20000@lemmy.ca
        link
        fedilink
        arrow-up
        4
        ·
        13 hours ago

        Reddit stopped being fun last November.

        I am not engaging with it as much as I used to, and now I am actively annoyed with it and trying to make Lemmy my default again.

      • Balder@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        ·
        16 hours ago

        In fact, I think at least in Brazil Reddit is becoming more and more popular. Go back like 5 years ago and Brazilian subs in Portuguese were small and low-traffic except for one or two (ex: /r/brasil).

        Now there are a bunch of different themes and I see new topics being discussed quite often.

        I believe the reason is Facebook has enshittified so much that its communities are dying quickly and Brazilians are finding Reddit works better for simple discussions. Also no one posts anything personal on FB anymore, it’s all Instagram style reposts so it lost any purpose.

      • foremanguy@lemmy.ml
        link
        fedilink
        arrow-up
        5
        ·
        18 hours ago

        Didn’t really support the porn industry since the beginning but I kinda miss the porn magazines and the website made for it.

        It’s far more dangerous for multiple reasons to view porn on mainstream socials than intended websites, for young specifically… Sadly… One more thing that tend to worsen…