• 4 Posts
  • 701 Comments
Joined 2 years ago
cake
Cake day: June 20th, 2023

help-circle

















  • solrize@lemmy.worldtoSelfhosted@lemmy.worldSelfhosting wikipedia
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    9 days ago

    I haven’t looked in a few years but 20TB is probably plenty. I agree that Wikipedia lost its way once it got all that attention online and all that search traffic. Everyone should have their own copy of Wikipedia. I used to download the daily incremental data dumps but got tired of it. I still have a few TB of them around that I’ve been wanting to merge.


  • The text is in not-exactly-convenient database dumps (see other commenter’s link) and there are daily diffs (mostly bot noise), but then there are the images and other media, which are way up in the terabytes by now. There are some docs, maybe out of date, about how to run the software yourself. It’s written in PHP and it’s big and complicated.