Rose here. Also @umbraroze for non-kbin stuff.

  • 1 Post
  • 29 Comments
Joined 2 years ago
cake
Cake day: June 14th, 2023

help-circle
  • Reddit has an user data checkout feature (IIRC, check out the user settings or maybe reddit help pages to find it).

    It’s a bit crap though.

    It takes a long time to process, especially if you happened to post in the era when the Reddit data infrastructure was horribly terrible instead of merely ordinarily terrible, and apparently this involves some handwork in the worst cases on behalf of the staff.

    Some data may be missing or truncated. It doesn’t give you data from privated/banned subreddits (which was a fun thing to discover because last time I tried to do this the blackouts were on), and even for legit stuff, long comments/posts may be truncated. Even so, I’m pretty sure that the dumps just straight up didn’t have all of my posts from several years ago, even if those were on public subreddits. So you need to make sure the checked out data is sensible.

    In conjunction to the official dumps, I recommend a few other tools, especially since the dumps aren’t really magnificently usable on their own. One tool that I found personally invaluable is reddit-user-to-sqlite, which allows you to import Reddit data dumps and available live user data (I think it does this by scraping or something, I’m sure it worked despite the API being shut down) to sqlite database, and Datasette is a nice frontend for browsing the posts.

    As for scrubbing, there’s tools for that are supposed to work. I think.


  • Yup. The robots.txt file is not only meant to block robots from accessing the site, it’s also meant to block bots from accessing resources that are not interesting for human readers, even indirectly.

    For example, MediaWiki installations are pretty clever in that by default, /w/ is blocked and /wiki/ is encouraged. Because nobody wants technical pages and wiki histories in search results, they only want the current versions of the pages.

    Fun tidbit: in the late 1990s, there was a real epidemic of spammers scraping the web pages for email addresses. Some people developed wpoison.cgi, a script whose sole purpose was to generate garbage web pages with bogus email addresses. Real search engines ignored these, thanks to robots.txt. Guess what the spam bots did?

    Do the AI bros really want to go there? Are they asking for model collapse?



  • umbraroze@kbin.socialtoLinux@lemmy.mlLinux Boomers
    link
    fedilink
    arrow-up
    21
    arrow-down
    1
    ·
    1 year ago

    So yeah, Xfce looks the same as it did 10 years ago.

    And?

    Desktop environment is meant to launch apps and give me windows and maybe have a file manager. Xfce does that. It’s a desktop environment.

    Hey, “modern” desktop environment enthusiasts, if you bring Compiz back from the dead, give us luddites a call, will you? Ohhhh you kids should have seen it back in the day. Windows and Mac users saw Compiz in action and were, like, “wat.” You don’t get them to react that way to modern Linux desktops, no. And all that is lost now. Thanks Wayland.



  • umbraroze@kbin.socialtoLinux@lemmy.ml*Permanently Deleted*
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    1 year ago

    Yeah, there’s an important distinction. Just because you could use Linux doesn’t mean you can at any particular moment.

    I don’t really do music production; I’m more into writing and visual arts and photography. I could do all of those things on Linux and be perfectly productive. But there’s a difference between being productive and being optimal. My current process happens to be based on software that runs on Windows. (Heck, a lot of the software I use already runs on both Windows and Linux, anyways.)

    The key here being that you shouldn’t lock yourself too much to just one tool and one approach, and that actually goes both ways.





  • I was a Slashdot user.

    People kept hyping Digg as a Slashdot replacement, but trying to submit posts was actually even more futile in practice than trying to submit articles to Slashdot editors. So much bigger hivemind too. Boring unfunny comment section.

    When I first joined Reddit, it seemed like it was mostly populated by Slashdot refugees. Just people posting awesome shit. Great riveting discussions, even before anyone actually read the articles. That sort of stuff.





  • Well, since it seemed to be a way to support the site and get to see new features ahead of time, so yeah, why not? I only decided not to renew my gold access when it became very clear Spez wouldn’t ban the hate subs he loved.

    As for getting gold otherwise:

    I’m an introvert, ok? I mostly only comment if I have something worthwhile to say.

    So the only comments I ever got gilded by others were drunken shitpost. And in one instance some random off the cuff post. …I don’t get it.

    Anyway. Basically, I didn’t want to post any Gold Baits™. because that way lies madness.


  • Been using a Suunto 5 Peak watch since May and it’s been absolutely great. Dunno if 250€ counts as inexpensive, but like we say in Finland, poor people can’t afford to buy cheap shit that breaks right away. (I think they have cheaper options?) Suunto watches talk to phone app which at least on Android is pretty great, and the app can talk to other services which can analyse stuff further.


  • I was a reddit user for ages. Reddit search always sucked. Heck, Reddit could barely make their own data available to the users (which is why their user histories are so limited and why the GDPR takeouts take a week). Everyone, and I mean EVERYONE, used external search engines.

    Do they want to block external searches? Literally enshittify their shit further? Are they willing to hold back progress?

    Just today I was thinking of Reddit Gold - back when I actually paid for it, the marketing spin was “you get to test new features before we add them to everyone else!” Literally none of the Gold features I’ve ever used made to the unwashed masses. I take it back, saving comments did.

    So yeah, they will hold back progress. In fact, progress isn’t on the cards. It’s just regress. AND you can be a premium user and PAY for it.


  • Here in Finland we have a really extensive and efficient plastic bottle and aluminum can recycling system. Every bottle and can has a deposit (0.40 € for large bottles, 0.20 € for small bottles, 0.15 € for cans) and you can cash them by returning them at any store. Just toss them in a machine.

    There’s even some hypermarkets where you can just pour in a giant bag full of bottles or cans and the machine sorts and prices the things automatically.

    It’s super annoying we still can’t really do the same for rest of the single use plastic, but at least trash sorting and recycling what can be recycled is a thing everywhere. We have a lot of projects that aim to reduce those. Probably the coolest recent thing was that someone came up with all-carton coffee cups. (I hope they catch on so we can get rid of the cups that have the Sad Turtle Warning. I don’t want turtles to be sad, they’re awesome.)


  • Google Podcasts to shut down in 2024

    Welp, another Google service that was too beautiful for this world.

    Time to move my subscriptions to other podcatcher then. [taking a quick look at various migration options] Hmmm. What to write on Google Podcasts gravestone? “Here lies Google Podcasts. It never supported OPML.”

    with listeners migrated to YouTube Music

    Damn. I migrated my Google Play Music purchases to YouTube Music and to this day I have no idea where they actually went. If I hadn’t downloaded the local MP3 copies with the terrible joke of a client software they had, I’d have been screwed. Went back to just buying music on iTunes.


  • Doesn’t even list the most fun part of Reddit experience.

    You fancy yourself an established user, right? Been there for years. Decade, even. Surely by any reasonable standard you can be trusted to apply common sense and etiquette by now, right? Perhaps it is finally time for you to submit new posts to a subreddit?

    Lol, nope, wrong. Every link post to a “popular” domain (e.g. YouTube) gets silently eaten by the spam filter. No, nothing tells you this. If your post has no engagement whatsoever after a day, maybe this is what happened. You can sometimes get the post unstuck by messaging the moderators. Which is sometimes fruitful. Sometimes not. Because the individual moderators making these decisions might have differing opinions on prevailing submission standards and this is about the most polite way I can put this, I’d have more colourful explanation for this but it’s late and I’m kinda tired.

    But never worry! If most of these links get eaten by the spam filter, it might be just that you’re using the old interface. Try using the new interface instead! It eats only 80% of submissions silently. The rest might even get feedback on why they got automatically rejected! Like “you haven’t linked your account to Twitter, so you’re clearly a suspicious noob.” (…I’m confused, I thought only noobs linked their accounts to Twitter. I mean, what’s the point of that to begin with?)