• 0 Posts
  • 363 Comments
Joined 1 year ago
cake
Cake day: December 6th, 2024

help-circle

  • Even the LLM part might be considered Plagiarism.

    Basically, unlike humans it cannot assemble an output based on logical principles (i.e. assembled a logical model of the flows in a piece of code and then translate it to code), it can only produce text based on an N-space of probabilities derived from the works of others it has “read” (i.e. fed to it during training).

    That text assembling could be the machine equivalent of Inspiration (such as how most programmers will include elements they’ve seen from others in their code) but it could also be Plagiarism.

    Ultimately it boils down to were the boundary between Inspiration and Plagiarism stands.

    As I see it, if for specific tasks there is overwhelming dominance of trained weights from a handful of works (which, one would expect, would probably be the case for a C-compiler coded in Rust), then that’s a lot more towards the Plagiarism side than the Inspiration side.

    Granted, it’s not the verbatim copying of an entire codebase that would legally been deemed Plagiarism, but if it’s almost entirely a montage made up of pieces from a handful of codebases, could it not be considered a variant of Plagiarism that is incredibly hard for humans to pull off but not so for an automated system?

    Note that obviously the LLM has no “intention to copy”, since it has no will or cognition at all, what I’m saying is that the people who made it have intentionally made an automated system that copies elements of existing works, which normally assembles the results from very small textual elements (same as a person who has learned how letters and words work can create a unique work from letters and words) but with the awareness that in some situations that automated system they created can produce output based on an amount of sources which is very low to the point that even though it’s assembling the output token by token, it’s pretty much just copying whole blocks from those sources same as a human manually copying a text from a document to a different document would.

    In summary, IMHO LLMs don’t always plagiarize, but can sometimes do it when the number of sources that ended up creating the volume of the N-dimensional probabilistic space the LLM is following for that output is very low.


  • It’s even simpler than that: using an LLM to write a C compiler is the same as downloading an existing open source implementation of a C compiler from the Internet, but with extra steps, as the LLM was actually fed with that code and is just re-assembling it back together but with extra bugs - plagiarism hidden behind an automated text parrot interface.

    A human can beat the LLM at that by simply finding and downloading an implementation of that more than solved problem from the Internet, which at worse will take maybe 1h.

    The LLM can “solve” simple and well defined problems because its basically plagiarizing existing code that solves those problems.


  • The problem is that LLMs don’t generate “an answer” as a whole, they just generate tokens (generally word-sized, but not always) for the next text element given the context of all the text elements (the whole conversation) so far and the confidence level is per-token.

    Further, the confidence level is not about logical correctness, it’s about “how likely is this token to appear in this context”.

    So even if you try using token confidence you still end up stuck due to the underlying problem that the LLMs architecture is that of a “realistic text generator” and hence that confidence level is all about “what text comes next” and not at all about the logical elements conveyed via text such as questions and answers.


  • If you want a low power, cheap x86 mini-PC to run a Linux box for low demand uses (personal TV Box, PC for a family member that only ever does light web browsing and e-mail) they do have some nice processors.

    I mean, you can also use an ARM SBC for some of those things, but it’s handy to have an x86 processor because of easier availability of binaries, plus even the low power ones are actually more powerful than the ARM stuff.

    That’s about the only thing, really.


  • The only place in the EU with surveillance anywhere as bad as the US was Britain and they aren’t in the EU anymore.

    And this is just State surveillance.

    When it comes to Private Sector surveillance, nowhere in the EU are things anywhere close to as bad in the US since EU countries have far tighter Privacy regulations and even outside the EU-wide regulations most countries have had pretty strict Medical and Banking data regulations for quite a while.

    That Propaganda in the US is a mix of straight bullshit about government surveillance in Europe - which in reality is not much of a thing outside dictatorships or Britain - and the insiduious take of, anchored on the Hard-Neoliberal Fable that Public Is Bad, Private Is Good, not even considering private sector surveillance and its impact, when that’s a far worse problem in the US than in Europe.



  • Let me put things this way:

    • Hands up anybody who doesn’t believe that, if they can, Health Insurance companies won’t mine the shit out of your purchase data and Car Insurance mine the shit out of your driving data to try and fine tune your risk group in their models and find out any change if your conditions that impact their bottom line (and dump you if they can if you switch to a high risk group)

    Even if one’s relaxed about data mining of private data for the purpose of serving you custom adverts, there are plenty of other use cases which can actually cost you money, not to mention the risk when the Authorities start running crime-predictive models sold to them by slimy Tech Investors with high enough rates of false positive that you run the risk of being tagged a “Terrorism” for some stupid shit like buying more bleech than the average person.

    Even you think you’re above board on everything and about as boring and uninteresting a person as possible, there are plenty of ways in which others known everything about you might come around and bit you in the ass in very concrete ways.


  • I’m pretty sure plenty if not most of people here pay most of their shopping with a card rather than cash, even though that shit at minimum goes into a database for ever and ever, probably shared with the authorities and in some countries just outright sold for pennies to anybody willing to pay for it.

    And don’t get me started with just how many Techies jumped into Tesla’s “surveillance nightmare on wheels” - I mean, Techies were very much a large block of early adopters of Tesla cars and this was already well after the Snowden Revelations.

    Further, how many people are in the habit of accessing the Internet behind a VPN?

    (Personally, living in Britain - maybe the worst offender - at the time, the Snowden Revelations were what prompted me to start using a VPN regularly)

    Whilst lots of people here have an actual “lets keep my digital footprint” mindset and praxis, I get the impression that most do not, and even those who bitch and moan about “surveillance” trade convenience (or, even worse, the Techie desire for “shinny new thing” thus getting shit like Alexa) for high digital visibility.

    So yeah, maybe not “Yall”, but probably “Most of you”.


  • What “global backlash”?

    If there had been such a thing European citizens and companies would have not have spent the next decade putting their data in America’s hands and now be scrambling to decouple as American goes from Hard Neoliberal At Home Fascist Abroad to Full-on Fascists Everywhere.

    For people paying attention back then it was painfully obvious back then that one could not trust one’s data in the hands of American companies or in fact any companies from a 4-eyes (meanwhile expanded to 7-eyes) country and yet the rush for putting personal and corporate data in American cloud systems were insane (not helped by the EU approving the US as a “safe haven” for data, something so outrageous after the the Snowden Revelations that I bet a lot of people involved were either customers of Epstein’s “services” or corrupt as fuck).

    In fact, that massive surveillance cooperative operation expanding from 4 countries to 7 is also a pretty good indication that there wasn’t really a “global backlash”, otherwise countries like New Zeeland would be wary of joining it as it would get them cut out of international data networks and agreements.

    Only countries like China seem to have taken the whole thing seriously and setup their own local stack of consumer and corporate data sharing and storing, and that seems to have been driven at least partly by wanting to do exactly the same as the 4-eyes countries were doing.


  • The US can change their laws to not have a global wiretap and secret backdoor warrant program, then this would be possible.

    Even if they did, they can change them right back whenever they want and the thing with data is that, once it’s out there somewhere, there’s no way of knowing for sure it hasn’t been copied and archived.

    Not just from recent events but from the Snowden Revelations and the decades of 4-eyes operations even before that, we’re well beyond the point of it being possible to trust US-based and US-registered companies with the data of Europeans, and ditto for those of any other of what are now the 7-eyes countries.



  • I just want to inject here my experience in Britain during the 2008 Crash and its aftermath:

    In Britain, the Finance Industry was 17% of GDP, so when the Crash happened the country was disproportionally hit.

    After the crash the autorities chose to protect Asset Owners above all:

    • Interest rates were lowered to 0%, thus protecting lenders (i.e. those with the money to lend or ownership of Banks which in the modern system can de facto create money: if you don’t believe me, read the paper “Money Creation In The Modern Economy” from the Bank Of England) from debt defaults, also indirectly protecting Asset Owners by avoiding asset firesales from collateral confiscated after a default thus avoiding the associate asset price falls, most notably for Land and Housing (in the UK the Housing bubble never really stopped being inflated and Land Ownership is the core of Old Wealth)
    • Banks were unconditionally saved by the state taking a share in them. That Public share was then put under management of a group made up of bankers “so that the government doesn’t interfere in the market”. De facto pressure for changing from the very practices that had cause the Crash was removed and most of the people having the blame for the failures of the Crash kept their positions of privilege.
    • All this was paid by most people through Austerity. Public services were cut, Social Security (aka "Benefits) were reduced, salaries stagnated. The poorer one was the worst they got hit.

    By 2015 the incomes of the top wealthier 10% of the population were growing in real terms 23% per year whilst the bottom 90% were seeing their incomes fall 1% per year in real terms.

    This was roughly how things went for about a decade after the Crash. UK inequality is nowadays huge, social mobility near non-existent, average incomes when measured in a currency other than the pound - which went down following Brexit - have stagneted, overall economic growth is anemic and concentrated in highest wealth layers since that “growth” told by official GDP numbers is mostly asset prices going up.

    This is the process by which the billionaires make sure they win: everybody gets hit more or less in a Crash, but in during the subsequent period when the state is supposedly trying to fix it, you get also sorts of “extreme measures required by extreme times” that, “curiously”, help the billionaires the most, so some years later everybody but the wealthiest slices of society are worst of whilst the wealthiest are much richer even than before the Crash.

    I expect the plans of the billionaires who are cozying up with Trump is exactly to end up richer via this process.




  • Having lived in a couple of countries in Europe, from The Netherlands which has Proportional Vote system and a thus a multitude of small parties to Britain with a First Past The Post system like the US and thus pretty much a Two Party System, I’ve concluded that at least in Politics stability is just like standing water - it invariably turns into a swamp.

    We need some amount of constant change to bring up and flush out the rot that innevitably accumulates in the murky waters of a system were power is always in the hands of a subset of people who are all in the same social circles, went to the same schools and whose sons and daughters marry each other.

    Not “Daily Revolution”, just regular change so that any funny business going on outside the public eye risks being brought to light, destroyed and the guilty people punished because power has changed has to people who aren’t mates of the crooks that did it.




  • Unless things have changed recently LLMs don’t really used slow data stores with very high capacity such as HDDs, at least not beyond the training stage.

    The prices that have been pushed up by AI are for GPUs and DRAM (price rises which in turn possibly feed onwards to other kinds of chip done in the same kind of fab), whilst this stuff is magnetic data storage on movable disk plates, a very different tech.

    I expect these things at most will only be affected in price very indirectly (for example, if memory prices go up because of all the datacenters targetting AI applications, there might be fewer datacenters set up for other kinds of server side application which are more data-centric, which would impact demand for ultra high-capacity HDDs).

    Not that it makes much of a difference to us run-of-the-mill techies as consumers - even if HDDs get cheaper, with many times more expensive GPUs and RAM we can hardly put together new systems using these things, so at best it might just get a bit cheaper to expand one’s large storage NAS (the slower kind just storing data that doesn’t get accessed often, as the other kind uses SDDs).