![](/static/61a827a1/assets/icons/icon-96x96.png)
![](https://fry.gs/pictrs/image/c6832070-8625-4688-b9e5-5d519541e092.png)
The irony that this story was posted by a bot…
These are all me:
I control the following bots:
The irony that this story was posted by a bot…
I’ve reported pictures/gifs of accidental nudity that were posted on Reddit without any evidence of consent, and they blew me off. Not just ignored me - they took the time to say the content was fine.
Yeah, it was legal to post stuff like that - no reasonable expectation of privacy in public places and all that. But it isn’t ethical. Don’t do it. It isn’t funny.
AI content isn’t watermarked, or detection would be trivial. What he’s talking about is that certain words have a certain probability of appearing after certain other words in a certain context. While there is some randomness to the output, certain words or phrases are unlikely to appear because the data the model was based on didn’t use them.
All I’m saying is that the more a writer’s writing style and word choice are similar to the data set, the more likely their original content would be flagged as AI generated.
Here’s the thing though - the probabilities for word choice come from the data the model was trained on. While someone that uses a substantially different writing style / word choice than the LLM could easily be identified as being not from the LLM, someone with a similar writing style might be indistinguishable from the LLM.
Or, to oversimplify: given that Reddit was a large portion of the input data for ChatGPT, all you need to do is write like a Redditor to sound like ChatGPT.
If it could, it couldn’t claim that the content out produced was original. If AI generated content were detectable, that would be a tacit admission that it is entirely plagiarized.
The base assumption of those with that argument is that an AI is incapable of being original, so it is “stealing” anything it is trained on. The problem with that logic is that’s exactly how humans work - everything they say or do is derivative from their experiences. We combine pieces of information from different sources, and connect them in a way that is original - at least from our perspective. And not surprisingly, that’s what we’ve programmed AI to do.
Yes, AI can produce copyright violations. They should be programmed not to. They should cite their sources when appropriate. AI needs to “learn” the same lessons we learned about not copy-pasting Wikipedia into a term paper.
Copyright 100% applies to the output of an AI, and it is subject to all the rules of fair use and attribution that entails.
That is very different than saying that you can’t feed legally acquired content into an AI.
No, you misunderstand. Yes, they can control how the content in the book is used - that’s what copyright is. But they can’t control what I do with the book - I can read it, I can burn it, I can memorize it, I can throw it up on my roof.
My argument is that the is nothing wrong with training an AI with a book - that’s input for the AI, and that is indistinguishable from a human reading it.
Now what the AI does with the content - if it plagiarizes, violates fair use, plagiarizes- that’s a problem, but those problems are already covered by copyright laws. They have no more business saying what can or cannot be input into an AI than they can restrict what I can read (and learn from). They can absolutely enforce their copyright on the output of the AI just like they can if I print copies of their book.
My objection is strictly on the input side, and the output is already restricted.
My point is that the restrictions can’t go on the input, it has to go on the output - and we already have laws that govern such derivative works (or reuse / rebroadcast).
There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.
There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.
When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.
Folks, this isn’t a new problem, and it doesn’t need new laws.
The fediverse is the name for services that use ActivityPub - a communication protocol. What you are saying is like saying “tech companies, banks and regulators need to crack down on http because there is CSAM on the web”.
In medicine, when a big breakthrough happens, we hear that we could see practical applications of the technology in 5-10 years.
In computer technology, we reach the same level of proof of concept and ship it as a working product, and ignore the old adage “The first 90% of implementation takes 90% of the time, and the last 10% takes the other 90%”.
As noted elsewhere, do everything you can to avoid handing your card to anyone.
Use tap to pay wherever possible, then chip - neither of those methods give the card number to the merchant. Do not swipe unless you absolutely have to, and then inspect what you are swiping to make sure nothing is attached to the card reader.
For online purchases, do everything you can to avoid giving your card number to anyone - use ApplePay / GooglePay / Amazon Pay / PayPal etc. wherever possible. These can be used to put charges on your card without giving your card # to the merchant. These are one-time authorizations (unless you explicitly identify it as a subscription / recurring charge), so they can’t reuse the transaction token they get.
As a (hopefully good citizen) bot maintainer, the best advice I can give is to not follow active or new. Hot and top should not show bot posts unless they are being upvoted.
Go read about the red scare(s). Is it that hard to imagine a politician with a rabid following accusing some group of people of being “disloyal” and starting a witch hunt to destroy the reputations and careers of the people who fall under that umbrella? It is truly scary to think what McCarthy could have accomplished if he could have subpoenaed Google’s data.
According to the article, the reason the repair was expensive was structural and had nothing to do with it being an EV.
That being said, I ran over a rock with a 1 year old EV and totaled it because it put a 4” crack in the battery pack and cracked the high voltage power harness. (Insurance covered it…that’s what car insurance is for folks)
Not only is Discord a bad replacement for Reddit, it is another monolithic platform struggling to find a business model. The enshitification of Discord is real, and is going to get worse.
Users aren’t going to care about privacy until there are consequences. Given tendencies in red states in the US, I expect some people to be arrested based on social media data - not “here’s a post of me breaking the law” kind of data, but “you browsed this site while posting this comment after seeing your doctor last Tuesday which is a strong indication you were trying to cause a miscarriage” kind of data.
I worked for a company that had an expensive San Jose lease during the .com bubble. When they decided they needed to get out of that lease, they folded the company - “fired” everyone, then re-hired everyone under an independent second company that was owned by the parent company. Sketchy, but not really surprising…
When they re-hired me, they didn’t have me sign any NDAs. All the old NDAs were with the company that folded, not the parent company. Some days I wish I had been unethical enough to sell off their source code to a competitor.
Not really - it isn’t prediction, it is early detection. Interpretive AI (finding and interpreting patterns) is way ahead of generative AI.