Sadly, it seems Ken doesn't seem to understand exactly this. At least, that is the impression I get based upon the comments of his that I've read up to now here on page 7.I have absolutely no problem with Ars selling access to their articles to OpenAI.
Access to the comments, though, I'd have preferred if they didn't.
Not everything. Only things that have been fed way too many times in and this dataset isn't one of those. It'll be weighted more than others, perhaps with upvotes and downvotes in mind, but it won't be able to reproduce any individual posts.ChatGPT can be prodded into regurgitating the content used to train it by being specific enough about the prompt
This is bullshit. The entire of the internet wasn't compressed into ChatGPT. ChatGPT has tools to look things up at runtime. that's not training. That's using a tool likeIt's still there, in its entirety, in the training data, waiting for someone to put in a prompt that will regurgitate it exactly as I created it.
search just like a person can. Language models have tool use now.It's really funny when even the existing LLMs rate your post as utterly dumb and full of shiteLuddites were so opposed to new technology that they attempted to destroy it. Technology that improved productivity but threatened their outdated livelihoods. You're fooling yourself if you think that they were motivated only by it benefitting "the few" rather than the effect it had on them, and if you think that making textiles more widely available for less money only benefitted "the few".
It's pretty sad that you (and so many others) want to see some of the most amazing and promising technology ever developed "burn".
Better training data will indeed make for better AI. Your mistake is assuming that all AI will ever be is a "bullshitting machine". It's already more useful than that even with it's known flaws, and will only get better once it can reliably apply logic, which is a major focus of AI research.
Cute "prompt injections" though.![]()
I set the window to edit posts to 60 minutes for the time being, because people were going back and editing all their post history.I can still edit my posts, is editing not working for others?
troll andI am a software engineer and
We're actually on the same page on this. I don't think we should stop or the tech be banned. I think the reaction here is mostly out of fear. I have modified my posting behavior in anticipation of this, however, as I'm sure many have. Commence downvoting!, technically, I can be automated out of my job. That's what progress does. Computers and software have automated all sorts of jobs already and we all are better off as a result. Should we stop now? Should we only ban genAI? How about Internet search? We can ban it too. Imagine how many people/operators it would take to replace Google search! Millions!
With all due respect, please keep reading past page 7.Sadly, it seems Ken doesn't seem to understand exactly this. At least, that is the impression I get based upon the comments of his that I've read up to now here on page 7.
The company determines what goes in our robots.txt. That said, I am asking if we can do exactly that: block /civis/.
Not by OAI in the last 11 months according to TFA.The reality is all those old posts have already either been scraped by someone already,
Aha. Now who is controlling other people's speech. From another thread:I set the window to edit posts to 60 minutes for the time being, because people were going back and editing all their post history.
Heh.at least you're consistent in wanting to control other people's speech.
I intend to. For 26 years I've been told the sky is falling. For 26 years, I've watched as people hate us, come back to us, and hate us again. There's even the really interesting people who come here daily to let us know how much they hate us, how they are never coming back, etc., only to be back the next day to do it all over again. All we can do is try and understand, and do our best. People can choose not to subscribe, and that's fine. In the end, that won't hurt CN, only Ars, but it is what it is. We're not entitled to readers or revenue. We know that every day we have to earn it.Let's hope you continue to prove the doubters (myself included) wrong.
I do not refuse to accept reality. I acknowledge that stopping the plague that is GenAI is almost certainly impossible. That does not mean I can not oppose it, futile as it might be. Just as I acknowledge that ads are the blood of the modern internet, but that is again something I oppose even if I acknowledge it. Also, at some point in the past, things on the internet used to not have a business model. Most readers of those articles already understand that many places on the internet are dependent on ad revenue.That's a very naive take on journalism. Perhaps, more education and more realistic takes would have helped? One overarching idiocy among many Ars posters I see is the refusal to accept the reality. This manifests itself in unstoppable hate towards Google, Meta etc. for their advertising business. As if any one of these people came up with a business model for most services on the Internet that could sustain them without ad money. It's just a lunacy. Have you seen any articles on Ars that would explain this to naive people? Ars was seemingly OK with it as long as this generated the clicks from certain people but that's not a good journalism.
Also, don't confuse Sam Altman with AI. Are you going to protest against the electric vehicles because EV==Musk?
YUP! That's gonna break some poor chat agent's e-brain for sure.Holy shit, so Ars simultaneously holds the position that comments aren't that valuable and has also determined that it's in their best interest to stop people from editing them while Ars staffers hang out in this thread and try to downplay the significance of this move?
This reads like you just dismiss people's concerns.I intend to. For 26 years I've been told the sky is falling. For 26 years, I've watched as people hate us, come back to us, and hate us again. There's even the really interesting people who come here daily to let us know how much they hate us, how they are never coming back, etc., only to be back the next day to do it all over again. All we can do is try and understand, and do our best. People can choose not to subscribe, and that's fine. In the end, that won't hurt CN, only Ars, but it is what it is. We're not entitled to readers or revenue. We know that every day we have to earn it.
We take reader concerns seriously. We have a proven track record of fighting against things we don't like, and we often win. We're looking into our options with this robots.txt issue.
Yeah, but no. The dataset is hella valuable.You can at least let us choose whether our posts remain available and scrapable.
By deleting their own content?but we gave people the honor system to not abuse it
What are the chances that this deal will provide more funding for Ars or compensation for the writers here?
Honestly, I’m fine with licensing deals like this as long as the creators are properly compensated for this use of their work. However, this is happening at such a high level that it seems like everyone here is just left out of the decision-making process and financial benefits altogether. Boo.
The reality, at least in my case, is that there is suitable alternative, I may or may not leave, but I will almost certainly be back.I intend to. For 26 years I've been told the sky is falling. For 26 years, I've watched as people hate us, come back to us, and hate us again. There's even the really interesting people who come here daily to let us know how much they hate us, how they are never coming back, etc., only to be back the next day to do it all over again. All we can do is try and understand, and do our best. People can choose not to subscribe, and that's fine. In the end, that won't hurt CN, only Ars, but it is what it is. We're not entitled to readers or revenue. We know that every day we have to earn it.
We take reader concerns seriously. We have a proven track record of fighting against things we don't like, and we often win. We're looking into our options with this robots.txt issue.
Yes, I agree with what you're saying about LLMs, and indeed would say that's my entire point about them: the fact that they can't be taught to think with the amount of data they have constitutes evidence that they can't be taught to think, period.That's not how LLMs work. They can't think. They will never be able to think. They can only ever predict. They have no capacity for creating the abstract connections that are endemic to actual thought. And while they can be trained out of providing inaccurate responses on a case by case basis, they have no capacity to understand why those answers are not acceptable and will continue to make the same mistake with alternate details far faster that any human could possibly train out of them.
This is the difference between a human and a machine. A human can take a single correction and extrapolate it out to similar but not identical situations to improve their processing across a broad variety of situations. A machine is incapable of doing the same. Which is why you can feed a human instruction via a collection of educational texts meant to cover K-12 and they will typically become a fully functional individual capable of reasoning and rational thought.
A machine fed the same data will only ever be able to collate, combine, remix, and regurgitate that data with no understanding at all about what any of it actually means and no means by which to ensure that the probability-controlled output is not inaccurate.
Sure, we can post something publicly. It's going to be several days, however. There is just too much going on and many people are on vacation.I’ll repeat my challenge to you again: please codify and publish your policy on this covering all forms of media given you already allow AI generated art on Ars articles. I feel you are under-appreciating that just because AI isn’t being used for text generation, doesn’t mean you’ve not started to normalize its use recently.
So far I'm on page 9, and it's actually gotten worse. Zero understanding thus far from Ken. The opposite, rather. I will continue to read through, but thus far I do not see at all any understanding from Ars about people who don't want their comments scraped, and/or want them deleted. Oh, Ken talks about not selling our personally identifiable information, but that's not what's at issue.With all due respect, please keep reading past page 7.
There is another potential explanation:Holy shit, so Ars simultaneously holds the position that comments aren't that valuable and has also determined that it's in their best interest to stop people from editing them while Ars staffers hang out in this thread and try to downplay the significance of this move?
Jesus Christ, I'm done feeling sorry for you guys.
lol yeah why didn't anyone else think of this?My apologies if this has already been asked, but…
Can we community members opt-out comments and other community data from participation in AI training?
I'd need someone to show me how you can take a user post with zero PII from Ars, and link it to a real person, without any metadata or private data being supplied from Ars. If someone can show me exactly that, it would be very helpful in making our case.Maybe strictly, but you know how easy it is to identify from general data. "Gosh, the illegal copying company is doing something else illegal, who'a thunk it?"
It's not a mater of spirituality, but of the design of the machinery.That said, I don't necessarily believe that there's anything special about the lump of folded meat in my head that can't be replicated with machinery. That is, there's nothing, to my knowledge, within the laws of physics that inherently prevents a machine from thinking. Believing otherwise smacks of religion and the "soul."
Ars comments have no monetary value to anyone else, as Ken said we are not selling them, or getting paid for them.Holy shit, so Ars simultaneously holds the position that comments aren't that valuable and has also determined that it's in their best interest to stop people from editing them while Ars staffers hang out in this thread and try to downplay the significance of this move?
The WHARRGARBL meme is an Ars original creation that I actually witnessed.You can even find it referenced outside of Ars. Here's a reddit post that reposts the Fugly story:
This is what we are looking into now. You'll notice that there is a lot of Civis stuff blocked, just not all of it. We want to do a global exclusion.It feels like Ars might like to mention that... and also explore whether they can add /civis/
Wow. Account banned. Already?Holy shit, so Ars simultaneously holds the position that comments aren't that valuable and has also determined that it's in their best interest to stop people from editing them while Ars staffers hang out in this thread and try to downplay the significance of this move?
Jesus Christ, I'm done feeling sorry for you guys.
I'm not trying to be glib, just real:To be honest, given how scummy AI companies are, I would not surprised if all user comments have been being ingested this whole time, regardless of whether there was any “11 month” window that they shouldn’t have been.
Also, i just looked and the comments from an article from 6 days ago is already available on archive.org. What’s to stop AI from ingesting that? It appears that editing or deleting comments is fruitless if the goal is to prevent AI from scraping them.
Oh, sure. As far as I can tell, we're nowhere near designing machine hardware that would be able to support software that can think, much less creating software that can think that will run on the hardware we currently have.It's not a mater of spirituality, but of the design of the machinery.
Until we understand how it is that human brain's actually function, we will have difficulties in attempting to replicate that function.
And then it becomes a matter of whether the technology to do so even exists or could be recreated within the same space and using the same resources as a moderately educated person requires.
We'll get there, eventually, but not in either of our lifetimes. And, you know, provided we don't manage to wipe ourselves out in the meantime by being wasteful of our resources... like using all of that energy and clean water to power generative AI... >_>