Is our machine learning? Ars takes a dip into artificial intelligence

Post content hidden for low score. Show…
Post content hidden for low score. Show…

orwelldesign

Ars Tribunus Angusticlavius
7,317
Subscriptor++
Huh. Usually headlines with a question mark are "No"s, this one's a pretty aggressive "maybe."

Yay?

Looking forward to the rest of the series, frankly, though I do wish you'd anti-train it against being clickbaity.

This is, regardless, exactly the sort of thing that has me coming back for more. Wish there was an "upvote this article" button.
 
Upvote
67 (67 / 0)

Shanrak

Ars Scholae Palatinae
1,304
Subscriptor
Ars has given me data on over 5,500 headline tests over the past four years—11,000 headlines, each with their rate of click-throughs.

Garbage in garbage out as they say. If the only input is # of clicks-throughs to an article, you are going to end up with clickbait headlines. I hope you have more data than that, curious to see what you come up with.

Either way, it should at least be entertaining.

Yes it should :D
 
Upvote
46 (50 / -4)

AquaSandwalker

Seniorius Lurkius
11
Subscriptor++
Sweet, I look forward to the next instalments!

Also, wanted to drop a couple of book recommendations for those that want to learn more about ML algorithms (a.k.a. stats) at a conceptual level:
1. Hello World: How to be Human in the Age of the Machine by Hannah Fry.
I found this to be a good place to start, very easy to read and more of a casual read, it gave me a good overview of where ML is used and some of the relevant concerns.
2. The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos.
This one goes more in-depth on how ML algorithms work, with neat historical details on how they originated, while it ponders the future of machine 'intelligence'.
 
Upvote
22 (22 / 0)

Callias

Ars Scholae Palatinae
681
Subscriptor++
Very good article, in part because I really liked the case study of using something so seemingly simple — headline writing — to illustrate something so complicated, challenging and ever evolving.

We’ve all been in situations where you have to pick a decent title for an email to get it to standout from the 250+ a day your boss gets, or trying to craft a decent summary in title form of a reddit post and question you are making so you get helpful responses (Yes, it can happen), or even picking the right wording for a technical support issue with a vendor.

So this “case study” has immediate “Oh, get it” and can figure out some applicability in my life. Looking forward to the next installment.
 
Upvote
14 (14 / 0)

Unclebugs

Ars Praefectus
3,086
Subscriptor++
When I read what you were trying to do, I immediately thought of something I stumbled upon 23 years ago, latent semantic analysis (LSA). A professor who worked part-time at New Mexico State University, just 30 miles from my abode in El Paso, TX, was promoting this, and as a teacher of language arts grading essays, I was curious if this if this technique could be adapted to grade essays and it was. I got on the mailing list and next thing I know I receive an invitation to a classified ONI conference (Office of Naval Intelligence). I am now retired but spent 10 years as a journalist often writing headlines, and I also spent 11 years as a journalism teacher teaching students how to write headlines. If you can get this thing to work, I'm sure the Journalism Educator's Association or JEA (www.jea.org) would be interested as they have fully developed curriculum on writing headlines which you can find at the website under educator resources. Good luck!
 
Upvote
42 (42 / 0)

Mustachioed Copy Cat

Ars Praefectus
5,043
Subscriptor++
Here is a task that some Ars writers are exceptionally good at: writing a solid headline. (Beth Mole, please report to collect your award.)

There’s a Marvel crossover series called Fear Itself wherein Wolverine gains armor made from the same metal as Thor’s hammer, which turns his claws a lambent orange, as though from tremendous heat, and gives him a set of secondary spikes that appear to protrude from other locations on his body (though if this is a magical effect of the armor or a physical attachment to the armor is never made clear). Marvel heard that Wolverine fans liked claws, so covered him in claws and gave the claws the ability to cook whatever they happened to cut, or pierce.

I feel like giving Beth Mole encouragement is about the same as giving Wolverine extra, heated claws/spikes. It’s unnecessary, gratuitous, and is going to result in a lot of dead bodies.
 
Upvote
40 (40 / 0)
Ars has given me data on over 5,500 headline tests over the past four years—11,000 headlines, each with their rate of click-throughs.

Garbage in garbage out as they say. If the only input is # of clicks-throughs to an article, you are going to end up with clickbait headlines. I hope you have more data than that, curious to see what you come up with.
Are you sure that the Ars audience clicks quicker on the "clickbaity" headlines than the "good" ones in A/B testing?

Clickbaity doesn't automatically mean it gets more clicks in any given audience. It's a style, that I think most here at Ars despise.
 
Upvote
12 (15 / -3)

donpsychote

Smack-Fu Master, in training
1
I performed a similar experiment years ago, only I scored the title appeal based on the number of social media shares. You have A/B data, which is different in that you can discriminate between different wordings. But I had very similar titles from different media sources, with wildly different engagement.

The main problem is that most of the traffic to any site comes from social media, and readers there read them based on their own criteria, which is partly based on temporal context, in other words, what is happening at the time. A story about some event may be very attractive regardless of the title, but after the media saturation, readers just won't click it no matter how well written the title is.

I then concluded the experiment with a mental note to repeat it some day, but figure out how to pass the temporal context in the classifier as well.
 
Upvote
28 (28 / 0)

Wickwick

Ars Legatus Legionis
39,975
Given the Ars community, I think you're missing an opportunity to make this a competition. If anyone had been paying close attention, the A/B headlines are all public as well as the winner. So why not make the database public and let your readers compete for the best-performing system on a withheld validation set?

Better yet, ask one of the other Conde Nast sites for some A/B headlines to see how general the results are.
 
Upvote
53 (53 / 0)

Wickwick

Ars Legatus Legionis
39,975
I wonder if the images associated with headlines have anything to do with the success rate of clicking the headline?
This isn't a matter of clickthrough percentage for a story. This is the difference in how many people click through on headline A vs. headline B for the same subtitle and image.

Edit: although, the image is a confounding variable because it does act in concert with the chosen headline text. That would require several images as well as A/B headlines to suss out.
 
Upvote
9 (9 / 0)

Wickwick

Ars Legatus Legionis
39,975
I performed a similar experiment years ago, only I scored the title appeal based on the number of social media shares. You have A/B data, which is different in that you can discriminate between different wordings. But I had very similar titles from different media sources, with wildly different engagement.

The main problem is that most of the traffic to any site comes from social media, and readers there read them based on their own criteria, which is partly based on temporal context, in other words, what is happening at the time. A story about some event may be very attractive regardless of the title, but after the media saturation, readers just won't click it no matter how well written the title is.

I then concluded the experiment with a mental note to repeat it some day, but figure out how to pass the temporal context in the classifier as well.
As I recall in previous discussions of this, the A/B process is only carrier out for a short while - an hour perhaps? Then the winning headline is used. The idea being, you're trying to get the most clicks. It doesn't make sense to continue running the test once you know which of the two is more likely to convert to a click-through.
 
Upvote
7 (7 / 0)
D

Deleted member 221201

Guest
Couple of minor suggestions

1. You could probably get away with using google colab depending on your dataset size

2. You could run the headline through a word2vec & strip stopwords (or not) & then use a simple
randomforest classifier for a starter model
I'm assuming your target is number of clicks which you can threshold 0/1 for binary or bin it

3. If you really need lexical analysis then a bidirectional lstm will do the job, but depending on datasize aws or just let it run for a few hours to train on colab
Again this depends on how many layers you are stacking up & I suggest using Keras as a wrapper for tensorflow

4. If you want a pre-made model then look at BERT.

5. Up-sample/down-sample as needed or adjust class weights if needed & run the sklearn classification report at the end

Have fun :)
 
Upvote
22 (22 / 0)

seanmgallagher

Ars Tribunus Militum
1,911
Subscriptor
I performed a similar experiment years ago, only I scored the title appeal based on the number of social media shares. You have A/B data, which is different in that you can discriminate between different wordings. But I had very similar titles from different media sources, with wildly different engagement.

The main problem is that most of the traffic to any site comes from social media, and readers there read them based on their own criteria, which is partly based on temporal context, in other words, what is happening at the time. A story about some event may be very attractive regardless of the title, but after the media saturation, readers just won't click it no matter how well written the title is.

I then concluded the experiment with a mental note to repeat it some day, but figure out how to pass the temporal context in the classifier as well.
As I recall in previous discussions of this, the A/B process is only carrier out for a short while - an hour perhaps? Then the winning headline is used. The idea being, you're trying to get the most clicks. It doesn't make sense to continue running the test once you know which of the two is more likely to convert to a click-through.

It's a very very short test, not even an hour. Any changes to headlines after an hour are deliberate human efforts to fix flawed headlines. At least, as far as I know. I don't work here anymore.
 
Upvote
29 (29 / 0)

seanmgallagher

Ars Tribunus Militum
1,911
Subscriptor
Couple of minor suggestions

1. You could probably get away with using google colab depending on your dataset size

2. You could run the headline through a word2vec & strip stopwords (or not) & then use a simple
randomforest classifier for a starter model
I'm assuming your target is number of clicks which you can threshold 0/1 for binary or bin it

3. If you really need lexical analysis then a bidirectional lstm will do the job, but depending on datasize aws or just let it run for a few hours to train on colab
Again this depends on how many layers you are stacking up & I suggest using Keras as a wrapper for tensorflow

4. If you want a pre-made model then look at BERT.

5. Up-sample/down-sample as needed or adjust class weights if needed & run the sklearn classification report at the end

Have fun :)

Yeah, I'm several weeks into this game right now and I wish we had talked sometime in May, :D
 
Upvote
27 (27 / 0)

Shanrak

Ars Scholae Palatinae
1,304
Subscriptor
Ars has given me data on over 5,500 headline tests over the past four years—11,000 headlines, each with their rate of click-throughs.

Garbage in garbage out as they say. If the only input is # of clicks-throughs to an article, you are going to end up with clickbait headlines. I hope you have more data than that, curious to see what you come up with.
Are you sure that the Ars audience clicks quicker on the "clickbaity" headlines than the "good" ones in A/B testing?

Clickbaity doesn't automatically mean it gets more clicks in any given audience. It's a style, that I think most here at Ars despise.

More clicks = clickbaity regardless of audience

It remains to be seen if the typical Ars audience prefers a different form of clickbait headline than what is typically considered clickbait :D
 
Upvote
8 (11 / -3)

Wickwick

Ars Legatus Legionis
39,975
Couple of minor suggestions

1. You could probably get away with using google colab depending on your dataset size

2. You could run the headline through a word2vec & strip stopwords (or not) & then use a simple
randomforest classifier for a starter model
I'm assuming your target is number of clicks which you can threshold 0/1 for binary or bin it

3. If you really need lexical analysis then a bidirectional lstm will do the job, but depending on datasize aws or just let it run for a few hours to train on colab
Again this depends on how many layers you are stacking up & I suggest using Keras as a wrapper for tensorflow

4. If you want a pre-made model then look at BERT.

5. Up-sample/down-sample as needed or adjust class weights if needed & run the sklearn classification report at the end

Have fun :)

Yeah, I'm several weeks into this game right now and I wish we had talked sometime in May, :D
This IS Ars...

I mean, how often does one of the authors of a paper being covered show up in the comments? Or someone that worked on said widget, etc.
 
Upvote
13 (14 / -1)
Ars has given me data on over 5,500 headline tests over the past four years—11,000 headlines, each with their rate of click-throughs.

Garbage in garbage out as they say. If the only input is # of clicks-throughs to an article, you are going to end up with clickbait headlines. I hope you have more data than that, curious to see what you come up with.
Are you sure that the Ars audience clicks quicker on the "clickbaity" headlines than the "good" ones in A/B testing?

Clickbaity doesn't automatically mean it gets more clicks in any given audience. It's a style, that I think most here at Ars despise.

More clicks = clickbaity regardless of audience

It remains to be seen if the typical Ars audience prefers a different form of clickbait headline than what is typically considered clickbait :D
I disagree that more clicks inherently means something is clickbait. For something to be clickbait, it has to be sensationalist/misleading/overpromising/etc the actual content of the article.
 
Upvote
39 (39 / 0)

Wickwick

Ars Legatus Legionis
39,975
Ars has given me data on over 5,500 headline tests over the past four years—11,000 headlines, each with their rate of click-throughs.

Garbage in garbage out as they say. If the only input is # of clicks-throughs to an article, you are going to end up with clickbait headlines. I hope you have more data than that, curious to see what you come up with.
Are you sure that the Ars audience clicks quicker on the "clickbaity" headlines than the "good" ones in A/B testing?

Clickbaity doesn't automatically mean it gets more clicks in any given audience. It's a style, that I think most here at Ars despise.

More clicks = clickbaity regardless of audience

It remains to be seen if the typical Ars audience prefers a different form of clickbait headline than what is typically considered clickbait :D
I disagree that more clicks inherently means something is clickbait. For something to be clickbait, it has to be sensationalist/misleading/overpromising/etc the actual content of the article.
The headline "Scientists Invent Faster-Than-Light Travel" is going to get clicks. If it's actually reporting on such work then it's not clickbait. If it's followed by "...in a new Netflix series" when you click on the article then it's clickbait.
 
Upvote
30 (30 / 0)
D

Deleted member 221201

Guest
Couple of minor suggestions

1. You could probably get away with using google colab depending on your dataset size

2. You could run the headline through a word2vec & strip stopwords (or not) & then use a simple
randomforest classifier for a starter model
I'm assuming your target is number of clicks which you can threshold 0/1 for binary or bin it

3. If you really need lexical analysis then a bidirectional lstm will do the job, but depending on datasize aws or just let it run for a few hours to train on colab
Again this depends on how many layers you are stacking up & I suggest using Keras as a wrapper for tensorflow

4. If you want a pre-made model then look at BERT.

5. Up-sample/down-sample as needed or adjust class weights if needed & run the sklearn classification report at the end

Have fun :)

Yeah, I'm several weeks into this game right now and I wish we had talked sometime in May, :D


Please stick with pandas to load the dataset & use sklearn or keras if making a deep learning model, it will save you a lot of grief

1. Use df.loc for slicing & avoid copies in pandas

2. For keras something like this
Code:
X = np.array(df[features].tolist())     
Y = your_trained)model.predict(X)   <-- super fast vectorized op  
df['prediction']  = pd.Series(y[:,0])

Doing the same thing above using df.apply() is asking for trouble & a lot of wasted hours
 
Upvote
5 (5 / 0)

OtherSystemGuy

Ars Scholae Palatinae
1,293
Subscriptor++
I see a couple of things here. On the data, on commenter mentioned the picture associated with the article. I'd add the short text under the title and the author's name. I actually look at those as well to decide what I'll read. And that takes me into the data analysis part...

Since you have the full count of clicks, one analysis would be to simply plot the data to see how far apart the winner and loser are from each other and use that as a new data point. Titles that are far apart are more interesting and should take greater weight than those close to each other.

I'm going to watch this series with interest because I'm not sure how well this is going to work out. I suspect that phrase structure is probably going to be important. How often does Beth's punny titles generate a click? An algorithm looking solely at words will miss the crafting of word placement and relationship.

I do have to keep reminding myself that this project is a title scoring system and not a title generator.
 
Upvote
10 (10 / 0)

entropy_wins

Ars Tribunus Militum
1,690
Subscriptor++
Ars has given me data on over 5,500 headline tests over the past four years—11,000 headlines, each with their rate of click-throughs.

Garbage in garbage out as they say. If the only input is # of clicks-throughs to an article, you are going to end up with clickbait headlines. I hope you have more data than that, curious to see what you come up with.
Are you sure that the Ars audience clicks quicker on the "clickbaity" headlines than the "good" ones in A/B testing?

Clickbaity doesn't automatically mean it gets more clicks in any given audience. It's a style, that I think most here at Ars despise.

More clicks = clickbaity regardless of audience

It remains to be seen if the typical Ars audience prefers a different form of clickbait headline than what is typically considered clickbait :D
I disagree that more clicks inherently means something is clickbait. For something to be clickbait, it has to be sensationalist/misleading/overpromising/etc the actual content of the article.
The headline "Scientists Invent Faster-Than-Light Travel" is going to get clicks. If it's actually reporting on such work then it's not clickbait. If it's followed by "...in a new Netflix series" when you click on the article then it's clickbait.

That's what google has become. They mention an actor, say, but the image will be someone that appeared with the actor who might be more famous.

Not sure how we solve this...

S
 
Upvote
3 (3 / 0)

Wickwick

Ars Legatus Legionis
39,975
I see a couple of things here. On the data, on commenter mentioned the picture associated with the article. I'd add the short text under the title and the author's name. I actually look at those as well to decide what I'll read. And that takes me into the data analysis part...

Since you have the full count of clicks, one analysis would be to simply plot the data to see how far apart the winner and loser are from each other and use that as a new data point. Titles that are far apart are more interesting and should take greater weight than those close to each other.

I'm going to watch this series with interest because I'm not sure how well this is going to work out. I suspect that phrase structure is probably going to be important. How often does Beth's punny titles generate a click? An algorithm looking solely at words will miss the crafting of word placement and relationship.

I do have to keep reminding myself that this project is a title scoring system and not a title generator.
The trouble is the subheading and the picture are the same regardless of the A/B heading. So are you suggesting including that in the training set as part of the data for each title? That could only help in choosing a winner for contextually-based models. Otherwise they add data without any differentaition.
 
Upvote
3 (3 / 0)
Couple of minor suggestions

1. You could probably get away with using google colab depending on your dataset size

2. You could run the headline through a word2vec & strip stopwords (or not) & then use a simple
randomforest classifier for a starter model
I'm assuming your target is number of clicks which you can threshold 0/1 for binary or bin it

3. If you really need lexical analysis then a bidirectional lstm will do the job, but depending on datasize aws or just let it run for a few hours to train on colab
Again this depends on how many layers you are stacking up & I suggest using Keras as a wrapper for tensorflow

4. If you want a pre-made model then look at BERT.

5. Up-sample/down-sample as needed or adjust class weights if needed & run the sklearn classification report at the end

Have fun :)

Yeah, I'm several weeks into this game right now and I wish we had talked sometime in May, :D


Ah, but you haven’t lived until you’ve tried to explain how ML translates to direct and indirect revenue streams to a roomful of suits, most of whom need IT to ‘fix’ their laptops regularly.

So fun….
 
Upvote
8 (8 / 0)

OtherSystemGuy

Ars Scholae Palatinae
1,293
Subscriptor++
The trouble is the subheading and the picture are the same regardless of the A/B heading. So are you suggesting including that in the training set as part of the data for each title? That could only help in choosing a winner for contextually-based models. Otherwise they add data without any differentaition.

Agreed, but that's my concern about the limited scope of the experiment and why I'm interested in following along.
 
Upvote
1 (1 / 0)

ukeandhike

Ars Scholae Palatinae
1,057
Love this, should be interesting even if it falls spectacularly flat.

Once you’ve done this and (hopefully) have a functional model, you should put the model up against Beth and do more A/B testing of Beth vs the Machine… so have the model run with everyone else’s headlines and then whatever the model says is the most likely to win out, pit THAT headline against Beth’s creations.
 
Upvote
3 (3 / 0)

adespoton

Ars Legatus Legionis
10,723
I can't wait for this upcoming Ars headline:

"SpaceX says: "just one low price to put users of new iOS Trump video game into space for a million years, without Facebook, Twitter, or windows"

You've got a problem with that headline: it's 130 characters. See if you can shorten it to 70!
 
Upvote
6 (6 / 0)