Forget AGI—Sam Altman celebrates ChatGPT finally following em-dash formatting rules

Niles Gazic

Ars Praetorian
405
Subscriptor++
I'm guessing that hardly anybody shares this view with me, but personally I find the em-dash symbol to be too damn wide, and I especially dislike seeing it when it's not wrapped in whitespace. But I also find the keyboard dash symbol to be too narrow, so my personal preference – as demonstrated right here – is to use the en-dash symbol wrapped in whitespace. I assume that writers argue about stylistic conventions like this on Reddit or someplace, but I don't want to further agitate myself by seeking out those discussions, I suppose.
 
Upvote
10 (10 / 0)

nogglebeak

Wise, Aged Ars Veteran
110
And those of us with editing backgrounds who understand the proper use of em-dashes will still get treated as if we're using AI to write our content...
Or be criticized for using AI-like behavior for using basic English... this entire article just makes me mad that people have somehow taken a subtle, but important aspect of the English language and tied it to manufactured content. (didn't use an em dash in there because, I ain't a snitch.)
 
Upvote
1 (1 / 0)
What was it, about 5-6 months ago that one of these guys, maybe Altman, said that the models would be smarter than a PhD within a year? That hasn’t aged well.
Do you seriously think a majority of PhD grads know what an em-dash is??

(The phrase gets misinterpreted, I think, as "smarter than a PhD" is not as impressive as it sounds. The bar for a PhD was never massively high and is dropping annually, because completions == dollars.)
 
Upvote
-3 (4 / -7)

Komarov

Ars Tribunus Militum
2,259
What was it, about 5-6 months ago that one of these guys, maybe Altman, said that the models would be smarter than a PhD within a year? That hasn’t aged well.

Eh, ask yourself how many PhDs use the em-dash correctly.

(ETA: PhD or, as Victor Borge put it, just a ffud.)
 
Last edited:
Upvote
0 (3 / -3)

Komarov

Ars Tribunus Militum
2,259
I'm guessing that hardly anybody shares this view with me, but personally I find the em-dash symbol to be too damn wide, and I especially dislike seeing it when it's not wrapped in whitespace. But I also find the keyboard dash symbol to be too narrow, so my personal preference – as demonstrated right here – is to use the en-dash symbol wrapped in whitespace. I assume that writers argue about stylistic conventions like this on Reddit or someplace, but I don't want to further agitate myself by seeking out those discussions, I suppose.

In English, the em-dash should be used without spaces. In some other languages, an en-dash with spaces has the same role. In yet others, em-dash with a trailing space introduces quoted speech. There's no universal rule.
 
Upvote
4 (7 / -3)

Ooooompf

Smack-Fu Master, in training
28
I do wonder if "advances" in controlling chatbot behaviour such as this are, in reality, a hardcoded hack. Maybe it took OpenAI so long because they had to write an elaborate translation layer that modifies the output of a query, rather than wrangling the LLM into obeying its instructions.
Given the hype machine that is Altman/OpenAI, I wouldn't be surprised if there's a mountain of spaghetti solutions building up behind the scenes.
 
Upvote
3 (3 / 0)

FranzJoseph

Ars Centurion
2,145
Subscriptor
In English, the em-dash should be used without spaces. In some other languages, an en-dash with spaces has the same role. In yet others, em-dash with a trailing space introduces quoted speech. There's no universal rule.
In my native language, the en‑dash is always used with spaces in place of an em‑dash. Unless in certain compounds like 2000–2020, but otherwise the (unbreakable) hyphen - is more commonly used for compounds. Thus seeing the em‑dash without spaces in English always seemed jarring to me – even if a proper style there – and I default to the '–' even here writing in English.

Am I the bad guy? 😅
 
Upvote
4 (5 / -1)

Komarov

Ars Tribunus Militum
2,259
In my native language, the en‑dash is always used with spaces in place of an em‑dash. Unless in certain compounds like 2000–2020, but otherwise the (unbreakable) hyphen - is more commonly used for compounds. Thus seeing the em‑dash without spaces in English always seemed jarring to me – even if a proper style there – and I default to the '–' even here writing in English.

Same here (though a different but I suspect related language). An en-dash without spaces is used for number ranges and suchlike, and a hyphen with spaces for some quirky compound names; otherwise, always without spaces

Am I the bad guy? 😅

Absolutely. Correct typography is paramount.
 
Upvote
0 (2 / -2)

Don Reba

Ars Praefectus
3,306
Subscriptor++
Just don't space an em-dash. You space around an en-dash, but em-dashes are used unspaced.
Em-dashes are usually spaced in newspapers — and in the AP style, specifically — but not in books. Personally, I prefer the newspaper style. Omitting spaces makes it look like a hyphenated word.
 
Upvote
10 (10 / 0)

Komarov

Ars Tribunus Militum
2,259
Perhaps they "cheated" by using post filtering to remove it?

The thing about the m-dash (the dash as wide as an "m") is that one could use parentheses instead or use commas, if one is so inclined, to mark asides. In fact, if I remember grade school, they taught us to use commas.

Ah yes, the nested comma substatement, that, as some would put it, whilst pointedly raising an eyebrow at the crowded parlour, where china figurines and lace doilies vied for space with exotic orchids and dusty books, giving the candle-lit room a feel of barely contained yet gloomy chaos, has all the style and charm of a well intended but ultimately tasteless run-on sentence in some naïve Victorian romance novel.

(ETA: missed a chance for a "having &c. &c.", need more practice.)
 
Last edited:
Upvote
10 (10 / 0)

graylshaped

Ars Legatus Legionis
67,694
Subscriptor++
FWIW, I've never had any issues using

Code:
Always: `s/—/ – /g`

in a system prompt, with any OpenAI model (going back to GPT-4o). Perhaps Altman doesn't know how to prompt an LLM effectively?
If it was intelligent, why can't you tell it how you want it to handle various punctuation in plain language? e.g. "In all output, please apply the standards included in The Cambridge Guide to English Usage."
 
Upvote
5 (6 / -1)

AliceErischech

Smack-Fu Master, in training
36
Serious question from someone who follows the news but is no expert: why do we expect AGI to evolve from LLMs, of all AI tools? As impressive as they are, they look like a dead end to me in that regard; what am I missing?
People like Sam Altman and Jensen Huang have massively overhyped LLMs since their paycheck depends on people believing them. (And in the case of some people such as Sam Altman and Elon Musk, they seem to genuinely believe the bullshit they spew about AI.)
 
Upvote
9 (9 / 0)

Komarov

Ars Tribunus Militum
2,259
in
Serious question from someone who follows the news but is no expert: why do we expect AGI to evolve from LLMs, of all AI tools? As impressive as they are, they look like a dead end to me in that regard; what am I missing?

PR for VCs.

There's not a snowflake's chance in hell that AGI could emerge from LLMs and I bet the researchers working in the field are aware of that.

Including the lying shitbags who only want to get rich from the IPO, then get out before the bubble bursts.
 
Upvote
11 (11 / 0)
The instruction lowers the probability of choosing the em-dash. Okay, but wouldn't the model have to know what the instruction means to know that it should do that? I'm pretty sure there's not some website including an instruction to use fewer em-dashes followed by blocks of text that happen to be relevant but luckily contain fewer em-dashes than other relevant sources would contain. I see this as a good example of what I consider a logical gap between the claim that this isn't thinking, just statistics and the actual responses being generated, which to me just don't follow from pure noisy statistics seeded with random numbers.
How does it know to reduce the likelihood if it doesn't even know what reduce means?
 
Upvote
-5 (0 / -5)

FranzJoseph

Ars Centurion
2,145
Subscriptor
Ah yes, the nested comma substatement, that, as some would put it, whilst pointedly raising an eyebrow at the crowded parlour, where china figurines and lace doilies vied for space with exotic orchids and dusty books, giving the candle-lit room a feel of barely contained yet gloomy chaos, has all the style and charm of a well intended but ultimately tasteless run-on sentence in some naïve Victorian romance novel.

(ETA: missed a chance for a "having &c. &c.", need more practice.)
Apparently you still need to work on your non‑breaking hyphens as well (unless my browser or the XenForo Civis forum software just ignores them), but I quite enjoyed your post otherwise! Ta for it, &c. &c. ;‑)

(using Text Replacements in Mac OS to replace all normal minus signs or "hyphens" with the non‑breaking ones in all the :‑D and similar ASCII emojis is the first thing I do on a fresh install)
 
Last edited:
Upvote
-1 (0 / -1)

nosh

Wise, Aged Ars Veteran
129
If it was intelligent, why can't you tell it how you want it to handle various punctuation in plain language? e.g. "In all output, please apply the standards included in The Cambridge Guide to English Usage."
It's the other way around: It's so hard exactly because it is (some level of) intelligent.

An algorithm will do exactly what you told it (tough which rarely is what you actually wanted). It's one of the defining characteristics of intelligence that you do not follow instructions blindly but determine what all the instructions and context together mean is the correct answer.

Of course current artificial intelligence is not yet very good, making it even harder. But even if it was already perfect, you could not just add instructions and expect them to be followed.

"Writing 'you' instead of 'u' makes everyone believe my mother wrote the text instead of me, so don't do that. Now write me an application for that job posting. It really must look professional so that I can get the job."
 
Upvote
-6 (1 / -7)

AdrianS

Ars Tribunus Militum
3,739
Subscriptor
Serious question from someone who follows the news but is no expert: why do we expect AGI to evolve from LLMs, of all AI tools? As impressive as they are, they look like a dead end to me in that regard; what am I missing?

We keep getting told that, because the AI spruikers need the investors to keep throwing buckets of cash at them, and the promise of AGI is the carrot.
 
Upvote
7 (7 / 0)

Sideros

Wise, Aged Ars Veteran
150
Incredible that the engineers building AI do not know how it works nor can they predictably alter its behavior. It is so opaque that it requires research teams to study it so it can be understood.
It really is amazing technology.
That something with a trillion parameters can produce anything coherent, let alone on topic, is astounding even if it doesn't do what the snake oil salespeople say it does.
 
Upvote
7 (7 / 0)

graylshaped

Ars Legatus Legionis
67,694
Subscriptor++
It's the other way around: It's so hard exactly because it is (some level of) intelligent.

An algorithm will do exactly what you told it (tough which rarely is what you actually wanted). It's one of the defining characteristics of intelligence that you do not follow instructions blindly but determine what all the instructions and context together mean is the correct answer.

Of course current artificial intelligence is not yet very good, making it even harder. But even if it was already perfect, you could not just add instructions and expect them to be followed.

"Writing 'you' instead of 'u' makes everyone believe my mother wrote the text instead of me, so don't do that. Now write me an application for that job posting. It really must look professional so that I can get the job."
That is among the silliest defenses for the poor reliability of these models I have heard to date. "They can't follow simple directions like 'follow this style guide' because they are too smart" ?
 
Upvote
7 (9 / -2)

AliceErischech

Smack-Fu Master, in training
36
It's the other way around: It's so hard exactly because it is (some level of) intelligent.

An algorithm will do exactly what you told it (tough which rarely is what you actually wanted). It's one of the defining characteristics of intelligence that you do not follow instructions blindly but determine what all the instructions and context together mean is the correct answer.

Of course current artificial intelligence is not yet very good, making it even harder. But even if it was already perfect, you could not just add instructions and expect them to be followed.

"Writing 'you' instead of 'u' makes everyone believe my mother wrote the text instead of me, so don't do that. Now write me an application for that job posting. It really must look professional so that I can get the job."
It's not intelligent though. That's the problem. It's a fancy mathematical prediction model that's effectively autocomplete on steroids. And no, "reasoning" models don't count since their reasoning is basically just a paraphrase of the user's prompt that can help increase coherency in many contexts simply due to the fact that it's basically a second copy of the user's prompt, thus increasing the chances of the model focusing on details in the user's prompt rather than other details in its context history.
 
Upvote
0 (2 / -2)

Komarov

Ars Tribunus Militum
2,259
Apparently you still need to work on your non‑breaking hyphens as well (unless my browser or the XenForo Civis forum software just ignores them), but I quite enjoyed your post otherwise! Ta for it, &c. &c. ;‑)

(using Text Replacements in Mac OS to replace all normal minus signs or "hyphens" with the non‑breaking ones in all the :‑D and similar ASCII emojis is the first thing I do on a fresh install)

The hyphens used in compound words are very much not non-breaking. In fact, one of the rules for hyphenation is to prefer breaking compound words on the hyphen, rather than on syllable boundaries.
 
Upvote
1 (1 / 0)
What was it, about 5-6 months ago that one of these guys, maybe Altman, said that the models would be smarter than a PhD within a year? That hasn’t aged well.
It depends on how you define 'smarter'. Ask a bunch of phds to answer a series of questions from their field of study. If the AI outperforms the average (scores 51st+ percentile) then some reasonable people would claim that the AI has outperformed the typical phd. The 'AI is smarter than a phd' claim would I think by most people be deemed to have been met. That looks pretty likely to happen given another 6 months of model improvements.
 
Upvote
-4 (0 / -4)

Erbium68

Ars Centurion
2,590
Subscriptor
Perhaps they "cheated" by using post filtering to remove it?

The thing about the m-dash (the dash as wide as an "m") is that one could use parentheses instead or use commas, if one is so inclined, to mark asides. In fact, if I remember grade school, they taught us to use commas.
This is one of the common differences between British and American English as we are taught not to use unnecessary commas. Except at Oxford...
The rise of computing seems to have encouraged the use of parentheses in ordinary English (and I personally prefer them) but in any case commas are not ideal because they do not clearly mark the extent to which the enclosed matter deviates from the main text. Having an hierarchy - commas or m-dashes, semicolons, colons, parentheses, full stops and paragraphs - gives more order to text and in my view makes it easier to follow.
 
Upvote
1 (2 / -1)
For the (few) people unsure: en-dashes are longer than hyphens. em-dashes are longer than en-dashes.

- << <<
That's not the result I get (KDEneon / Ubuntu Linux):

hyphen: -
n-dash: –
m-dash: —

Oh - they're different here than in a terminal! What the heck? Is it font-dependent?


[Edit: below was ninja'd by AGT499 back on page 2; well done.]

For anyone interested, it's much easier in Linux than Alt+...:

m-dash: (–)Win+---
n-dash: (—)Win+--.
hyphen: (-)-
 
Last edited:
Upvote
3 (3 / 0)

Erbium68

Ars Centurion
2,590
Subscriptor
Ask a bunch of phds to answer a series of questions from their field of study.
A PhD is supposed to be about original research, so how do you define "field of study"? Is the AI going, say, to set up a condensed matter experiment or try a novel electrolyte separator based on a hunch? Or devise an experiment or do some modelling to confirm a professor's pet idea?

Altman is just demonstrating that he doesn't really understand the very complex ideas surrounding "intelligence".
Not to mention Leo Rosten's invention of the word Phudnik derived from nudnik - an otherwise useless person with a PhD and yes, looking back, I've had at least two working for me.
 
Upvote
5 (5 / 0)

graylshaped

Ars Legatus Legionis
67,694
Subscriptor++
This is one of the common differences between British and American English as we are taught not to use unnecessary commas. Except at Oxford...
The rise of computing seems to have encouraged the use of parentheses in ordinary English (and I personally prefer them) but in any case commas are not ideal because they do not clearly mark the extent to which the enclosed matter deviates from the main text. Having an hierarchy - commas or m-dashes, semicolons, colons, parentheses, full stops and paragraphs - gives more order to text and in my view makes it easier to follow.
When properly used, theoretically. I was taught—setting aside the question of how well I demonstrate the skill—clauses separated by commas are directly connected to the main thought, that em dashes are best used for clarifying thoughts, that parentheticals hold what can be considered true "asides" (which may be of interest but are more like those deleted scenes found as bonus content on digital releases for movies: their presence or absence is more-or-less marginal), and that Oxford commas are typically used only by people of character and class.

As for em dashes in practical terms, I usually don't sweat the keystrokes and am fine with letting a double dash stand or be auto-corrected, and would think all this energy would be better employed improving understanding of proper use cases for colons and semi-colons.
 
Last edited:
Upvote
0 (0 / 0)