OpenAI debuts DALL-E API so devs can integrate its AI artwork into their apps

Aurich

Director of Many Things
41,066
Ars Staff
I've got a local Stable Diffusion client running. It's been interesting to check out. I've actually used it twice so far to help make images for Ars stories. It's a tool like anything else in my mind.

The whole "you're about to be replaced by an AI" narrative definitely feels far out at this point though. We'll see how solving these issues goes, I'm of the opinion it's going to be harder than some think, but as it is you really should try it for yourself to form an opinion.

2 cents is definitely cheap, but part of generating these images in a useful fashion is doing a ton of them. Sometimes you get lucky fast, but mostly it's a game of numbers. So those costs come with a multiplier attached to them in the real world.
 
Upvote
90 (90 / 0)
D

Deleted member 14629

Guest
I've got a local Stable Diffusion client running. It's been interesting to check out. I've actually used it twice so far to help make images for Ars stories. It's a tool like anything else in my mind.

The whole "you're about to be replaced by an AI" narrative definitely feels far out at this point though. We'll see how solving these issues goes, I'm of the opinion it's going to be harder than some think, but as it is you really should try it for yourself to form an opinion.

2 cents is definitely cheap, but part of generating these images in a useful fashion is doing a ton of them. Sometimes you get lucky fast, but mostly it's a game of numbers. So those costs come with a multiplier attached to them in the real world.

I've been using it and a custom-modeled version hosted by someone else.

Right now, working with Stable Diffusion really reminds me of working as a solution architect working with overseas development resources. Yes it's cheaper to use them, but you better design your spec within an inch of its life to get a result anything like what you expect or want.

I far prefer having it to not having it, especially since image creation isn't my own skillset, but it definitely has its limitations.
 
Upvote
40 (40 / 0)

NYKevin

Ars Scholae Palatinae
870
Subscriptor++
I've got a local Stable Diffusion client running. It's been interesting to check out. I've actually used it twice so far to help make images for Ars stories. It's a tool like anything else in my mind.

The whole "you're about to be replaced by an AI" narrative definitely feels far out at this point though. We'll see how solving these issues goes, I'm of the opinion it's going to be harder than some think, but as it is you really should try it for yourself to form an opinion.

IMHO it was never in the cards, and it was pure hubris on the part of (some) technologists to think otherwise. At an absolute minimum, there's always going to be a human in the loop to decide what image they want and whether the algorithm has correctly interpreted the prompt. But even that is probably several years away at least.

The real problem, to my mind, is the excessive overemphasis on txt2img at the expense of inpainting, upscaling, etc., which are all much more straightforward to fit into existing workflows, and offer much finer control over what the output looks like. txt2img is somewhat useful as a starting point, and maybe in a few more years it'll be a threat to stock photo companies, but in its present state, if used without its sister technologies, it's little more than a cheap parlor trick.

2 cents is definitely cheap, but part of generating these images in a useful fashion is doing a ton of them. Sometimes you get lucky fast, but mostly it's a game of numbers. So those costs come with a multiplier attached to them in the real world.

Yeah, I noticed that in my own experimentation too. I feel good if ~10% of my generations look like what I want - because that means in a stack of twenty, I'll have two good ones on average. It is very, very hard to get the AI to put people (or characters) into specific poses or to block scenes in a specific way. You basically just have to roll the generation dice and pray.
 
Upvote
9 (16 / -7)

panton41

Ars Legatus Legionis
11,115
Subscriptor
I've got a local Stable Diffusion client running. It's been interesting to check out. I've actually used it twice so far to help make images for Ars stories. It's a tool like anything else in my mind.

The whole "you're about to be replaced by an AI" narrative definitely feels far out at this point though. We'll see how solving these issues goes, I'm of the opinion it's going to be harder than some think, but as it is you really should try it for yourself to form an opinion.

2 cents is definitely cheap, but part of generating these images in a useful fashion is doing a ton of them. Sometimes you get lucky fast, but mostly it's a game of numbers. So those costs come with a multiplier attached to them in the real world.

I fed Stable Diffusion the description of the Hell Priest (Pinhead) from the Clive Barker novel The Scarlet Gospels. The resulting images were certainly Clive Barker levels of gruesome, but looked nothing like even what the text said, much less the character in the movies. (Notably, no nails or scars on the head, despite being half the words in the description.) Same with descriptions of Lovecraftian and Stephen King monsters.

(Though, one of the Pinhead pictures was amazing and I might have to use it in one of my own stories.)
 
Upvote
11 (11 / 0)

Achilles

Ars Scholae Palatinae
939
Subscriptor
I've got a local Stable Diffusion client running. It's been interesting to check out. I've actually used it twice so far to help make images for Ars stories. It's a tool like anything else in my mind.

I'm sure I'm not the only one curious to know what that process looked like. What was the initial prompt and products? How did it help the creative process?
 
Upvote
8 (8 / 0)

Aurich

Director of Many Things
41,066
Ars Staff
I've got a local Stable Diffusion client running. It's been interesting to check out. I've actually used it twice so far to help make images for Ars stories. It's a tool like anything else in my mind.

I'm sure I'm not the only one curious to know what that process looked like. What was the initial prompt and products? How did it help the creative process?
Check the promoted comment at the end of this story for an example:

https://meincmagazine.com/gaming/2022/09/ ... ts-future/
 
Upvote
20 (20 / 0)
It's been not quite a century since Magritte told the world "This is not a pipe," and yet we go on believing images represent reality. Given that they influence attitudes and emotions without recourse to facts or argument, pictures are the perfect tool for those who want to manipulate without informing. Advertising, propaganda, politicians, and CEOs immediately come to mind, but it's a long list.

And don't even get me started on functionally illiterate America, where no one will read anything except the first sentence that follows a properly seductive picture. We're already intellectually dead and at war with objective reality, and the consequences of our profound ignorance are more imminent than we imagine.
 
Upvote
4 (6 / -2)
How funny it is that "OpenAI" is closed source and gated while Stable Diffusion is wide open. Apparently they're also no longer a non-profit.
Didn’t OpenAI bill themselves as, “we’ll do this scary AI stuff in the open so it can be used for good, we gotta beat the bad actors to it and do it right and ethical!”

I was suspicious that it didn’t just mean “fake being open until you’re able to rake it cash from your service that now has open source vibes that encourage trust”
 
Upvote
13 (15 / -2)

emdude

Seniorius Lurkius
10
I've got a local Stable Diffusion client running. It's been interesting to check out. I've actually used it twice so far to help make images for Ars stories. It's a tool like anything else in my mind.

The whole "you're about to be replaced by an AI" narrative definitely feels far out at this point though. We'll see how solving these issues goes, I'm of the opinion it's going to be harder than some think, but as it is you really should try it for yourself to form an opinion.

2 cents is definitely cheap, but part of generating these images in a useful fashion is doing a ton of them. Sometimes you get lucky fast, but mostly it's a game of numbers. So those costs come with a multiplier attached to them in the real world.
I'm not convinced the technology isn't going to continue to rapidly improve. Issues like modelling a specific style or subject are already being addressed with techniques like DreamBooth and eDiffi.
 
Upvote
7 (7 / 0)

darkowl

Ars Tribunus Militum
2,012
Subscriptor++
Given that I've seen word around that they originally wanted to charge around several dollars for a single generated image, this seems like a massive pivot. I'd guess that Stable Diffusion's sudden popularity and integration with services that presumably may have been considering Dall-E, have forced their hand.
 
Upvote
4 (4 / 0)

Aurich

Director of Many Things
41,066
Ars Staff
I'm not convinced the technology isn't going to continue to rapidly improve. Issues like modelling a specific style or subject are already being addressed with techniques like DreamBooth and eDiffi.
Well, as I said, I encourage people to use it for themselves, and not go by cherry picked images in news stories. It's easy to see the 'good stuff' and not realize how much time and effort it can often take to get there.

Also, if you're happy with "oh that looks neat" it's often not hard to get that. If you have something specific in mind trying to actually achieve what you're envisioning can be very difficult.

There are things it's pretty good at. And things it freaking sucks at (hands lol). And those latter things aren't necessarily easy to solve.

I'm not convinced yet myself.
 
Upvote
17 (17 / 0)

Fred Duck

Ars Tribunus Angusticlavius
7,248
Well, as I said, I encourage people to use it for themselves, and not go by cherry picked images in news stories. It's easy to see the 'good stuff' and not realize how much time and effort it can often take to get there.

Also, if you're happy with "oh that looks neat" it's often not hard to get that. If you have something specific in mind trying to actually achieve what you're envisioning can be very difficult.

There are things it's pretty good at. And things it freaking sucks at (hands lol). And those latter things aren't necessarily easy to solve.

I'm not convinced yet myself.
It doesn't have to be good, it just has to be not bad enough.

Low quality CGI animated shows entirely displaced low quality hand-drawn animated shows by being cheaper, not better. We still have low quality CGI shows and also some high quality CGI films but hand-drawn animation is almost entirely extinct now, especially in the West.

The vast majority of people can overlook glaring flaws such as partially clipping through walls so long as there's SOMETHING to look at, most people will be placated.

Example:
In fan translation circles, there are two basic camps:
1 Those who feel translation must be done by people who understand the culture well enough to recognise/understand/translate jokes and references and do everything well or perfectly.

2 Those who feel a machine translation (cleaned up or no) is better than nothing, even if they miss all nuance or end up being more or less unintelligible.

The situation really is that if people do ML translation on something, because the vast majority of people can overlook glaring flaws so long as there's SOMETHING to try to read, it lessens the incentive for people to properly translate that piece and thus, once something is ML-ised, it's generally passed over by fan translators.

Other example:
Audiophiles were horrified when MP3 appeared because of the compression. Most people were just pleased to have audio files resembling their favourite songs.


In the hit film, Back to the Future II, Marty shows off his quickdraw skills on a coin-op machine and the little rascal watching him remarks, "You mean you have to use your hands? That's like a baby's toy!"

Those words will be applicable to art sooner rather than later. :(
 
Upvote
5 (6 / -1)

austenite

Wise, Aged Ars Veteran
164
Given that I've seen word around that they originally wanted to charge around several dollars for a single generated image, this seems like a massive pivot. I'd guess that Stable Diffusion's sudden popularity and integration with services that presumably may have been considering Dall-E, have forced their hand.

Us feeding our input text and output choice is simply the next stage of teaching.
 
Upvote
2 (2 / 0)

Bill T.

Ars Centurion
322
Subscriptor
Only on the Moon would a horse be able to stand with its front legs crossed like that. And, since the bridle doesn't go over the top of her nose (can't see any plumbing back there, so I'm assuming it's a her), ain't much holding the bit in her mouth.

Yeah, there's a lot of uncanny going on in those images. In particular, the shadows on the legs look somehow off to me, although I can't quite put my finger on it. It's like the light for those shadows is above the viewer's right shoulder, where all the other shadows are generated by a light at the far left.

What really bugged me, though, was that the corgis are blurry in all the pictures but the sunglasses are very sharp on two of the four pictures. What's up with that? Why are the corgis all blurry?
 
Upvote
2 (2 / 0)
How funny it is that "OpenAI" is closed source and gated while Stable Diffusion is wide open. Apparently they're also no longer a non-profit.
Didn’t OpenAI bill themselves as, “we’ll do this scary AI stuff in the open so it can be used for good, we gotta beat the bad actors to it and do it right and ethical!”

I was suspicious that it didn’t just mean “fake being open until you’re able to rake it cash from your service that now has open source vibes that encourage trust”
You were right to be suspicious. Turns out that's exactly what it meant (along with perhaps some tax shenanigans from the non-profit status, can't say for sure).

Really glad the people who do actually care seem to not be lagging far behind. Or if it turns out if they don't actually care either, at least someone else can pretend to after they've moved on. The basic science is staying out in the open enough for that to work (for now).
 
Upvote
2 (2 / 0)

nickf

Ars Tribunus Militum
2,636
Subscriptor
Only on the Moon would a horse be able to stand with its front legs crossed like that. And, since the bridle doesn't go over the top of her nose (can't see any plumbing back there, so I'm assuming it's a her), ain't much holding the bit in her mouth.

Yeah, there's a lot of uncanny going on in those images. In particular, the shadows on the legs look somehow off to me, although I can't quite put my finger on it. It's like the light for those shadows is above the viewer's right shoulder, where all the other shadows are generated by a light at the far left.

What really bugged me, though, was that the corgis are blurry in all the pictures but the sunglasses are very sharp on two of the four pictures. What's up with that? Why are the corgis all blurry?

With regards to generating images of people, Stable Diffusion often has problems with bifurcated limbs, pupils (eyes, not scholastic) and general posture. That said, it can be fine if you just want something for a D&D night, for example. (Especially if you possess no artistic talent).
 
Upvote
3 (3 / 0)

nickf

Ars Tribunus Militum
2,636
Subscriptor
Can we get a Stable Diffusion beginners guide from Ars, please? Step by step, how to install on a local computer and how to use. I don’t understand what’s involved and I can’t find anything that is easy to follow. Most what I found is chaotic at best.

Diffusion Bee is a GUI-based Stable Diffusion installation for M1 Macs. It's a ~ 8 Gb one-click install.

https://diffusionbee.com
 
Upvote
2 (2 / 0)

Zak

Ars Tribunus Angusticlavius
7,545
Can we get a Stable Diffusion beginners guide from Ars, please? Step by step, how to install on a local computer and how to use. I don’t understand what’s involved and I can’t find anything that is easy to follow. Most what I found is chaotic at best.

Diffusion Bee is a GUI-based Stable Diffusion installation for M1 Macs. It's a ~ 8 Gb one-click install.

https://diffusionbee.com

Thanks! I also found this: https://www.slaphappylarry.com/how-to-u ... -for-free/

The author lists several Mac and Windows apps as well as Photoshop and Krita plugins.
 
Upvote
3 (3 / 0)
There are things it's pretty good at. And things it freaking sucks at (hands lol).

As an avid midjourney user, I am haunted by all of failed hand images it has produced. Truly terrifying.

Hands and proportional legs or feet tend to be the bane of all artists, it's not a surprise that AI would screw them up more frequently even if the training set had a million images from artists who mastered hands.
 
Upvote
3 (3 / 0)
This will just fuel more weaponized meme chaos. 3 Years ago, I never would've imagined I'd come to hate memes as much as I do today. And it all started with misappropriated terribly drawn french cartoon frogs.


You are either cranky about oversaturation or the authors themselves, not the content.

Complaining about either art or memes in and of themselves is too broad, but it's entirely fair to critique that lowering the barrier of entry for certain types of creativity severely devalues it by annoying the audience before even getting to the good stuff.

Coincidentally that was the thrust of the arguments from a subset of artists on Twitter before Elon's purchase completed. More experienced and confident artists see it as a valuable tool but the more medicore artists with less distinctive styles that highly value the iterative process more feel like not being able to charge reasonably for the time and effort behind those iterations in addition to the final product is taking away a higher price point for their commissions.
 
Upvote
1 (2 / -1)
At this point I've generated thousands of images with Stable Diffusion and only a handful didn't go straight to trash. So far I'd categorize it as more a toy than a tool. Technically impressive, but underwhelming in practice.

For characters it's honestly almost useless. Usually there's more stuff wrong than there's right and when it gets things right, it's very basic front-facing or 3/4 portraits. Try to specify anything like equipment, clothing, expression, pose, lighting, etc. and you'll just get, at best, funny results. One of the more common problems (besides incorrect anatomy) are things just fusing together, like an archer's bow being a part of her hair.

Landscape images, especially natural landscapes or ruins, can turn out great. A prompt like "post-apocalyptic ruins on Mars" generates results that will have you thinking there must've been life on Mars at some point. Consistency is an issue though. Say I want to create some backdrops for a visual novel - getting multiple shots that could be used in a sequence and look like they belong in the same environment is a tall order. Even if you just generate hundreds of images with the exact same prompt.

And my cityscapes always have weird, distorted lines. Like the algorithm just could not draw a straight line. I don't know what's up with that.
 
Upvote
6 (6 / 0)

foobarian

Ars Scholae Palatinae
1,160
Subscriptor
A few observations regarding the astronaut on horse photo: astronaut's hand looks really weird (as Aurich mentioned that is typical for AI). Horse's legs and tail are really weird. Tail looks more like a stocky tentacle. The horse's head also has an interesting structure on its face above the eyes like some sort of unicorn ancestor or a dinosaur with cranial bone plates.

Overall, though -- the program certainly produced a photo which matches the prompt. Interesting tech.

edit: also what is that creepy porthole looking thing on the horse's side? At first I thought it was part of a saddle but now it kinda looks like the horse has a hole in it. Creepy :)
 
Upvote
5 (5 / 0)

Zak

Ars Tribunus Angusticlavius
7,545
So I'm trying this out:
https://nmkd.itch.io/t2i-gui

Instead of the Mac app here:
https://diffusionbee.com/

The PC app is faster on my PC with RTX3080 than the Mac app on my M1 Mac Mini and allows me to create 1024x1024 images versus 768x768 on the Mac, and includes upscaling option. But the results so far are... creepy on both and not very pleasing. People and animals are weak, landscapes much better indeed. It's fun though, but it maxes out the video card:)
 
Upvote
2 (2 / 0)

Kjella

Ars Tribunus Militum
2,081
A few observations regarding the astronaut on horse photo: astronaut's hand looks really weird (as Aurich mentioned that is typical for AI). Horse's legs and tail are really weird. Tail looks more like a stocky tentacle. The horse's head also has an interesting structure on its face above the eyes like some sort of unicorn ancestor or a dinosaur with cranial bone plates.

Overall, though -- the program certainly produced a photo which matches the prompt. Interesting tech.

edit: also what is that creepy porthole looking thing on the horse's side? At first I thought it was part of a saddle but now it kinda looks like the horse has a hole in it. Creepy :)
I feel like the main contribution here is trying to sketch something up to get the real requirements. "No, it should be a black stallion. And it should be more front-facing. And the horse should be galloping. And the astronaut should be leaning forward like he's a jockey in a race. Dark horse in the space race, get it?"

Because I can't tell you how much time I've spent with users that is like "you asked for an astronaut on a horse, I delivered an astronaut on a horse" and they start going "yes, but not like that what I meant was..." That's with computer code though, I can't do art worth shit but I imagine creators deal with the same thing.

And yes you can manage this with contracts and change orders but it's still better to manage expectations up front as much as possible. Or they can click through a thousands of generated images with a hundred variations of the same prompt hoping to win the lottery.
 
Upvote
5 (5 / 0)

foobarian

Ars Scholae Palatinae
1,160
Subscriptor
A few observations regarding the astronaut on horse photo: astronaut's hand looks really weird (as Aurich mentioned that is typical for AI). Horse's legs and tail are really weird. Tail looks more like a stocky tentacle. The horse's head also has an interesting structure on its face above the eyes like some sort of unicorn ancestor or a dinosaur with cranial bone plates.

Overall, though -- the program certainly produced a photo which matches the prompt. Interesting tech.

edit: also what is that creepy porthole looking thing on the horse's side? At first I thought it was part of a saddle but now it kinda looks like the horse has a hole in it. Creepy :)
I feel like the main contribution here is trying to sketch something up to get the real requirements. "No, it should be a black stallion. And it should be more front-facing. And the horse should be galloping. And the astronaut should be leaning forward like he's a jockey in a race. Dark horse in the space race, get it?"

Because I can't tell you how much time I've spent with users that is like "you asked for an astronaut on a horse, I delivered an astronaut on a horse" and they start going "yes, but not like that what I meant was..." That's with computer code though, I can't do art worth shit but I imagine creators deal with the same thing.

And yes you can manage this with contracts and change orders but it's still better to manage expectations up front as much as possible. Or they can click through a thousands of generated images with a hundred variations of the same prompt hoping to win the lottery.
That sounds like a really useful bit of design software. Time to write my investor pitch...
 
Upvote
-1 (0 / -1)

Skizzman

Smack-Fu Master, in training
84
There are things it's pretty good at. And things it freaking sucks at (hands lol).

As an avid midjourney user, I am haunted by all of failed hand images it has produced. Truly terrifying.

Hands and proportional legs or feet tend to be the bane of all artists, it's not a surprise that AI would screw them up more frequently even if the training set had a million images from artists who mastered hands.

If only the issue was proportion... Instead you get things that look like they were caught in heavy machinery or bizarrely long and branching, or sometimes both https://mj-gallery.com/3111ef28-2b15-47 ... grid_0.png
 
Upvote
4 (4 / 0)