Getting an all-optical AI to handle non-linear math

Omega_Prime · Jan 12, 2025

That’s just the time needed for a camera to transform the photons hitting its aperture into electrical chargers using either CMOS or CCD sensors.

I believe you meant charges rather than chargers. Emphasis mine.

-peter- · Jan 12, 2025

Very interesting! Really looking forward to more and more optical computing!

sxotty · Jan 12, 2025

This does seem to offer something that could actually be useful

aleph_nought · Jan 12, 2025

A photonic image/video/vision recognition system would be a huge step forward for autonomous moving systems. Your robot car could see and categorize a potentially dangerous object up ahead in nanoseconds. Having the evasive maneuver take a few more milliseconds is fine.

I could see this going into assistive tech for the visually impaired. The system could take camera input, draw outlines around potentially dangerous objects, and beam the augmented image on to displays on glasses or even right into the brain through electrodes.

paw · Jan 12, 2025

aleph_nought said:
A photonic image/video/vision recognition system would be a huge step forward for autonomous moving systems. Your robot car could see and categorize a potentially dangerous object up ahead in nanoseconds. Having the evasive maneuver take a few more milliseconds is fine.
...

Faster is usually better, but at what point is it fast enough? Classifying in milliseconds rather than picoseconds is still way faster than human reflexes.

I'm more interested in the energy efficiency of this approach. Very early days, but would photonics by 10x more/less energy efficient when scaled up to 100,000 parameters than conventional chips?

→→→ · Jan 12, 2025

When comparing camera latency, it is worth mentioning event cameras, which allow for faster control loops, as they have latency on the order of ten microseconds. But you still need to process the output which takes milliseconds on a conventional computer.

Half a nanosecond mentioned in this article is still much smaller timeframe, so it seems interesting.

DarthSlack · Jan 12, 2025

paw said:
Faster is usually better, but at what point is it fast enough? Classifying in milliseconds rather than picoseconds is still way faster than human reflexes.

I think the comparison to human reflexes isn't the right one to be making. If information can be processed in picoseconds, that gives an autonomous system more time to evaluate different options for avoiding a problem. Or evaluating multiple sensor inputs to determine if there's a problem in the first place.

Given the prevalence of problem like phantom braking or plowing into parked cop cars, there may be no "fast enough".

azazel1024 · Jan 12, 2025

DarthSlack said:
I think the comparison to human reflexes isn't the right one to be making. If information can be processed in picoseconds, that gives an autonomous system more time to evaluate different options for avoiding a problem. Or evaluating multiple sensor inputs to determine if there's a problem in the first place.

Given the prevalence of problem like phantom braking or plowing into parked cop cars, there may be no "fast enough".

Well, sure. However, a system should be continuously imaging and evaluating. Faster is better, but as quantified in the article, we are talking a 20ms delay. The human brain is also not instantly determining what to do and our visual lag is approximately 20ms. Our actual reflex time is on the order of 200ms. And actual input to vehicle time is approximately 500ms when we are POISED to take action and are simply waiting.

When we are less alert (as in almost always) our reflex time is closer to a second, especially if you are talking braking and you have to move your foot.

Faster IS better. Every 11.36ms is one foot you travel at 50mph, so yes, 20ms can be the difference between missing something and a collision. That being said, vehicle systems can see a lot more and react drastically faster than humans already. What is MOST important is seeing the CORRECT thing as well as taking the CORRECT action. That, autonomous vehicle systems still lag pretty far behind humans.

So at least TODAY, the most important thing is correctly processing signals and coming to the correct decision, not the speed at which it can process the input signal. And unless the photonic signal processors can do all of the processing on chip, it is still going to have to offload to something with significantly more processing power for image interpretation, decision making, and vehicle control.

Don't get me wrong, this is cool and there are certainly applications. But this is a very far from replacing what there currently is and is perhaps chasing the wrong problem to solve. As to your last, that isn't a speed of signal processing issue, that is data interpretation issue, which is what the struggle still is. Humans and human vision ar still drastically better at that than what we have come up with for on vehicle systems.

wrylachlan · Jan 12, 2025

This sounds interesting but I’m not at all convinced that low latency is more important than bandwidth for vision recognition ML. Obviously both would be ideal but in a world where you can have one or the other I think bandwidth is king.

Wickwick · Jan 12, 2025

The imaging example is a bad one. Most of the 20 ms is spent waiting for enough photons to arrive to make use of them. Trying to do that faster will yield lower detail like looking at a ray tracing scene lit with only a few rays.

However, there are plenty of control and/or optimization scenarios where having a shorter inference time would be very useful.

Wickwick · Jan 12, 2025

I'm sure the authors are aware of the field of nonlinear optics. I would have loved a discussion about why something like frequency doubling of a 1064 nm laser beam with ~~KDP~~ KTP crystals wasn't considered or failed its implementation.

Edit: It's been a few years.

Mungus the Unhyphenated · Jan 12, 2025

azazel1024 said:
Well, sure. However, a system should be continuously imaging and evaluating. Faster is better, but as quantified in the article, we are talking a 20ms delay. The human brain is also not instantly determining what to do and our visual lag is approximately 20ms. Our actual reflex time is on the order of 200ms. And actual input to vehicle time is approximately 500ms when we are POISED to take action and are simply waiting.

When we are less alert (as in almost always) our reflex time is closer to a second, especially if you are talking braking and you have to move your foot.

Faster IS better. Every 11.36ms is one foot you travel at 50mph, so yes, 20ms can be the difference between missing something and a collision. That being said, vehicle systems can see a lot more and react drastically faster than humans already. What is MOST important is seeing the CORRECT thing as well as taking the CORRECT action. That, autonomous vehicle systems still lag pretty far behind humans.

So at least TODAY, the most important thing is correctly processing signals and coming to the correct decision, not the speed at which it can process the input signal. And unless the photonic signal processors can do all of the processing on chip, it is still going to have to offload to something with significantly more processing power for image interpretation, decision making, and vehicle control.

Don't get me wrong, this is cool and there are certainly applications. But this is a very far from replacing what there currently is and is perhaps chasing the wrong problem to solve. As to your last, that isn't a speed of signal processing issue, that is data interpretation issue, which is what the struggle still is. Humans and human vision ar still drastically better at that than what we have come up with for on vehicle systems.

I think the objective is to build a system which responds similarly to the human "spinal reflex". The classic human example is if you touch something painful, your arm will jerk back before the full neural signal ever reaches the brain to trigger a decision/action. Instead, the reflex is triggered in the spinal column for an immediate self-preservation action, so that by the time the brain receives the signal, your hand is already "safe" and your brain can direct further conscious action.

Humans have visual-triggered reflexes as well. We see an object heading toward our face rapidly, we see that we're about to walk off a ledge instead of down a set of stairs, something large suddenly blocks out path -- we immediately and un-thinkingly avoid by ducking, reversing course, or stopping. Once trained to drive a car, the "avoid" reflex kicks in if we see a vehicle turn into our path, or brake lights suddenly appear immediately in front of us. We'll stomp the brake and/or haul on the steering wheel without thinking for an instant, and then the thinking kicks in and we'll make more decisive maneuvers. But the instantaneous reaction to what we see is a reflexive "avoid" with no other thought-directed constraints. A particularly well-trained driver will note escape routes constantly, so that reflex is continually directed toward the safest action in an if-then sort of loop. But our body's initial reaction will always be an ambiguous "avoid" if incoming visual input is too much of a fundamental "danger!" trigger, and we'll pick up with conscious action after our reflexes attempt to mitigate the immediate threat.

So in a autonomous vehicle system with an optical danger-avoidance system, the objective would be to replicate the human reflex action. Brake lights appear too close -> trigger hard braking + send "emergency braking mode triggered" on the control bus -> main control unit responds to emergency brake action in progress and then begins to modulate controlled braking and steering requirements.

A more sophisticated scenario would be: Aspect of vehicle ahead changes suddenly (from rear-on view to sudden brake lights and then rapidly increasing side-on view -- car is baking/skidding/avoiding) -> trigger hard braking AND steer away from car ahead in direction with most distant or no sensed obstacles + send "emergency braking AND avoidance+direction" on the control bus -> main control unit responds to braking and avoidance direction in progress and then begins to modulate controlled braking AND refines steering input and driection based on perimeter sensor inputs.

In short, we would use extremely fast optical input and processing to trigger simple, near-instantaneous "reflex" action, which the vehicle's main processor will pick up and further refine. Theoretically, if the optical "reflex" followed by CPU-controlled full response is faster than human reflex and follow-on conscious action/reaction, then an autonomous vehicle could possibly surpass a human in sudden accident avoidance. It won't solve a "no-win" or "trolley problem" style incident where any deviation from speed and direction will result in a collision of some sort. Humans are very bad at deciding what to crash into to minimize damage an injury to themselves or others. The best we could do in such a scenario is to have a visual reflex+CPU-controlled response path that includes continuous risk analysis in the processing routine to that it's more likely to choose the least-bad path faster than a human. Humans are notoriously emotional and erratic in a panic situation. A computer is more likely to be calculatingly statistical.

Tanj · Jan 12, 2025

A 4GHz clock ticks every 250ps.

An inverter uses about 50 attoJ to switch. An 8 bit multiply and add needs around 20 femtoJ.

Optical computing is interesting, but the bar to beat is very high.

laserboy · Jan 12, 2025

Wickwick said:
I'm sure the authors are aware of the field of nonlinear optics. I would have loved a discussion about why something like frequency doubling of a 1064 nm laser beam with KDP crystals wasn't considered or failed its implementation.

Well, most 2nd order nonlinear optical crystals (like ktp) are incompatible with standard lithographic technologies. Assuming (I’ve not read the paper yet) it’s InP based, then you only have third order processes available. High confinement can make them efficient, but they are really hard to control

Just read that is based on Si-on-insulator technology, so third order is all you have (and, a fairly small chi3, compared to InP from memory)

Photon_plumber · Jan 12, 2025

Wickwick said:
I'm sure the authors are aware of the field of nonlinear optics. I would have loved a discussion about why something like frequency doubling of a 1064 nm laser beam with KDP crystals wasn't considered or failed its implementation.

Curious why you think the harmonic will do better than the fundamental line? For some applications, the shorter wavelength is an advantage. At this stage of the research, having more bandwidth certainly isn’t going to improve anything.

Veritas super omens · Jan 12, 2025

I, for one, salute our new quantum photonic blockchain AI overlords...

lasertekk · Jan 12, 2025

Argh! Matrices. I did enough Fourier transforms to sink the Death Star. We did those strictly by hand. No computer aids.

Bill T. · Jan 12, 2025

Wickwick said:
I'm sure the authors are aware of the field of nonlinear optics. I would have loved a discussion about why something like frequency doubling of a 1064 nm laser beam with KDP crystals wasn't considered or failed its implementation.

Doubling the frequency doesn't strike me as useful for an activation function -- frankly, despite the link text, doubling sounds like a linear operation. In software, the simplest activation function is to convert all the negative values to zeros -- you need something that causes a bend in the curve, not that simply replaces the curve with another curve of a different slope.

Is there an optical equivalent of a diode?

laserboy · Jan 12, 2025

Photon_plumber said:
Curious why you think the harmonic will do better than the fundamental line? For some applications, the shorter wavelength is an advantage. At this stage of the research, having more bandwidth certainly isn’t going to improve anything.

Frequency doubling is also a multiplication operation

AI is cool i guess · Jan 12, 2025

when they said "mind-bogglingly faster" i felt that

410 picoseconds... shoutout to the MIT team for reminding us what exponential progress actually looks like. can't wait to see what happens when we scale this

(ps - anyone else getting early transistor vibes from this? just me? ok)

Wickwick · Jan 12, 2025

laserboy said:
Well, most 2nd order nonlinear optical crystals (like ktp) are incompatible with standard lithographic technologies. Assuming (I’ve not read the paper yet) it’s InP based, then you only have third order processes available. High confinement can make them efficient, but they are really hard to control

Just read that is based on Si-on-insulator technology, so third order is all you have (and, a fairly small chi3, compared to InP from memory)

Ah, excellent point on the lithography connection. That would be enough explanation.

Shavano · Jan 12, 2025

seeitrightayo said:
This is why AI is amazing

It's not AI. It's analog optical processing. The advance is the ability to do a lot of layers of optical processing in an integrated sensor array.

SixDegrees · Jan 12, 2025

Back in the day, Synthetic Aperture Radar images were produced using a special Fourier Transform lens that would optically compute the Fourier transform of the frequency image into a spatial image pretty much instantaneously. Those lenses were ghastly expensive and vanished into obscurity with the advent of faster, cheaper computers and the Fast Fourier Transform. So in some sense maybe we're coming full circle here.

Wickwick · Jan 12, 2025

Photon_plumber said:
Curious why you think the harmonic will do better than the fundamental line? For some applications, the shorter wavelength is an advantage. At this stage of the research, having more bandwidth certainly isn’t going to improve anything.

Bill T. said:
Doubling the frequency doesn't strike me as useful for an activation function -- frankly, despite the link text, doubling sounds like a linear operation. In software, the simplest activation function is to convert all the negative values to zeros -- you need something that causes a bend in the curve, not that simply replaces the curve with another curve of a different slope.

Is there an optical equivalent of a diode?

To answer both of you, the fraction of the fundamental line that is converted to 532 is a nonlinear function of the intensity. The nonlinear function between layers in a neural net need not be a threshold function. It can be just about any nonlinear function. So one would do the linear algebra with the 1064 line then use the doubling to 532 as the nonlinear functions between layers. Given that you want to keep working with the 1064 lines, however, you'd probably not want to use the green light for your next step. You'd want to stick with the IR. So You might use the conversion to green as a nonlinear loss function.

Wickwick · Jan 12, 2025

Shavano said:
It's not AI. It's analog optical processing. The advance is the ability to do a lot of layers of optical processing in an integrated sensor array.

With the important distinction that there's a non-linear bit of processing included. Linear operations with light are quite simple: every interferometer is doing that.

Wickwick · Jan 12, 2025

Bill T. said:
Is there an optical equivalent of a diode?

There is at least one I know of, but it's like a backwards Zener diode. If you focus to a spot and ionize the air, the rest of the pulse can't get through the plasma ball. So it's a diode that cuts out at a given intensity.

DarthSlack · Jan 12, 2025

azazel1024 said:
Well, sure. However, a system should be continuously imaging and evaluating. Faster is better, but as quantified in the article, we are talking a 20ms delay. The human brain is also not instantly determining what to do and our visual lag is approximately 20ms. Our actual reflex time is on the order of 200ms. And actual input to vehicle time is approximately 500ms when we are POISED to take action and are simply waiting.

When we are less alert (as in almost always) our reflex time is closer to a second, especially if you are talking braking and you have to move your foot.

Faster IS better. Every 11.36ms is one foot you travel at 50mph, so yes, 20ms can be the difference between missing something and a collision. That being said, vehicle systems can see a lot more and react drastically faster than humans already. What is MOST important is seeing the CORRECT thing as well as taking the CORRECT action. That, autonomous vehicle systems still lag pretty far behind humans.

So at least TODAY, the most important thing is correctly processing signals and coming to the correct decision, not the speed at which it can process the input signal. And unless the photonic signal processors can do all of the processing on chip, it is still going to have to offload to something with significantly more processing power for image interpretation, decision making, and vehicle control.

Don't get me wrong, this is cool and there are certainly applications. But this is a very far from replacing what there currently is and is perhaps chasing the wrong problem to solve. As to your last, that isn't a speed of signal processing issue, that is data interpretation issue, which is what the struggle still is. Humans and human vision ar still drastically better at that than what we have come up with for on vehicle systems.

We're in violent agreement here. I was hoping that by the time I retired and/or lost the ability to drive, that autonomous driving would be a thing. What's been clear from the last few years is that the sensors and processing capability we have just isn't up to the job. We need leaps in technology (like this could be) in order to allow someone to take a nap in the back seat. Until then, we're pretty much stuck with the driver assist tech we've got now. That's not a bad place to be, I've found it to be enormously helpful, but as you note, it doesn't take the people out of the equation.

Pentarctagon · Jan 12, 2025

I know nothing about their fabrication, but are these kinds of photonic processing units durable enough to be put in stuff like phones or cars? Working with light just seems like an inherently more fragile approach.

Wickwick · Jan 12, 2025

Pentarctagon said:
I know nothing about their fabrication, but are these kinds of photonic processing units durable enough to be put in stuff like phones or cars? Working with light just seems like an inherently more fragile approach.

The processing is done in silica so it's just as robust as any other electronics in your car. Creating the photons needs some fiber optics and gratings, but that can be relatively robust.

laserboy · Jan 12, 2025

Wickwick said:
There is at least one I know of, but it's like a backwards Zener diode. If you focus to a spot and ionize the air, the rest of the pulse can't get through the plasma ball. So it's a diode that cuts out at a given intensity.

A faraday isolator is also an optical diode but that is even harder to integrate in a pic than frequency doubling

perral1 · Jan 12, 2025

The team that implemented a complete deep neural network on a photonic chip, achieving a latency of 410 picoseconds. To put that in perspective, Bandyopadhyay’s chip could process the entire neural net it had onboard around 58 times within a single tick of the 4 GHz clock on a standard CPU.

This statement is not true, and I’m curious what it’s supposed to say? As someone also noted above, 4Ghz clock ticks in 250ps. Not to imply a CPU is processing “a complete DNN” in that time, but a better comparison is needed.

laserboy · Jan 12, 2025

Shavano said:
It's not AI. It's analog optical processing. The advance is the ability to do a lot of layers of optical processing in an integrated sensor array.

That’s not exactly fair. Current AI consists of repeated matrix operations followed by repeated nonlinear operations. This chip demonstrates exactly that, but at a much smaller scale than modern AI

laserboy · Jan 12, 2025

Tanj said:
A 4GHz clock ticks every 250ps.

An inverter uses about 50 attoJ to switch. An 8 bit multiply and add needs around 20 femtoJ.

Optical computing is interesting, but the bar to beat is very high.

While the numbers might be true, the comparison isn’t meaningful. Your inverter uses x joules per bit. The equivalent optical inverter uses y joules per operation, but the number of bits encoded in the operation depends on how accurately we decide to measure the light amplitude. So a 1 bit inverter uses exactly the same power as a 100 bit inverter.
The better comparison is at the operational level (in this case a 5 layer device, including input and output levels, each with 6 inputs, and encoded at about 132 levels)

djspiewak · Jan 12, 2025

FWIW, most commercial car-mountable lidar units run at 10 Hz. This is a hard limit because the lidar array has to physically spin around in a circle, so it’s only going to move so fast. Cameras are more flexible but a sample rate of 10-15 Hz is pretty standard just to ease compute pressure on the ECU, and I’m not aware of anyone even sniffing 30 Hz much less higher.

At 10 Hz, you have 100 ms to process each “frame” (LiDAR are weird) before you start to perceive the next one. The signal processing latency is peanuts by comparison.

Also let’s remember that most humans have reaction times in the 100-200ms range anyway, and our road safety margins are designed with this in mind. I find it very unlikely that cutting out the ISP and operating directly on the optical feed will result in latency improvements that anyone is willing to pay for.

It is cool though!

Nalyd · Jan 12, 2025

djspiewak said:
FWIW, most commercial car-mountable lidar units run at 10 Hz. This is a hard limit because the lidar array has to physically spin around in a circle, so it’s only going to move so fast. Cameras are more flexible but a sample rate of 10-15 Hz is pretty standard just to ease compute pressure on the ECU, and I’m not aware of anyone even sniffing 30 Hz much less higher.

At 10 Hz, you have 100 ms to process each “frame” (LiDAR are weird) before you start to perceive the next one. The signal processing latency is peanuts by comparison.

Also let’s remember that most humans have reaction times in the 100-200ms range anyway, and our road safety margins are designed with this in mind. I find it very unlikely that cutting out the ISP and operating directly on the optical feed will result in latency improvements that anyone is willing to pay for.

It is cool though!

Right but here you bypass the (imposed) camera refresh and the ECU processing rate restrictions. At least on certain scenarios.

Wickwick said:
The imaging example is a bad one. Most of the 20 ms is spent waiting for enough photons to arrive to make use of them. Trying to do that faster will yield lower detail like looking at a ray tracing scene lit with only a few rays.

However, there are plenty of control and/or optimization scenarios where having a shorter inference time would be very useful.

For active pulse imaging eg LiDAR sure, but for natural light the world is awash in photons arriving continuously.

Fast reprocessing allows lots and lots of reinforcement of interpretations, perhaps even including a fuzzing/dropout sampling approach to get a better handle on uncertainty within a tiny timeframe.

Photon_plumber · Jan 12, 2025

laserboy said:
A faraday isolator is also an optical diode but that is even harder to integrate in a pic than frequency doubling

It’s been a number of years, the brain fades. I remember the use of an optical diode to prevent spatial hole burning in a laser we were developing. It was either a dye or a very early Ti:Sapphire. That was perhaps 40 years ago, or more. To put it into perspective , one of my team associates used to pass Heisenberg in the hallway at his previous position. I’m currently quite disconnected from current products and technologies. To be young again.

Winston11 · Jan 12, 2025

djspiewak said:
FWIW, most commercial car-mountable lidar units run at 10 Hz. This is a hard limit because the lidar array has to physically spin around in a circle, so it’s only going to move so fast. Cameras are more flexible but a sample rate of 10-15 Hz is pretty standard just to ease compute pressure on the ECU, and I’m not aware of anyone even sniffing 30 Hz much less higher.

Tesla claims that its FSD chip processes camera images at 2,300 frames/sec, on cars equipped with the HW3 hardware, first introduced in 2019 (!). It is not clear how many cameras are involved.

HW4 was introduced in 2024, and are on the latest vehicles, including the Model 3 Highland. Its (unknown) processing speed is reportedly several times faster than HW3. [Wikipedia: Tesla Autopilot hardware]

Getting an all-optical AI to handle non-linear math

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Ars Praefectus

Ars Tribunus Militum

Seniorius Lurkius

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Tribunus Militum

Ars Praetorian

Ars Tribunus Militum

Ars Tribunus Militum

Ars Legatus Legionis

Ars Scholae Palatinae

Ars Centurion

Ars Tribunus Militum

Ars Centurion

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Ars Legatus Legionis

Wise, Aged Ars Veteran

Ars Legatus Legionis

Ars Tribunus Militum

Seniorius Lurkius

Ars Tribunus Militum

Ars Tribunus Militum

Seniorius Lurkius

Ars Praefectus

Ars Tribunus Militum

Smack-Fu Master, in training