Getting an all-optical AI to handle non-linear math

A photonic image/video/vision recognition system would be a huge step forward for autonomous moving systems. Your robot car could see and categorize a potentially dangerous object up ahead in nanoseconds. Having the evasive maneuver take a few more milliseconds is fine.

I could see this going into assistive tech for the visually impaired. The system could take camera input, draw outlines around potentially dangerous objects, and beam the augmented image on to displays on glasses or even right into the brain through electrodes.
 
Upvote
47 (48 / -1)

paw

Ars Tribunus Militum
2,032
Subscriptor
A photonic image/video/vision recognition system would be a huge step forward for autonomous moving systems. Your robot car could see and categorize a potentially dangerous object up ahead in nanoseconds. Having the evasive maneuver take a few more milliseconds is fine.
...
Faster is usually better, but at what point is it fast enough? Classifying in milliseconds rather than picoseconds is still way faster than human reflexes.

I'm more interested in the energy efficiency of this approach. Very early days, but would photonics by 10x more/less energy efficient when scaled up to 100,000 parameters than conventional chips?
 
Upvote
36 (43 / -7)

→→→

Seniorius Lurkius
33
Subscriptor
When comparing camera latency, it is worth mentioning event cameras, which allow for faster control loops, as they have latency on the order of ten microseconds. But you still need to process the output which takes milliseconds on a conventional computer.

Half a nanosecond mentioned in this article is still much smaller timeframe, so it seems interesting.
 
Upvote
58 (58 / 0)
Post content hidden for low score. Show…

DarthSlack

Ars Legatus Legionis
23,406
Subscriptor++
Faster is usually better, but at what point is it fast enough? Classifying in milliseconds rather than picoseconds is still way faster than human reflexes.

I think the comparison to human reflexes isn't the right one to be making. If information can be processed in picoseconds, that gives an autonomous system more time to evaluate different options for avoiding a problem. Or evaluating multiple sensor inputs to determine if there's a problem in the first place.

Given the prevalence of problem like phantom braking or plowing into parked cop cars, there may be no "fast enough".
 
Upvote
86 (89 / -3)

azazel1024

Ars Legatus Legionis
15,152
Subscriptor
I think the comparison to human reflexes isn't the right one to be making. If information can be processed in picoseconds, that gives an autonomous system more time to evaluate different options for avoiding a problem. Or evaluating multiple sensor inputs to determine if there's a problem in the first place.

Given the prevalence of problem like phantom braking or plowing into parked cop cars, there may be no "fast enough".
Well, sure. However, a system should be continuously imaging and evaluating. Faster is better, but as quantified in the article, we are talking a 20ms delay. The human brain is also not instantly determining what to do and our visual lag is approximately 20ms. Our actual reflex time is on the order of 200ms. And actual input to vehicle time is approximately 500ms when we are POISED to take action and are simply waiting.

When we are less alert (as in almost always) our reflex time is closer to a second, especially if you are talking braking and you have to move your foot.

Faster IS better. Every 11.36ms is one foot you travel at 50mph, so yes, 20ms can be the difference between missing something and a collision. That being said, vehicle systems can see a lot more and react drastically faster than humans already. What is MOST important is seeing the CORRECT thing as well as taking the CORRECT action. That, autonomous vehicle systems still lag pretty far behind humans.

So at least TODAY, the most important thing is correctly processing signals and coming to the correct decision, not the speed at which it can process the input signal. And unless the photonic signal processors can do all of the processing on chip, it is still going to have to offload to something with significantly more processing power for image interpretation, decision making, and vehicle control.

Don't get me wrong, this is cool and there are certainly applications. But this is a very far from replacing what there currently is and is perhaps chasing the wrong problem to solve. As to your last, that isn't a speed of signal processing issue, that is data interpretation issue, which is what the struggle still is. Humans and human vision ar still drastically better at that than what we have come up with for on vehicle systems.
 
Upvote
80 (81 / -1)

Wickwick

Ars Legatus Legionis
40,099
The imaging example is a bad one. Most of the 20 ms is spent waiting for enough photons to arrive to make use of them. Trying to do that faster will yield lower detail like looking at a ray tracing scene lit with only a few rays.

However, there are plenty of control and/or optimization scenarios where having a shorter inference time would be very useful.
 
Upvote
33 (34 / -1)
Well, sure. However, a system should be continuously imaging and evaluating. Faster is better, but as quantified in the article, we are talking a 20ms delay. The human brain is also not instantly determining what to do and our visual lag is approximately 20ms. Our actual reflex time is on the order of 200ms. And actual input to vehicle time is approximately 500ms when we are POISED to take action and are simply waiting.

When we are less alert (as in almost always) our reflex time is closer to a second, especially if you are talking braking and you have to move your foot.

Faster IS better. Every 11.36ms is one foot you travel at 50mph, so yes, 20ms can be the difference between missing something and a collision. That being said, vehicle systems can see a lot more and react drastically faster than humans already. What is MOST important is seeing the CORRECT thing as well as taking the CORRECT action. That, autonomous vehicle systems still lag pretty far behind humans.

So at least TODAY, the most important thing is correctly processing signals and coming to the correct decision, not the speed at which it can process the input signal. And unless the photonic signal processors can do all of the processing on chip, it is still going to have to offload to something with significantly more processing power for image interpretation, decision making, and vehicle control.

Don't get me wrong, this is cool and there are certainly applications. But this is a very far from replacing what there currently is and is perhaps chasing the wrong problem to solve. As to your last, that isn't a speed of signal processing issue, that is data interpretation issue, which is what the struggle still is. Humans and human vision ar still drastically better at that than what we have come up with for on vehicle systems.
I think the objective is to build a system which responds similarly to the human "spinal reflex". The classic human example is if you touch something painful, your arm will jerk back before the full neural signal ever reaches the brain to trigger a decision/action. Instead, the reflex is triggered in the spinal column for an immediate self-preservation action, so that by the time the brain receives the signal, your hand is already "safe" and your brain can direct further conscious action.

Humans have visual-triggered reflexes as well. We see an object heading toward our face rapidly, we see that we're about to walk off a ledge instead of down a set of stairs, something large suddenly blocks out path -- we immediately and un-thinkingly avoid by ducking, reversing course, or stopping. Once trained to drive a car, the "avoid" reflex kicks in if we see a vehicle turn into our path, or brake lights suddenly appear immediately in front of us. We'll stomp the brake and/or haul on the steering wheel without thinking for an instant, and then the thinking kicks in and we'll make more decisive maneuvers. But the instantaneous reaction to what we see is a reflexive "avoid" with no other thought-directed constraints. A particularly well-trained driver will note escape routes constantly, so that reflex is continually directed toward the safest action in an if-then sort of loop. But our body's initial reaction will always be an ambiguous "avoid" if incoming visual input is too much of a fundamental "danger!" trigger, and we'll pick up with conscious action after our reflexes attempt to mitigate the immediate threat.

So in a autonomous vehicle system with an optical danger-avoidance system, the objective would be to replicate the human reflex action. Brake lights appear too close -> trigger hard braking + send "emergency braking mode triggered" on the control bus -> main control unit responds to emergency brake action in progress and then begins to modulate controlled braking and steering requirements.

A more sophisticated scenario would be: Aspect of vehicle ahead changes suddenly (from rear-on view to sudden brake lights and then rapidly increasing side-on view -- car is baking/skidding/avoiding) -> trigger hard braking AND steer away from car ahead in direction with most distant or no sensed obstacles + send "emergency braking AND avoidance+direction" on the control bus -> main control unit responds to braking and avoidance direction in progress and then begins to modulate controlled braking AND refines steering input and driection based on perimeter sensor inputs.

In short, we would use extremely fast optical input and processing to trigger simple, near-instantaneous "reflex" action, which the vehicle's main processor will pick up and further refine. Theoretically, if the optical "reflex" followed by CPU-controlled full response is faster than human reflex and follow-on conscious action/reaction, then an autonomous vehicle could possibly surpass a human in sudden accident avoidance. It won't solve a "no-win" or "trolley problem" style incident where any deviation from speed and direction will result in a collision of some sort. Humans are very bad at deciding what to crash into to minimize damage an injury to themselves or others. The best we could do in such a scenario is to have a visual reflex+CPU-controlled response path that includes continuous risk analysis in the processing routine to that it's more likely to choose the least-bad path faster than a human. Humans are notoriously emotional and erratic in a panic situation. A computer is more likely to be calculatingly statistical.
 
Upvote
67 (67 / 0)

laserboy

Ars Tribunus Militum
1,642
Moderator
I'm sure the authors are aware of the field of nonlinear optics. I would have loved a discussion about why something like frequency doubling of a 1064 nm laser beam with KDP crystals wasn't considered or failed its implementation.
Well, most 2nd order nonlinear optical crystals (like ktp) are incompatible with standard lithographic technologies. Assuming (I’ve not read the paper yet) it’s InP based, then you only have third order processes available. High confinement can make them efficient, but they are really hard to control

Just read that is based on Si-on-insulator technology, so third order is all you have (and, a fairly small chi3, compared to InP from memory)
 
Last edited:
Upvote
34 (34 / 0)
I'm sure the authors are aware of the field of nonlinear optics. I would have loved a discussion about why something like frequency doubling of a 1064 nm laser beam with KDP crystals wasn't considered or failed its implementation.
Curious why you think the harmonic will do better than the fundamental line? For some applications, the shorter wavelength is an advantage. At this stage of the research, having more bandwidth certainly isn’t going to improve anything.
 
Upvote
3 (5 / -2)

Bill T.

Ars Centurion
332
Subscriptor
I'm sure the authors are aware of the field of nonlinear optics. I would have loved a discussion about why something like frequency doubling of a 1064 nm laser beam with KDP crystals wasn't considered or failed its implementation.
Doubling the frequency doesn't strike me as useful for an activation function -- frankly, despite the link text, doubling sounds like a linear operation. In software, the simplest activation function is to convert all the negative values to zeros -- you need something that causes a bend in the curve, not that simply replaces the curve with another curve of a different slope.

Is there an optical equivalent of a diode?
 
Upvote
11 (11 / 0)

laserboy

Ars Tribunus Militum
1,642
Moderator
Curious why you think the harmonic will do better than the fundamental line? For some applications, the shorter wavelength is an advantage. At this stage of the research, having more bandwidth certainly isn’t going to improve anything.
Frequency doubling is also a multiplication operation
 
Upvote
11 (11 / 0)

Wickwick

Ars Legatus Legionis
40,099
Well, most 2nd order nonlinear optical crystals (like ktp) are incompatible with standard lithographic technologies. Assuming (I’ve not read the paper yet) it’s InP based, then you only have third order processes available. High confinement can make them efficient, but they are really hard to control

Just read that is based on Si-on-insulator technology, so third order is all you have (and, a fairly small chi3, compared to InP from memory)
Ah, excellent point on the lithography connection. That would be enough explanation.
 
Upvote
13 (13 / 0)

SixDegrees

Ars Legatus Legionis
48,560
Subscriptor
Back in the day, Synthetic Aperture Radar images were produced using a special Fourier Transform lens that would optically compute the Fourier transform of the frequency image into a spatial image pretty much instantaneously. Those lenses were ghastly expensive and vanished into obscurity with the advent of faster, cheaper computers and the Fast Fourier Transform. So in some sense maybe we're coming full circle here.
 
Upvote
52 (52 / 0)

Wickwick

Ars Legatus Legionis
40,099
Curious why you think the harmonic will do better than the fundamental line? For some applications, the shorter wavelength is an advantage. At this stage of the research, having more bandwidth certainly isn’t going to improve anything.

Doubling the frequency doesn't strike me as useful for an activation function -- frankly, despite the link text, doubling sounds like a linear operation. In software, the simplest activation function is to convert all the negative values to zeros -- you need something that causes a bend in the curve, not that simply replaces the curve with another curve of a different slope.

Is there an optical equivalent of a diode?
To answer both of you, the fraction of the fundamental line that is converted to 532 is a nonlinear function of the intensity. The nonlinear function between layers in a neural net need not be a threshold function. It can be just about any nonlinear function. So one would do the linear algebra with the 1064 line then use the doubling to 532 as the nonlinear functions between layers. Given that you want to keep working with the 1064 lines, however, you'd probably not want to use the green light for your next step. You'd want to stick with the IR. So You might use the conversion to green as a nonlinear loss function.
 
Upvote
21 (21 / 0)

Wickwick

Ars Legatus Legionis
40,099
It's not AI. It's analog optical processing. The advance is the ability to do a lot of layers of optical processing in an integrated sensor array.
With the important distinction that there's a non-linear bit of processing included. Linear operations with light are quite simple: every interferometer is doing that.
 
Upvote
10 (10 / 0)
Post content hidden for low score. Show…

DarthSlack

Ars Legatus Legionis
23,406
Subscriptor++
Well, sure. However, a system should be continuously imaging and evaluating. Faster is better, but as quantified in the article, we are talking a 20ms delay. The human brain is also not instantly determining what to do and our visual lag is approximately 20ms. Our actual reflex time is on the order of 200ms. And actual input to vehicle time is approximately 500ms when we are POISED to take action and are simply waiting.

When we are less alert (as in almost always) our reflex time is closer to a second, especially if you are talking braking and you have to move your foot.

Faster IS better. Every 11.36ms is one foot you travel at 50mph, so yes, 20ms can be the difference between missing something and a collision. That being said, vehicle systems can see a lot more and react drastically faster than humans already. What is MOST important is seeing the CORRECT thing as well as taking the CORRECT action. That, autonomous vehicle systems still lag pretty far behind humans.

So at least TODAY, the most important thing is correctly processing signals and coming to the correct decision, not the speed at which it can process the input signal. And unless the photonic signal processors can do all of the processing on chip, it is still going to have to offload to something with significantly more processing power for image interpretation, decision making, and vehicle control.

Don't get me wrong, this is cool and there are certainly applications. But this is a very far from replacing what there currently is and is perhaps chasing the wrong problem to solve. As to your last, that isn't a speed of signal processing issue, that is data interpretation issue, which is what the struggle still is. Humans and human vision ar still drastically better at that than what we have come up with for on vehicle systems.

We're in violent agreement here. I was hoping that by the time I retired and/or lost the ability to drive, that autonomous driving would be a thing. What's been clear from the last few years is that the sensors and processing capability we have just isn't up to the job. We need leaps in technology (like this could be) in order to allow someone to take a nap in the back seat. Until then, we're pretty much stuck with the driver assist tech we've got now. That's not a bad place to be, I've found it to be enormously helpful, but as you note, it doesn't take the people out of the equation.
 
Upvote
18 (19 / -1)

Wickwick

Ars Legatus Legionis
40,099
I know nothing about their fabrication, but are these kinds of photonic processing units durable enough to be put in stuff like phones or cars? Working with light just seems like an inherently more fragile approach.
The processing is done in silica so it's just as robust as any other electronics in your car. Creating the photons needs some fiber optics and gratings, but that can be relatively robust.
 
Upvote
22 (22 / 0)

laserboy

Ars Tribunus Militum
1,642
Moderator
There is at least one I know of, but it's like a backwards Zener diode. If you focus to a spot and ionize the air, the rest of the pulse can't get through the plasma ball. So it's a diode that cuts out at a given intensity.
A faraday isolator is also an optical diode but that is even harder to integrate in a pic than frequency doubling
 
Upvote
5 (5 / 0)

perral1

Seniorius Lurkius
1
Subscriptor
The team that implemented a complete deep neural network on a photonic chip, achieving a latency of 410 picoseconds. To put that in perspective, Bandyopadhyay’s chip could process the entire neural net it had onboard around 58 times within a single tick of the 4 GHz clock on a standard CPU.
This statement is not true, and I’m curious what it’s supposed to say? As someone also noted above, 4Ghz clock ticks in 250ps. Not to imply a CPU is processing “a complete DNN” in that time, but a better comparison is needed.
 
Upvote
14 (14 / 0)

laserboy

Ars Tribunus Militum
1,642
Moderator
It's not AI. It's analog optical processing. The advance is the ability to do a lot of layers of optical processing in an integrated sensor array.
That’s not exactly fair. Current AI consists of repeated matrix operations followed by repeated nonlinear operations. This chip demonstrates exactly that, but at a much smaller scale than modern AI
 
Upvote
11 (12 / -1)

laserboy

Ars Tribunus Militum
1,642
Moderator
A 4GHz clock ticks every 250ps.

An inverter uses about 50 attoJ to switch. An 8 bit multiply and add needs around 20 femtoJ.

Optical computing is interesting, but the bar to beat is very high.
While the numbers might be true, the comparison isn’t meaningful. Your inverter uses x joules per bit. The equivalent optical inverter uses y joules per operation, but the number of bits encoded in the operation depends on how accurately we decide to measure the light amplitude. So a 1 bit inverter uses exactly the same power as a 100 bit inverter.
The better comparison is at the operational level (in this case a 5 layer device, including input and output levels, each with 6 inputs, and encoded at about 132 levels)
 
Upvote
17 (17 / 0)

djspiewak

Seniorius Lurkius
38
FWIW, most commercial car-mountable lidar units run at 10 Hz. This is a hard limit because the lidar array has to physically spin around in a circle, so it’s only going to move so fast. Cameras are more flexible but a sample rate of 10-15 Hz is pretty standard just to ease compute pressure on the ECU, and I’m not aware of anyone even sniffing 30 Hz much less higher.

At 10 Hz, you have 100 ms to process each “frame” (LiDAR are weird) before you start to perceive the next one. The signal processing latency is peanuts by comparison.

Also let’s remember that most humans have reaction times in the 100-200ms range anyway, and our road safety margins are designed with this in mind. I find it very unlikely that cutting out the ISP and operating directly on the optical feed will result in latency improvements that anyone is willing to pay for.

It is cool though!
 
Upvote
17 (17 / 0)

Nalyd

Ars Praefectus
3,057
Subscriptor
FWIW, most commercial car-mountable lidar units run at 10 Hz. This is a hard limit because the lidar array has to physically spin around in a circle, so it’s only going to move so fast. Cameras are more flexible but a sample rate of 10-15 Hz is pretty standard just to ease compute pressure on the ECU, and I’m not aware of anyone even sniffing 30 Hz much less higher.

At 10 Hz, you have 100 ms to process each “frame” (LiDAR are weird) before you start to perceive the next one. The signal processing latency is peanuts by comparison.

Also let’s remember that most humans have reaction times in the 100-200ms range anyway, and our road safety margins are designed with this in mind. I find it very unlikely that cutting out the ISP and operating directly on the optical feed will result in latency improvements that anyone is willing to pay for.

It is cool though!
Right but here you bypass the (imposed) camera refresh and the ECU processing rate restrictions. At least on certain scenarios.
The imaging example is a bad one. Most of the 20 ms is spent waiting for enough photons to arrive to make use of them. Trying to do that faster will yield lower detail like looking at a ray tracing scene lit with only a few rays.

However, there are plenty of control and/or optimization scenarios where having a shorter inference time would be very useful.
For active pulse imaging eg LiDAR sure, but for natural light the world is awash in photons arriving continuously.

Fast reprocessing allows lots and lots of reinforcement of interpretations, perhaps even including a fuzzing/dropout sampling approach to get a better handle on uncertainty within a tiny timeframe.
 
Upvote
5 (8 / -3)
A faraday isolator is also an optical diode but that is even harder to integrate in a pic than frequency doubling
It’s been a number of years, the brain fades. I remember the use of an optical diode to prevent spatial hole burning in a laser we were developing. It was either a dye or a very early Ti:Sapphire. That was perhaps 40 years ago, or more. To put it into perspective , one of my team associates used to pass Heisenberg in the hallway at his previous position. I’m currently quite disconnected from current products and technologies. To be young again.
 
Last edited:
Upvote
21 (21 / 0)

Winston11

Smack-Fu Master, in training
42
FWIW, most commercial car-mountable lidar units run at 10 Hz. This is a hard limit because the lidar array has to physically spin around in a circle, so it’s only going to move so fast. Cameras are more flexible but a sample rate of 10-15 Hz is pretty standard just to ease compute pressure on the ECU, and I’m not aware of anyone even sniffing 30 Hz much less higher.
Tesla claims that its FSD chip processes camera images at 2,300 frames/sec, on cars equipped with the HW3 hardware, first introduced in 2019 (!). It is not clear how many cameras are involved.

HW4 was introduced in 2024, and are on the latest vehicles, including the Model 3 Highland. Its (unknown) processing speed is reportedly several times faster than HW3. [Wikipedia: Tesla Autopilot hardware]
 
Upvote
-9 (2 / -11)