Conventional vision sensors see the world as a series of frames. Successive frames contain enormous amounts of redundant information, wasting memory access, RAM, disk space, energy, computational power and time. In addition, each frame imposes the same exposure time on every pixel, making it difficult to deal with scenes containing very dark and very bright regions.
The Dynamic Vision Sensor (DVS) solves these problems by using patented technology that works like your own retina. Instead of wastefully sending entire images at fixed frame rates, only the local pixel-level changes caused by movement in a scene are transmitted – at the time they occur. The result is a stream of events at microsecond time resolution, equivalent to or better than conventional high-speed vision sensors running at thousands of frames per second. Power, data storage and computational requirements are also drastically reduced, and sensor dynamic range is increased by orders of magnitude due to the local processing.
| Conventional high-speed vision systems | DVS | DVS Benefits |
|---|---|---|
| Requires fast PC | Works with any laptop | Lower costs Lower power consumption |
| Extremely large data storage (often several TB) Highly redundant data | Low storage requirements No redundant data | Lower costs More portable Easier and faster data management |
| Custom interface cards | Webcam-sized, USB2.0 Java API | More portable Easier programming |
| Batch-mode acquisition Off-line post-processing | Real-time acquisition Extremely low latency | Continuous processing No downtime, lower costs |
| Low dynamic range, ordinary sensitivity Needs special bright lighting (lasers, strobes, etc.) for short exposure times | High sensitivity No special lighting needed | Lower costs Simpler data acquisition |
| Limited dynamic range, typically 50 dB | Very high dynamic range (120 dB) | Usable in more real-world situations |
Problem: You need to react quickly to moving objects in uneven lighting conditions. Conventional video cameras are too slow and specialized high frame rate cameras produce too much data to process in real time. Both of these conventional solutions require very high and even lighting at high frame rate.
Solution: The DVS sensor nearly instantaneously reports movement of objects and automatically adapts to differing lighting conditions in different parts of an image without any calibration. Its high dynamic range brings out details that could not be detected with conventional vision systems and its low data rate enables real time short latency processing at low CPU load.
DVS used for robotic goalie with 550 effective frames per second performance at 4% processor load. See robogoalie.
Problem: You are analyzing turbulent fluid flow. Your conventional high-speed vision setup requires a cumbersome and expensive high-speed PC, lots of hard disk space, custom interface cards and high-intensity laser strobe lighting to illuminate the fluid. After each test run you have to wait minutes or hours while the data is processed.
Solution: DVS sensors enable you to replace your entire system with a single standard PC with a USB connection. Only normal collimated light is required to illuminate the fluid. The small data flow can be processed in real time, enabling you to work continuously and even adjust experimental parameters on the fly.
DVS used for PTV, courtesy P. Hafliger, Univ. of Oslo.
Problem: You are deploying a fast mobile robot that must work in the real world. You are operating under tight constraints of power consumption, space and weight. Conventional vision processing systems consume far too much power to fit on the robot platform. The only alternative is to send the images for off-line processing, but this would require a separate server, increase response times and limit the range of the robot.
Solution: The DVS vision sensor does much of the front-end processing, giving you only the “interesting” events in a scene at the time they occur. You can integrate all of your processing hardware on-board and react quickly to new input.
DVS data from driving.
Problem: You are studying sleep behavior patterns. Conventional video cameras record huge amounts of boring data where the subject is not moving, making it very labor intensive to manually annotate the behaviors.
Solution: The DVS only outputs subject movements. Instead of playing back the data at constant frame rate, you can play it back at constant event rate, so that the action is continuous. A whole night of sleep can be recorded in a 100 MB of storage and played back in less than a minute. Activity levels can be automatically extracted and any part of the recording can be viewed at 1 millisecond resolution.
DVS used to monitor mouse activity, courtesy I. Tobler, Univ. of Zurich.
The DVS functionality is achieved by having pixels that respond with precisely-timed events to temporal contrast. Movement of the scene or of an object with constant reflectance and illumination causes relative intensity change; thus the pixels are intrinsically invariant to scene illumination and directly encode scene reflectance change .
The events are output asynchronously and nearly instantaneously on an Address-Event bus, so they have much higher timing precision than the frame rate of a frame-based imager. This is shown by these recording from a spinning disk painted with wedges of various contrasts. The disk spins at 17 rev/sec, and the events are painted with colored-time in the right image. Our measurements show that we can often achieve a timing precision of 1 us and a latency of 15 us with bright illumination. Because there are no frames, the events can be played back at any desired rate, as shown in the right video. The low latency is very useful for robotic systems, such as the pencil balancing robot.
Because the pixels locally respond to relative change of intensity, the device has a large intra-scene dynamic range. This wide dynamic range is demonstrated by the Edmund gray scale chart, which is differentially illuminated by a ratio of 135:1 – a 42dB illumination ratio, which means a normal high-quality CCD based device like the Nikon 995 used below must either expose for the bright or dark part of the image to obtain sensible data. Most of the vision sensor pixels still respond to the 10% contrast steps in both halves of the scene. The rightmost data is captured under 3/4 moon with a high contrast scene. Under these conditions the photocurrent is <20% of the photodiode leakage current, but the low threshold mismatch still allows a good response.
The 4 key innovations in this development are the pixel design, the on-chip digital bias generators, the highly-usable USB2 implementation, and the jAER processing software.
The pixel uses a continuous-time front end photoreceptor,(inspired from the adaptive photoreceptor), followed by a precision self-timed switched-capacitor differentiator (inspired by the column amplifier used in the pulsed bipolar imager). The most novel aspects of this pixel are the idea of self-timing the switch-cap differentiation and self-biasing the photoreceptor. This pixel does a data-driven AD conversion (like biology, but very different than the usual ADC architecture). Local capacitor ratio matching gives the differencing circuit a precisely defined gain for changes in log intensity, thus reducing the effective imprecision of the comparators that detect positive and negative changes in log intensity.
The pixel is drawn to use quad mirror symmetry to isolate the analog and digital parts. Most of the pixel area is capacitance. The periphery uses the Boahen lab's AER circuits. The chip includes a fully programmable bias current generator that makes the chip's operation largely independent of temperature and process variations; all dozen chips we have built up into boards behave indistinguishably with identical digital bias settings.
The DVS is integrated with a USB2.0 high-speed interface that plugs into any PC or laptop. The host software presently stands at >200 Java classes. The open source jAER software project lets you render events in a variety of formats, capture them, replay them, and most important, process them using events and their precise timing.
| Functionality | Asynchronous temporal contrast |
| Pixel size um (lambda) | 40×40 (200×200) |
| Fill factor (%) | 9% (PD area 151μm2) |
| Fabrication process | 4M 2P 0.35um standard CMOS |
| Pixel complexity | 26 transistors (14 analog), 3 capacitors |
| Array size | 128×128 (higher resolutions coming soon) |
| Die size mm2 | 6.0 x 6.3 |
| Chip interface | 15-bit word-parallel AER active low Req and Ack 4-phase handshake |
| Computer interface | USB 2.0, Windows XP driver Java API & Matlab output file format |
| Power consumption | Chip: 23mW @ 3.3V 1.5mA core 0.3mA logic 5.5mA biases USB System: approx. 70mA |
| Dynamic range | 120dB 2 lux to > 100 klux scene illumination with f/1.2 lens with normal contrast objects Moonlight (<0.1 lux) with high contrast scene |
| Photodiode dark current at room temperature | 4fA (~10nA/cm2) Nwell photodiode |
| Response latency | 15μs @ 1 klux chip illumination |
| Max events/sec | ~1M events/sec |
| FPN, matching | 2.1% contrast (The event threshold 1-sigma mismatch is 2.1% contrast) |
| Optics | Standard CS-mount lenses Other custom mounts available |
Current work is funded by the European FET BioICT project SEEBETTER, the Swiss National Center of Competence NCCR Robotics, and the Samsung Advanced Institute of Technology (SAIT).
The original development was supported by the European FET project CAVIAR and ETH Research Grant TH-18 07-1.
Ongoing support is provided by the Inst. of Neuroinformatics through the University of Zurich and the Swiss Federal Institute of Technology (ETH Zurich).
This neuromorphic chip project was the PhD project of Patrick Lichtsteiner and started with our colleague, the late Jorg Kramer, who died in July 2002. Much of this development happened during the CAVIAR project.
Patrick Lichtsteiner, postdoctoral student at INI (pixel design, pixel layout, chip integration, chip characterization, PCB design)
Christoph Posch, engineer at ARC (chip integration and device characterization)
Tobi Delbruck (group leader at INI; pixel design, bias generators, chip integration, USB interfaces, and host software),
Raphael Berner (PhD student at INI; firmware and host software).
The Boahen lab freely provided the AE peripheral communication infrastructure. Srinjoy Mitra and Giacomo Indiveri provided their 0.35u layout for the AE circuits.
See the userguide page for more information if you are a user of one of the engineering prototype systems.
Tobi Delbruck tobi@ini.phys.ethz.ch
Institute of Neuroinformatics
Winterthurerstr. 190
8057 Zürich
Switzerland
Tel. +41-1-635 3051
Fax +41-1-635 3053
http://www.ini.uzh.ch