Effective Camera Architecture Design

One of the most important decisions that you will make when starting a new camera design is with regard to the type of processor choosen for sensor control. If you have the option or budget for being able to throw an OMAP or i.MX processor at your design, do it- you have gone a long way toward completion. Other applications require either razor-thin BOMs, or the necessity to utilize sensors that may not match these CPUs input capabilities. In these cases, a different architectural approach must be used. Unfortunately, very few micros are fast enough to directly control the transfer of data that occurs once capture begins. Now, decisions must be made as to how the control system is going to work, and more importantly, if manipulation of the data is required as sensor data 'broadsides' a system. At best, sensor timing allows very little time for CPU intervention during a capture process. Image synchronization takes many forms, but direct CPU control must be carefully thought out- if not outright avoided- if any degree of capture speed is required. To help alleviate this limitation, there are some pretty cool tricks that can be utilized such as on-the-fly LUTs and multi-tap FPGA paths that can be designed into a data-path to help avoid imaging latency. Regardless of what approach is taken, a careful analysis must be made prior to putting down the first IC.  Problems I have been called in to fix included

Architectural designs that were too complicated for the task at hand.

KISS, (Keep It Simple, Stupid) is still the best possible solution to a challenging system design. Minimize data paths and ICs whenever possible. If you can use one FPGA to eliminate 12 discretes, do it. Your manufacturer will love you. 

Not planning for easy test of the system -up front- for when the prototype PCBs are in your hands.

This one is REALLY, REALLY, REALLY important. Unless you have unlimited time to do simulations with expensive software like ModelPro, you are going to end up using a scope and analyzer to find out what is going on. Creating an analyzer port on complex FPGA designs is often so important that I don't even recommend trying to a big design without one. For the first spin, it only costs a connector and a number of I/O pins on the FPGA. This allows you to make Verilog or VHDL changes on the fly via JTAG and display them on your analyzer to squish bugs in near real time. That is, of course, if you ever have bugs. If you don't, I would like to work with you! I could learn something...

Next, CREATE TEST PATTERNS. Nothing says success until you can prove that you have good image quality, and time spent trying to determine that may be minimized by designing the system for checks all along the data capture chain. I typically like to embed test images in FPGAs to programmatically replace the sensor inputs to test the data path. A typical pattern is shown here:

This block pattern is effective for testing on horizontal / vertical synchronization as well as for validating integrity of data. The different levels of gray allow me to do histogram testing to find noise and data transfer problems. Once you get a good data path under control, you have a much better shot at a successful sensor integration.

Depending on the sensor type, there are other patterns that may be highly useful, but these are pretty much application dependent.

Under-estimating system performance requirements with respect to image capture.

Offload, offload, offload. Using low cost FPGAs to supplant CPU processing can be extraordinarily effective. One of the key elements of the panoramic camera shown elsewhere in this blog was system cost. A USB approach for data capture was chosen to eliminate requirements for special PCI, PCIe or other data capture cards. Eliminating these necked down the maximum camera transfer rate to about 20-25MB/sec, but really did save system cost and allowed more money to be invested into the camera- which is where the money should have been spent. Using FPGAs for all the time intensive tasks allowed the selecton of a control processor that cost only $4.00. In retrospect, the FPGAs worked so well in this application that the $4 processor could have been replaced with  one only costing $1-$2. The processor for 12c sensor setup, USB protocol, and general control and housekeeping. It only had to have enough horsepower and memory to handle a USB specification. This particular design used the Digital Camera / PIMA spec.

Inadequate DMA or ISR routine speeds.

Ok, so you didn't decide on the FPGA route so you could save money, or you are not so comfortable with FPGAs. You have chosen an architecture that relies on an ISR routine to grab data from the camera and burst it to memory via some DMA routine. That is actually a good approach: if your handheld runs at > 200MHz. If you are working with low cost devices, you may have a problem. One easy way to handle data buffers is with SRAM, but nowadays, having a 5MB SRAM is out of the question. SDRAM is better, cheaper, and faster, but requires either an SDRAM controller in the CPU you are working with (out of the question for low cost systems), or, once again, an FPGA. Unfortunately, there is no magic here, and SDRAMs come with their own set of problems with respect to interface timing complexity. In the image below, the setup cycle for an SDRAM is shown right at the start of capture. This is a (single) burst capture, which has low utilization of the memory bandwidth. In single burst mode, the overhead for a single transfer is quite bad- but it is still fast for a very low-cost micro.

The problem shown here is that from the start of capture, AND THIS IS NOT A FAST SYSTEM,  there is less than 200nS available to set up the transfer- which can tax an ISR. Worse yet, each subsequent transfer is only 200nS behind the last one. It won't take long before the processor gives up, and kills any concurrent tasks that may be running.

The timing shown here is from a system where the CPU idled, and an FPGA did the heaving lifting. There were multiple paths running in parallel, giving up a 10MB transfer rate. When modified to an 8-cycle burst instead of a single cycle, the transfer rate went up to 80MB/Sec- but the USB was overwhelmed by the data. The design ended up in a 2-cycle burst, (20MB/sec), which throttled the system down to match the latencies of the USB 2.0 HS. You may end up having to play games to get the timing just right.

Next...Lack of understanding about basic architectural limitations.