Well, Chumby is finally shipping units in volume. You can’t go to the website yet and just buy one because we’re working through the long, long list of emails we received from people who asked to be notified when chumby devices are available for sale — there’s quite a backlog of orders there alone, although a few have shown up on ebay! I’m working on making some embedded developer (bare board) and craft (outerware only) units available for sale someday, separate from the consumer units. Hopefully that will make the core hardware more available to interested hackers and crafters.
The natural consequence of having many units out there is that we’re starting to see some interesting customer return cases. I recently got one unit that was destroyed due to an ESD (electrostatic discharge) event coming in from the power cord: the AMS1117 regulator that supplies the standby current for the CP and system management controller had been zapped. Interestingly, the unit worked for about an hour before it ceased operation.
This failure is particularly intriguing because there is a series of ESD protection devices between the AMS1117 and the power cord; at the front line, there is an AVX TransGuard transient voltage suppressor, then a set of EMI filters, diodes, fuses and so forth. Presumably, either some monster ESD event happened that not even these barriers could absorb, or the device was damaged on the factory floor prior to assembly. To investigate, I had the device decap’d and imaged by FlyLogic, and I found the results interesting enough to share.
Here is an overview of the damaged chip (click on the image for a much larger version).
The spot with ESD damage is in the lower right hand corner, zoomed in here.
The path that the discharge event took between the output pin and the pass transistor of the regulator looks a little bit like a river channel that bends slightly up and over to the right. You can see how the metal was splattered and migrated by the ballistic motion of the electrons flowing by. This migration of metal eventually caused the pass transistor to get shorted out, so, unfortunately all the 3.3V devices downstream of this regulator got stuck with about 12V across them. Toasty! Fortunately, the PTC fuses on the board and other current-limiting mechanisms kicked in so the board never got dangerously hot.
One thing that struck me about this particular layout is the apparent lack of any on-chip ESD protection devices. Even though this is a big bad analog process, it seems non-intuitive to me that a bare device can stand up to the 2kV human body model tests that’s pretty much considered the minimum bar for ESD protection. Even if the device alone could stand up to the failure, it seems that what went wrong here was current arcing between two adjacent pieces of metal, possibly aggravated by the corona effect at the corners of the metal layout. At any rate, a device with no local ESD protection can be very susceptible, so perhaps even despite the precautions taken in the board layout, an external ESD event could blow up this chip.
I could be wrong, but perhaps I should now be looking for a chip vendor that was a little more fastidious about their ESD protection to prevent more unhappy customer return events. If any readers have some experience with analog chip processes like this, I’d appreciate a comment about the level of ESD protection you had to incorporate in your chip designs!
The ESD damage wasn’t the only interesting thing about the chip however. I also noticed burned out metal elsewhere on the chip:
There are several burned out spots like this on the left hand side of the chip.
This was a neat find, because this shows you how they trim these voltage regulators in the factory. For those not familiar with analog chip design, the accuracy of an integrated polysilicon resistor in an analog-optimized process is about +/- 20%. On a generic digital process, the accuracy is typically much worse (on the other hand, matching between devices on the same chip can be extremely tight and the quality of the match is proportional to the area of the device). Thus, when a chip advertises +/-1% accuracy for voltage, it has to have some kind of post-fabrication trimming mechanism built in.
Basically, the trim mechanism is constructed using a ladder of resistors in series, with shorting metal straps in parallel with each resistor. Therefore, when the chip is first manufactured, the calibration resistor ladder has a nominal resistance of nearly zero ohms. At wafer test, the chip’s output voltage is measured, and resistance is selectively added to this calibration ladder by using a series of high current is pulses to selectively blow the metal straps. Thus, the native chip design, without calibration, always shoots too far one way on the voltage, so you can always correct the problem by only adding resistance to a calibration ladder. If they did the design right, they would have it start with the voltage too low, so that if a fuse was only partially blown and it managed to repair itself (this does happen), you would only end up passing too little voltage to the regulated load, instead of too much — under the theory that if you are to have a malfunction, it’s typically safer to push less voltage than too much.
You can see all the extra bond pads used for this calibration process along the left hand side of the chip. There are very deep “scrub” marks, so large, heavily weighted needles were used to touch down on the wafer. This was probably necessary because of the high currents required to blow the metal fuses. Other trim mechanisms I’ve heard of include poly fuses, eFuses, or laser trimming, but I had never seen one “in real life” — they had always been an intellectual curiosity that I’ve read about in a process manual or a journal paper.
Wow, this post turned out long…