Archive for the ‘The Factory Floor’ Category

Novena Update

Saturday, October 4th, 2014

It’s been four months since we finished Novena’s crowd funding campaign, and we’ve made a lot of progress since then. Since then, a team of people have been hard at work to make Novena a reality.

It takes many hands to build a product of this complexity, and we couldn’t do it without our dedicated and hard-working team at AQS. Above is a photo from the conference room where we did the T1 plastics review in Dongguan, China.

In this update, we’ll be discussing progress on the Casing, Electronics, Accessories, Firmware and the Community.


Case construction update
We’re very excited that the Novena cases we’re carrying around are now made of entirely production-process hardware — no more prototypes. A total of 10 injection molding tools, many of the family molds, have been opened so far; for comparison, a product like NeTV or chumby had perhaps 3-4 tools.

For those not familiar with injection molding, it’s a process whereby plastic is molded into a net shape from hot, high pressure liquid plastic forced into a cavity made out of hardened steel. The steel tool is a masterpiece of engineering in itself – it’s a water-cooled block weighing in at about a ton, capable of handling pressures found at the bottom of the Mariana Trench, and the internal surfaces are machined to tolerances better than the width of a human hair. And on top of that, it contains a clockwork of moving pieces, with dozens of ejector pins, sliders, lifters and parting surfaces coming apart and back together again smoothly over thousands of cycles. It’s amazing that these tools can be crafted in a couple of months, yet here we are.

With so much complexity involved, it’s no surprise that the tools require several iterations of refinement to get absolutely perfect. In tooling jargon, the iterations are referred to as T0, T1, T2…etc. You’re doing pretty good if you can go to full production at T2; we’re currently at T1 stage. The T1 plastics are 99% there, with a few issues relating to flow and knit lines, as well as a couple of spots where the plastic is warping during cooling or binding to the tool during ejection and causing some deformation. This manifests itself in a couple spots where the seams aren’t as tight as we’d like them to be in the case.

Most people have only seen products of finished tooling, so I thought I’d share what a pretty typical T0 shot looks like, particularly for a large and complex tool like the Novena case base part. Test shots like this are typically done in colors that highlight defects and/or the resin is available as scrap, hence the gray color. The final units will be black.

There’s a lot going on with this piece of plastic, so below is a visual guide to some of the artifacts.

In the green boxes are a set of “sink marks”. These happen when the opposite side of the plastic has a particularly thin or thick feature. These areas will cool faster or slower than the bulk of the plastic, causing these regions to pucker slightly and cause what looks like a bit of a shadow. It’s particularly noticeable on mirror-finish parts. In this case, the sink marks are due to the plastic underneath the nut bosses of the Peek array being much thinner than the surrounding plastic. The fix to this problem was to slightly thicken that region, reducing the overall internal clearance of the case by 0.8mm. Fortunately, I had designed in a little extra clearance margin to the case so this was possible.

The red arrow points to a “knit line”. This is a region where plastic flow meets within the tool. Plastic, as it is injected into the cavity, will tend to flow from one or more gates, and where the molten plastic meets itself, it will leave a hairline scar. It’s often located at points of symmetry between the gates where the plastic is injected (on this tool, there are four gates located underneath the spot where the rubber feet go — gates are considered cosmetically unattractive and thus they are strategically placed to hide their location).

The white feathery artifacts, as indicated by the orange arrow, are flow marks. In this case, it seems plastic was cooling a bit too quickly within the tool, causing these streaks. This problem can often be fixed by adjusting the injection pressure, cycle length, and temperature. This tweaking is done using test shots on the molding machine, with one parameter at a time tweaked, shot after shot, until its optimum point is found. This process can sometimes take hundreds of shots, creating a small hill of scrap plastic as a by-product.

Most of these gross defects were fixed by T1, and the plastic now looks much closer to production-grade (and the color is now black). Below is the T1 shot in initial testing after transferring live hardware into the plastics.

There’s still a few issues around fit and finish. The rear lip is binding to the tool slightly during ejection, which is causing a little bit of deformation. Also, the panel we added at the last minute to accommodate oversized expansion boards isn’t mating as tightly as we’d like it to. But, despite all of these issues, the case feels much more solid than the prototypes, and the gas piston mechanism is finally consistent and really smooth.

Front bezel update
The front bezel of Novena’s case (not to be confused with the aluminum LCD bezel) has gone through a couple of changes since the campaign. When we closed funding, it had two outward-facing USB ports and one switch. Now, it has two switches and one outward-facing USB port and one inward-facing USB port.

One switch is for power — it goes directly to the power board and thus can be used to turn the system on and off even when the main board is fully powered down.

The other switch is wired to a user key press, and the intent is to facilitate Bluetooth association for keyboards that are being stupid. It seems some keyboards can take up to a half-minute to cycle through — something (presumably, it’s trying to be secure) — before they connect. There are hacks you can do to bypass that, but it requires you to run a script on the host, and the idea is by pressing this button users can trigger a convenience script to get past the utter folly of Bluetooth. This switch also doubles as a wake-up button for when the system is in suspend.

As for the USB ports, there are still four ports total in the design, but the configuration is now as follows:

  • Two higher-current capable ports on the right
  • One standard-current capable port on the front
  • One standard-current capable port facing toward the Peek Array
  • In other words, we face one USB port toward the inside of the machine; since half the fun of Novena is modding the hardware, we figure making a USB port available on the inside is at least as useful as making it available on the outside.

    For those who don’t do hardware mods, it’s also a fine place to plug small dongles that you generally keep permanently attached, such as a radio transceiver for your keyboard. It’s a little inconvenient to initially plug in the dongle, but keeping the radio transceiver dongle facing the inside helps protect it from damage when you throw your laptop into your travel bag.

    Speakers
    We toyed with several iterations of speaker selection for Novena. One of the core ideas behind the design was to make speaker choice something every user would be encouraged to make on their own. One driving reason for this is some people really listen to music on their laptop when they travel, but others simply rely upon the speaker for notification tones and would prefer to use headphones for media capabilities.

    Physics dictates that high-quality sound requires a certain amount of space and mass, and so users who have a more relaxed fidelity requirement should be able to reclaim the space and weight that nicer speakers would require.

    Kurt Mottweiler, the designer of the Heirloom model, had selected a nice but very compact off-the-shelf speaker, the PUI ASE06008MR-LW150-R, for the Heirloom. We evaluated that in the context of the standard Novena model and found that it fit well into the Peek Array and it also had acceptable fidelity, particularly for its size. And so, we adopted this as the standard offering for audio. However, it will be provided with a mounting kit that allows for easy removal so users who need to reclaim the space they take, or who want to go the other way and put in larger speakers, can do so with ease.


    PVT2 Mainboard
    The Novena mainboard went through a minor revision prior to mass production. The 21-point change list can be viewed here; the majority of the changes focused on replacing or updating components that were at risk of EOL. The two most significant changes from a design standpoint were the addition of an internal FPC header to connect to the front bezel cluster, and a dedicated hardware RTC module.

    The internal FPC header was added to improve the routing of signals from the mainboard to the front bezel cluster. We had to run two USB ports, plus a smattering of GPIOs and power to the front bezel and the original scheme required multiple cables to execute the connection. The updated design condenses all of this into a single FPC, thereby simplifying the design and improving reliability.

    A dedicated hardware RTC module was added because we couldn’t get the RTC built into the i.MX6 to perform well. It seems that the CPU simply had a higher leakage on the RTC than reported in the datasheet, and thus the lifetime of the RTC when the system was turned off was measured in, at best, minutes. We made the call that there was too much risk in continuing to develop with the on-board RTC and opted to include an external, dedicated RTC module that we knew would work. In order to increase compatibility with other i.MX6 platforms, we picked the same module used by the Solid-Run Hummingboard, the NXP PCF8523T/1.

    GPBB
    The GPBB got a face-lift and a couple of small mods to make it more hacker-friendly.

    I think everything looks a little bit nicer in matte black, so where it doesn’t compromise production integrity we opted to use a matte black soldermask with gold finish.

    Beyond the obvious cosmetic change, the GPBB also features an adjustable I/O voltage for the digital outputs. The design change is still going through testing, but the concept is to by default allow a 5V/3.3V selectable setting in software. However, the lower voltage can also be adjusted to 2.5V and 1.8V by changing a single resistor (R12), which I also labelled “I/O VOLTAGE SET” and made a 1206 part so soldering novices can make the change themselves.

    In our experience, we’re finding an ever-increasing gulf between the voltage standards used by hobbyists and what we’re actually finding inside equipment we need to reverse engineer; and thus, to accommodate both applications a flexible voltage output selection mechanism was added to the GPBB.

    Desktop Passthrough
    The desktop case originally included just the Novena mainboard, and the front panel breakout. It turns out this makes power management awkward, as the overall power management system for the case was designed with the assumption there is a helper microcontroller managing a master cut-off switch.

    Complexity is the devil, and it’s been hard enough to get the software going for even a single configuration. So in net we found it would be cheaper to introduce a new piece of hardware rather than deal with multiple code configurations.

    Therefore, desktop systems are now getting a power pass-through board as part of the offering. It’s a simple PCBA that contains just the STM32 controller and power switch of the full Senoko board. This allows us to use a consistent gross power management architecture across both the desktop and the laptop systems.

    Of course, this is swatting a fly with a sledgehammer, but this sledgehammer costs as much as the flyswatter and it’s inconvenient to carry both a fly swatter and a sledgehammer around. And so yes, we’re using a 32-bit ARM CPU to read the state of a pushbutton and flip a GPIO, and yes, this is all done using a full multi-threaded real time operating system (ChibiOS) running underneath it. It feels a little silly, which is why we broke out some of the unused GPIOs so there’s a chance some clever user might find an application for all that untapped power.


    Battery
    The battery pack for Novena is and will continue to be a wildcard in the stack. It’s our first time building a system with such a high-capacity battery, and working through all the shipping regulations to get these delivered to your front door will be a challenge.

    Some countries are particularly difficult in terms of their regulations around the importation of lithium batteries. In the worst case, we’ll send your laptop with no battery inside, and we will ship separately, at our cost, an off-the-shelf battery pack from a vendor that specializes in RC battery packs (e.g. Hobby King). You will have the same battery we featured in the crowd funding campaign, but you’ll need to plug it in yourself. We consider this to be a safe fall-back solution, since Hobby King ships thousands of battery packs a day all around the world.

    However, this did not stop us from developing a custom battery pack. As it’s very difficult to maintain a standing stock of battery packs (they need to be periodically conditioned), we’re including this custom battery pack only to backers of the campaign, providing their country of residence allows its import (and we won’t know for sure until we try). We did get UN38.3 certification for the custom battery pack, which in theory allows it to be shipped by air freight, but regulations around this are in flux. It seems countries and carriers keep on inventing new rules, particularly with all the paranoia about the potential use of lithium batteries as incendiary devices, and we don’t have the resources to keep up with the zeitgeist.

    For those who live in countries that allow the importation of our custom pack, the new pack features a 5000mAh rated capacity (about 2x the capacity over the pack we featured in the crowd campaign, which had 3000mAh printed on the outside but actually delivered about 2500mAh in practice). In real-life testing, the custom pack is getting about 6-7 hours of runtime with minimal power management enabled. Also, since I got to specify the battery, I know this one has the correct protection circuitry built into it, and I know the provenance of its cells and so I have a little more confidence in its long-term performance and stability.

    Of course, it’s a whole different matter convincing the lawmakers, customs authorities, and regulatory authorities of those facts…but fear not, even if they won’t accept this custom limited-edition battery, you will still get the original off-the-shelf pack promised in the campaign.

    Hard Drive
    In the campaign, we referenced providing 240GiB Intel 530 (or equivalent) and 480GiB Intel 720 drives for the laptop and heirloom models, respectively. We left the spec slightly ambiguous because the SSD market moves quickly, and probably the best drive last February when we drew up the spec will be different from the best drive we could get in October, when we actually do the purchasing.

    After doing some research, it’s our belief that the best equivalent drives today are the 240GiB Samsung 840 EVO (for the laptop model) and the 512GiB Samsung 850 Pro (for the Heirloom). We’ve been personally using the 840 EVO in our units for several months now, and they have performed admirably. An important metric for us is how well the drives hold up under unexpected power outages — this happens fairly often, for example, when you’re doing development work on the power management subsystem. Some hard drives, such as the SanDisk Extreme II, fail quite reliably (how’s that for an oxymoron) after a few unexpected power-down cycles. We’ve also had bad luck with OCZ and Crucial drives in the past.

    Intel drives have generally been pretty good, except that Intel stopped doing their own controllers for the 520 and 530 series and instead started using SandForce controllers, which in my opinion removes any potential advantage they could have being both the maker of the memory chips and the maker of the controller. The details of how flash memory performs, degrades, and yields are extremely process-specific, and at least in my fantasy world a company that produces flash + controller combinations should have an advantage over companies that have to mix-and-match multiple flash types with a semi-generic controller. Furthermore, while the Intel 720 does use their home-grown controller solution, it’s a power hog (over 5W active power) and requires a 12V rail, and is thus not suitable for use in laptop environments.

    The 840 EVO series comes with a reasonable 3-year warranty and at it’s held up well against one site’s write endurance test. After using mine for several months, I’ve had no complaints about it, and I think it’s a solid every-day use drive for firmware development. We also have a web server that hosts most of the media content for this and a couple other blogs, wikis, and bug tracking tools, and it’s a Novena running off an 840 EVO.

    For the premium Heirloom users, we’re very excited to build in the 850 PRO series. This drive comes with a serious warranty that matches the “heirloom” name — 10 years. The reason behind their ability to offer such a high claim of reliability is even more remarkable. The drive uses a technology that Samsung has branded “V-NAND”, which I consider to be the first bona-fide production-grade 3D transistor technology. Intel claims they make 3D transistors, but that’s just marketing hype — yes, the gate region has a raised surface topology, but you still only get a single layer of devices. From a design standpoint you’re still working with a 2D graph of devices. It’s like calling Braille a revolutionary 3D printing technology. The should have stuck with what I consider to be the “original” (and more descriptive/less misleading) name, FinFET, because by calling these 3D transistors I don’t know what they’re going to call actual 3D arrays of transistors, if they ever get around to making them.

    Chipworks did an excellent initial analysis of Samsung’s V-NAND technology and you can see from this SEM image they published that V-NAND isn’t about stacking just a couple transistors, Samsung is shipping a full-on 38-layer sandwich:

    This isn’t some lame Intel-style bra-padding exercise. This is full-on process technology bad-assery at its finest. This is Neo decoding the Matrix. This is Mal shooting first. It’s a stack of almost 40 individual, active transistors in a single spot. It’s a game changer, and it’s not vapor ware. Heirloom backers will get a laptop with over 4 trillion of these transistors packed inside, and it will be awesome.

    Sorry, I get excited about these kinds of things.


    Firmware
    From the software side, we’re working on finalizing the kernel, bootloader, and distro selection, as well as deciding what you’ll see when you first power on Novena.

    Marek Vasut is working on getting Novena supported in mainline U-Boot, which involves a surprising number of patches. Few ARM boards support as much RAM as Novena, so some support patches were needed first. Full support is in progress, including USB and video.

    We intend to ship with a mainline kernel, but interestingly Jon Nettleson has a 3.14 long-term-support kernel that is a hybrid of Freescale’s chip-specific patches combined with many backported upstream patches. Users may be interested in using this kernel over the upstream one, which has better support for thermal events and for power management.

    While we prefer to go with an upstream kernel, and to get our changes pushed into mainline, other users might find this kernel’s interesting blend of community and vendor code to satisfy their needs better.

    The kernel that we’ll use has most of the important parts upstreamed, including the audio chip which should be part of the 3.17 kernel. We’re still carrying a few local patches for various reasons ranging from specialized hacks to experimental features, or features that are not yet ready to push upstream, or rely on other features that are not yet upstream.

    For example, the display system on a laptop is very different from what is usually found on an ARM device, and we have local patches to fix this up. In most ARM devices, the screen is fixed during boot and it isn’t possible to hot-swap displays at runtime. Novena supports two different displays at once, and allows you to plug in an HDMI monitor without needing to reboot.

    Speaking of displays, the community has been hard at work on an accelerated 2D Xorg DDX driver. 2D acceleration is important, because most of the time users are interacting with the desktop, and 2D hardware uses significantly less power than 3D hardware. On a desktop machine, the 3D chip is used to composite the desktop. On Novena, which doesn’t have a fan and a small overall active power footprint, saving power is very important. By taking advantage of the 2D-only hardware, we save power while having a smoother experience. There are a few bugs that remain with the 2D driver, but it should be ready by the time we ship.

    There is a 3D driver that is in progress as well. It’s able to run Quake 3 on the framebuffer, but still has to be integrated into an OpenGL ES driver before it works under X.

    We’ve also been working on getting a root filesystem setup. This includes deciding which packages are installed, and customizing the list of software repositories. We want to add a repository for our kernel and bootloader, as well as for various packages which haven’t made it upstream such as an imx6 version of irqbalance. This will allow us to provide you with updated kernels as we add more support.

    Finally, the question remains of what you’ll see when you first power it up. In Linux, it’s not at all common to have a first-boot setup screen where you create your user, set the time, and configure the network. That’s common in Windows and OS X, which come preinstalled, but under Linux that’s generally taken care of by the installer. As we mull the topic, we’re torn between creating a good desktop-style experience vs. making a practical embedded developer’s experience. A desktop-style experience would ship a blank-slate and prompt the user to create an account via a locally attached keyboard and monitor; however, embedded developers may never plug a monitor into their device, and instead prefer to connect via console or ssh, thereby requiring a default username, password and hostname. Either way, we want to create just a single firmware common across all platforms, and so special-casing releases to a particular target is the least desired solution. If you have an opinion, please share it in our user forum.


    Community
    We’re pleased to see that even before shipping, we have a few alpha developers who continue to be very active. In addition to Jon Nettleton (gfx), Russell King (also gfx), and Marek Vasut (u-boot), we have a couple of other alpha user’s efforts we’d like to highlight in this update.

    MyriadRF continues to move forward with their SDR solution for Novena. About three weeks ago they sent us pre-production boards, and they are looking good. We’ve placed a binding order for their boards, and things look on track to get them into our shop by November, in time for integration with the first desktop units we’ll be shipping. MyriadRF is working on a fun demo for their hardware, but I’ll save that story for them to tell :)

    The CrypTech group has also been developing applications with the help of Novena. The CrypTech project is developing a BSD / CC BY-SA 3.0 licensed reference design and prototype examples of a Hardware Security Module. Their hope is to create a widely reviewed, designed-for-crypto device that anyone can compose for their application and easily build with their own trusted supply chain. They are using Novena to prototype elements of their design.

    The expansion board highlighted above is a prototype noise source based on avalanche noise from the transistor that can be seen on the middle of the board. CrypTech uses that noise to generate entropy in the FPGA. The entropy is then combined with entropy generated by ring oscillators in the FPGA and mixed using e.g. SHA-512 to generate seeds. The seeds are then used to initialize the ChaCha stream cipher, ultimately resulting in a stream of cryptographically sound random values. The result is a high performance, state-of-the art random number generator coprocessor. This of course represents just a first draft; since the implementation is done in an FPGA, the CrypTech team will continue to evolve their methodology and experiment with alternative methods to generate a robust stream of random numbers.

    Thanks to the CrypTech team for sharing a sneak-peek of their baby!

    Looking Forward

    From our current progress, it seems we’re still largely on track to release an initial shipment of bare boards to early backers in late November, and have an initial shipment of desktop units ready to go by late December. We’ll be shipping the units in tranches, so some backers will receive units before others.

    Our shipping algorithm is roughly a combination of how early someone backed the campaign, modified by which region of the world you’re in. As every country has different customs issues, we will probably ship just one or two items to each unique country first to uncover any customs or regulatory problems, before attempting to ship in bulk. This means backers outside the United States (where Crowd Supply’s fulfillment center is located) will be receiving their units a bit later than those within the US.

    And as a final note, if there’s one thing we’ve learned in the hardware business, is that you can’t count your chickens before they’ve hatched. Good progress to date doesn’t mean we’ve got an easy path to finished units. We still have a lot of hills to climb and rivers to cross, but at least for now we seem to be on track.

    Thanks again to all of our Novena backers, we’re looking forward to getting hardware into your hands soon!

    -bunnie & xobs

    Circuit Stickers Manufacturing Retrospective: From Campaign to First Shipment

    Tuesday, April 29th, 2014

    Last December, Jie Qi and I launched a crowdfunding campaign to bring circuit stickers under the brand name of “chibitronics” to the world.

    Our original timeline stated we would have orders shipped to Crowd Supply for fulfillment by May 2014. We’re really pleased that we were able to meet our goal, right on time, with the first shipment of over a thousand starter kits leaving the factory last week. 62 cartons of goods have cleared export in Hong Kong airport, and a second round of boxes are due to leave our factory around May 5, meaning we’ve got a really good chance of delivering product to backers by Mid-May.

    Above: 62 cartons containing over a thousand chibitronics starter kits waiting for pickup.

    Why On-Time Delivery Is So Important
    A personal challenge of mine was to take our delivery commitment to backers very seriously. I’ve seen too many under-performing crowdfunding campaigns; I’m deeply concerned that crowdfunding for hardware is becoming synonymous with scams and spams. Kickstarter and Indiegogo have been plagued by non-delivery and scams, and their blithe caveat emptor attitude around campaigns is a reflection of an entrenched conflict of interest between consumers and crowdfunding websites: “hey, thanks for the nickel, but what happened to your dollar is your problem”.

    I’m honestly worried that crowdfunding will get such a bad reputation that it won’t be a viable platform for well-intentioned entrepreneurs and innovators in a few years.

    I made the contentious choice to go with Crowd Supply in part because they show more savvy around vetting hardware products, and their service offering to campaigns — such as fulfillment, tier-one customer support, post-campaign pre-order support, and rolling delivery dates based on demand vs. capacity — is a boon for hardware upstarts. Getting fulfillment, customer support and an ongoing e-commerce site as part of the package essentially saves me one headcount, and when your company consists of just two or three people that’s a big deal.

    Crowd Supply doesn’t have the same media footprint or brand power that Kickstarter has, which means it is harder to do a big raise with them, but at the end of the day I feel it’s very important to establish an example of sustainable crowdfunding practices that is better for both the entrepreneur and the consumer. It’s not just about a money grab today: it’s about building a brand and reputation that can be trusted for years to come.

    Bottom line is, if I can’t prove to current and future backers that I can deliver on-time, I stand to lose a valuable platform for launching my future products.

    On-Time Delivery Was not Easy
    We did not deliver chibitronics on time because we had it easy. When drawing up the original campaign timeline, I had a min/max bounds on delivery time spanning from just after Chinese New Year (February) to around April. I added one month beyond the max just to be safe. We ended up using every last bit of padding in the schedule.

    I made a lot of mistakes along the way, and through a combination of hard work, luck, planning, and strong factory relationships, we were able to battle through many hardships. Here’s a few examples of lessons learned.

    A simple request for one is not necessarily a simple request for another. Included with every starter kit is a fantastic book (free to download) written by Jie Qi which serves as a step-by-step, self-instruction guide to designing with circuit stickers. The book is unusual because you’re meant to paste electronic circuits into it. We had to customize several aspects of the printing, from the paper thickness (to get the right light diffusion) to the binding (for a better circuit crafting experience) to the little pocket in the back (to hold swatches of Z-tape and Linqstat material). Most of these requests were relatively easy to accommodate, but one in particular threw the printer for a loop. We needed the metal spiral binding of the book to be non-conductive, so if someone accidentally laid copper tape on the binding it wouldn’t cause a short circuit.

    Below is an example of how a circuit looks in the book — in this case, the DIY pressure sensor tutorial (click on image for a larger version).

    Checking for conductivity of a wire seems like a simple enough request for someone who designs circuits for a living, but for a book printer, it’s extremely weird. No part of traditional book printing or binding requires such knowledge. Because of this, the original response from the printer is “we can’t guarantee anything about the conductivity of the binding wire”, and sure enough, the first sample was non-conductive, but the second was conductive and they could not explain why. This is where face to face meetings are invaluable. Instead of yelling at them over email, we arranged a meeting with the vendor during one of my monthly trips to Shenzhen. We had a productive discussion about their concerns, and at the conclusion of the meeting we ordered them a $5 multimeter in exchange for a guarantee of a non-conductive book spine. In the end, the vendor was simply unwilling to guarantee something for which he had no quality control procedure — an extremely reasonable position — and we just had to educate the vendor on how to use a multimeter.

    To wit, this unusual non-conductivity requirement did extend our lead time by several days and added a few cents to the cost of the book, but overall, I’m willing to accept that compromise.

    Never skip a checkplot. I alluded to this poignant lesson with the following tweet:


    The pad shapes for chibitronics are complex polyline geometries, which aren’t handled so gracefully by Altium. One problem I’ve discovered the hard way is the soldermask layer occasionally disappears for pads with complex geometry. One version of the file will have a soldermask opening, and in the next save checkpoint, it’s gone. This sort of bug is rare, but it does happen. Normally I do a gerber re-import check with a third-party tool, but since this was a re-order of an existing design that worked before, and I was in a rush, I skipped the check. Result? thousands of dollars of PCBs scrapped, four weeks gone from the schedule. Ouch.

    Good thing I padded my delivery dates, and good thing I keep a bottle of fine scotch on hand to help bitter reminders of what happens when I get complacent go down a little bit easier.

    If something can fit in a right and a wrong way, the wrong way will happen. I’m paranoid about this problem — I’ve been burned by it many times before. The effects sticker sheet is a prime example of this problem waiting to happen. It is an array of four otherwise identical stickers, except for the LED flashing pattern they output. The LED flashing pattern is controlled by software, and trying to manage four separate firmware files and get them all loaded into the right spot in a tester is a nightmare waiting to happen. So, I designed the stickers to all use exactly the same firmware; their behaviors set by the value of a single external resistor.

    So the logic goes: if all the stickers have the same firmware, it’s impossible to have a “wrong way” to program the stickers. Right?

    Unfortunately, I also designed the master PCB panels so they were perfectly symmetric. You can load the panels into the assembly robot rotated by pi radians and the assembly program runs flawlessly — except that the resistors which set the firmware behavior are populated in reverse order from the silkscreen labels. Despite having fiducial holes and text on the PCBs in both Chinese and English that are uniquely orienting, this problem actually happened. The first samples of the effects stickers were “blinking” where it said “heartbeat”, “fading” where it said “twinkle”, and vice-versa.

    Fortunately, the factory very consistently loaded the boards in backwards, which is the best case for a problem like this. I rushed a firmware patch (which is in itself a risky thing to do) that reversed the interpretation of the resistor values, and had a new set of samples fedexed to me in Singapore for sanity checking. We also built a secondary test jig to add a manual double-check for correct flashing behavior on the line in China. Although, in making that additional test, we were confronted with another common problem —

    Some things just don’t translate well into Chinese. When coming up with instructions to describe the difference between “fading” (a slow blinking pattern) and “twinkling” (a flickering pattern), it turns out that the Chinese translation for “blink” and “twinkle” are similar. Twinkle translates to 闪烁 (“flickering, twinkling”) or 闪耀 (to glint, to glitter, to sparkle), whereas blink translates to 闪闪 (“flickering, sparkling, glittering”) or 闪亮 (“brilliant, shiny, to glisten, to twinkle”). I always dread making up subjective descriptions for test operators in Chinese, which is part of the reason we try to automate as many tests as possible. As one of my Chinese friends once quipped, Mandarin is a wonderful language for poetry and arts, but difficult for precise technical communications.

    Above is an example of the effects stickers in action. How does one come up with a bulletproof, cross-cultural explanation of the difference between fading (on the left) and twinkling (on the right), using only simple terms anyone can understand, e.g. avoiding technical terms such as random, frequency, hertz, periodic, etc.

    After viewing the video, our factory recommended to use “渐变” (gradual change) for fade and “闪烁” (flickering, twinkling) for twinkle. I’m not yet convinced this is a bulletproof description, but it’s superior to any translation I could come up with.

    Funny enough, it was also a challenge for Jie and I to agree upon what a “twinkle” effect should look like. We had several long conversations on the topic, followed up by demo videos to clarify the desired effect. The implementation was basically tweaking code until it “looked about right” — Jie described our first iteration of the effect as “closer to a lightning storm than twinkling”. Given the difficulty we had describing the effect to each other, it’s no surprise I’m running into challenges accurately describing the effect in Chinese.

    Eliminate single points of failure. When we built test jigs, we built two copies of each, even though throughput requirements demanded just one. Why? Just in case one failed. And guess what, one of them failed, for reasons as of yet unknown. Thank goodness we built two copies, or I’d be in China right now trying to diagnose why our sole test jig isn’t working.

    Sometimes last minute changes are worth it. About six weeks ago, Jie suggested that we should include a stencil with the sensor/microcontroller kits. She reasoned that it can be difficult to lay out the copper tape patterns for complex stickers, such as the microcontroller (featuring seven pads), without a drawing of the contact patterns. I originally resisted the idea — we were just weeks away from finalizing the order, and I didn’t want to delay shipment on account of something we didn’t originally promise. As Jie is discovering, I can be very temperamental, especially when it comes to things that can cause schedule slips (sorry Jie, thanks for bearing with me!). However, her arguments were sound and so I instructed our factory to search for a stencil vendor. Two weeks passed and we couldn’t find anyone willing to take the job, but our factory’s sourcing department wasn’t going to give up so easily. Eventually, they found one vendor who had enough material in stock to tool up a die cutter and turn a couple thousand stencils within two weeks — just barely in time to meet the schedule.

    When I got samples of the sensor/micro kit with the stencils, I gave them a whirl, and Jie was absolutely right about the utility of the stencils. The user experience is vastly improved when you have a template to work from, particularly for the microcontroller sticker with seven closely spaced pads. And so, even though it wasn’t promised as part of the original campaign, all backers who ordered the sensor/micro kit are getting a free stencil to help with laying out their designs.

    Chinese New Year has a big impact the supply chain. Even though Chinese New Year (CNY) is a 2-week holiday, our initial schedule essentially wrote off the month of February. Reality matched this expectation, but I thought it’d be helpful to share an anecdote on exactly how CNY ended up impacting this project. We had a draft manuscript of our book in January, but I couldn’t get a complete sample until March. It’s not because the printer was off work for a month straight — their holiday, like everyone else’s, was about two weeks long. However, the paper vendor started its holiday about 10 days before the printer, and the binding vendor ended its holiday about 10 days after the printer. So even though each vendor took two weeks off, the net supply chain for printing a custom book was out for holiday for around 24 days — effectively the entire month of February. The staggered observance of CNY is necessary because of the sheer magnitude of human migration that accompanies the holiday.

    Shipping is expensive, and difficult. When I ran the initial numbers on shipping, one thing I realized is we weren’t selling circuit stickers — at least by volume and weight, our principle product is printed paper (the book). So, to optimize logistics cost, I was pushing to ship starter kits (which contain a book) and additional stand-alone book orders by ocean, rather than air.

    We actually had starter kits and books ready to go almost four weeks ago, but we just couldn’t get a reasonable quotation for the cost of shipping them by ocean. We spent almost three weeks haggling and quoting with ocean freight companies, and in the end, their price was basically the same as going by air, but would take three weeks longer and incurred more risk. It turns out that freight cost is a minor component of going by ocean, and you get killed by a multitude of surcharges, from paying the longshoreman to paying all the intermediate warehouses and brokers that handle your goods at the dock. All these fixed costs add up, such that even though we were shipping over 60 cartons of goods, air shipping was still a cost-effective option. To wit, a Maersk 40′ sea container will fit over 1250 cartons each containing 40 starter kits, so we’re still an order of magnitude away from being able to efficiently utilize ocean freight.

    We’re not out of the Woods Yet. However excited I am about this milestone, I have to remind myself not to count my chickens before they hatch. Problems ranging from a routine screw-up by UPS to a tragic aviation accident to a logistics problem at Crowd Supply’s fulfillment depot to a customs problem could stymie an on-time delivery.

    But, at the very least, at this point we can say we’ve done everything reasonably within our power to deliver on-time.

    We are looking forward to hearing our backer’s feedback on chibitronics. If you are curious and want to join in on the fun, the Crowd Supply site is taking orders, and Jie and I will be at Maker Faire Bay Area 2014, in the Expo hall, teaching free workshops on how to learn and play with circuit stickers. We’re looking forward to meeting you!

    The Factory Floor Part 4 of 4:
    Picking (and Maintaining) a Partner

    Tuesday, January 29th, 2013

    Just like the wands from Harry Potter, a good factory chooses you as much as you choose them. Forget the term “vendor” and replace it with “partner”: if you’re doing it right, you aren’t simply instructing the factory; there should be a frank dialog about the trade-offs involved, and how things can be improved. Furthermore, a healthy relationship with a factory can lead to better payment terms, which improves cash flow. In some cases, factory credit can directly replace raising venture capital, taking loans, or Kickstarting. As a result, I treat good factories with the same respect as investors and partners in a business.

    Here are some basic things to remember when forming a relationship with a factory.

    • It’s easy to know the cost, but hard to know the price“. Cost reduction is critical for any business, but nobody can make up a loss with volume. When negotiating prices with a factory, take a step back and check if everything makes sense. If a quote seems too good to be true, it often is. Factories that lose money on a deal will stop at no end to make it back. Many manufacturing horror stories have roots in unhealthy cost structures – a factory’s first prerogative is survival, even if it means mixing defective units into lots to boost margin, or assigning novice engineers to a flagging project to better monetize their seasoned engineers on more profitable customers.
    • If you can’t talk with the boss, you’re nobody”. Work with a factory too big, and you risk getting lost in bureaucracy, and pushed out of the line at critical times by bigger customers. Work with a factory too small, and they can’t provide the services you need. My rule for right-sizing a factory is to pick the biggest facility where you can get direct access to the lao ban (factory boss) on a regular basis. It’s a good sign if on the first meeting, the lao ban is there to give you a tour and asks astute questions about your business over lunch.
    • Light is the best disinfectant”. If a factory will not quote with an “open BOM”, i.e., a quotation where the cost of every component, process, and margin is explicitly disclosed (not the same use of the word “open” as in the F/OSS context), I won’t work with them. Cost reduction discussions cannot function without transparency; there are too many places to bury costs otherwise. Likewise, if cost discussions seem to be turning into a game of “whack-a-mole” where reduced costs on one line item are inexplicably popping up in another item, run away.

    Quotations

    A quote should have called out the price of each part, the excess for the job, labor, overhead, and NRE. Here are some of the fine points to understand about quotations that are not immediately obvious:

    • “Excess” is the result of what I call the hot dogs-and-buns problem. Hot dogs come in packs of 10, but buns come in packs of 8. So unless one buys 40 servings, there’s going to be left over buns or hot dogs. Likewise, many components come only in 3,000 piece reels, so a 10,000 piece build will conclude with 2,000 pieces of excess (four reels equals 12,000 pieces). “Cut tape” (or partial reels) exist, but the cost per part of cut tape is much higher, as this just shifts the risk of excess material onto the distributor. Excess isn’t all bad – excess can be folded into future runs of a product. So, as long as a decent run rate is sustained, the excess inventory turns into cash on a regular basis. However, at some point production will end or pause, and the bill for the excess will arrive, putting a crimp on cash flow. If a quote is lacking an excess column, it’s possible the factory is charging for the full reel but keeping the excess for their own purposes (this is where many of the gray market goods in Shenzhen come from); or they will just send an unexpected invoice for it down the road. In my opinion, it’s best to get that all out there up front so as to build a complete cradle-to-grave business model.
    • Labor costs are devilishly tricky to estimate. However, the good news is that for high tech assemblies, labor is typically a small fraction of total cost. The labor cost of assembling a straightforward board with 200 parts on it in small volumes in China may be about $2-3, whereas the cost of doing it in the US is closer to $20-$30. So even if labor prices double overnight in China and halve in the US, China may still be competitive. This is in contrast to the lower-value goods moving out of China (such as textiles), where the base value of the raw material is already low so labor costs are a significant portion of the final product cost. I usually don’t argue too much over labor costs, since the end result of scrimping on labor is often lowered quality, and pushing too hard over labor costs can force the factory to reduce the worker’s quality of life by trimming benefits.
    • Factory margin is also a bit of an art to negotiate. The fair margin for a factory depends on how much value they’ve added, and the volume of production. There are no hard and fast rules for margin. Although I give guidance here, remember there are always exceptions to the rule, and everyone has a special deal that can be cut. Also, the definition of “margin” varies depending on the facility. Some facilities include scrap, handling overhead, and even R&D expense into the “margin”, whereas others may break those out on separate lines, so it’s important to look at the big picture and use some common sense when reviewing a quotation. In general, margin will range between single-digit to low double-digit percentages depending upon volume, value add and project complexity. For very low quantity production lots (~1k pieces) there may also be a per-lot “line fee” charged. This fee partially defrays the cost of setting up an assembly line only to tear it down after running for a short period of time. A line’s throughput may be very fast, producing hundreds to thousands of units a day, but it also takes days to set up.
    • NRE, or “non-recurring engineering” – these are one-time fees required to set up a production run, such a stencils, SMT programming, jigs and test equipment. Note that the re-use of test equipment between customers is considered bad practice, so if a multimeter is required as part of a production test, don’t be surprised if a bill for a multimeter is tacked onto the NRE. This is due to customers having drastically varying standards around the maintenance and use of test equipment.

    Miscellaneous Advice

    Here are a few final parting thoughts to keep in mind.

    • Have an understanding of how scrap or exceptional yield loss is handled. There are a few schools of thought around this. Ideally, one only pays for good, delivered items, and the factory bears the burden of defectivity. This gives the factory an incentive to maintain a high production quality, because every percent of defectivity eats away at their margin. However, if the design has a flaw or is too hard to build, and defectivity is high, the factory may start shipping lower quality units as a desperate measure to meet production and margin targets. They may also start gray-marketing defective goods to recover cost, leading to brand reputation problems down the road. It’s good to have some sort of an understanding on how to handle such a contingency ahead of time. This may include, for example, a dedicated “scrap” line item inside the quotation to handle defectivity explicitly.
    • On the subject of scrap & yield, it’s a good idea to order more units than the proven demand. These extras go toward handling returns and exchanges. Despite best efforts, mistakes do happen; sometimes they aren’t your fault, such as shipping damage. Ordering 1,000 pieces to fulfill a 1,000 piece Kickstarter campaign means returns and exchanges can be handled with only refunds, as it’s just not practical to fire up the factory to make a dozen replacement units. Thus, as a general rule, I order a few percent excess beyond the customer deliverable, so that I have stock on hand to handle returns and exchanges. Units that don’t get used up by the returns process then turn into demo loaners or business development give-aways to drum up the next set of orders!
    • Keep an eye on shipping costs. These fees aren’t typically built into a quotation, but they impact the bottom line (greatly so for low-volume products). Fedex is a great tool to save time, but it’s also a very expensive addiction. Courier fees can easily wash out the profit on a small project, so manage those costs. Pro tip: couriers will offer discounts to frequent shippers, but you have to call in to negotiate the special rates.
    • Duties. Keep in mind that components imported to China without an import license are levied a 23% or so automatic duty on their value. The general rule for China is dutiable on import, duty free on export. If stuff is accidentally shipped across the border to Hong Kong, expect to pay a duty to get it back into China. Customs brokers can work the angles – for example, some brokers can get goods taxed by their weight and not their value, which for microelectronics is typically a good deal. I haven’t figured out all the customs rules, as they seem to be a moving target – every month it seems there is a new rule, fine, exceptional fee or tariff to deal with. There are also plenty of shady ways to get goods into China, but I sleep better at night knowing I do my best to comply with every rule. The reason quotations don’t include duties is that it’s assumed by default there will be an import license. Import license enable the duty-free import of goods. However, import licenses cost a few thousand bucks, take weeks to process, and have no room for flexibility, as they are tied to an exact BOM for the product. Small ECOs can invalidate a license – customs officers are known to count the number of decoupling caps on a PCB, and if it doesn’t match the count in the license, a fine is levied and the license is invalidated. Even deviations in the material used to line the decorative box can invalidate a license. This import license scheme favors high-volume produces, and punishes low volume producers.

    As one can see, going to China isn’t for everyone. Particularly for those based based in the US, the overhead of courier fees, travel, duties, and late-night concalls adds up rapidly. As a rule of thumb, a US designer is better off assembling PCBs in the US for volumes less than 1k, and they don’t start seeing clear advantages until perhaps 5k-10k volumes. That math shifts in China’s favor as processes such as injection molding and chassis assembly come into play, due to the immense amount of expertise China has accumulated in these labor-intensive processes. Also, the break-even point can be much lower for those living in or near China, as courier fees, travel, and time zone impact are all a small fraction of what they are coming from the US. This compounds with the fact that locals are more effective at leveraging the component ecosystem in China, leading to further cost reductions compared to a design produced using only parts available in the US ecosystem. On the other hand, physically large assemblies or systems built using lots of dutiable components may be cheaper to build domestically, as it saves on shipping costs and tariffs. In the end, one should keep an open mind and try to consider all the possible secondary costs and benefits of domestic vs. foreign production before deciding where to park production.

    The Factory Floor, Part 3 of 4:
    Industrial Design for Startups

    Saturday, January 19th, 2013

    The geek tour continues on. Akiba has new posts up covering our visit to a motor factory, Huawei, CTS, and also a side trip to get full custom clothes and bags tailor made. The photos from the motor factory and the custom tailoring expedition came out particularly well.

    And now, on to part 3 of 4 of “The Factory Floor” series…

    Industrial Design for Startups:
    Guerrilla Engineering on a Shoestring Budget

    Sex sells. The performance of a CPU or amount of RAM in a box, to within a factor of two or so, is less important to a typical consumer than how the device looks. Apple devices command a hefty premium in part because of their slick industrial design, and many product designers aim to emulate the success of Sir Jony Ives in their own products.

    There are many schools of thought in industrial design. One school invokes the monastic designer, coming up with a beautiful, pure concept, and the only thing the production engineers can do is spoil the purity of the design. Another school invokes the pragmatist designer, working closely with the production engineers, hammering out gritty compromises to produce an inexpensive and high-yielding design.

    In my experience, neither extreme is compelling. The monastic approach often results in an unmanufacturable product that is either late to market or exorbitant to produce. The pragmatist approach often results in in a cheap look and feel, to which consumers have trouble assigning a significant value. The real trick is understanding how to strike a balance between the two.

    Trim and finish are difficult, and therefore a point of distinction when it comes to design. The current design fad is minimalism, with an emphasis on “honest” finishes. An honest finish features the natural properties of the material systems in play, and eschews the use of paints and decals. Minimalist, honest designs are very hard to manufacture. Minimal designs have…well, minimal, features – and as a result even tiny blemishes stand out. Honest finishes likewise can be very difficult, as an honest finish means no paint: all the burs, gates, sinks, knits, scoring and flow lines that are a fact of life in manufacturing are laid naked before the consumer. As a result, this school of design requires well-made tools that are constantly checked and maintained throughout production.

    If you don’t have pockets deep enough to invest in new equipment and capabilities on behalf of your factory (i.e., if you’re not a Fortune 500 company), the first step is to learn the vocabulary available. A design vocabulary is defined by the capabilities of the factory or factories producing the goods. What materials, what finish, what tolerances are achievable, what fastening technology is available – these are all heavily dependent upon the processes available.

    Therefore, I find that visiting a factory in person early in the design process results in a better design result. In a factory visit, some design vocabulary will be discarded, but some new vocabulary will be discovered as well – the engineers who work the factory day in and day out develop process innovations that can open up novel design possibilities that are not knowable without the on-site visit.

    The chumby One contains a concrete example of the impact manufacturing process can have on design outcome. In the original concept art, the blue highlight around the front edge was added to evoke the feeling of a speech balloon, like those used in captioning comics – the idea being the chumby is captioning your world with snippets from the Internet.

    It turns out the implementation of such a blue trim across a raised surface is very hard. At the first factory, we implemented the highlight using paint. Silk screening was not an option because the shape wasn’t flat enough. Pad printing can handle curved surfaces, but the alignment wasn’t good enough, as the tiniest bleed over the edge looked terrible from the side. Decals and stickers likewise could not achieve the alignment required. In the end, a small channel had to be carved to contain the paint, and a stencil plus spray paint process was employed to create the highlight. The yield was terrible – in some lots, over 40% of the cases were being thrown away due to painting errors. Fortunately, plastic is cheap, so throwing away every other case after painting had a net cost impact of about $0.35.

    Mid-way through production, we migrated to a second source facility. They had a different plastic molding capability, and unlike the first factory, the second facility could do double-shot molds. Double-shot molds have twice the number of tools, but they can injection mold two different colors, or even different materials, into the same mold. Thus, at the new factory, we opted to use a double-shot process for the thin blue strip, instead of painting. The results were stunning. Every unit came off the line with a sharp, crisp blue line; and no paint meant a more honest, clean finish. However, the cost per case jumped to $0.94 a piece due to the more expensive process, despite the 100% yield. In fact, it would be cheaper to throw away more than half of the painted cases, but even the best painted cases could not compare to the quality of the finish delivered by the double-shot tool.

    Another great example of how tweaking a factory process can improve a product’s appearance is the Arduino motherboard. The wonderfully detailed artwork on the back side, sporting an outline of Italy and very fine lettering, isn’t silkscreen. They actually put on two layers of soldermask, one blue, and one white. Because soldermask is applied using a photolithographic process, the resolution, consistency and alignment of the artwork is much better than a silkscreen. And since an Arduino’s look is the circuit board, it gives the product a distinctive high-quality look that is difficult to copy using conventional processing methods.

    Thus, the process capability of the factory – painting vs. double-shot molding, double soldermasking vs. silkscreening – can have a real effect in the outcome of a product’s perceived quality, without a huge impact on cost. However, a factory may not appreciate the full potential of their processes, and so it requires a designer’s direct interaction to realize the potential. Unfortunately, many designers don’t visit a factory until something has gone wrong, at which point the tools are cut and even if they see a cool process that could solve all their problems, it’s often too late.

    Design is an intensely personal activity, and as a result every designer will develop their own process. This is the general process I might use to develop a product on a tight, startup budget:

      1. Every design starts with a sketchbook. First, decide on the soul and identity of the design, and pick a material system and vocabulary that suits your concept. But don’t fall in love with it…

      2. Break the design down by material system, and identify a factory capable of producing each material system.

      3. Visit the facility, and take note of what is actually running down the production lines. Don’t get too drawn in by the sample room or one-off bits. Practice makes perfect, and from the operators to the engineers they will do a better job of executing things they are doing on a daily basis than reaching deep and exercising an arcane capability.

      4. Re-evaluate the design based on a new understanding of what’s possible, and iterate. This may require going back to step 1, or it may just require small tweaks. But this is the stage at which it’s easiest to make compromises without sacrificing the purity of the design.

      5. Rough out the details of the design – pick parting lines, sliding surfaces, finishes, fastening systems, etc. based upon what the factory can do best.

      6. Pass a revised drawing to the factory, and work with them to finalize details such as draft angles, fastening surfaces, internal ribbing, etc.

      7. Validate the design using a 3D print and extensive 3D model checks.

      8. Identify features prone to tolerance errors, and trim the initial tool so that the tolerance favors “tool-safe” modifications. For example, in injection molding it is easier to remove steel than to add it to a tool, so target the initial test shot to have less plastic than too much on critical dimensions. A button is an example of a mechanism that benefits from tuning: it’s hard to predict from CAD or 3D prints exactly how a button will feel, and getting that tactile feel just perfect usually requires a little trimming of the tool.

    The Factory Floor, Part 2 of 4:
    On Design for Manufacturing

    Monday, January 14th, 2013

    Akiba has new posts covering day 3 and day 4 of my “geek tour” course for MIT Media Lab students, held in Shenzhen, China. His website has been a little bogged down with traffic, so we’re thinking about migrating to a different server. Unfortunately, being inside the GFW (the Great Firewall of China) makes administering anything in the cloud a challenge.

    Anyways, here’s the second installment of my four-part series titled “The Factory Floor”.

    Process optimization: design for manufacturing and test jigs

    It’s time to visit the topic of yield. This is a boring subject for many engineers, but for entrepreneurs, success or failure will be determined in part by achieving a reasonable yield. Unlike software, every copy of a physical good will have slight imperfections. Sometimes the imperfections will cancel out; and sometimes the imperfections gang up and degrade performance. As production volume ramps, these corner cases start adding up and a certain fraction of product ends up non-salable. In a robust design, the failing fraction may be so small that functional tests can be simplified, leading to further cost reductions. In contrast, designs sensitive to component tolerances will require extensive testing, and will suffer heavy yield losses. Reworking the defective units incurs extra labor and parts charges, ultimately leading to profit erosion.

    Thus, a major challenge of moving from the engineering bench to mass production is re-designing to improve robustness in the face of normal manufacturing tolerances. This is called “design for manufacturing”, or DFM.

    Examples of tolerances to consider during the design process include:

    • Passive component tolerances (i.e. resistance +/- 5%, capacitance +80/-20%, etc.)
      Spec sheet parameters that vary widely (such as hFE for bipolar transistors, Vt for FETs, Vf for LEDs). Always read the datasheet and keep an eye out for parameters that have a wide min-max spread. For example, the min-max on hFE for Fairchild’s 2N3904, ranges from 40 to 300, and the Vf on a superbright LED from Kingbright goes between 2 and 2.5V.
    • Voltage margins – particularly important for capacitors and input networks. As a rule of thumb, I try to spec capacitors with 2x headroom over nominal voltage, so I will try to use 10V caps for 5V rails and 6.3V caps for 3.3V rails. For example, many ceramic capacitor dielectrics reduce or derate their capacitance with increasing voltage. This means that ceramic capacitors in designs operating near their rated max voltage will see all the operating capacitances cornering toward the negative end of their tolerance range. Also, input networks – anything a user can plug something into – are subject to punishing ESD and other transient abuses, and special attention needs to be paid there to achieve the desired reliability.
    • PCB trace widths and layer stack variations – impacts systems requiring matched impedance, or dealing with high currents.
    • Mechanical tolerances – a case designed to fit a PCB with zero tolerance will result in the factory forcing PCBs into the case half the time, when either the PCB was cut a little large or the case came out a little small. This can lead to unintentional mechanical damage of the circuitry or the case.
    • Cosmetic blemishes – any manufactured product is subject to small blemishes, such as dust trapped in plastics, small scratches, sink marks, and abrasions. It’s important to work out the acceptance criteria for such defects ahead of time – i.e., not more than two dot blemishes larger than 0.2mm per unit, no scratch longer than 0.3mm, etc. so that the process can be crafted to avoid such defects, as opposed to the more expensive alternative of just building units and throwing away the ones that don’t meet a set of criteria imposed late in the game. Of course, nothing comes for free – to do things on the cheap, avoid high-gloss finishes and consider using matte/textured finishes that naturally hide blemishes.

    DFM Improves the Bottom Line
    Let’s return to our LED blinker case study from part 1 of the series. Let’s say the prototype design calls for an array of three LEDs in parallel, each with its own current-set resistor. As noted above, Vf , the forward bias voltage of an LED at a given brightness, can vary by perhaps 20% between devices – in this case, from 2.0 to 2.5V. A design that uses resistive current limiting will amplify this variation. This is because an efficient circuit would drop a minority of the voltage across the current limiting resistor, leaving the parameter that sets the current – the voltage drop across the resistor – more sensitive to the variation in Vf. Since the brightness of an LED is proportional to the current flowing through the LED – not the voltage – the use of resistive current limiting to set LED brightness can lead to jarring inconsistencies in LED brightness uniformity.

    The above chart illustrates how a 20% LED Vf variation leads to a 40% change in the voltage across a current-set resistor for a fixed 3.3V supply, which will in turn lead to a 40% change in the current flowing through the LED, finally manifesting as a 40% change in perceived brightness.

    Such a design may work well most the time – the problem is only pronounced when by chance a high Vf unit is paired with a low Vf unit. So for the one or two units prepared on the lab bench, things looked great.. However, a meaningful fraction of units may have brightness uniformities so bad there is choice but to reject the units. Given that most large hardware businesses have to survive on lean margins, losing even 10% of finished goods to defectivity is a terrible outcome.

    One stop gap is to re-work the failed material. A factory can identify the LED that is too dim or too bright in an array, and replace it with a new one that may have a better chance of matching its cohorts. However, this rework drives up costs, and results in an unexpected and unpleasant invoice at the 11th hour of a manufacturing program. Naïve designers may be inclined to blame the factory for poor quality and argue over who should bear the cost, but it’s better to proactively avoid these kinds of problems by subjecting every design to a DFM check, and using a small pilot run to sanity-check yield before punching out a whole bunch of units.

    The cost of yield fallout quantifies how much money to spend on extra circuitry to compensate for normal component variability. For example, a $10 COGS product that is yielding 80% good units has an effective cost per salable unit of $12.5 (calculated using COGS x total units built / yielded units). Therefore, increasing the COGS by $2.5 to improve yield to 100% breaks even, and spending $1 to improve yield to 99% improves the bottom line by $1.38.

    In the case of the LED flasher, the dollar could be spent on a current-feedback boost regulator IC, allowing the LEDs to be stacked in series instead of parallel, so that each LED is guaranteed to have a consistent and identical amount of current flowing through them, thereby leading to greatly improved lighting uniformity. While the cost of the boost regulator is much greater than the penny spent on three current limiting LEDs, the improvement in manufacturing yield more than pays for the extra component costs.

    Test for Success
    The other often-neglected responsibility of a designer is the test program. A factory can only detect the problems they are instructed to look for. Therefore, every feature must be tested, no matter how trivial. For example, on a chumby device, every user-facing feature had an explicit factory test – LCD, touchscreen, audio, microphone, all the expansion ports (USB, audio), battery, buttons, knobs, etc. etc. Even the simplest buttons had to be tested. While it’s tempting to skip testing such simple components, I guarantee if it’s not tested, it will lead to returns.

    And no, do not outsource the test program to the factory, even if they offer the service. First of all, the factory often doesn’t understand your design intent, so their test programs will either be inefficient, or they will test for the entirely wrong behavior. Also, factories have an incentive to pass as much material as possible, as quickly as possible, so factory-created test programs tend to be primitive and inadequate.

    As a rule of thumb, for every product you make, you’re actually making two related products: one for the end user, and a test for the factory. In many ways, the test for the factory has to be as user-friendly and foolproof as the product itself – after all, tests are not run by electrical engineers. However, the related testing product will be much quicker and faster to build if adequate testability features are designed into the consumer product.

    Here are some guidelines when it comes to designing a test program:

    • Strive for 100% feature coverage. It’s often easy to overlook simple or secondary features – status LEDs, an internal voltage sensor, etc. As a sanity check, look at the device and list every way a consumer can interact with the device. Ask if the test program addresses every interaction surface, if even superficially – is every LED lit, every button pressed, every sensor stimulated and every memory device touched. If the product has a microcontroller, it’s also helpful to review which drivers are loaded to cross-check the test list. Finally, do a schematic review and look at every port and consider key internal nodes to monitor as part of the test.
    • Minimize incremental setup effort. In other words, optimize the amount of time required to set up the test for each unit. This is often done through jigs that employ pogo pins or pre-aligned connector arrays. A test that requires an operator to manually probe test a dozen test points or insert a dozen connectors is time consuming and prone to manual error. Most factories in China can help design the jig for a nominal cost, but jig design is easier and more effective if the design is provisioned with adequate test points.
    • Automate test execution in a linear execution flow. Ideally, a test just run with a single button press, and then produce a pass/fail result. In practice, there will always be stop points that require operator intervention. An example of too much intervention is requiring an operator to key in or select an SSID from a list every time during a test for wifi connectivity. Instead, fix the test target SSID and hard-code that value into the test script so that the connection cycle is automatic.
    • Use icons and colors to communicate with operators, not text. Not every operator is guaranteed to be literate in a given language.
    • Employ audit logs. Record test results correlated to device serial numbers by incorporating a barcode scanner into the test rig. An alternative is to create a “test coupon” or a locally stored audit log to prove which units have had the test run successfully. This gives some hints as to what went wrong when a consumer returns a failed product. It also gives a quick method to check that the test procedure is being executed on all products. After an eight-hour shift of running test, an operator is prone to making mistakes, such as putting a defective unit accidentally into the good units’ bin. Having a quick way to check that every product that ships has been subjected to and passed the full test can help identify and isolate such problems.
    • Provide an easy update mechanism. Like any program, test programs also have bugs, and tests also need to evolve as the product has patches and upgrades applied . Thus it is imperative to have a mechanism to update and fix test programs without having to visit the factory every time in person. Many of my test fixtures have a mode where they can “phone home” via a VPN where I can then ssh into the jig itself to fix bugs: even my simplest jigs employ a linux laptop at its core.

    The guidelines above are easy to implement if the product is designed with testability in mind. As most of the products I design run Linux, I leverage the processor inside the product itself to run the majority of the tests and manage the test UI. For products that lack user interaction surfaces, I will use an Android phone or a laptop connected via wifi or serial as the test UI.

    Testing vs. Validation
    Production tests are meant to check for assembly errors, not parametric variations or design issues. If a test is screening out devices because of normal parametric component variations, either buy better components, or re-do the design.

    For consumer-grade products, there is no need to run a five minute comprehensive RAM test on every unit – in theory, a product should be designed well enough that if it’s all soldered together correctly, the RAM will do its job. A quick test to check that there are no stuck or open address pins is all that is really need. Name-brand chip vendors have typically very low defectivity rates, so we’re not validating the silicon; rather, we are validating the solder joints, connectors, and checking for missing or swapped components (note that if you buy clone chips or off-brand/remarked/partially tested devices to cut costs, it is advised to make a mini-validation program for those components).

    To illustrate the point, let’s consider testing vs. validation for a switch.

    A production test for a switch on a product may simply consist of asking the operator to hit the switch a few times and verify that the feel is right, and verify that electrical contact is made through a simple digital indicator.

    A validation test for a switch may consist of taking a few devices, measuring contact resistance with a five-digit multimeter, subjecting them to 100% humidity at 40C overnight, and then putting the devices into an automated jig where the switches are cycled ten thousand times. Finally, the switches are re-measured with a five-digit multimeter and any degradation in close-state contact resistance is noted.

    Clearly, this level of validation cannot be performed on every device manufactured. Rather, the validation program checks for performance of the switch over the expected lifetime of the product, and the test just makes sure the switch is put together right. Note that it is considered good practice to re-run validation tests on a couple randomly sampled units out of every several thousand units produced; there are formulas and tables to compute how much sampling is needed to achieve a certain level of quality.

    So how much testing is enough testing? One threshold for testing can be derived through a cost argument. Every additional test run incurs test equipment costs, engineering costs, and the variable cost of the test time. As a result, testing is subject to diminishing returns: at some point, it’s cheaper just to take a product return than to test more. Naturally, the testing bar is much higher for medical or industrial grade equipment, as the liability associated with faulty equipment is also much higher. Likewise, a novelty product meant to be given away may get away with much less testing.

    As a final thought, don’t dismiss the value of applying solid engineering to test jig design. I once had a problem once where a flat flex cable adapter with 50 pins had random cold solder joint failures. I asked the factory to build a test to validate the adapters. Their solution was to hang LEDs off of every pin of the adapter and put a test voltage into one side, and look for LEDs that don’t light up on the other side. The problem is that the cold solder joints weren’t simply open or closed – some were just high resistance. Enough current would flow to light an LED, yet enough resistance would be present to cause a fault in the design. After I noted this problem, the factory proposed buying 50 multimeters and hanging them off of every pin to check the resistance manually – an expensive and error-prone proposition. My response was to daisy-chain the connections across the adapter, and then use a single multimeter to check the net resistance of the daisy chain. Putting the connections in series checks all 50 connections with a single numeric measurement (as opposed to the subjective observation of an LED’s brightness). As one can see, even a test as simple as checking for cold solder joints on a cable adapter can have better or worse implementations. As ever more complicated components require ever more subtle tests, there is real value in applying solid engineering skills to crafting efficient yet foolproof tests.