Exclave: Hardware Testing in Mass Production, Made Easier

December 21st, 2018

Reputable factories will test 100% of every product shipped. For example, the computer or phone you’re using to read this has had a plug inserted in every connector, along with dozens of internal and external tests run to confirm everything from the correct operation of the CPU to the proper function of the buttons.


A test station at a motherboard factory (2x speed). Every port and connector gets tested.

Even highly automated processes can yield defective units: entropy happens, and constant vigilance is required to guard against it. Even a very stable manufacturing process with a raw defect rate of around 1% is considered unacceptable by any reputable brand. This is one of the elephants in the digital fabrication room – just because a tool is digital doesn’t mean it will fabricate things perfectly with a push of the button. Every tool needs maintenance, and more often than not a skilled operator is required to inspect the final product and polish over rough edges.

To better grasp the magnitude of the factory test problem, consider the software that’s loaded on your computer. How did it get in there? Devices come out of the silicon foundry mostly blank. They typically don’t even have the innate knowledge to traverse a filesystem, much less connect to the Internet to download an update. Yet everyone has had the experience of waiting for an update to download and install. Factories must orchestrate a much more time-consuming and complicated process to bootstrap every device made, in order for you to enjoy the privilege of connecting to the Internet to download updates.

One might think, “surely, there must be a standardized way for handling this”.

Shockingly, there isn’t.

How Not To Test a Product

Unfortunately, first-time product makers often make the assumption that either products don’t require 100% testing (because the boards are assembled by robots, and robots don’t make mistakes, right?), or there is some otherwise standardized way to handle the initial firmware upload. Once upon a time, I was called upon to intervene on a factory test for an Arduino-derivative product, where the original test specification was literally “plug the device into the USB port of [your] laptop, and type in this AVRDUDE command to load code, and then type in another AVRDUDE command to set the fuses, and then use a multimeter to check the voltages on these two test points”. The test documentation was literally two photographs of the laptop screen and a paragraph of text. The product’s designer argued to the factory that this was sufficient because it it’s really quick and reliable: he does it in under two minutes, how could any competent factory that handles products with AVR chips not have heard of AVRDUDE, and besides he encountered no defects in the half dozen prototypes he produced by hand. This is in addition to an over-arching attitude of “whatever, I’m the smart guy who comes up with the ideas, just get your minimum-wage Chinese laborers to stop messing them up”.

The reality is that asking someone to manually run commands from a shell and read a meter for hours on end while expecting zero defects is neither humane nor practical. Furthermore, assuming the ability and judgment to run command line scripts isn’t realistic; testing is time-consuming, and thus often the least-skilled, lowest wage laborers are employed for the process. Ironically, there is no correlation between the skills required to assemble a computer, and the skills required to operate a computer. Thus, in order for the factory to meet the product designer’s expectation of low labor cost with simultaneously high quality, it’s up to the product designer to come up with an automated, fool-proof test jig.

Introducing the Test Jig: The Product Behind the Product

“Test jig” is a generic term any tool designed to assist with production testing. However, there is a basic format for a test jig chassis, and demand for test jig chassis is so high in places like Shenzhen that entire cottage industries have sprung up to support the demand. Most circuit board test jigs look a bit like this:


Above: NeTV2 circuit board test jig

And the short video below highlights the spring-loaded pogo pins of the test jig, along with how a circuit board is inserted into a test jig and clamped in place for testing.


Above: Inserting an NeTV2 PCB into its test jig.

As you can see in the video, the circuit board is placed into a precision-milled platter that moves along spring-loaded rails, allowing the board to engage with pogo-pin style test points underneath. As test points consume precious space on the circuit board, the overall mechanical accuracy of the system has to be better than +/-1mm once all tolerances are considered over thousands of cycles of wear and tear, in order to keep the test points a reasonable size (under 2mm in diameter).

The specific test jig shown above measures 12 separate DC voltages, performs a basic JTAG ID code check on the FPGA, loads firmware, and tests the on-board DRAM all in under 20 seconds. It’s the preliminary “fast test” of the NeTV2 product, meant to screen out gross solder faults and it provides an estimated coverage of about 80% of the solder joints on the PCB. The remaining 20% of the solder joints belong principally to connectors, which require a much more labor-intensive manual test to check.

Here’s a look inside the test jig:

If it looks complicated, that’s because it is. Test jig complexity is correlated with product complexity, which is why I like to say the test jig is the “product behind the product”. In some cases, a product designer may spend even more time designing a test jig than they spend designing the product itself. There’s a very large space of problems to consider when implementing a test jig, ranging from test coverage to operator fatigue, and of course throughput and reliability.

Here’s a list of the basic issues to consider when designing a test jig:

  • Coverage: How to test every single feature?
  • UX: Who is interpreting your test data? How to internationalize the UI by using symbols and colors instead of text, and how to minimize operator fatigue?
  • Automation: What’s the quickest way to set up and tear down tests? How to avoid relying on human judgment?
  • Audit & traceability: How do you enforce testing standards? How to incorporate logging and coupons to facilitate material traceability?
  • Updates: What do you do when the tester needs a patch or update? How do you keep the test program in lock-step with the current firmware release?
  • Responsibility: Who is responsible for product quality? How do you create a natural incentive to design-for-test from the very first product sketch?
  • Code Structure: How do you maintain the tester’s code base? It’s tempting to think that test jig code should be write-once, since it’s going into a single device with a limited user base. However, the reality of production is rarely so simple, and it pays to structure your code base so that it’s self-checking, modular, reconfigurable, and reliable.

Each of these bullet points are aspects of test jig design that I have learned from the school of hard knocks.

Read on, and avoid my mistakes.

Coverage

Ideally, a tester should cover 100% of the features of a product. But what, exactly, constitutes a feature? I once designed a product called the Chumby One, and I also designed its test procedure. I tried my best to cover all of its features, but I missed one: the power button. It seemed simple enough – just a switch, what could go wrong? It turns out that over the course of production, the tolerance between the mechanical switch pusher and the electrical switch mechanism had drifted to the point where pushing on the cap would not contact the electrical switch itself, leading to a cohort of returns from that production lot.

Even the simplest of mechanisms is a feature that needs to be tested.

Since that experience, I’ve adopted an “inside/outside” methodology to derive the test feature list. First, I look “inside” the product, going through the schematic and picking key features for testing. The priority is to check for solder faults as quickly as possible, based on the assumption that the constituent components are 100% pre-tested and reliable. Then, I look at the product from the “outside”, as a consumer might approach it. First, I look at the marketing brochure and see what was promised: “world class WiFi performance” demands a different level of test from “product has WiFi”. Then, I try to imagine all the ways a customer might interact with the product – such as pressing the power button – and add those points to the test list. This means every connector needs to have something stuffed in it, every switch pressed, every indicator light must get checked.


Red arrow calls out the mechanical switch pusher that drifted out of tolerance with the corresponding electrical switch

UX

Test jig UX can have a large impact on test throughput and reliability; test operators are human, and like all humans are susceptible to fatigue and boredom. A startup I worked with once told me a story of how a simple UX change drastically improved test throughput. They had a test that would take 10 minutes on average to run, so in order to achieve a net throughput of around 1 minute per unit, they provided the factory 10 testers. Significantly, the test run-time would vary from unit to unit, with a variance of several minutes from unit to unit. Unfortunately, the only indicator of test state was a single light that could either flash or change color. Furthermore, the lighting pattern of units that failed testing bore a resemblance to units that were still running the test, so even when the operator noticed a unit that finished testing, they would often overlook failed units, assuming they were still running the test. As a result, the actual throughput achieved on their first production run was about one unit every 5 minutes — driving up labor costs dramatically.

Once the they refactored the UX to include an audible chime that would play when the test was finished, aggregate test cycle time dropped to a bit over a minute – much closer to the original estimate.

Thus, while one might think UX is just for users, I’ve found it pays to make wireframes and mock-ups for the tester itself, and to spend some developer cycles to create an operator-friendly test program. In some ways, tester UX design is more challenging than the product UX: ideally, you’re creating a UX with icons that are internationally recognizeable, using little or no text, so operators anywhere in the world can just sit down and use it with no special training. Furthermore, you’re trying to create user engagement with something as banal as a test – something that’s literally as boring as watching paint dry. I’ve even gone so far as putting a mini-game in the middle of a long test sequence to keep operators attentive. The mini-game was of course directly relevant to the testing certain hardware sensors, but it was surprisingly effective because the operators would race each other on the mini-game to see who could finish the fastest, boosting throughput and increasing worker happiness.

At the end of the day, factories are powered by humans, and it pays to employ a human-first design process when crafting test programs.

Automation

Human operators are prone to error. The more a test can be automated, the more reliable it can be, and in the long run automation will save money. I once visited a large mobile phone maker’s factory, and witnessed a gymnasium-sized room full of test stations replaced by a pair of fully robotic test stations. Instead of hundreds of operators plugging cables in and checking aspects like screen and camera quality, a delicate ballet of robotic actuators would plug connectors into every port in a fraction of a second, and every feature of the phone from the camera to the GPS is tested in a couple of minutes. The test stations apparently cost about a million dollars to develop, but the empty cavern of idle test jigs sitting next to it was clear testament to the labor cost savings of such a high degree of automation.

At the smaller scales more typical of startups, automation can happen but it needs to be judiciously applied. Every robotic actuator takes time and money to develop, and they are also prone to wear-out and eventual failure. For the Chibitronics Chibi Chip product, there’s a single mechanical switch on the board, and we developed a simple servo mechanism to actuate the plunger. However, despite using a series-elastic spring and a foam pad to avoid over-stressing the servo motor, over time, we’ve found the motor still fails, and operators have disconnected it in favor of manually pushing the button at the right time.


The Chibi Chip test jig


Detail view of the reset switch servo

Indicator lights can also be tricky to test because the lighting conditions in a factory can be highly variable. Sometimes the floor is flooded by sunlight; other times, it’s lit by dim fluorescent lamps or LED bulbs, each with distinct noise signatures. A simple photodetector will be unreliable unless you can perfectly shield the device under test (DUT) from stray light sources. However, if the product’s LEDs can be modulated (with a PWM waveform, for example), the modulation can be detected through an AC-coupled photodetector. This system tends to be more reliable as the AC coupling rejects sunlight, and the modulation frequency can be chosen to be distinct from other stray light noise sources in the factory.

In general, the gold standard for test automation is to put the DUT into a jig, press a button, wait, and then a red or green light indicates if the device passes or fails. For simple products, this should be achievable, but reasonable exceptions should be made depending upon the resources available in a startup to implement tests versus the potential frequency and impact of a particular feature escaping the test process. For example, in the case of NeTV2, the functionality of indicator LEDs and the fan are visually inspected by the operator; but in my judgment, all the components involved have generous tolerances and are less likely to be assembled incorrectly, and there are other points downstream of the PCB test during the assembly process where the LEDs and fan operation will be checked yet again, further reducing the likelihood of these features escaping the test process.

Audit and Traceability

Here’s a typical failure scenario at a factory: one operator is running two testers in parallel. The lunch bell rings, and the operator gets up and leaves without noting the status of the test (if you’ve been doing the same thing over and over for the past four hours and running on an empty belly, you’d do the same thing too). After lunch, the operator sits down again, and has to recall whether the units in front of her have been tested or not. As a result of this arbitrary judgment call, sometimes units that didn’t pass test, or weren’t even tested at all, slip into the tested product bins after a shift change.

This is one of the many reasons why it pays to incorporate some sort of audit and traceability program into the tester and product itself. The exact nature of the program will depend greatly upon the exact nature of the product and amount of developer resources available, but a simple example is structuring the test program so that a serial number isn’t generated for the product until all the tests pass – thus, the serial number is a kind of “coupon” to prove the unit has passed test. In the operator-returning-from-lunch scenario, she just has to check for the presence of a serial number to determine the testing state of a particular unit.


The Chibi Chip uses Bitmarks as a coupon to indicate when they have passed test. The Bitmarks also help prevent warranty fraud and deters cloning.

Sometimes I also burn a log of the test into the product itself. It’s important to make the log a circular buffer that can store more than one test run, because often times products that fail test the first time must be retested several times as it’s reworked and repaired. This way, if a product is returned by a user, I can query the log and see a fairly complete history of the product’s rework experience in the factory. This is incredibly helpful in debugging factory process issues and holding the factory accountable for marginal practices such as re-testing a device multiple times without repairing it, with the hope that they get lucky and get a “pass” out of the tester due to random environmental fluctuations.

Ideally, these logs are sent up to the cloud or a server directly, but that will depend heavily upon the reliability of the Internet connectivity at your facility. Internet is notoriously unreliable in China, especially to servers not located on the mainland, and so sometimes a small startup with limited resources has to make compromises about the extent and nature of audit and traceability achievable on the factory floor.

Updates

Consumer electronic products are increasingly just software wrapped in a plastic shell. While the hardware itself must stabilize months before production, the software in a product continues to evolve, especially in Internet-connected products that support over-the-air updates. Sometimes patches to a product’s firmware can profoundly alter low-level APIs, breaking the factory test program. For example, I had a product once where the audio drivers went through a major upgrade, going from OSS to ALSA. This changed the way the microphone subsystem was accessed, causing the microphone test to fail in production. Thus user firmware updates can also necessitate a tester program update.

If a test jig was engineered as a stand-alone box that requires logging into a terminal to upgrade, every time the software team pushes an update, guess what – you’re hopping on a plane to the factory to log in to the test jig and upgrade it. This is not a sustainable upgrade plan for products that have complex, constantly evolving internal firmware; thus, as the test jig designer, it’s well-advised to build a secure remote upgrade process into the test jig itself.


That’s me about 12 years ago on a factory floor at 2AM debugging a testjig update gone wrong, bringing production to a screeching halt. Don’t be like me; you can do better!

In addition a remote upgrade mechanism, you’re going to need a way to validate the test jig update without having to bring down a production line. In order to help with this, I always keep a physical copy of the production test jig in my office, so I can validate testjig updates from the comfort of my office before pushing them to the production floor. I try my best to keep the local jig an exact copy of what’s on the line; this may involve taking snapshots of the firmware image or swapping out OS drives between development and production versions, or deliberately breaking features that have somehow failed on the production jigs. This process is inspired by the engineers at JPL and NASA who keep an exact copy of Mars-based rovers on Earth, so they can thoroughly test an update before pushing it to the rover on Mars. While this discipline can be inconvenient and incurs the cost of an extra test jig, it’s inevitably cheaper than having to book a last minute flight to your factory to fix things because of an update gone wrong.

As for the upgrade mechanism itself, how fancy and secure you want to get has virtually no limit; I’ve done everything from manual swaps of USB thumb drives that contain the tester configuration data to a private VPN via a dedicated 3G-to-wifi gateway deployed at the factory site. The nature of the product (e.g. does it contain security keys, how often is the product firmware updated) and the funding level of your organization will heavily influence the architecture of the upgrade process.

Responsibility

Given how much effort it takes to build a good test jig, it’s tempting to free up precious developer resources by simply outsourcing the test jig to a third party. I’ve almost never found this to be a good idea. First of all, nobody but the developer knows what skeletons are hidden in a product’s closet. There’s what’s written in the spec, but then there is how faithfully the spec was implemented. Of course, in an ideal world, all specs were perfectly met, but only the developer has a true sense of how spot-on the implementation ended up. This drives the second point, which is avoiding the blame game. By throwing tests over the fence to a third party, if a test isn’t easy to implement or is generating false results, it’s easy to get into a finger-pointing exercise over who is at fault: the developer for not meeting the specs, or the test developer for not being creative enough to implement the test without necessitating design changes.

However, when the developer knows they are ultimately on the hook for the test jig, from day one the developer thinks about design for test. Where will the test points go? How do we make internal state easily visible? What bring-up sequence gives us the most test coverage in the shortest amount of time? By making the developer responsible for the test jig, the test program comes together as the product matures. Bring-up scripts used to validate the product are quickly converted to factory tests, and overall the product achieves a higher standard of testability while saving the money and resources that would otherwise be spent trying to coordinate between two parties with conflicting self-interests.

Code Structure

It’s tempting to think about a test jig as a pile of write-once code that doesn’t need to be maintainable. For simple products, one can definitely get away with this mentality. However, I’ve been bitten more than once by fragile code bases inside production testers. The most typical scenario where things break is when I have to change the order of tests, in order to prioritize testing problematic features first. It doesn’t make sense to test a dozen high-yielding features before running a test on a feature with a known yield issue. That just wastes operator time, and runs up the cost of production.

It’s also hard to predict before production what the most frequent mode of failure would be – after all, any failures you could have anticipated would already be designed out! So, quite often in the middle of an early production run, I’m challenged with having to change the order of tests in a complex sequence of tests to optimize operator time and improve production throughput.

Tests almost always have dependencies – you have to power on the board before you can flash the firmware; you need firmware before you can connect to wifi; you need credentials to connect to wifi; you have to clean up the test credentials before shipping the product. However, if the process that cleans up the test credentials is also responsible for cleaning up any other temporary tester files (for example, a flag that also sets Bluetooth into test mode), moving the wifi test sequence earlier could result in tester configuration files being left on the customer image, potentially leading to unexpected behaviors (such as Bluetooth still being in test mode in the shipping product!).

Thus, it’s helpful to have some infrastructure for tests that keeps each test modular while enforcing dependencies. Although one could write this code every single time from scratch, we encounter this problem so regularly that Sean ‘Xobs’ Cross set out to create a testjig management system to solve this problem “once and for all”. The result is a project he calls Exclave, with the idea being that Exclave – like an actual geographical exclave – is a tiny bit of territory that you can retain control of inside a foreign factory.

Introducing Exclave

Exclave is a scaffold designed to give structure to an otherwise amorphous blob of test code, while minimizing the amount of overhead required of the product designer to achieve this structure. The basic features of Exclave are as follows:

  • Code Re-use. During product bring-up, designers write simple scripts to validate each feature individually. Exclave attempts to re-use these scripts by making no assumption about the language used to write them. Python, C, Bash, Node.js, Rust – all are welcome, so long as they run on a command line and can return an exit code.
  • Automated dependency resolution. Each test routine is associated with a “.test” descriptor which describes the dependencies and timeout for a given script, which are then automatically resolved by Exclave.
  • Scenario management. Test descriptors are strung together into scenarios, which can be selected dynamically based on the real-time requirements of the factory.
  • Triggers. Typically a test is started by pressing a button, but Exclave’s flexible triggering system also allows tests to start on other cues, such as hot-plug events.
  • Multiple UI targets. Test jig UI can range from a red/green light to a serial console device to a full graphical interface running on a monitor. Exclave has a system for interpreting test results and driving multiple UI sinks. This allows for fast product debugging by attaching a GUI (via an HDMI monitor or laptop) while maintaining compatibility with cost-efficient LED indicators favored for production scale-up.


Above: Exclave helps migrate lab-bench validation code to production-grade factory tests.

To get a little flavor on what Exclave looks like in practice, let’s look at a couple of the tests implemented in the NeTV2 production test flow. First, the production test is split into two repositories: the test descriptors, and the graphical UI. Note that by housing all the tests in github, we also solve the tester upgrade problem by providing the factory with a set git repo management scripts mapped to double-clickable desktop icons.

These repositories are installed on a Raspberry Pi contained within the test jig, and Exclave is started on boot as a systemd service. The service runs a simple script that fires up Exclave in a target directory which contains a “.jig” file. The “netv2.jig” file specifies the default scenario, among other things.

Here’s an example of what a quick test scenario looks like:

This scenario runs a variety of scripts in different languages that: turn on the device (bash/C), checks voltages (C), checks ID code of the FPGA (bash/openOCD), loads a test bitstream (bash/openOCD), checks that the REPL shell can start on the FPGA (Expect/TCL), and then runs a RAM test (Expect/TCL) before shutting the board down (bash/C). Many of these scripts were copied directly from code used during board bring-up and system validation.

A basic operation that’s surprisingly tricky to do right is checking for terminal interaction (REPL shell) via serial port. Writing a C or bash script that does this correctly and gracefully handles all error cases is hard, but fortunately someone already solved this problem with the “Expect” TCL extension. Here’s what the REPL shell test descriptor looks like in Exclave:

As you can see, this points to a couple other tests as dependencies, sets a time-out, and also designates the location of the Expect script.

And this is what the Expect script looks like:

This one is a bit more specialized to the NeTV2, but basically, it looks for the NeTV2 tester firmware shell prompt, which is “TESTER_NX8D>”; the system will attempt to recover this prompt by sending a carriage-return sequence once every two seconds and searching for this special string in return. If it receives the string “BIOS” instead, this indicates that the NeTV2 failed to boot and escaped into the ROM BIOS, probably due to a RAM error; at which point, the Expect script prints a bunch of JSON, which is automatically passed up to the UI layer by Exclave to create a human-readable error message.

Which brings us to the interface layer. The NeTV2 jig has two options for UI: a set of LEDs, or an HDMI monitor. In an ideal world, the total amount of information an operator needs to know about a board is if it passed or failed – a green or red LED. Multiple instances of the test jig are needed when a product enters high volume production (thousands of units per day), so the cost of each test jig becomes a factor during production scale-up. LEDs are orders of magnitude cheaper than an HDMI monitor, and in general a test jig will cost less than an HDMI monitor. So LEDs instead of an HDMI monitor for UI can dramatically slash the cost to scale up production. On the other hand, a pair of LEDs does not give enough information to diagnose what’s gone wrong with a bad board. In a volume production scenario, one would typically collect the (hopefully small) fraction of failed boards and bring them to a secondary station where a more skilled technician debugs them. Exclave allows the same jig used in production to be placed at the debug station, but with an HDMI monitor attached to provide valuable detailed error reports.

With Exclave, both UI are integrated seamlessly using “.interface” files. Below is an example of the .interface file that starts up the http daemon to enable JSON debugging via an HDMI monitor.

In a nutshell, Exclave contains an event reporting system, which logs events in a fashion similar to Linux kernel messages. Events are tagged with metadata, such as severity, and the events are broadcast to interface handlers that further refine them for the respective UI element. In the case of the LEDs, it just listens for “START” [a scenario], “FAIL” [a test], and “FINISH” [a scenario] events, and ignores everything else. In the case of the HDMI interface, a browser configured to run in kiosk mode is pointed to the correct localhost webpage, and a jquery-based HTML document handles the dynamic generation of the UI based upon detailed messages from Exclave. Below is a screenshot of what the UI looks like in action.

The UI is deliberately brutalist in design, using color to highlight only the most important messages, and also includes audible alerts so that operators can zone out while the test runs.

As you can see, the NeTV2 production tester tests everything – from the LEDs to the Ethernet, to features that perhaps few people will ever use, such as the SD card slot and every single GPIO pin. Thanks to Exclave, I was able to get this complex set of tests up and running in under a month: the first code commit was made on Oct 13, 2018, and by Nov 7, I was largely just tweaking tests for performance, and to reflect operational realities discovered on the factory floor.

Also, for the hardware-curious, I did design a custom “hat” for the Raspberry Pi to add several ADC channels and various connectors to facilitate testing. You can check out the source for the tester hat at the Alphamax github repo. I had six of these boards built; five of them have found their way into various parts of the NeTV2 production flow, and if I still have one spare after production is stabilized, I’m planning on installing a replica of a tester at HAX in Shenzhen. That way, those curious to find out more about Exclave can walk up to the tester, log into it, and poke around (assuming HAX agrees to this).

Let’s Stop Re-Inventing the Test Jig!
The unspoken secret of hardware is that behind every product, there’s a robust test jig making sure that every unit shipped to end customers meets quality standards. Hardware startups that don’t anticipate the importance and difficulty of creating such a tester often encounter acute (and sometimes fatal) growing pains. Anytime I build more than a few copies of a piece of hardware, I know I’m going to need a test jig – even for bespoke, short-run products like a conference badge.

After spending months of agony re-inventing the wheel every time we shipped a product, Xobs decided to create Exclave. It’s still a work in progress, but by now it’s been used as the production test infrastructure for several volume products, including the Chibi Chip, Chibi Scope, Tomu, The Phage Blinky Badge, and now NeTV2 (those are all links to the actual Exclave test scripts for each of the respective products — open source ftw!). I feel Exclave has come along far enough that it’s time to invite more users to join the Exclave community and give it a try. The code is located on github and is 100% open source, and it’s written in Rust entirely by Xobs. It’s my hope that Exclave can mature into a tool and a community that will save countless Makers and small hardware startups the teething pains of re-inventing the test jig.


Production-proven testjigs that run Exclave. Clockwise from top-right: NeTV2, Chibi Chip, Chibi Scope, Tomu, and The Phage Blinky Badge. The badge tester has even survived a couple of weeks exposed to the harsh elements of the desert as a DIY firmware updating station!

Name that Ware December 2018

December 16th, 2018

The Ware for December 2018 is shown below.

Finishing off the year with a (hopefully) easy one that’s slightly off the beaten path.

Happy holidays! Stay safe, and stay free.

Winner: Name that Ware November 2018

December 16th, 2018

The Ware for November 2018 is a bias/control board for the HP 2-18GHz YIG-tuned multiplier. I really appreciate this fascinating ware, it reminds me that the MOS transistor is not the be-all and end-all of electronics. Of course, every day we encounter crystals as frequency references, and those are literally shaved pieces of quartz, but here is a sphere of Yttrium Iron Garnet (YIG) being used as a tunable RF filter. Thanks to phantom deadline for contributing this ware, and also congrats to Brian for nailing the ware. Email me for your prize!

On Overcoming Pain

December 6th, 2018

Breaking my knee this year was a difficult experience, but I did learn a lot from it. I now know more than I ever wanted to know about the anatomy of my knee and how the muscles work together to create the miracle of bipedal locomotion, and more importantly, I now know more about pain.

Pain is one of those things that’s very real to the person experiencing it, and a person’s perception of pain changes every time they experience a higher degree and duration of pain. Breaking my knee was an interesting mix of pain. It wasn’t the most intense pain I had ever felt, but it was certainly the most profound. Up until now, my life had been thankfully pain-free. The combination of physical pain, the sheer duration of the pain (especially post-surgery), and the corresponding intellectual anguish that comes from the realization that my life has changed for the worse in irreversible ways made this one of the most traumatizing experiences of my life. Despite how massive the experience was to me, I’m also aware that my experience is relatively minor compared to the pains that others suffer. This sobering realization gives me a heightened empathy for others experiencing great pain, or even modest amounts of pain on a regular basis. Breaking a knee is nothing compared to having cancer or a terminally degenerative disease like Alzheimer’s: at least in my case, there is hope of recovery, and that hope helped keep me going. However, a feeling of heightened empathy for those who suffer has been an important and positive outcome from my experience, and sharing my experiences in this essay is both therapeutic for me and hopefully insightful for others who have not had similarly painful life experiences.

I broke my knee on an average Saturday morning. I was wearing my paddling gear, walking to a taxi stand with my partner, heading for a paddle around the islands south of Singapore. At the time, my right knee was recovering from a partial tear of the quadriceps tendon; I had gone through about six weeks of immobilization and was starting physical therapy to rebuild the knee. Unfortunately that morning, one of the hawker stalls that line the alley to the taxis had washed its floor, causing a very slick soup of animal grease and soapy water to flood into the alley. I slipped on the puddle, and in the process of trying to prevent my fall, my body fully tore the quadriceps tendon while avulsing the patella – in other words, my thigh had activated very quickly to catch my fall, but my knee wasn’t up for it, and instead of bearing the load, the knee broke, and the tissue that connected my quads muscle to my knee also tore.

It’s well documented that trauma imprints itself vividly onto the brain, and I am no exception. I remember the peanut butter sandwich I had in my hand. The hat I was wearing. The shape and color of the puddle I slipped on. The loud “pop” of the knee breaking. The writhing on the floor for several minutes, crying out in pain. The gentlemen who offered to call an ambulance. The feeling of anguish – after six weeks in therapy for the partial tear, now months more of therapy to fix this, if fixable at all. I was looking forward to rebuilding my cardiovascular health, but that plan was definitely off. Then the mental computations about how much travel I’m going to have to cancel, the engagements and opportunities I will miss, the work I will fall behind upon. Not being able to run again. Not being able to make love quite the same way again. The flight of stairs leading to my front door…and finally, my partner, who was there for me, holding my hand, weeping by my side. She has been so incredibly supportive through the whole process, I owe my good health today to her. To this day, my pulse still rises when I walk through the same alley to the taxi. But I do it, because I know I have to face my fears to get over the trauma. My partner is almost always there with me when I walk through that particular alley, and her hand in mine gives me the strength I lack to face that fear. Thank you.

Back to the aspect of pain. Breaking the knee is an acute form of pain. In other words, it happens quickly, and the intensity of the pain drops fairly quickly. The next few days are a blur – initially, the diagnosis is just a broken kneecap, but an MRI revealed I had also torn the tendon. This is highly unusual; usually a chain fails at one link, and this is like two links of a chain failing simultaneously. The double-break complicates the surgery – now I’m visiting surgeons, battling with the insurance company, waiting through a three-day holiday weekend, with the knowledge that I have only a week or two before the tendon pulls back and becomes inoperable. I had previously written about my surgical experience, but here I will recap and reframe some of my experiences on coping with pain.

Pain is a very real thing to the person experiencing it. Those who haven’t felt a similar level of pain to the person suffering from pain can have trouble empathizing. In fact, there was no blood or visible damage to my body when I broke my knee – one could have also possibly concluded I was making it all up. After all, the experience is entirely within my own reality, and not those of the observers. However, I found out that during surgery I was injected with Fentanyl, a potent opioid pain killer, in addition to Propofol, an anesthetic. I asked a surgeon friend of mine why they needed to put opioids in me even though I was unconscious. Apparently, even if am unconscious, the body has autonomous physiological responses to pain, such as increased bleeding, which can complicate surgery, hence the application of Fentanyl. Fentanyl is fast-acting, and wears off quickly – an effect I experienced first-hand. Upon coming out of the operation room, I felt surprisingly good. One might almost say amazing. I shouldn’t have, but that’s how powerful Fentanyl is. I had a six-inch incision cut into me and my kneecap had two holes drilled through it and sutures woven into my quads, and I still felt amazing.

Until about ten minutes later, when the Fentanyl wore out. All of a sudden I’m a mess – I start shivering uncontrollably, I’m feeling enormous amounts of pain coming from my knee; the world goes hazy. I mistake the nurse for my partner. I’m muttering incoherently. Finally, they get me transferred to the recovery bed, and they give me an oral mix of oxycodone and nalaxone. My experience with oxycodone gives me a new appreciation of the lyrics to Pink Floyd’s “Comfortably Numb”:

There is no pain, you are receding
A distant ship smoke on the horizon
You are only coming through in waves
Your lips move but I can’t hear what you’re saying

That’s basically what oxycodone does. Post-op surgical pain is an oppressive cage of spikes wrapping your entire field of view, every where you look is pain…as the oxycodone kicks in, you can still see the spikey cage, but it recedes until it’s a distant ship smoke on the horizon. You can now objectify the pain, almost laugh at it. Everything feels okay, I gently drift to sleep…

And then two hours later, the nalaxone kicks in. Nalaxone is an anti-opioid drug, which is digested more slowly than the oxycodone. The hospital mixes it in to prevent addiction, and that’s very smart of them. I’ve charted portions of my mental physiology throughout my life, and that “feeling okay” sensation is pretty compelling – as reality starts to return, your first might be “Wait! I’m not ready for everything to not be okay! Bring it back!”. It’s not euphoric or fun, but the sensation is addictive – who wouldn’t want everything to be okay, especially when things are decidedly not okay? Nalaxone turns that okay feeling into something more akin to a bad hangover. The pain is no longer a distant ship smoke on the horizon, it’s more something sitting in the same room with you staring you down, but with a solid glass barrier between you and it. Pain no longer consumes your entire reality, but it’s still your bedfellow. So my last memory of the drug isn’t a very fond one, and as a result I don’t have as much of an urge to take more of it.

After about a day and a half in the hospital, I was sent home with another, weaker opioid-based drug called Ultracet, which derives most of its potency from Tramadol. The mechanism is a bit more complicated and my genetic makeup made dosing a bit trickier, so I made a conscious effort to take the drug with discipline to avoid addiction. I definitely needed the pain killers – even the slightest motion of my right leg would result in excruciating pain; I would sometimes wake up at night howling because a dream caused me to twitch my quads muscle. The surgeon had woven sutures into my quads to hold my muscle to the kneecap as the tendon healed, and my quads were decidedly not okay with that. Fortunately, the principle effect of Ultracet, at least for me, is to make me dizzy, sleepy, and pee a lot, so basically I slept off the pain; initially, I was sleeping about 16 hours a day modulo pee breaks.

In about 2-3 days, I was slightly more functional. I was able to at least move to my desk and work for a couple hours a day, and during those hours of consciousness I challenged myself to go as long as I could without taking another dose of Ultracet. This went on for about two weeks, gradually extending my waking hours and taking Ultracet only at night to aid sleep, until I could sleep at night without the assistance of the opioids, at which point I made the pills inconvenient to access, but still available should the pain flare up. One of the most unexpected things I learned in this process is how tiring managing chronic pain can be. Although I had no reason to be so tired – I was getting plenty of sleep, and doing minimal physical activity (maybe just 15-30 minutes of a seated cardio workout every day), I would be exhausted because ignoring chronic pain takes mental effort. It’s a bit like how anyone can lift a couple pounds easily, but if you had to hold up a two-pound weight for hours on end, your arm would get tired after a while.

Finally, after bit over forty years, I now understand why some women on their period take naps. A period is something completely outside of my personal physical experience, yet every partner I’ve loved has had to struggle with it once a month. I’d sometimes ask them to try and explain to me the sensation, so I could develop more empathy toward their experience and thereby be more supportive. However, none of them told me was how exhausting it is to cope with chronic pain, even with the support of mild painkillers. I knew they would sometimes become tired and need a nap, but I had always assumed it was more a metabolic phenomenon related to the energetic expense of supporting the flow of menses. But even without a flow of blood from my knee, just coping with a modest amount of continuous pain for hours a day is simply exhausting. It’s something as a male I couldn’t appreciate until I had gone through this healing process, and I’m thankful now that I have a more intuitive understanding of what roughly half of humanity experiences once a month.

Another thing I learned was that the healing process is fairly indiscriminate. Basically, in response to the trauma, a number of growth and healing factors were recruited to the right knee. This caused everything in the region to grow (including the toe nails and skin around my foot and ankle) and scar over, not just the spots that were broken. My tendon, instead of being a separate tissue that could move freely, had bonded to the tissue around it, meaning immediately after my bone had healed, I couldn’t flex my knee at all. It took months of physiotherapy, massaging, and stretching to break up the tissue to the point where I could move my knee again, and then months more to try and align the new tissue into a functional state. As it was explained to me, I had basically a ball of randomly oriented tissue in the scarring zone, but for the tendons to be strong and flexible, the tissue needs to be stretched and stressed so that its constituent cells can gain the correct orientation.

Which lead to another interesting problem – I now have a knee that is materially different in construction to the knee I had before. Forty plus years of instinct and intuition has to be trained out of me, and on top of that, weeks of a strong mental association of excruciating pain with the activation of certain muscle groups. It makes sense that the body would have an instinct to avoid doing things that cause pain. However, in this case, that response lead to an imbalance in the development of my muscles during recovery. The quads is not just one muscle, it’s four muscles – hence the “quad” in “quadriceps” – and my inner quad felt disproportionately more pain than the outer quad. So during recovery, my outer quad developed very quickly, as my brain had automatically biased my walking gait to rely upon the outer quad. Unfortunately, this leads to a situation where the kneecap is no longer gliding smoothly over the middle groove of the knee; with every step, the kneecap is grinding into the cartilage underneath it, slowly wearing it away. Although it was painless, I could feel a grinding, sometimes snapping sensation in the knee, so I asked my physiotherapist about it. Fortunately, my physiotherapist was able to diagnose the problem and recommend a set of massages and exercises that would first tire out the outer quad and then strengthen the inner quad. After about a month of daily effort I was able to develop the inner quad and my kneecap came back into alignment, moving smoothly with every step.

Fine-tuning the physical imbalances of my body is clockwork compared to the process of overcoming my mental issues. The memory of the trauma plus now incorrect reflexes makes it difficult for me to do some everyday tasks, such as going down stairs and jogging. I no longer have an intuitive sense of where my leg is positioned – lay me on my belly and ask me to move both legs to forty-five degrees, my left leg will go to exactly the right location, and my right leg will be off by a few degrees. Ask me to balance on my right leg, and I’m likely to teeter and fall. Ask me to hop on one foot, and I’m unable to control my landing despite having the strength to execute the hop.

The most frustrating part about this is that continuous exercise doesn’t lead to lasting improvement. The typical pattern is on my first exercise, I’m unstable or weak, but as my brain analyzes the situation it can actively compensate so that by my second or third exercise in a series, I’m appearing functional and balanced. However, once I’m no longer actively focusing to correct for my imbalances, the weaknesses come right back. This mental relapse can happen in a matter of minutes. Thus, many of my colleagues have asked if I’m doing alright when they see me first going down a flight of stairs – the first few steps I’m hobbling as my reflexes take me through the wrong motions, but by the time I reach the bottom I’m looking normal as my brain has finally compensated for the new offsets in my knee.

It’s unclear how long it will be until I’m able to re-train my brain and overcome the mental issues associated with a major injury. I still feel a mild sense of panic when I’m confronted with a wet floor, and it’s a daily struggle to stretch, strengthen, and balance my recovering leg. However, I’m very grateful for the love and support of my partner who has literally been there ever step of the way with me; from holding my hand while I laid on the floor in pain, to staying overnight in the hospital, to weekly physiotherapy sessions, to nightly exercises, she’s been by my side to help me, to encourage me, and to discipline me. Her effort has paid off – to date my body has exceeded the expectations of both the surgeon and the physiotherapist. However, the final boss level is in between my ears, in a space where she can’t be my protector and champion. Over the coming months and years it’ll be up to me to grow past my memories of pain, overcome my mental issues and hopefully regain a more natural range of behaviors.

Although profound pain only comes through tragic experiences, it’s helped me understand myself and other humans in ways I previously could not have imagined. While I don’t wish such experiences on anyone, if you find yourself in an unfortunate situation, my main advice is to pay attention and learn as much as you can from it. Empathy is built on understanding, and by chronicling my experiences coping with pain, it helps with my healing while hopefully promoting greater empathy by enabling others to gain insight into what profound pain is like, without having to go through it themselves.


My right knee, 7-months post-op. Right thigh is much smaller than the left. Still a long way to go…

You Can’t Opt Out of the Patent System. That’s Why Patent Pandas Was Created!

November 30th, 2018

A prevailing notion among open source developers is that “patents are bad for open source”, which means they can be safely ignored by everyone without consequence. Unfortunately, there is no way to opt-out of patents. Even if an entire community has agreed to share ideas and not patent them, there is nothing in practice that stops a troll from outside the community cherry-picking ideas and attempting to patent them. It turns out that patent examiners spend about 12 hours on average to review a patent, which is only enough time to search the existing patent database for prior art. That’s right — they don’t check github, academic journals, or even do a simple Google search for key words.

Once a patent has been granted, even with extensive evidence of prior art, it is an expensive process to challenge it. The asymmetry of the cost to file a patent — around $300 — versus the cost to challenge an improperly granted patent — around $15,000-$20,000 — creates an opportunity for trolls to patent-spam innovative open source ideas, and even if only a fraction of the patent-spam is granted, it’s still profitable to shake down communities for multiple individual settlements that are each somewhat less than the cost to challenge the patent.

Even though in practice open source developers are “in the right” that the publication and sharing of ideas creates prior art, in practice the fact that the community routinely shuns patents means our increasingly valuable ideas are only becoming more vulnerable to trolling. Many efforts have been launched to create prior art archives, but unfortunately, examiners are not required to search them, so in practice these archives offer little to no protection against patent spamming.

The co-founder of Chibitronics, Jie Qi, was a victim of not one but two instances of patent-spam on her circuit sticker invention. In one case, a crowdfunding backer patented her idea, and in another, a large company (Google) attempted to patent her idea after encountering it in a job interview. In response to this, Jie spent a couple years studying patent law and working with law clinics to understand her rights. She’s started a website, Patent Pandas, to share her findings and create a resource for other small-time and open source innovators who are in similar dilemmas.

As Jie’s experience demonstrates, you can’t opt-out of patents. Simply being open is unfortunately not good enough to prevent trolls from patent-spamming your inventions, and copyright licenses like BSD are well, copyright licenses, so they aren’t much help when it comes to patents: copyrights protect the expression of ideas, not the ideas themselves. Only patents can protect functional concepts.

Learn more about patents, your rights, and what you can do about them in a friendly, approachable manner by visiting Patent Pandas!