A while back I wrote an analysis of fake microSD cards. As a result of the post, I’ve received this question regularly via email:
“I’m trying to buy a thousand microSD cards for my embedded controller project. How do you qualify a microSD card?”
So, I thought it might be helpful to share my answer here.
There’s this awkward phase between the weekend project (where you buy your microSD card from Best Buy for $20 and have a no-questions return policy) and being Nokia (where you buy the same cards for $2 in quantities large enough to actually have leverage over vendors). When you source a few thousand cards at a time on the wholesale spot market, you’re basically on your own to control quality.
As far as process control, some vendors are easier to work with than others. Samsung will bump their part numbers based on die revs or other significant internal changes to the card. Sandisk, on the other hand, uses a very short part number for their cards, so you have no idea if the NAND on the inside is MLC or TLC, etc.; you just know the capacity and the card is simply guaranteed to perform to spec. To wit, Sandisk is very thorough about ensuring they meet the spec. However, it’s the edge cases that usually bite you in production; regardless of the spec, every die/controller combo has some character and your embedded controller may bring out some of that color. And, of course, there’s the fakes — Sandisk is a huge target for fakes, people who want to borrow their good name to sell you a batch of shoddy cards.
If you’re working with a distributor, get a copy of their authorization letter that certifies the relationship with the brand they are selling. It’s easy to fake the certificate, but it’s a good formality to pursue anyways. If you can, get the upstream brand to confirm the distribution relationship.
Aside from these supply-chain side things, here’s a check-list of technical tests to run on your cards:
For each new distributor:
1. I read out the CID and CSD registers and decode them. This is easy to do on linux with a directly connected microSD card. You cannot do this if the card is plugged into a USB adapter — you need to have the card plugged into a direct SD interface. The CID and CSD should look “right” i.e., the manufacturer ID should make sense (unfortunately the manufacture ID codes are all secret, but I can assure you it’s not supposed to be FF or 00), serial numbers should be some big number, date codes correct, etc.
2. Do a “full write” test at least once. i.e., create a random block of data that’s the putative size of the card, and dd it into the card. Then, do an md5sum of the contents of the card. This will identify loopback tricks that fake capacity. This is a relatively common trick that is surprisingly hard to detect, because many cards are only used to less than 50% capacity in real life.
3. Do a reboot test, to understand the behavior of the controller/die combo during ungraceful powerdown. It’s less important on systems that can never have their battery removed.
Before the test, I do a recursive find piped to md5sum to get a full map of all the files in the card. Then, I use a script that writes a random amount of /dev/urandom data in odd-sized blocks (ranging from a couple hundred bytes to a couple megabytes) to the card and then calls sync, in a constant loop after boot. For each block written, the md5sum is recorded. At boot time, all old blocks are checked for md5sum consistency and then deleted. The system under test is automatically power cycled by cutting the AC power about once very 2-3 minutes plus some random interval (depends on how long it takes your device to boot). I cut on the AC power side to capture the effects of the power decay curve of the wall adapter; the logic goes that a clean power down is less likely to cause problems than a gradual powerdown. I run the test on a cohort of at least 2 systems for 2 days straight. If you want to get fancy, you have the system upload its statistics to a server so you can see exactly when it starts to fail. After a couple of days, I extract the card from the system and redo the recursive find with md5sum to verify that no non-critical files have been corrupted that would be difficult to notice without the comprehensive check. Be sure, of course, to ignore files that naturally vary.
I still don’t have a straight answer on why some cards perform better under this test and others fail miserably. Ultimately, however, every card I’ve encountered eventually corrupts the filesystem after enough cycles, it’s just a matter of how long. I feel comfortable if I can reliably get to ten thousand ungraceful reboots-while-writing before failure. Note that supposedly eMMC has design features that harden cards against these problems, but I’ve never had the luxury of building such high volume systems that eMMC becomes an affordable option. Besides, I consider giving users the ability to remove the firmware card and reflash it with new code using a common USB adapter an important feature, at least in the systems I design. Mobile phone carriers would think differently.
Of course, once a vendor is qualified, they can still send you bad lots.
For each new lot I get, I take a few cards and burn them myself and check they boot the system before handing them over the factory. I also manually inspect the CID/CSD to ensure that the manufacturer’s IDs haven’t rotated and I inspect the laser markings to ensure that the lot number changes (it should — if it doesn’t then they are pulling something wonky on you). I also compare the circuit trace pattern on the back, visible through the reliefs in the solder resist coating. If you have easy access to an X-ray machine (some CMs have them on site) you can go so far as to compare the internal construction in the x-ray to see if the dies have been revved. If all these are the same you’re probably good to go on the new lot, but I do pay attention to the failure rate data in the first couple hours of production just to make sure there isn’t something to worry about.
There’s probably a bunch of other tests, techniques and good ideas that I should be aware of…look forward to reading the comments!
[…] MicroSD card FAQ @ bunnie’s blog. A while back I wrote an analysis of fake microSD cards. As a result of the post, I’ve received this question regularly via email: […]
Low voltage test is missing. On battery powered devices some cards get always corrupted when voltage drops.
Been there, done that.
Thanks for taking the time to show us your process. Love the blog, great work.
Agreed on all counts!
Regarding the xray – if you have a friendly dentist, he may let you xray a few uSD cards…
Oh we’ve had plenty of mSD fun :-) — we’ve also seen bad-poweroff-behaviour on some mSD makes and models where they just… die and become unrecoverable. That’s actually out of spec.
Our wear levelling testing scripts (for soldered raw NAND, mSD and eMMC) are at http://wiki.laptop.org/go/NAND_Testing — we take our time with this to ensure our mSD cards can handle the wear.
For basic performance testing, we use flashbench and keep an eye on the 4kb results — current Linux VFS layer reads and writes in 4kb blocks most of the time, so the 4kb row of the resultset is all that matters. Testing of recent batches show that FTL vendors are (at last!) working to make 4kb reads and writes faster. Technical notes at http://wiki.laptop.org/go/SDCard_Testing
Has anyone any evidence that reformatting an SD card can affect wear levelling?
I can understand that using a different cluster size from that set by the manufacturer in a FAT partition will affect performance – but can’t find any proof that wear levelling is implemented in a way that is reliant on the format as written in wikipedia: en.wikipedia.org/wiki/Secure_Digital#File_system
Or do people always use the format provided with the SD card?
The wear-levelling happens at a lower layer that’s invisible to the SD interface exposed to the host — so there’s no way to blow it away by reformatting. At worst, there might be the performance impact you mention, and a little bit of extra wear at format time.
Hello Bunnie,
The software side of your torture test sounds like something that would be eminently open sourcable. Can you consider that please?
Also, most people don’t know this, but on many modern flash cards, the area that contains the FAT is treated differently to the rest of the device. Reformatting the card (or changing the filesystem) isn’t a good idea, because those special properties will be lost. More info here:
http://lwn.net/Articles/428584/
Thanks for Linking to my LWN article, I hope that will be useful to other people who stumble upon this page. I also maintain a list of devices at https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey and a tool that can find out the characteristics of most cards, which is linked from there, and from the laptop.org wiki that Martin Langhoff pointed to above. This database also has a list of the manufacturer ID values you can expect to see from a lot of vendors.
Testing with “./flashbench –open-au –open-au-nr=5 –random –blocksize=4096 –erasesize=-4194304} /dev/mmcblk0” as Martin does generally tells you if a card is any good. E.g. all Kingston cards will have very low results for small block sizes there (a few kb/s), while decent cards like current Samsung models or most Sandisk cards can reach close to the maximum throughput with smaller sizes, too, which is what you need when you have a Linux file system like ext4 (never use ext3 on SD cards please) or btrfs.
Feel free to contact me if you have questions about interpreting flashbench results.
[…] are simply bad ones initially removed from production that are resold afterwards by third parties. Bunnie has an excellent article on fake SD cards and how to test them. For more info on SD cards, check out this flash card […]
A very helpful post which helped me a lot as I have make a project on micro SD cards.This post gave enough material to understand the micro SD cards clearly.
[…] –Theodore Ts’o And here is a link to the information he is referring to: http://www.bunniestudios.com/blog/?p=2297 […]