The Best Laid Plans of Mice…

It’s been a strange, stressful week. I’ll beg your indulgence for a moment here. I still want to consider how robots will be developed and tested but, today I want to go in a slightly different direction. I want to quote from the Hitchhiker’s Guide to the Galaxy.
In Douglas Adams’ story, hyper-dimensional beings, who appear to humans as lab mice, attempted to sort out the “wretched question” of Life, the Universe, and Everything! They developed a computer called Deep Thought which would calculate the answer to the question once and for all. After generations, the computer came up with an answer, “42!” It then suggested that no one actually knew what the question was and that a bigger computer would have to be built. A computer so large, in fact, that it would require organic organisms to be part of the computational matrix. Deep Thought named the new computer “The Earth.” (Adams, 1983, p 171)
The Earth was built and after 2 million years processing, just before it could complete it’s calculations, an alien species destroyed it to make way for a hyper-space by-pass.
The reason I bring this up, besides the fact that I’m a geek and it’s my nature, is that companies do this kind of thing all the time. The company may develop on a shorter time-scale, for instance only worrying about this quarter’s results. If it plans longer term, the further out the resolution is, the more planning has to be done. And the more monitoring the company has to do to make sure the goal is still being reached. In the story for example, had the mice been diligent about making sure the computer program was working, they would have noticed the plan to put in the hyper-space bypass and fought it, at least until their calculations had concluded.
The other thing to notice here is that laws may change while a product is being developed or another team may develop something that is at cross-purposes to the project we are working on. In this case, government planning functionally rezoned an area that the mice were using but didn’t bother to inform the mice. This implies that the risk review must be a constant process in a project, especially if it runs for any length of time.

References
Adams, D. (1979). The Hitchhiker’s Guide to the Galaxy. New York: Random House

It’s a Metaphor!

Imagination-Celebration
A couple years ago, I participated in the Imagination Celebration in downtown Colorado Springs. My team from Colorado Tech had fielded a group of SumoBots to show off STEM activities. A SumoBot is a small cubic robot about 4 inches on a side which is outfitted with a couple IR sensors whose programming drives it to find its opponent, then push it out of a ring drawn on the table. Interesting side note, the robots don’t do too well out in the direct sunlight, for some reason 🙂 A little bit of shade made them work much better!
sumobot
Anyway, the kids came up to watch the robots doing their thing. Since there was nothing to do but watch, the kids didn’t really get involved. I rummaged in my backpack and found a yellow and an orange sticker which I placed on top of the ‘bots. I got the kids to start cheering on the “orange” ‘bot and it won. Encouraged, we continued to cheer the orange bot which won again and again. To the kids, their cheering was helping even though the engineers in the group knew there were no sound sensors on the ‘bot. For the first hour with the new colors, the orange ‘bot won about 95% of its matches, a statistical improbability. The kids were happy, the robots were doing their thing, but the tester in me was suspicious…
This all reminds me of a fairly apocryphal story from the automotive industry (Gupta, 2007). A customer called in to GM soon after purchasing a new vehicle to complain that his vehicle “was allergic” to vanilla ice cream. Puzzled help personal established that the vehicle was not ingesting the ice cream but rather that the when the customer purchased vanilla ice cream, and no other flavors, the car wouldn’t start to take him (and the ice cream) home to his family. The engineers, understandably wrote the guy off as a kook, knowing there was no way for the vehicle to know, much less care, about the ice cream purchase.
The customer continued to call in and complain. Intrigued, the manufacturer sent an engineer out to investigate, and hopefully calm the customer. Interestingly, the engineer was able to confirm the problem. When the customer bought vanilla ice cream, and no other flavors, the vehicle did not start. Knowing about the make-up of the vehicle, the engineer conducted some tests and found that the real problem the vehicle was experiencing was vapor-locking which resolved itself I the customer bought non-vanilla flavors of ice cream because the store had the vanilla up front because they sold so much of it. If you bought a different flavor, you had to walk further into the store and the additional time allowed the vapor-locking to resolve itself.
therapy-seal-robot
Sherry Turkel (2012) at MIT found that senior citizens given a “companion robot” that looked similar to a stuffed seal would talk to it and interact with it as though it were alive. Researchers found that the residents often placed the robot in the bathtub, thinking its needs were similar to a seals. Though it has been debunked by Snopes, the car owner determined that the vehicle “didn’t like” vanilla ice cream. We found similar behavior with the kids and the SumoBots. Cheering the orange one led it to win. Investigation showed the orange robot had the attack software installed where the yellow bot had line following software installed instead. In all these instances, the humans interacted with the machines using a metaphor they understood, other living beings.
The lesson? The snarky answer is that the customer doesn’t know what’s going on and they are trying to describe what they see. They often lack the vocabulary to explain what the machine/software is doing. But they are describing behavior that they believe they see. The engineer needs to pay attention to the clues however. Sometimes the customer does something that seems really reasonable to the customer that the product designer didn’t think of. And sometimes the metaphor just doesn’t stretch to what is being done.

References:
Gupta, N. (October 17, 2007). Vanilla ice cream that puzzled general motors. Retrieved from http://journal.naveeng.com/2007/10/17/vanilla-ice-cream-that-puzzled-general-motors/
Snopes.com (April 11, 2011). Cone of silence. Retrieved from http://www.snopes.com/autos/techno/icecream.asp
Turkle, S. (2012). Alone Together: Why we expect more from technology and less from each other. New York: Basic Books.

Testing, Testing, Planning, Planning…

This week at OR, we’ve been discussing “scenario planning,” a method for visualizing alternatives which allows planners to perform risk management type activities. Wade (2014) says scenario planning asks two fundamental questions “What could the landscapes look like?” and “What trends will have an impact (on us) and how will they develop?”. In Wade’s model, two or more trends are considered, using an “either … or …” framework for each then pairing the various endpoints in a matrix. For example, a researcher might decide that oil price and electrical stability are the two factors that will impact our future plans. The researcher would set two extremes for the “oil price,” say ‘higher than now’ and ‘lower than now,’ and two extremes for the stability, say ‘cheap and plentiful’ and ‘expensive and scarce.’ By combining those four extremes, the researcher could define a set of four scenarios which would allow for planning.
It turns out that much of what’s written about scenario planning is based on financial forecasts. One notable failure of the model involved the Monitor Group, which ironically performed scenario planning for other companies. The problem they faced was akin to a mechanic getting into an accident because his vehicle had faulty brakes. (Hutchinson, 2012) Monitor Group got into trouble when they began to experience negative cash flow. They trimmed the employees by 20% and assumed they would weather the coming storm until the market picked back up. (Cheng, n/d)
It didn’t. They didn’t.
Victor Cheng (ibid) opined that Monitor fell because they spent too much time in denial that their models were not working. Desperate for cash, they contracted with Libyan Dictator Moammar Gadhafi in an attempt to improve his image which ironically hurt theirs. He suggested that Monitor should have “protected its reputation” to be able to borrow during their downturn. They should have monitored (no pun intended) their Pride and Ego which told them they couldn’t be having these problems since they solve them for others. Hutchinson (ibid) also suggested that this is a common problem within organizations, where collectively held beliefs can be spectacularly wrong. And finally Cheng warns, a “strategy” is only as good as the paper it’s written on. Execution of the strategy is hard and needs to be monitored closely to prevent ending up alone and in the weeds.
So why am I writing about this you ask? To me, this scenario planning is at the core of software testing. How do we know which scenarios to test? How do we find interesting combinations of actions? How do we validate that complex behaviors are working in a way that makes sense?
Many of the same tools can be used to define the scenarios for testing the autonomous vehicles. Delphi, affinity grouping, brainstorming are all well known ways of collecting requirements. Each can be used to help define scenarios we would be interested in. Once we have the many thousands of ideas for what a vehicle must do in a specific situation, we can start grouping them together by process or mechanical grouping to find overlaps.
I recently began a project to define the tests that were necessary for a small cash register program I was brought in to test. After playing with the application for a week, I started writing test cases. I came up with nearly 1400 of them before I got to talk to the development team. I showed them my list and told them that I would need their help to prioritize the list since we obviously would not be able to test them all. Their eyes widened. Then they asked me a question that shocked both them and me. How much of this have you already tested while working this week?
I set down my sheet and said, “Before I defined these tests, I felt I had a fairly good grasp of your product. I would have said I was working at about 80% coverage. Now, looking at all these paths, I might have been up to about 10% coverage.”
I believe the automotive testers such as Google have only begun to scratch the surface (no pun intended) of their required testing, even with all the miles they have under their belts.
Next post, we’ll start looking at some of those possible scenarios and we’ll start trying to define a priority for them as well…

References

Cheng, V. (n/d). Monitor group bankruptcy-the downfall. Retrieved from http://www.caseinterview.com/monitor-group-bankruptcy

Hutchinson, A. (November 13, 2012), Monitor group: a failure of scenario planning. Retrieved from http://spendmatters.com/2012/11/13/monitor-group-a-failure-of-scenario-planning/

Roxburgh, C. (November 2009). The use and abuse of scenarios. Retrieved from http://www.mckinsey.com/insights/strategy/the_use_and_abuse_of_scenarios

Wade, W. (May 21, 2014) “Scenario planning” – thinking differently about future innovation (and real-world applications). Retrieved from http://e.globis.jp/article/000363.html

RIP HitchBOT

RIP HitchBOT
I’m not sure what to make of this. Researchers in Port Credit, Ontario created a “robot” based on the Flat Stanley principle and turned it loose about a year ago. Like Flat Stanley, HitchBOT required strangers to pick it up and transport it to a new destination. Like Flat Stanley, people took their pictures with it at interesting events and people wrote about their experiences travelling with it.
HitchBOT travelled more than 6000 miles across Canada, visited Germany and the Netherlands then began it’s journey across the United States with the goal of reaching San Francisco one day. The robot was built to help researchers answer the question “can robots trust humans?” Brigitte Deger-Smylie (Moynihan, 8/4/2015), a project manager for the HitchBOT experiment at Toronto’s Ryerson University says they knew the robot could become damaged and had plans for “family members” to repair it if needed.
The three foot tall, 25 pound robot was a robot in name only as it was literally built out of buckets with pool noodles for arms and legs. Because of privacy concerns, the machine did not have any real-time surveillance abilities. It could however respond to verbal input and take pictures which it could post to it’s social media site along with GPS coordinates. Researchers could not operate the camera remotely.
Starting July 16, HitchBOT travelled through Massachusetts, Connecticut, Rhode Island, New York, and New Jersey. When it reached Philadelphia though, it was decapitated and left on the side of the road on August 1st. Knowing how attached people get to objects which have faces, I wonder how the person who dropped it off last feels, knowing they were the last to interact with it. As a tester, I wonder how HitchBOT responded to being damaged since it had as least rudimentary speech processing abilities.
The researchers say there are several ways of looking at the “decapitation.” One way is to think, “of course this happened, people are awful.” Another way is to think, “Americans are obnoxious, so of course it happened here.” Or worse, “of course this happened in Philly, where fans once lashed out at Santa Claus.” The project team suggests that the problem is an isolated incident involving “one jerk” and that we should concentrate on the distance the machine got and the number of “bucket list” items it was able to complete before it was destroyed. Deger-Smylie (ibid) says the team learned a lot about how humans interact with robots in non-restricted, non-observed ways which were “overwhelmingly positive.”
This makes me wonder. Will it be a “hate crime” to destroy robots one day? Will protesters picket offices where chips are transplanted, effectively changing the robot from one “being” to another? If the robot hurts a human, will they all be recalled for retraining? Where do we draw the line between what is “alive” and what is not? Does that question even mean anything? Sherry Turkel at MIT is researching how “alive” robots have to be to be a companion. I read a fascinating scifi novel back in the day called “Flight of the Dragonfly.” In it, a starship had a variety of AI personalities to help the crew maintain their sanity. The ship was damaged at one point and the crew had to abandon it, but was afraid to leave the injured ship on its own to die. The ship reminded the crew that the devices they used were all extensions of itself and that the different voices it used were just fictions to help the humans interact with it. How are these “fictions” going to play out with people who already name their cars?
In the meantime, the Hacktory, a Philly-based art collective is taking donations to rebuild the HitchBOT and send it back on the road.

References:
Moyniham, T. (August 4, 2015) Parents of the decapitated HitchBOT say he will live on. Retrieved from http://www.wired.com/2015/08/parents-decapitated-hitchbot-say-will-live/