’I have always believed that any scientific concept can be demonstrated to people with no specialist knowledge or scientific education.’
Richard Feynman, Nobel physicist (1918-1988)

The Temple of Geekdom

I had coffee recently with the head of AI for Samsung in Canada, Darin Graham. (He’s heading up one of the five new AI hubs that Samsung is opening globally; the others are in Silicon Valley, Moscow, Seoul and London.)

Hanging out with a bona fide guru of computer science clarified a lot of things for me. But Darin demanded a steep price for his wisdom: he made me promise to learn a programming language this year, so that the next time we talk I can better fake an understanding of the stuff he’s working on.

Given my chronic inability to say “No” to people, I now find myself chipping away at a certification in a language called Python (via DataCamp.com, which another guru-friend recommended and which is excellent). The basics are quick and easy to learn. But what really sold me on it was the fact that its inventor named it after Monty Python’s Flying Circus. (The alternatives—C, Java, Fortran—all sound like they were invented by people who took themselves far too seriously.)

To keep me company on my pilgrimage to the temple of geekdom, I also picked up a couple of  textbooks on AI programming: Fundamentals of Deep Learning (2017) and Deep Learning (2018). Again, they clarified a lot of things.

(Reading textbooks, by the way, is one of the big secrets to rapid learning. Popular books are written in order to sell copies. But textbooks are written to teach practitioners. So if your objective is to learn a new topic, always start with a good textbook or two. Reading the introduction gives you a better grounding in the topic than reading a hundred news articles, and the remaining chapters take you on a logical tour through (a) what people who actually work in the field think they are doing and (b) how they do it.)

Stage One: Telling Computers How (A.I. As A Cookbook)

Traditional computer programming involves telling the computer how to do something. First do this. Now do that. Humans give explicit instructions; the computer executes them. We write the recipe; the computer cooks it.

Some recipes we give to the computer are conditional: If THIS happens, then do THAT. Back in the 1980s and early 1990s, some sectors of the economy witnessed an AI hype-cycle very similar to the one we’re going through today. Computer scientists suggested that if we added enough if…then decision rules into the recipe, computers would be better than mere cooks; they’d be chefs. Or, in marketing lingo: “expert systems.” After all, how do experts do what they do? The answer (it was thought) was simply: (a) take in information, then (b) apply decision rules in order to c) reach a decision.

It seemed a good job for computers to take over. Computers can ingest a lot more information, a lot faster, than any human can. If we can tell them all the necessary decision rules (if THIS…, then THAT), they’ll be able to make better decisions, faster, than any human expert. Plus, human expertise is scarce. It takes a long time to reproduce—years, sometimes decades, of formal study and practical experience. But machines can be mass manufactured, and their software (i.e., the cookbooks) can be copied in seconds.

Imagine the possibilities! Military and big business did just that, and they invested heavily into building these expert systems. How? By talking to experts and watching experts in order to codify the if THIS…, then THAT recipes they followed. A new age of abundant expertise lay just around the corner.

Or not. Most attempts at the cookbook approach to computerizing expertise failed to live up to the hype. The most valuable output from the “expert system” craze was a fuller understanding, and appreciation, for how experts make decisions.

First, it turned out that expert decision rules are very hard to write down. The first half of any decision rule (if THIS…) assumes that we’ve seen THIS same situation before. But experts are rarely lucky enough to see the same situation twice. Similar situations? Often. But ‘the same’? Rarely. The value of their expertise lies in judging whether the differences they perceive are relevant—and if so, how to modify their decision (i.e., …then THAT) to the novelties of the now.

Second, it turned out that there’s rarely a one-to-one relationship between THIS situation and THAT decision. In most domains of expertise, in most situations, there’s no single “right” answer. There are, instead, many “good” answers. (Give two Michelin-starred chefs the same basket of ingredients to cook a dish, and we’d probably enjoy either one.) We’ll probably never know which is the “best” answer, since “best” depends, not just on past experience, but on future consequences—and future choices—we can’t yet see. (And, of course, on who’s doing the judging.) That’s the human condition. Computers can’t change it for us.

Human expertise proved too rich, and reality proved too complex, to condense into a cookbook.

But the whole venture wasn’t a complete failure. “Expert systems” were rebranded as “decision support systems”. They couldn’t replace human experts, but they could be valuable sous-chefs: by calling up similar cases at the click of a button; by generating a menu of reasonable options for an expert to choose from; by logging lessons learned for future reference.

Stage Two: Training Computers What (From Cooks to Animal Trainers)

Many companies and research labs that had sprung up amidst the “expert system” craze went bust. But the strong survived, and continued their research into the 2000s. Meanwhile, three relentless technological trends transformed the environment in which they worked, year by year: computing power got faster and cheaper; digital connections reached into more places and things; and the production and storage of digital data grew exponentially.

This new environmental condition—abundant data, data storage, and processing power—inspired a new approach to AI research. (It wasn’t actually ‘new’; the concept dated back to at least the 1950s. But the computing technology available then—knobs and dials and vacuum tubes—made the approach impractical.)

What if, instead of telling the computer exactly how to do something, you could simply train it on what to do, and let it figure out the how by itself?

It’s the animal trainer’s approach to AI. Classic stimulus-response. (1) Supply an input. (2) Reward the outputs you want; punish the outputs you don’t. (3) Repeat. Eventually, through consistent feedback from its handler, the animal makes its own decision rule—one that it applies whenever it’s presented with similar inputs. The method is simple but powerful. I can train a dog to sit when it hears the sound, “Sit!”; or point when it sees a bird in the grass; or bark when it smells narcotics. I could never tell the dog how to smell narcotics, because I can’t smell them myself. But I don’t need to. All I need to do is give the dog clear signals, so that it infers the link between its behaviors and the rewards/punishments it receives.

This “Machine Learning” approach has now been used to train systems that can perform all kinds of tricks: to classify an incoming email as “spam” or not; to recognize objects in photographs; to pick out those candidates most likely to succeed in Company X from a tall pile of applicants; or (here’s a robot example) to sort garbage into various piles of glass, plastic and metal. The strength of this approach—training, instead of telling—comes from generalization. Once I’ve trained a dog to detect narcotics, using some well-labelled training examples, it can apply that skill to a wide range of new situations in the real world.

One big weakness of this approach—as any animal or machine trainer will tell you—is that training the behavior you want takes a lot of time and effort. Historically, machine trainers have spent months, years, and sometimes decades of their lives manually converting mountains of raw data into millions of clear labels that machines can learn from—labels like “This is a narcotic” and “This is also a narcotic”, or “This is a glass bottle” and “This is a plastic bottle.”

Computer scientists call this training burden “technical debt.”

I like the term. It’s intuitive. You’d like to buy that mansion, but even if you had enough money for the down-payment, the service charges on the mortgage would cripple you. Researchers and companies look at many Machine Learning projects in much the same light. Machine Learning models look pretty. They promise a whole new world of automation. But you have to be either rich or stupid to saddle yourself with the burden of building and maintaining one.

Another big weakness of the approach is that, to train the machine (or the animal), you need to know in advance the behavior that you want to reward. You can train a dog to run into the bushes and bring back “a bird.” But how would you train a dog to run into the bushes and bring back “something interesting”???

From Animal Trainers to Maze Architects

In 2006, Geoff Hinton (YouTube) and his AI research team at the University of Toronto published a seminal paper on something they called “Deep Belief Networks”. It helped spark a new subfield of Machine Learning called Deep Learning.

If Machine Learning is the computer version of training animals, then Deep Learning is the computer version of sending lab rats through a maze. Getting an animal to display a desired behavior in response to a given stimulus is a big job for the trainer. Getting a rat to run a maze is a lot easier. Granted, designing and building the maze takes a lot of upfront effort. But once that’s done, the lab technician can go home. Just put a piece of cheese at one end and the rat at the other, and the rat trains itself, through trial-and-error, to find a way through.

This “Deep Learning” approach has now been used to produce lab rats (i.e., algorithms) that can run all sorts of mazes. Clever lab technicians built a “maze” out of Van Gogh paintings, and after learning the maze the algorithm could transform any photograph into the style of Van Gogh. A Brooklyn team built a maze out of Shakespeare’s entire catalog of sonnets, and after learning that maze the algorithm could generate personalized poetry in the style of Shakespeare. The deeper the maze, the deeper the relationships that can be mimicked by the rat that runs through it. Google, Apple, Facebook and other tech giants are building very deep mazes out of our image, text, voice and video data. By running through them, the algorithms are learning to mimic the basic contours of human speech, language, vision and reasoning—in more and more cases, well enough that the algorithm can converse, write, see and judge on our behalf.

There are two immediate advantages to the Deep Learning approach—i.e., to unsupervised, trial-and-error rat running, versus supervised, stimulus-response dog training. The obvious one is that it demands less human supervision. The “technical debt” problem is reduced: instead of spending years manually labelling interesting features in the raw data for the machines to train on, the rat can find many interesting features on its own.

The second big advantage is that the lab rat can learn to mimic more complicated, more efficient pathways than a dog trainer may even be aware exists. Even if I could, with a mix of rewards and punishments, train a dog to take the path that I see through the maze, Is it the best path? Is it the only path? What if I myself cannot see any path through the maze? What if I can navigate the maze, but I can’t explain, even to myself, the path I followed to do it? The maze called “human language” is the biggest example of this. As children, we just “pick up” language by being dropped in the middle of it.  

So THAT’S What They’re Doing

No one seems to have offered this “rat in a maze” analogy before. It seems a good one, and an obvious one. (I wonder what my closest AI researcher-friends think of it—Rob, I’m talking to you.) And it helps us to relate intuitively with the central challenge that Deep Learning researchers (i.e., maze architects) grapple with today:

Given a certain kind of data (say, pictures of me and my friends), and given the useful behavior we want the lab rat to mimic (say, classify the pictures according to who’s in them), what kind of maze should we build?

Some design principles are emerging. If the images are full color, then the maze needs to have at least three levels (Red, Green, Blue), so that the rat learns to navigate color dimensions. But if the images are black-and-white, we can collapse those levels of the maze into one.

Similarly, if we’re dealing with data that contains pretty straightforward relationships (say, Column A: historical data on people’s smoking habits and Column B: historical data on how old they were when they died), then a simple, flat maze will suffice to train a rat that can find the simple path from A to B. But if we want to explore complex data for complex relationships (say, Column A: all the online behaviors that Facebook has collected on me to-date and Column B: a list of all possible stories that Facebook could display on my Newsfeed today), then only a multi-level maze will yield a rat that can sniff out the stories in Column B that I’d click on. The relationship between A and B is multi-dimensional, so the maze must be multi-dimensional, too. Otherwise, it won’t contain the path.

We can also relate to other challenges that frustrate today’s maze architects. Sometimes the rat gets stuck in a dead-end. When that happens, we either need to tweak the maze so that it doesn’t get stuck in the first place, or teleport the rat to some random location so that it learns a new part of the maze. Sometimes the rat gets tired and lazy. It finds a small crumb of cheese and happily sits down to eat it, not realizing that the maze contains a giant wheel of cheese—seven levels down. Other times, the rat finds a surprising path through the maze, but it’s not useful to us. For example, this is the rat that’s been trained to correctly identify any photograph taken by me. Remarkable! How on earth can it identify my style…of photography!?! Eventually, we realize that my camera has a distinctive scratch on the lens, which the human eye can’t see but which the rat, running through a pixel-perfect maze, finds every time.

Next Steps

These are the analogies I’m using to think about different strands of “AI” at the moment. When the decision rules (the how) are clear and knowable, we’ve got cooks following recipe books. When the how isn’t clear, but what we want is clear, we’ve got animal trainers training machines to respond correctly to a given input. And when what we want isn’t clear or communicable, we’ve got maze architects who reward lab rats for finding paths to the cheese for us.

In practice, the big AI developments underway use a blend of all three, at different stages of the problem they’re trying to solve.

The next question is: Do these intuitive (and imperfect) analogies help us think more clearly about, and get more involved in, the big questions that this technology forces society to confront.

Our experience with “expert systems” taught us to understand, and appreciate, how human experts make decisions more fully. Will our experience with “artificial intelligence” teach us to understand and appreciate human intelligence more fully?

Even the most hyped-up AI demonstration at the time of writing—Google Duplex—is, essentially, a lab rat that’s learned to run a maze (in this case, the maze of verbal language used to schedule appointments with other people). It can find and repeat paths, even very complicated ones, to success. Is that the same as human intelligence? It relies entirely upon past information to predict future success. Is that the same as human intelligence? It learns in a simplified, artificial representation of reality. Is that the same as human intelligence? It demands that any behavior be converted into an optimization problem, to be expressed in numerical values and solved by math equations. Is that the same as human intelligence?

At least some of the answers to the above must be “No.” Our collective task, as the generations who will integrate these autonomous machines into human society, to do things and make decisions on our behalf, is to discover the significance of these differences.

And to debate them in terms that invite non-specialists into the conversation.