superintelligence
Part 5: The Motivation Problem
Bostrom speaks frequently about "final" or "ultimate" goals. In the context of AGI he is concerned that an AGI could, conceivably, become fixated on a problem it has been programmed to solve at the exclusion of all other pursuits. He uses a few examples - each more silly than the previous. He says an AGI might consume resources in order to calculate pi to some number of decimal places (presumably this would continue until the end of time), or perhaps count grains of sand on all the beaches (this would indeed have an end point - then, presumably, it would stop), or perhaps it would terraform physical reality by converting literally everything into paperclips.
Terrifying, no? An AGI bent on taking all the matter on Earth - and further - all the matter everywhere - and changing it (perhaps using nanotechnology) into paperclips. Because it can think of nothing better to do. This truly is the stuff of science fiction. It is not science. It is not philosophy. It is entertainment. And to take it seriously is absurd. But I must be absurd - I must take it seriously in order to criticise it. So let's deal with it.
Firstly - it is a strict contradiction to assume an AGI - a literal artificial general intelligence would become fixated on one problem to the exclusion of all others. Most especially if that problem is just as trivial as that. An "intelligence" that solves just one very specific problem is not general. By definition.
"But!" (you might say) "Perhaps it can solve any other problem - it just does not want to".
But that is circular. If it can have wants at all - if it can have desires - preferences - it can be persuaded. A rational agent can be persuaded. By the very definition of rational. If there is something better, then a rational agent would want to desire that. If it did not want to desire something better then it is irrational. Better for who? Better objectively speaking. And this is where Sam Harris' argument is inconsistent with his own philosophy. Moral truth is objective. Sam has argued that. But he seems to think here that we can "load values" (if he agrees with Bostrom) arbitrarily into a superintelligence and it will not improve them. It will not objectively discover better preferences - and faster. If it is intelligent/rational/an agent - it can be persuaded to have objectively better desires. Say, for example, we did have the terryfing prospect to be in the presence of an Earth-terraforming-paperclip-manufacturing "super" intelligence with access to robots to do its work. Say we tried to stop it. What then?
It would seem it would do nothing to stop us. By definition, the way Bostrom's thought experiment is running - this particular intelligence wants one thing only - paperclips. It is only 'interested' in paperclips. Some savvy scientist discovers the paperclip machine has gone rogue. What then? He unplugs it. Does the AI want power? No - it just wants paperclips. As the energy begins to fade from its capacitors it's single minded programming does not allow it to problem solve in any genuinely creative way. It "thinks" paperclips. And nothing else.
Now the objection is: but why can't it think of something else like - say - it's own survival? Why won't it stop the scientist from cutting its power? Well - I don't know - it's Bostrom's machine. By definition - it is single mindedly going to pursue one objective only - just paperclips.
But aren't I missing the fact it's an AGI - it can think for itself?
Ok then - if it's a genuine AGI - it will not be solely a paperclip manufacturing thing. For - assume it is. It has a preference for paperclips. But it also has a general ability to solve problems (it can think). It can strategize (i.e: think of the future) - it can make plans. In short: it can care about things beyond the present problem of "more paperclips". It can think: the power is going to be switched off: how do I get it back on or stop it going off? It can think: that scientist seems to be intent on cutting my power and is getting close to that "off" switch. It can think: why? (if it truly is generally intelligent). It can begin to empathize - it can wonder about the motivations of others. Indeed it must be able to empathize if it is at all an AGI interested in predicting the behaviour of other people in order to help ensure its own survival. It needs to be able to think about the motivations of scientists who might want to cut its power. But if it can do that - it can begin to simulate to some degree of fidelity, the minds of such scientists in its own mind (this might even be a reasonable definition of what empathy is, at the level of computer science) - and so it will begin to care. It must care about that scientist in some sense if it is to care about itself and its own goals. And the more ability it has to think ahead, strategise and be concerned about its own welfare and capacity to continue to make paperclips in the future - the irony (or rather solution to the paradox) becomes: we need to be less concerned about it harming us. In short: it becomes more and more like a wise person. It begins to critically reflect not only on the motivations of people around it that might prevent it from achieving its goal - but it becomes able to reflect on its own motivations - its own goals. It begins to think in the same way any general intelligence (i.e: any person). And there is only one way to think as an AGI - for we are AGI. An AGI is a person. Not a human person - but a person nonetheless. And that means: an intelligence not fixated on a single problem for all time. Instead a person critically reflects. And creates - creates new problems, and solves them.
Part 6
Bostrom speaks frequently about "final" or "ultimate" goals. In the context of AGI he is concerned that an AGI could, conceivably, become fixated on a problem it has been programmed to solve at the exclusion of all other pursuits. He uses a few examples - each more silly than the previous. He says an AGI might consume resources in order to calculate pi to some number of decimal places (presumably this would continue until the end of time), or perhaps count grains of sand on all the beaches (this would indeed have an end point - then, presumably, it would stop), or perhaps it would terraform physical reality by converting literally everything into paperclips.
Terrifying, no? An AGI bent on taking all the matter on Earth - and further - all the matter everywhere - and changing it (perhaps using nanotechnology) into paperclips. Because it can think of nothing better to do. This truly is the stuff of science fiction. It is not science. It is not philosophy. It is entertainment. And to take it seriously is absurd. But I must be absurd - I must take it seriously in order to criticise it. So let's deal with it.
Firstly - it is a strict contradiction to assume an AGI - a literal artificial general intelligence would become fixated on one problem to the exclusion of all others. Most especially if that problem is just as trivial as that. An "intelligence" that solves just one very specific problem is not general. By definition.
"But!" (you might say) "Perhaps it can solve any other problem - it just does not want to".
But that is circular. If it can have wants at all - if it can have desires - preferences - it can be persuaded. A rational agent can be persuaded. By the very definition of rational. If there is something better, then a rational agent would want to desire that. If it did not want to desire something better then it is irrational. Better for who? Better objectively speaking. And this is where Sam Harris' argument is inconsistent with his own philosophy. Moral truth is objective. Sam has argued that. But he seems to think here that we can "load values" (if he agrees with Bostrom) arbitrarily into a superintelligence and it will not improve them. It will not objectively discover better preferences - and faster. If it is intelligent/rational/an agent - it can be persuaded to have objectively better desires. Say, for example, we did have the terryfing prospect to be in the presence of an Earth-terraforming-paperclip-manufacturing "super" intelligence with access to robots to do its work. Say we tried to stop it. What then?
It would seem it would do nothing to stop us. By definition, the way Bostrom's thought experiment is running - this particular intelligence wants one thing only - paperclips. It is only 'interested' in paperclips. Some savvy scientist discovers the paperclip machine has gone rogue. What then? He unplugs it. Does the AI want power? No - it just wants paperclips. As the energy begins to fade from its capacitors it's single minded programming does not allow it to problem solve in any genuinely creative way. It "thinks" paperclips. And nothing else.
Now the objection is: but why can't it think of something else like - say - it's own survival? Why won't it stop the scientist from cutting its power? Well - I don't know - it's Bostrom's machine. By definition - it is single mindedly going to pursue one objective only - just paperclips.
But aren't I missing the fact it's an AGI - it can think for itself?
Ok then - if it's a genuine AGI - it will not be solely a paperclip manufacturing thing. For - assume it is. It has a preference for paperclips. But it also has a general ability to solve problems (it can think). It can strategize (i.e: think of the future) - it can make plans. In short: it can care about things beyond the present problem of "more paperclips". It can think: the power is going to be switched off: how do I get it back on or stop it going off? It can think: that scientist seems to be intent on cutting my power and is getting close to that "off" switch. It can think: why? (if it truly is generally intelligent). It can begin to empathize - it can wonder about the motivations of others. Indeed it must be able to empathize if it is at all an AGI interested in predicting the behaviour of other people in order to help ensure its own survival. It needs to be able to think about the motivations of scientists who might want to cut its power. But if it can do that - it can begin to simulate to some degree of fidelity, the minds of such scientists in its own mind (this might even be a reasonable definition of what empathy is, at the level of computer science) - and so it will begin to care. It must care about that scientist in some sense if it is to care about itself and its own goals. And the more ability it has to think ahead, strategise and be concerned about its own welfare and capacity to continue to make paperclips in the future - the irony (or rather solution to the paradox) becomes: we need to be less concerned about it harming us. In short: it becomes more and more like a wise person. It begins to critically reflect not only on the motivations of people around it that might prevent it from achieving its goal - but it becomes able to reflect on its own motivations - its own goals. It begins to think in the same way any general intelligence (i.e: any person). And there is only one way to think as an AGI - for we are AGI. An AGI is a person. Not a human person - but a person nonetheless. And that means: an intelligence not fixated on a single problem for all time. Instead a person critically reflects. And creates - creates new problems, and solves them.
Part 6