cs472: Programming Languages: 2.3 Cognitive Science

2.3 Cognitive Science

This subject is about inhuman AI, all the tricks that computers can use to be smart that humans may or may not use.

Just to give the humans’ a little more equality in this subject, today were going to talk about humans and AI. The field of cognitive science is devoted to discovering more about human intelligence using insights from a range of other areas including:

neuro-physiology
philosophy
linguistics
mathematics
cognitive psychology
AI
Neuro-physiology

Human brain cells are very different to computer chips. In your brain, there is:

More distributed processing;
No explicit representation mechanism stored at one bit location (no "grandmother cell" which, if it dies, all your knowledge of granny is lost).
Much more use of parallelism. Throw a pen at the lecturer (just kidding). My hand can jerk out to catch that pen in the time required for just a few neurons to fire. The idea that some sequential algorithm has solved the trajectory problem using matrix mathematics (lots of iterations of row and columns) seem very unlikely.
More use of a single structure (the neuron) used repeatedly (28 billion times)
Not this menagerie of parts:
But one part, repeated many times:

A nerve cell can have up to 1000 dendritic branches, making connections with tens of thousands of other cells. Each of the 10^11 (one hundred billion) neurons has on average 7,000 connections to other neurons.

It has been estimated that the brain of a three-year-old child has about 10^16 synapses (10 quadrillion). This number declines with age, stabilizing by adulthood. Estimates vary for an adult, ranging from 10^15 to 5 x 10^15 synapses (1 to 5 quadrillion).

Just to say the obvious- that’s a BIG network.

Neuro-physiology is a very active field. The latest generation of MRI scanners allow for detailed real-time monitoring of human activity, while they are performing cognitive tasks.

This field shows great promise but, as yet, they are still working on locomotion and pain perception and vision and have yet to rise to the level of model-based reasoning.

The field of neural networks originally began as an experiment in exploiting massive repetition of a single simple structure, running in parallel, to achieve cognition. As the field evolved, it turned more into some curve fitting over non-linear functions (and the tools used to achieve that fit have become less and less likely to have a biological correlate).

For another example of AI research, initially inspired by a biological metaphor, see genetic algorithms.

Linguistics

Noam Chomsky is one the towering figures of the 20th century. He’s a linguistic and a political commentator. Every few years he disappears to re-emerge with a new book the redefines everything. For example, a lot of computer science parsing theory comes from Chomsky’s theory of language grammars.

In AI circles, Chomsky is most famous for his argument that we don’t learn language. Rather, we are born with a general grammar and , as a child grows all up, all they are doing is filling some empty slots referring to the particulars of the local dialect.

This must be so, argues Chomsky otherwise language acquisition would be impossible.

Children are exposed to very little correctly formed language. When people speak, they constantly interrupt themselves, change their minds, make slips of the tongue and so on. Yet children manage to learn their language all the same. This claim is usually referred to as the Argument from Poverty of the Stimulus.
Children do not simply copy the language that they hear around them. They deduce rules from it, which they can then use to produce sentences that they have never heard before. They do not learn a repertoire of phrases and sayings, as the behaviourists believe, but a grammar that generates an infinity of new sentences.

The implications are staggering. Somewhere in the wet-ware of the brain there is something like the grammars we process in computer science. At its most optimistic, this also means that grammar-based languages (like LISP, etc) have what it takes to reproduce human cognition. We’ll return to this below (when we talk about the "physical symbol system hypothesis").

But is there really a "language" of thought? Or is this just an interpretation of chemicals sloshing around the dendrites (under the hood) which we interpret as language.

Well, there is evidence of some model-based manipulation by our wet ware. In classic mental rotation experiments, it was shown that the time required to check if some object was rotation of another was linear proportional to the size of the rotation. It is as if some brain box is reaching out to a sculpture of the thing we are looking at, the turning it around at some fixed rate.

Anyway, if ask a philosopher, "is it really neurons, or are their symbolic models in between our ears?", they might answer who cares?. Whatever stance works best is the right one.

Philosophy: part1 (we love AI)

Daniel Dennett asks a simple questions. Try and beat a chess playing program. What are you going to do?

Assume a physical stance and work out the physics and chemistry of the device? In this stance, you are trying to work out which way the wires will flow through the computer. Good luck with that.
Assume a design stance and reason about the biology or engineering of the device. Look for large functional blocks are reasoning about the next move of the chess playing computer by thinking about the surge suppressors, the coolant system, etc etc. Again, good luck with that.
Assume an intentional stance (the level of software and minds) where we ascribe beliefs, desires, intents/goals to a device, then act accordingly. For example, we might say "the program wants to take my queen and believes that if it offers me a trap, I’ll fall into it".

Which is the right stance? The answer is, it depends. What do you want to do? Stop being short circuited by a loose wire? You want the physical stance? Beat the program at chess? You want the intentional stance.

Bottom line: a computer is not just "a machine". It is a mix of things, some of which are best treated like any other intelligence.

Don’t believe me? Well, pawn to king four and may the best stance win.

(By the way, for a good introduction to AI and philosophy, see The Mind’s I).

Philosophy: Part2 (AI? You crazy?)

I think therefore I am. I don’t think therefore...

There used to be a savage critic by certain philosophers along the lines that AI was impossible. For example, John Searle is a smart guy. His text Speech Acts: An Essay in the Philosophy of Language. 1969 is listed as one of the most cited works on the 20th century.

In one of the most famous critiques of early AI, Searle invented the Chinese Room: an ELIZA-like AI that used simple pattern look ups to react to user utterances. Searle argued that this was nonsense- that such a system could never be said to be "really" intelligent.

Looking back on it all, 27 years later, the whole debate seems wrong-headed. Of course ELIZA was the wrong model for intelligence- no internal model that is refined during interaction, no background knowledge, no inference, no set of beliefs/desires/goals, etc.

Searle’s argument amounts to "not enough- do more". And we did. Searle’s kind of rhetoric (that AI will never work) fails in the face of AI’s many successes.

Here’s some on-line resources on the topic:

Wikipedia’s entry on the Chinese Room.
The original article Brains, and Programs from The Behavioral and Brain Sciences, vol. 3. Copyright 1980
A spirited debate about this idea between Searle and Daniel Dennett. For my money, Searle’s argument grows tired and stale against Dennett’s insights.

And here’s some more general links:

A very old FAQ from comp.ai.philosophy.
Some more recent AI & philosophy notes from Wikipedia.

Mathematics

Godel’s Incompleteness Theorem

There is some mathematical support for Searle’s pessimism. In 1930, the philosophical world was shaken to its foundation in 1930 by a mathematical paper that proved:

For any consistent formal, computable enumerable theory that proves basic arithmetical truths, an arithmetical statement that is true, but not provable in the theory, can be constructed.1 That is, any effectively generated theory capable of expressing elementary arithmetic cannot be both consistent and complete.
For any formal recursively enumerable (i.e. effectively generated) theory T including basic arithmetical truths and also certain truths about formal provability, T includes a statement of its own consistency if and only if T is inconsistent.

That is, formal systems have fundamental limits.

So Godel’s theorem gives us an absolute limit to what can be achieved by "formal systems" (i.e. the kinds of things we can write with a LISP program).

Godel’s theorem might be used to argue against the "logical school" of AI. If formal logics are so limited, then maybe we should ignore them and try other procedural / functional representations instead:

This is a very old debate
The proceduralists lost.
Logic turns out to be a very succinct way to describe an implementation. And in that uniform view, impressive optimizations can be achieved (e.g. Markov Logic).
Meanwhile, the proceduralists were left struggling to patch yet another specific mechanism with one value in one specific domain. There may indeed be a neural net in my that implements any number of procedural kludges, but it has had billions of years to evolve and patch (and patch again) those kludges. The experience of the 1990s is that we can go must further with logical AI than thought possible in the 1970s and 1980s.

BTW, I do not regard Godel’s theorem as a limit to current research. I don’t know the length of my 1000th hair above my right ear but I can still buy a house, write programs, balance my check book, etc. So Godel’s theorem does not make me want to junk my LISP compiler and go off into procedural neural net land.

Cook and NP-Complete

Godel’s theorem is somewhat arcane. He showed that some things were unknowable but he did not say what those things are.

Enter Steve Cook. In 1971, he showed that commonly studied problems (e.g. boolean satisfiability) belong to a class of systems for which the solution takes exponential time to compute.

An army of algorithms researchers have followed Cook’s lead and now there are vast catalogues of commonly studied programs for which there is no known fast (less than exponential time) and complete (no heuristic search) solution.

BTW, Cook showed that there exist algorithms that can’t implement a complete and fast solution to a wide range of problems. But neither can people. And (using stochastic search) we can get pretty good solutions pretty fast, even from pretty big problems.

Psychology (part1)

O.K., so formal systems can never be omniscient, but how good do you have to be to "as smart as humans"?

The answer is, sometimes, not very smart at all. The cognitive psychology literature is full of examples where humans repeatedly reason in characteristic sub-optimal ways (see the wonderful Wikipedia page listing 35 decision-making biases and 28 biases in probability and belief and 20 social biases and 7 memory errors).

AI

In fact, one the early successes of AI was not replicated some human cognitive skills, but also human cognitive failings. In the 1970s, AI researchers adopted the physical symbol system hypothesis:

A physical symbol system has the necessary and sufficient means of general intelligent action.

Here, by physical symbol system they mean

the basic processes that a computer can perform with symbols are to input them into memory, combine and reorganize them into symbol structures, store such structures over time,. . . .compare pairs of symbols for equality or inequality, and "branch" (behave conditionally on the outcome of such tests)

(Note that tacit in this hypothesis is Chomsky’s language of thought and the notion that computers can think like people if they can push around symbols, just like the brain.) Rule-based programs designed around this hypothesis could replicate not just feats of human comprehension, but also human inadequacies in the face of (e.g.) limited short term memory or immature long-term memory (see Expert and Novice Performance in Solving Physics Problems, Science, 1980 Jun 20;208(4450):1335-1342 Larkin J, McDermott J, Simon DP, Simon HA).

Psychology (part2)

The AI work did not come in isolation. The physical symbol system hypothesis, for example, owed much to decades of psychological research. In particular, the cognitive psychology research that evolved as a reaction to behaviorism (from the early part of the 20th century). In its most extreme view, behaviorism denied all internal states and allowed for only the objective study of externally observable behavior (i.e. no mental life, no internal states; thought is covert speech).

Well, that flew for a few decades then it just ran out of steam. After decades of trying to map human behavior into (what seems now) trite stimulus response models, cognitive psychology made the obvious remark that the same input yields different outputs from different people because of their internal models. That is, intelligence is not just a reaction to the world. Rather, it is the careful construction and constant review of a set of internal states of belief which we use to decide how to best act next.

Models of expert and novice behavior

(Reference: Science, 20 June 1980, Vol. 208. no. 4450, pp. 1335 - 1342 "Expert and Novice Performance in Solving Physics Problems", Jill Larkin, John McDermott, Dorothea P. Simon, and Herbert A. Simon. )

Much of the early AI research was informed by a cognitive psychology model of human expertise. The model has two parts:

LTM: A large long term memory (100,000s of patterns)
STM: A (much) smaller short term memory (seven, plus/minus two)

Reasoning was, according to this mode: a match-act cycle:

Content of the STM matches patterns in the LTM
LTM patterns have actions when, when they fire, rewrites the STM contents
Goto 1.

In this model

experts and experts because of the "power" patterns they have laboriously learned and added to their long term memory.
novices are novices because they clog short term memory with irrelevant goals
experts dodge that since their LTM patters tell them

Q: How to experts know what is relevant?

A: feature extractors- gizmos that experts learn to let them glance at a situation and extract the salient details. E.g. chess experts can reproduce form memory all the positions on a chess board But if you show the same experts a gibberish game (one where all the rules are broken- e.g white pawns on white’s back row) then they can’t reproduce the board. Why? Well, when they glance at a board, their feature extractors fire to offer them a succinct summary. Gibberish boards baffle the feature extractors, so no summary

Patch in context

We study AI to learn useful tricks. Sometimes, we also learn something about people. Here’s a radically different, and successful, method to the above. What does it tell us about human cognition?

Ripple-down rules is a maintenance technique for rule-based programs initially developed by Compton. Ripple-down rules are best understood by comparison with standard rule-based programming. In standard rule-based programming, each rule takes the form

 rule ID IF condition THEN action

where condition is some test on the current case. In the 1970s and early 1980s rule-based systems were hailed as the cure to the ills of software and knowledge engineering. Rules are useful, it was claimed, since they: represent high-level logic of the system expressed in a simple form that can be rapidly modified.

However, as these systems grew in size, it became apparent that rules were surprisingly harder to manage. One reason for this was that, in standard rule-based programming, all rules exist in one large global space where any rule can trigger any other rule. Such an open architecture is very flexible. However, it is also difficult to prevent unexpected side-effects after editing a rule. Consequently, the same rule-based programs hailed in early 1980s (XCON) became case studies about how hard it can be to maintain rule-based systems.

Many researchers argued that rule authors must work in precisely constrained environments, lest their edits lead to spaghetti knowledge and a maintenance nightmare. One such constrained environment is ripple-down rules that adds an EXCEPT slot to each rule:

 rule ID1 IF condition THEN EXCEPT rule ID2 THEN conclusion because EXAMPLE

Here, ID1,ID2 are unique identifiers for different rules; EXAMPLE is the case that prompted the creation of the rule (internally, EXAMPLEs conjunction of features); and the condition is some subset of the EXAMPLE, i.e., condition ⊆ EXAMPLE (the method for selecting that subset is discussed below). Rules and their exceptions form a ripple-down rule tree:

At run time, a ripple-down rule interpreter explores the rules and their exceptions. If the condition of rule ID1 is found to be true, the interpreter checks the rule referenced in the except slot. If ID2’s rule condition is false, then the interpreter returns the conclusion of ID1. Else, the interpreter recurses into rule ID2.

(Note the unbalanced nature of the tree, most patches are shallow. This is a common feature of RDRs).

Ripple-down rules can be easier to read than normal rules:

But in practice, Compton advises hiding the tree behind a difference-list editor. Such an editor constrains rule authoring as follows:

Recall that these rules are only ever added in response to some new EXAMPLE. That is, each rule is useful for at least one example.
Hence ripple-down rules never delete old rules; rather they are patched with EXCEPT rules.
These EXCEPT rules cover the special case that confused the parent rule. If the parent rule with condition1 has EXAMPLE1 and the new rule is being created in response to EXAMPLE2, then the new rule’s condition2 must be formed from the features not used in the parent rule and must hold for the new EXAMPLE2; i.e.
```
condition2 = EXAMPLE1 - condition1
condition2 ⊆  EXAMPLE2
```

When users work in a difference list editor, they watch EXAMPLEs running over the ripple-down rules tree, intervening only when they believe that the wrong conclusion is generated. At that point, the editor generates a list of features to add to condition2 (this list is generated automatically from the above equations). The expert picks some items from this list and the patch rule is automatically to some leaf of the ripple-down rules tree.

Ripple-down rules are a very rapid rule maintenance environment. Compton et.al. report average rule edit times between 30 to 120 seconds for rule bases up to 1000 rules in size [Compton 2005]

AFAIK, ripple-down-rules are the current high-water mark in knowledge maintenance.

Q: what does this tell us about human knowledge?

This document was generated on March 1, 2011 using texi2html 5.0.