This course is run at the The University of Nottingham within the School of Computer Science & IT. The course is run by Graham Kendall (EMAIL : firstname.lastname@example.org)
This page gives a brief history of AI. In addition, you should take a look at the reading list for this part of the course.
Although this part of the course is titled History of AI, it is only possible to cover some of the main milestones within AI. However, you should also read the reading list texts as these give a fuller picture (for example, they discuss the reasons for the rise and declines of neural network research - something we will not discuss until we reach that part of the course).
The approach I have adopted is to consider key moments (in my opinion) which have shaped our perception of Artificial Intelligence (AI). We also consider a couple of the philosophical debates within the AI arena.
Before you read on though, take a look at this powerpoint presentation, (see notes on powerpoint presentations within this course - this is the big one) which gives us some insight as to why we need AI techniques, especially when we are looking through large search spaces.
Before we start though we need to define what we understand is meant by AI, or more importantly, we need a working definition for this course.
We could debate what AI is over the course of one or more lectures but at the end of the debate we would probably not have a definition that we would all agree upon. Therefore I will work towards a definition that we will use for this course. Of course, you can disagree with that definition but we need something on which to build.
In AIMA (Russell, 1995), four definitions are given, which are taken from recent AI textbooks (see! - even textbooks cannot agree). These definitions are re-produced here.
Systems that think like humans
Systems that think rationally
Systems that act like humans
Systems that act rationally
We can make the following observations about these definitions
In the past, all four approaches have been followed and, as you might expect, there exists tension between the top and the bottom definitions. As we shall see in a moment some of the AI arguments put forward have been trying to persuade others to accept the top or bottom half as the correct way to view AI.
The Turing Test (which we look at later) was designed to provide a satisfactory operational definition of intelligence. If you believe that a computer that can pass the Turing Test is intelligent then you believe that the definition bottom left is correct.
You may believe that for a computer to exhibit intelligence, then it must think
like a human (top left). But how do humans think?
There are two possible ways we could find this out. The first is to get "inside the mind" and capture thought processes as they go by. At present this is not possible to do but who knows for the future.
A second approach is to conduct psychological experiments.
If we use this definition of AI as our working model then we are straying into cognitive science. Not a bad thing to do but for the purposes of this course we are going to avoid that.
If you believe that a computer can think rationally then you are looking at
the top right definition. This approach is driven by the laws of logic. For
example, given two statements (say "Socrates is a man" and "All
men are mortal") we can imply further knowledge ("Socrates is mortal").
Using logic, we should be able to build systems that allow us take a description of a problem (in logical notation) and find the answer to that problem.
There are two main problems with this approach.
First, it is difficult to take a problem that is presented informally and transform it into the formal terms required by logical notation. This is particularly true when the knowledge you are representing in less than one hundred percent certain.
Secondly, solving even small problems (those with only a few initial facts) can be computationally expensive. So much so that it may become impractical to solve the problem.
Acting rationally (bottom right) means acting so as to achieve one's goals,
given one's beliefs.
Acting rationally, could be considered as acting logically but, as we discussed above, this has limitations. Sometimes, it is also not wise to simply act logically. In these instances something needs to be done and almost anything is better than doing nothing. Consider if you put your hand on an electric ring (one that is glowing red hot). Logically, you might decide the best thing to do is remove the source of the heat. And this will eventually remove the burning sensation, but at what cost.
In this instance you need a reflex action (i.e. move your hand as quickly as possible) - an action that does not require complicated thought processes - do something - almost anything.
During this course we will be taking the "acting rationally" approach. We will not be a slave to it. If you want to develop this theme further then the AIMA book is based around this definition and the development of agents that act rationally. We will simply be using this definition so that we can ignore the cognitive side of AI (except in this introductory lecture) and also so that we do not have to assume that AI is only acting logically.
If you want some other definitions to whet the AI juices then consider these.
"Artificial Intelligence is the part of computer science concerned with
designing computer systems, that is, systems that exhibit the characteristics
we associate with intelligence in human behavior - understanding, language,
learning, reasoning, solving problems and so on
(Barr and Feigenbaum, Handbook of AI)
"Artificial Intelligence is the study of how to make computers do things
at which, at the moment, people are better"
"AI is the study of mental faculties through the use of computational
(Charniak and McDermott)
"AI may be defined as the branch of computer science that is concerned
with the automation of intelligent behavior"
(Luger and Stubblefield)
In this section we pick out a few key aspects of AI history and some of the techniques that the pioneers used. This will give us some appreciation of the way this branch of computer science has evolved and also introduce us to some of the leading figures in the field as the subject evolved.
Newell and Simon (1976) attempted to define intelligence in such a way that it could be replicated on a computer. Much of the work done in AI is based upon their definition. They define a physical system as
A physical system consists of a set of entities, called symbols, which are
physical patterns that can occur as components of another type of entity called
an expression (or symbol structure). Thus, a symbol structure is composed of
a number of instances (or tokens) of symbols related in some physical way (such
as one token being next to another). At any instant of time the system will
contain a collection of these symbol structures.
Besides these structures, the system also contains a collection of processes that operate on expressions to produce other expressions; processes of creation, modification, reproduction and destruction.
A physical symbol system is a machine that produces through time an evolving collection of symbol structures. Such a system exists in a world of objects wider than just these symbolic expressions themselves.
Newell and Simon then state
The Physical Symbol System Hypothesis (PSSS). A physical system has the necessary and sufficient means for general intelligent action.
The PSSS is only an hypothesis. There does not seem to be any way to prove or disprove it using logic. Therefore, we are faced with using empirical evidence and computers allow us to do this relatively easily.
There is a separate page for ELIZA
MYCIN (Shortliffe, 1976) is an AI program that diagnoses infectious diseases.
It is considered as one of the first expert systems as it was attempting to
replicate the doctors knowledge in a computer program.
MYCIN uses backwards reasoning (see below) to work backwards from the patients illness to the possible cause.
To do this it uses rules of the form
"If the organism has a certain characteristic as determined by the lab results, then it is likely that the organism is x."
But, as you can see from the above statement, MYCIN uses the idea of assigning
probabilities to its diagnoses. This is called a certainty factor and it is
an indication as to how much confidence we have in the conclusion of the rule
given the evidence that was presented. There are other complications in that
certainty factors have to be combined to give an overall certainty factor
A typical MYCIN rule looks like this
If: (1) the stain of the organism is gram-positive, and
(2) the morphology of the organism is coccus, and
(3) the growth conformation of the organism is clumps,
there is suggestive evidence (0.7) that the identity of the organism is staphyloccus
These rules are obviously in an English like syntax. The MYCIN program was written in LISP and the above rule would be coded as follows
PREMISE: ($AND (SAME CNTXT GRAM GRAMPOS)
(SAME CNTXT MORPH COCCUS)
(SAME CNTXT CONFORM CLUMPS))
ACTION: (CONCLUDE CNTXT IDENT STAPHYLOCOCCUS TALLY 0.7)
With an expert system such as MYCIN it is difficult to measure its performance
exactly due to the inherent complexity of the problems and the "fuzziness"
that exists in the decision making process.
Probably the best measure is to test the system against human experts using real-world problems.
In 1982 MYCIN took part, with human medical experts in ten selected meningitis cases. In scored higher than all the humans (Buchanan, 1982).
The development of MYCIN is important for several reasons
XCON/R1 is one of the most cited expert systems. It was developed by DEC (Digital
Equipment Corporation) and was a system that ensured the customer was supplied
with all the components and software that was needed to make up the specified
computer system that they had ordered.
This is not as easy as it sounds. Building a bespoke computer system means that every system is different and, as such, needs different components and software. For example, if a customer orders a disc drive then DEC has to ensure that the customer also receives the disc controller and the relevant cables.
As a single system can be built from thousands of separate components, it was not easy for a human to ensure that the customer had everything he/she needed to build the system.
XCON/R1 was an expert system that did this job automatically. It saved DEC millions of dollars a year and raised the profile of expert system to a level that had not been seen before.
Forward and backward chaining is important with regards to AI. The two techniques are described here as you will often come across then in the AI literature. Although we will not explicitly use the terms in this course they are still being used in the techniques we implement.
Assume you are trying to get from Nottingham University to Kendal in the Lake
District. You have two ways of approaching this problem. You can start at Kendal
(in the planning sense) and work your way back to Nottingham. This is known
as backward chaining as you are moving backwards from the goal position to the
initial position. It is also sometimes called a goal driven search.
Of course, in a problem such as this, you would normally forward chain (move from the initial state to the goal state) as it is more logical for you to do it that way. This is sometimes known as a data driven search.
Which search method you use depends on the problem.
You use backward chaining when
You use forward chaining when
In the lecture we looked at a problem of trying to find out if you were a direct descendant of Isaac Newton. We decided that backward chaining was better for this problem given the (unrealistic) assumption that everybody has three children.
Like forward and backward chaining we will not explicitly refer to strong and weak problem solving in this course. But we will be using these techniques and as the terms are used in the AI literature it as well to describe what they are.
Weak problem solving is a technique where we just search for an answer to a
problem without having any knowledge of the problem.
Think again about the problem we described above where we need to get from Nottingham University to Kendal. We'll assume the start and end points of our journey are the railway stations. We could provide the program with a list of towns that have train stations along with a list of towns that it is possible to travel to and from using the train.
A weak problem solving method would simply start at Nottingham (assuming forward chaining) and explore the search space until one of the towns it reached was Kendal. In doing this we might have gone via Portsmouth but as the search has no knowledge of what it is trying to do this is perfectly acceptable.
A strong problem solving technique would have some domain knowledge. For example,
it might explore stations North of Nottingham first as the knowledge we give
the problem is that Kendal is North of Nottingham.
Or we might tell it that the straight line distance between the two towns is 180 miles and any route must not exceed that by a certain percentage.
The benefit of strong problem solving methods is that they are often quicker and can lead to better solutions. The downside is, is that they are problem specific. If we change our destination to London then we need to change the problem as London is South of Nottingham, not North.
GPS (Newell and Simon, 1963) was the first AI program to exploit an idea called Means-End Analysis (see [rich91, chapter 3.6, pages 94-97] and [caw98, chapter 4.3.2, pages 87-90]).
The idea behind Means-End Analysis is that it is often appropriate to use forward and backward chaining within the context of the same problem. Using such an approach we could solve separate parts of the problem and, eventually, bring all the parts together into one solution.
The main idea behind means-end analysis is that it identifies differences between
the initial state and the goal state and then tries to find an operator that
will move from the initial state to the goal state. If an operator cannot be
found then the problem is reduced to sub-problems and these sub-problems are
solved using the same idea.
You can appreciate that means-end analysis can be applied recursively.
Like many other AI techniques, GPS (and means-end analysis) uses a set of rules
but it also uses a difference table.
Consider the following problem. We have a room with a table in it. The table has two objects on top of it. We want to move the table and the objects to another room. Before we can move the table we need to move the two objects. We'll assume that a robot will do the work for us.
The operators available to the robot might be as follows.
|PUSH(obj,loc)||At(robot,obj) ANDLarge(obj) AND Clear(obj) AND ArmEmpty||At(obj,loc) AND At(robot,loc)|
|CARRY(obj,loc)||At(robot,obj) AND Small(obj)||At(obj,loc) AND At(robot,loc)|
|PLACE(obj1,obj2)||At(robot,obj2) AND Holding(obj2)||On(obj1,obj2)|
The difference table might be defined as follows
|Get Object on Object||
|Get Arm Empty||
|Be Holding Object||
Given the operators and the difference table we can now start (or the robot can) to solve the problem.
The main difference between the initial state and the goal state is the location of the desk. To reduce this difference we must move the desk and, looking at the difference table, we can see that we can either PUSH or CARRY the desk.
Assume we choose to carry the desk. We need to check that the pre-conditions of the operator are satisfied. There are two of these. One specifies that the robot is at the desk. The other specifies that the desk is small. We can take steps (pun intended) to ensure that the robot is at the desk but we cannot make the desk "small" (we are assuming it is classified as large) as there is no operator that allows us to reduce the size of the desk.
Therefore, we are forced to look at the PUSH operator.
So, our main goal is to PUSH the desk. But to do this we need to ensure we
satisfy the pre-conditions of the PUSH operator.
You can see that the initial problem has already led to a number of sub-problems and these in turn lead to other sub-problems. But as we resolve each problem we are reducing the overall difference for each problem, until we get ourselves in a position where we can solve the larger problem.
The basic means-end analysis algorithm presented here does have its limitations;
The Turing Test and The Chinese Room are both covered in separate pages.
Last Updated : 11 Sep 2001