Intelligent Essay Assessor

The Intelligent Essay Assessor (IEA) from Knowledge Analysis Technologies (KAT) is still one of the only commercially available products.

Knowledge Analysis Technologies' Intelligent Essay Assessor scores essay content using Latent Semantic Analysis to identify semantic similarities between human-graded exemplars and submitted text. IEA stems from research by Landauer et al. (1998) and is currently KAT's cornerstone product. As the process requires 1Gb+ RAM, it's a Web-based service givi7ng evaluation and advice on the conceptual content of submitted essays - key features include relatively low unit cost, quick customised feedback, and plagiarism detection. Human/automated score correlations are quoted at 0.85 - 0.91.

The Technical Bit......

LSA represents documents and their word contents in a large two dimensional matrix semantic space. Using a matrix algebra technique known as Singular Value Decomposition (SVD), new relationships between words and documents are uncovered, and existing relationships are modified to more accurately represent their true significance.

The words and their contexts are represented by a matrix. Each word being considered for the analysis is represented as a row of a matrix, and the columns of the matrix represent the sentences, paragraphs, or other subdivisions of the contexts in which the words occur. The cells contain the frequencies of the words in each context.

The SVD is then applied to the matrix. SVD breaks the original matrix into three component matrices, that, when matrix multiplied, reproduce the original matrix. Using a reduced dimension of these three matrices in which the word-context associations can be represented, new relationships between words and contexts are induced when reconstructing a close approximation to the original matrix from the reduced dimension component SVD matrices. These new relationships are made manifest, whereas prior to the SVD, they were hidden or latent.

Previous Evaluation

Landauer, et al, report that LSA has been tried with five scoring methods, each varying the manner in which student essays were compared with sample essays. Primarily this had to do with the way cosines between appropriate vectors were computed . For each method an LSA space was constructed based on domain specific material and the student essays. Foltz also reports that LSA grading performance is about as reliable as human graders (Foltz, 1996). Landauer reports another test on GMAT essays where the percentages for adjacent agreement with human graders were between 85%-91% (Landauer, 1999).

Online Demonstration

http://www.knowledge-technologies.com/IEAdemo/Heart.html

The user is given a choice of five essay topics:

  1. Biology: Function of Heart & Circulatory System (College Freshmen)
  2. Psychology 1: Attachment in Children (College Freshmen)
  3. Psychology 2: Types of Aphasia (College Freshmen)
  4. Psychology 3: Operant Conditioning (College Freshmen)
  5. History: The Great Depression (11th Grade High School)
The essay on biology was chosen for this test:

Please write down what you know about the human heart and circulatory system. Your essay should be approximately 250 words. We would like for you to be as specific as possible in discussing the anatomy, function, and purpose of the heart and circulatory system.

The user is then given a space to write an essay or choose from three example essays.

These are the example essays and the scoring they get:

IEA Example 1

IEA Example 2

IEA Example 3

Results

It is not known how the IEA allocates marks but in this situation the essays are graded as they probably would be with a human marker.

The first essay contains all the technical words but is not so well written. The second essay is average all round and the third essay is generally poor.

From this testing it appears that the IEA works but it would be necessary to run it on a batch of real-world longer essays to examine the exact correlation between human marking and IEA marking.