The Microsoft Research Sentence Completion Challenge

The Microsoft Research Sentence Completion Challenge Geoffry Zweig and Christopher J.C. Burges Microsoft Research Technical Report MSR-TR-2011-129 Dec...
Author: Amos Powell
15 downloads 0 Views 75KB Size
The Microsoft Research Sentence Completion Challenge Geoffry Zweig and Christopher J.C. Burges Microsoft Research Technical Report MSR-TR-2011-129 December 8th , 2011

Abstract Work on modeling semantics in text is progressing quickly, yet currently there are few public datasets which authors can use to measure and compare their systems. This work takes a step towards addressing this issue. We present the MSR Sentence Completion Challenge Data, which consists of 1,040 sentences, each of which has four impostor sentences, in which a single (fixed) word in the original sentence has been replaced by an impostor word with similar occurrence statistics. For each sentence the task is then to determine which of the five choices for that word is the correct one. This dataset was constructed from Project Gutenberg data. Seed sentences were selected from five of Sir Arthur Conan Doyle’s Sherlock Holmes novels, and then imposter words were suggested with the aid of a language model trained on over 500 19th century novels. The language model was used to compute 30 alternative words for a given low frequency word in a sentence, and human judges then picked the 4 best impostor words, based on a set of provided guidelines. Although the data presented here will not be changed, this is still a work in progress, and we plan to add similar datasets based on other sources. This technical report is a living document and will be updated appropriately as new datasets are constructed and new results on existing datasets (for example, using human subjects) are reported.

1 Introduction Interest in semantic modeling for text is growing rapidly (see for example [1, 2, 3, 4]). However, currently there are few publicly available large datasets with which researchers can compare results, and those that are available focus on isolated word pairs. For example, WordSimilarity-353 [5] consists of 353 word pairs whose degree of similarity has been determined by human judges. In [6], the authors make available a test set consisting of 950 questions in which the goal is to find the word that is most opposite in meaning to another.

Geoffrey Zweig and Christopher J.C. Burges Microsoft Research, Redmond, WA., e-mail: {gzweig, chris.burges}@microsoft.com

1

2

Authors Suppressed Due to Excessive Length

As a step towards addressing this problem, we present a set of 1,040 English sentences, taken from five novels written by Sir Arthur Conan Doyle. Each sentence has associated with it four impostor sentences, in which a single (fixed) word in the original sentence has been replaced by an impostor word with similar occurrence statistics. For each sentence the task is then to determine which of the five choices for that word is the correct one. The task is thus similar to a language SAT test. Our dataset was constructed from 19th century novel data from Project Gutenberg. We chose to use this source because of the high quality of the English, and also to avoid any copyright issues. We chose to use a single author (Conan Doyle) for the target sentences to give a consistent style of writing. We plan to construct similar datasets in the future to help explore other axes (multiple authors, and modern English, such as is typical in Wikipedia). Our data can be found at http://research.microsoft.com/scc/.

2 The Question Generation Process Question generation was done in two steps. First, a candidate sentence containing an infrequent word was selected, and alternates for that word were automatically determined by sampling with an n-gram language model. The n-gram model used the immediate history as context, thus resulting in words that make “look good” locally, but for which there is no a-priori reason to expect them to make sense globally. In the second step, we eliminated choices which are obviously incorrect because they constitute grammatical errors. Choices requiring semantic knowledge and logical inference were preferred, as described in the guidelines, which we give in section 3. Note that an important desideratum guiding the data generation process was requiring that a researcher who knows exactly how the data was created, including knowing which data was used to train the language model, should nevertheless not be able to use that information to solve the problem. We now describe the data that was used, and then describe the two steps in more detail.

2.1 Data Used Seed sentences were selected from five of Conan Doyle’s Sherlock Holmes novels: The Sign of the Four (1890), The Hound of the Baskervilles (1892), The Adventures of Sherlock Holmes (1892), The Memoirs of Sherlock Holmes (1894), and The Valley of Fear (1915). Once a focus word within the sentence was selected, alternates to that word were generated using a n-gram language model. This model was trained on approximately 540 texts from the Project Gutenberg collection, consisting mainly of 19th century novels. A full list is provided in the Appendix.

The Microsoft Research Sentence Completion Challenge

3

2.2 Automatically Generating Alternates Alternates were generated for every sentence containing an infrequent word. A state-of-the-art classbased maximum entropy n-gram model [7] was used to generate the alternates. The following procedure was used: 1. Select a word with overall frequency less than 10−4. For example, we might select “extraordinary” in “It is really the most extraordinary and inexplicable business.” 2. Use the two-word history immediately preceding the selected focus word to predict alternates. We sampled 150 unique alternates at this stage, requiring that they all have frequency less than 10−4. For example, “the most” predicts “handsome” and “luminous.” 3. If the sentence with the lowest score of the 150 contains the target word, reject the sentence. 4. Else, score each option according to how well it and its immediate predecessor predict the next word. For example, the probability of “and” following “most handsome” might be 0.012. 5. Sort the predicted words according to this score, and retain the top 30 options. The step of rejecting the original sentence if it attained the lowest score of the 150 candidates, helps remove possible bias: since the impostors are chosen by score, it is possible that on average, they will have higher score than the target word. We checked that the remaining bias was small by performing a test using the langauge model, but choosing the lowest scoring candidate as the answer. This gave an accuracy of 26% (as opposed to 31%, found by taking the highest scoring candidate). Thus although there is some remaining bias for the answer to be low scoring, it is small. When a language model other than the precise one used to generate the data is used, the score reversal test yields 17% correct. (Correct polarity giving 39%.) The overall procedure has the effect of providing options which are both well-predicted by the immediate history, and predictive of the immediate future. However, in total it uses just four consecutive words, and cannot be expected to provide globally coherent alternates.

2.3 Human Grooming The human judges (who picked the best four choices of impostor sentences from the automatically generated list of thirty) were given the following instructions: 1. All chosen sentences should be grammatically correct. For example: He dances while he ate his pipe would be illegal. 2. Each correct answer should be unambiguous. In other words, the correct answer should always be a significantly better fit for that sentence than each of the four impostors; it should be possible to write down an explanation as to why the correct answer is the correct answer, that would persuade most reasonable people. 3. Sentences that might cause offense or controversy should be avoided. 4. Ideally the alternatives will require some thought in order to determine the correct answer. For example: • Was she his [ client | musings | discomfiture | choice | opportunity ] , his friend , or his mistress?

4

Authors Suppressed Due to Excessive Length

would constitute a good test sentence. In order to arrive at the correct answer, the student must notice that, while ”musings” and ”discomfiture” are both clearly wrong, the terms friend and mistress both describe people, which therefore makes client a more likely choice than choice or opportunity. 5. Alternatives that require understanding properties of entities that are mentioned in the sentence are desirable. For example: • All red-headed men who are above the age of [ 800 | seven | twenty-one | 1,200 | 60,000 ] years , are eligible. requires that the student realize that a man cannot be seven years old, or 800 or more. However, such example are rare: most often, arriving at the answer will still require thought, but will not require detailed entity knowledge, such as: • That is his [ generous | mother’s | successful | favorite | main ] fault , but on the whole he’s a good worker. 6. We encourage the use of a dictionary, if necessary. 7. A given sentence should only occur once. If more than one target word has been identified for a sentence (i.e. different targets have been identified, in different positions), choose the set of sentences that generates the best challenge, according to the above guidelines. Note that the impostors sometimes constitute a perfectly fine completion, but that in those cases, the correct completion is still clearly identifiable as the most likely completion.

3 Guidelines for Use It is important for users of this data to realize the following: since the test data was taken from five 19th century novels, the test data itself is likely to occur in the index of most Web search engines, and in other large scale datasets that were constructed from web data (for example, the Google N-gram project). For example, entering the string That is his fault , but on the whole he’s a good worker (one of the sentence examples given above, but with the target word removed) into the Bing search engine results in the correct (full) sentence at the top position. It is important to realize that researchers may inadvertently get better results than truly warranted because they have used data that is thus tainted by the test set. To help prevent any such criticism from being leveled at a particular publication, we recommend than in any set of published results, the exact data used for training and validation be specified.

The Microsoft Research Sentence Completion Challenge

5

4 Baseline Results 4.1 A Simple 4-gram model As a sanity check we constructed a very simple N-gram model as follows: given a test sentence (with the position of the target word known), the score for that sentence was initialized to zero, and then incremented by one for each bigram match, by two for each trigram match, and by three for each 4-gram match, where a match means that the N-gram in the test sentence containing the target word occurs at least once in the background data. This simple method achieved 34% correct (compared to 20% by random choice) on the test set.

4.2 Smoothed N-gram model As a somewhat more sophisticated baseline, we use the CMU language modeling toolkit 1 to build a 4-gram language model using Good-Turing smoothing. We kept all bigrams and trigrams occurring in the data, as well as four-grams occurring at least twice. We used a vocabulary of the 126k words that occurred five or more times, and this resulted in a total of 26M N-grams. This improved by 5% absolute on the simple baseline to achieve 39% correct.

4.3 Latent Semantic Analysis Similarity As a final benchmark, we present scores for a novel method based on latent semantic analysis. In this approach, we treated each sentence in the training data as a “document” and performed latent semantic analysis [8] to obtain a 300 dimensional vector representation of each word in the vocabulary. Denoting two words by their vectors x, y, their similarity is defined as the cosine of the angle between them: x·y . sim(x, y) = k x kk y k To decide which option to select, we computed the average similarity to every other word in the sentence, and then output the word with the greatest overall similarity. This results in our best baseline performance, at 49% correct.

4.4 Benchmark Summary Table 1 summarizes our benchmark study. First, for reference, we had an unaffiliated human answer a random subset of 100 questions. Ninety-one percent were answered correctly, showing that scores 1

http://www.speech.cs.cmu.edu/SLM/toolkit.html

6

Authors Suppressed Due to Excessive Length

in the range of 90% are reasonable to expect. Secondly, we tested the performance of the same model (Model M) that was used to generate the data. Because this model output alternates that it assigns high-probability, there is a bias against it, and it scored 31%. Smoothed 3 and 4-gram models built with the CMU toolkit achieved 36 to 39 percent. The simple 4-gram model described earlier did slightly worse (hampered by a lack of smoothing), and the LSA similarity model did best with 49%. As a further check on this data, we have run the same tests on 203 sentence completion questions from a practice SAT exam and achieve similar results (Princeton Review, 11 Practice Tests for the SAT & PSAT, 2011 Edition). To train language models for the SAT question task, we used 1.2 billion words of Los Angeles Times data taken from the years 1985 through 2002. Method Human Generating Model Smoothed 3-gram Smoothed 4-gram Simple 4-gram Average LSA Similarity

% Correct (N=1040) 91 31 36 39 34 49

Table 1 Summary of Benchmarks

These results indicate that the “Holmes” sentence completion set is indeed a challenging problem, with a level of difficulty roughly comparable to that of SAT questions. Simple models based on N-gram statistics do quite poorly, and even a relatively sophisticated semantic-coherence model struggles to beat the 50% mark.

5 Conclusions and Future Work We plan to add a similarly sized dataset based on Wikipedia, and also to present results found by asking human judges (who have only a non-electronic dictionary at hand) to perform the test. These human tests will be done in-house, since using M-Turk raises the problem that it is not clear how to construct the correct incentive (paying by the sentence alone will give poor accuracy, while paying by the correct sentence gives an incentive to taint the results by e.g. using a search engine). The inhouse testing will also enable us to provide additional statistics regarding the judges’ backgrounds, for example, their level of education, and whether or not they are native-born English speakers.

References [1] Y. Bengio, R. Ducharme and P. Vincent. A Neural Probabilistic Language Model. Advances in Neural Information Processing Systems, 2001. [2] R. Collobert and J. Weston and L. Bottou and M. Karlen and K. Kavukcuoglu and P.P. Kuksa. Natural Language Processing (almost) from Scratch. CoRR, http://arxiv.org/abs/1103.0398, 2011. [3] R. Socher and J. Pennington and E.H. Huang and A.Y. Ng and C.D. Manning. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. EMNLP, 2011

The Microsoft Research Sentence Completion Challenge

7

[4] P. Blackburn and J. Bos. Representation and Inference for Natural Language. A First Course in Computational Semantics . CSLI Publications, 1999. [5] Finkelstein, L. and Gabrilovich, Y.M. and Rivlin, E. and Solan, Z. and Wolfman, G. and Ruppin, E. Placing search in context: The concept revisited. ACM TOIS 20(1), 2002. [6] Saif Mohammad, Bonnie Dorr , and Graeme Hirst. Computing Word-Pair Antonymy EMNLP, 2008. [7] Stanley Chen. Shrinking Exponential Language Models. HLT 2009. [8] Deerwester, S. and Dumais, S.T. and Furnas, G.W. and Landauer, T.K. and Harshman, R. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, Vol. 41, 1990.

8

Authors Suppressed Due to Excessive Length

Appendix: Full List of Training Data Tom Swift And His Submarine Boat Tom Swift And His Electric Runabout The Patchwork Girl of Oz by Baum Tik-Tok of Oz by Baum Tom Swift And His Sky Racer The Scarecrow of Oz by Baum Stories from Everybody’s Magazine Rinkitink In Oz by L. Frank Baum The Lost Princess of Oz, by Baum Tom Swift And His Air Glider The Tin Woodman of Oz, by Baum Glinda of Oz, by L. Frank Baum Tom Swift And His Big Tunnel First Book of Adam and Eve, by Platt The Argonautica by Apollonius Rhodius Adventures of Col. Daniel Boone One Basket, by Edna Ferber Volume 1: The History of The Decline and Democracy In America, Volume 1 Don Quixote by Miquel de Cervantes Project Gutenberg Anthology #1, Robin Hood by J. Walker McSpadden 20,000 Leagues Under the Sea Tom Swift And His War Tank Volume 2: The History of The Decline and Democracy In America, Volume 2 To-morrow, by Joseph Conrad The Merry Adventures of Robin Hood Twice Told Tales by Hawthorne The Thirty-Nine Steps Three Men in a Boat by Jerome K. Jerome Volume 3: The History of The Decline and Three Elephant Power and Other Stories Volume 4: The History of The Decline and Volume 5: The History of The Decline and Volume 6: The History of The Decline and House of Seven Gables by Hawthorne Around the World in 80 Days A Child’s History of England Acres of Diamonds, by Conwell Adam Bede, by George Eliot Aesop’s Fables Amy Foster, by Joseph Conrad The Secret Agent, by Joseph Conrad The Age of Innocence by Wharton Agnes Grey, by Anne Bronte A Hero of Our Time, by Lermontov Aladdin and the Lamp Alice Adams, by Booth Tarkington Remember the Alamo by Amelia E. Barr Alexander’s Bridge Alice in Wonderland

Fall of the Roman Empire by Edward Gibbon

Fall of the Roman Empire by Edward Gibbon

Fall of the Roman Empire by Edward Gibbon Fall of the Roman Empire by Edward Gibbon Fall of the Roman Empire by Edward Gibbon Fall of the Roman Empire by Edward Gibbon

The Microsoft Research Sentence Completion Challenge Allan Quatermain, by H. Rider Haggard The Altar of the Dead, by James The Amateur Cracksman by Hornung The Ambassadors, by Henry James American Notes, by Charles Dickens American Notes by Rudyard Kipling Under the Andes by Rex Stout Anne of Green Gables Anne’s House of Dreams Ann Veronica, by H. G. Wells The Autocrat of the Breakfast-Table A Personal Record by Joseph Conrad Arabian Nights, by Andrew Lang The Aeroplane Speaks, by Barber The Aspern Papers by Henry James At the Earth’s Core by Burroughs Anne of Avonlea Awakening & Selected Short Stories Where There’s A Will Arizona Nights by Stewart Edward White Bab:A Sub-Deb, Mary Roberts Rinehart Before Adam by Jack London [badam10] The Red Badge of Courage The Battle of Life by Charles Dickens Black Beauty, by Anna Sewell Bayard Taylor’s Beauty and The Beast Beyond the City, by A. Conan Doyle The Boy Captives The Brother of Daphne, by Dornford Yates Burning Daylight, by Jack London Beasts and Super-Beasts by Saki The $30,000 Bequest, by Twain The Burial of the Guns, by Page In The Bishop’s Carriage Barlaam and Ioasaph The Blue Lagoon: A Romance The Black Arrow, by Stevenson Black Experience in America The Blue Fairy Book Blix, by Frank Norris The Master of Ballantrae by R.L.S. Baby Mine, by Margaret Mayo Boyhood in Norway, by Hjalmar Boyesen The Bride of Lammermoor, by Scott "Buttered Side Down," by Ferber Bunner Sisters by Edith Wharton The Call of the Wild A Christmas Carol. The Cash Boy by Horatio Alger Jr. Reprinted Pieces, by Charles Dickens Travels with a Donkey in the Cevenne Child Christopher by Morris The Certain Hour, James Branch Cabell Charlotte Temple by Susanna Rowson

9

10

Authors Suppressed Due to Excessive Length Robin Hood by J. Walker McSpadden The Circular Staircase The Cook’s Decameron: Clotelle; or The Colored Heroine The Count’s Millions by Emile Gaboriau Confidence by Henry James The Coral Island, R. M. Ballantyne The Castle of Otranto David Copperfield, by Charles Dickens Cranford, by Elizabeth Gaskell Far From The Madding Crowd Catriona, by R. L. Stevenson Cast Upon the Breakers, by Alger The Old Curiosity Shop Daisy Miller, by Henry James Desert Gold, by Zane Grey Dorian Gray The Duchesse de Langeais The Death of the Lion, by James The Island of Doctor Moreau Doctor Dolittle by Hugh Lofting Dombey and Son, by Dickens Dorothy and the Wizard in Oz De Profundis, by Oscar Wilde Dracula, by Bram Stoker Dream Days, by Kenneth Grahame The Mystery of Edwin Drood Driven From Home by Horatio Alger Dsown’s Own Story by Don Marquis Dust by Mr. And Mrs. Haldeman-Julius Four Arthurian Romances by DeTroyes Fredercik Douglass The Damnation of Theron Ware by Frederic DOWN WITH THE CITIES! The Dynamiter by The Stevensons At the Earth’s Core, by Burroughs Edingburgh Picturesque Notes Emma McChesney & Co. by Edna Ferber The Emerald City of Oz Emma, by Jane Austen Enoch Soames, by Max Beerbohm The Enchanted Island of Yew The Tarn of Eternity, by Frank Tymon Falk by Joseph Conrad Fantastic Fables, by Ambrose Bierce Fables, by Robert L. Stevenson Flower Fables, by Louisa May Alcott Five Children and It, by E. Nesbit The Flirt, By Booth Tarkington Fanny Herself, by Edna Ferber Little Lord Fauntleroy The Forged Coupon by Leo Tolstoy Frankenstein by Mary Shelley Frivolous Cupid, by Anthony Hope

The Microsoft Research Sentence Completion Challenge Gene Stratton-Porter’s Freckles The Fortune Hunter by Phillips Father Sergius by Leo Tolstoy The Insidious Dr. Fu Manchu Frances Waldeaux, by Rebecca Davis The Go Ahead Boys and Their Racing Motorboat by Ross Kay The Quest of the Golden Girl The Great God Pan, by Arthur Machen Heroes, by Charles Kingsley Ginx’s Baby, by Edward Jenkins The Golden Age, by Grahame Gulliver’s Travels by Jonathan Swift Green Mansions, by W. H. Hudson The Gods of Mars by Burroughs Greenmantle, by John Buchan Good Indian by B. M. Bower The Golden Road by L. M. Montgomery Tom Grogan by F. Hopkinson Smith Grettir The Strong, author unknown George Silverman’s Explanation Gulliver of Mars by Edwin Arnold Hard Times, by Charles Dickens The Haunted Bookshop by Morley Hans Brinker or The Silver Skates Heart of Darkness, by Conrad The University of Hard Knocks Her Father’s Daughter, by Porter Hell Fer Sartain & Other Stories Huckleberry Finn by Twain/Clemens The High History of the Holy Graal The Haunted Hotel by Collins The History and Practice of the Art of Photography Holiday Romance, by Charles Dickens The Holy War, by John Bunyan Heimskringla, by Snorri Sturlson Hunted Down, by Charles Dickens Haunted Man/Ghost’s Bargain Hunting Sketches by Trollope The Happy Prince and Other Tales A House of Pomegranates, by Wilde Howard Pyle’s Book of Pirates Charlotte Gilman’s Herland Dr. Jekyll and Mr. Hyde Anne of The Island Old Indian Days by Charles Eastman Indian Boyhood, by Charles Eastman Indian Heroes and Great Chieftains Old Indian Legends by Zitkala-Sa The Soul of the Indian by Eastman The Innocence of Father Brown An International Episode Indian Why Stories, by Linderman Life in the Iron-Mills by Davis Island Nights’ Entertainments

11

12

Authors Suppressed Due to Excessive Length Ivanhoe, by Sir Walter Scott A Dream of John Ball--A King’s Lesson John Barleycorn, by Jack London Evergreens, by Jerome K. Jerome Idle Thoughts of an Idle Fellow (1886) The Cost of Kindness Mrs. Korner Sins Her Mercies Passing of the Third Floor Back The Philosopher’s Joke The Soul of Nicholas Snyders The Love of Ulrich Nebendahl The Jungle Book by Kipling Joe The Hotel Boy, by Horatio Alger Jr. A Journal of the Plague Year The Cruise of the Jasper B. Jude the Obscure, by Thomas Hardy The Jungle, by Upton Sinclair Just David, by Eleanor H. Porter A Knight of the Cumberland Kidnapped by R. L. Stevenson A Kidnapped Santa Claus Stories To Tell Children The King’s Jackal, by Davis The Kreutzer Sonata Laddie, by Gene Stratton Porter The Lady, or the Tiger? by Stockton A. V. Laider, by Max Beerbohm The Little Lame Prince by Miss Mulock The Life of Lazarillo of Tormes Jean of the Lazy A, By B. M. Bower The Lost Continent by Lord Arthur Savile’s Crime, etc. Little Dorrit, by Charles Dickens Susan Lenox: Her Rise and Fall Manon Lescaut by the Abbe Prevost "Les Miserables" The Life of Me Life/Adventures of Santa Claus Through the Looking-Glass Looking Backward From 2000 to 1887 Love of Life And Other Stories A Lady’s Life in the Rocky Mountains Life on the Mississippi, by Mark Twain The Lamplighter, by Charles Dickens Almayer’s Folly by Joseph Conrad The Land of Little Rain by Mary Austin The Lost City by Joseph E. Badger Jr Lorna Doone, A Romance of Exmoor Lost Continent by C. J. Cutcliffe Hyne The Lost World by Arthur Conan Doyle The Lost Prince, by Burnett A Little Princess by Burnett Lady Susan, by Jane Austen The Light Princess

The Microsoft Research Sentence Completion Challenge Lazy Tour of Two Idle Apprentices Lavengro, by George Borrow Lemorne Versus Huell Little Women by Louisa May Alcott The Gift of the Magi. The Magic of Oz, by L. Frank Baum Maid Marian by Thomas Love Peacock Malbone: An Oldport Romance Mosses From An Old Manse #4 in our series by Nathaniel Hawthorne Mansfield Park, by Austen Maria, by Mary Wollstonecraft The Market-Place by Harold Frederic Margaret Ogilvy, by J. M. Barrie The Mysterious Affair at Styles The Mayor of Casterbridge McTeague, by Frank Norris Mudfog & Other Sketches, by Dickens The Mad King by Edgar Rice Burroughs Middlemarch by George Eliot Our Mutual Friend, by Charles Dickens My Garden Acquaintance Daisy Miller, by Henry James Maggie, by Stephen Crane A Story of To-day, by Margret Howth Master Humphrey’s Clock Thuvia, Maid of Mars by Burroughs The Man Between, by Amelia E. Barr Main Street, by Sinclair Lewis The Last of the Mohicans Moll Flanders, by Daniel Defoe The Monster Men by Burroughs Jules Verne’s Classic Books Moon and Sixpence by Somerset Maugham Moran of the Lady Letty by Frank Norris The Moon Pool by A. Merritt Master and Man by Leo Tolstoy Merry Men by Robert Louis Stevenson Miss Billie’s Decision Miss Billie Married The Master Key, by L. Frank Baum Mr. Standfast, by John Buchan The Moonstone, by Wilkie Collins My Antonia by Willa Cather Northanger Abbey by Austen New Arabian Nights, by Stevenson Return of the Native by Hardy The Unbearable Bassington, by "Saki" Uncles Josh’s Punkin Centre Stories The Underdogs, by Mariano Azuela The Moon Endureth by Buchan Njal’s Saga by Unknown Icelanders Notes from the Underground #1 in our series by Feodor Dostoevsky Intentions, by Oscar Wilde An Inland Voyage by Stevenson

13

14

Authors Suppressed Due to Excessive Length At the Back of the North Wind The Oakdale Affair, by Burroughs The Octopus, by Frank Norris Oliver Twist by Charles Dickens Out of Time’s Abyss by Burroughs O Pioneers by Willa Cather Orthodoxy by G. K. Chesterton The Outlaw of Torn, by Burroughs Our Nig by Harriet E. Wilson An Outcast of the Islands An Occurrence at Owl Creek Bridge The Marvelous Land of Oz Ozma of Oz, by L. Frank Baum "The Princess Aline" A Pair of Blue Eyes by Thomas Hardy The Princess of Cleves The Purcell Papers, Volume 1 The Purcell Papers, Volume 2 The Purcell Papers, Volume 3 Polly of the Circus by Margaret Mayo Pellucidar by Edgar Rice Burroughs Persuasion by Jane Austen The Adventures of Peter Pan James Pethel, by Max Beerbohm "The Country of the Pointed Firs" Phantastes, by George MacDonald. The Phantom of the Opera" The Phoenix and the Carpet by Nesbit Phil, the Fiddler, by Alger Philosophy 4, by Owen Wister Pictures From Italy, by Dickens Puck of Pook’s Hill PARADISE LOST The Pilgrim’s Progress, by Bunyan Burroughs’ "A Princess of Mars" The Adventures of Pinocchio The People That Time Forgot The Poison Belt by Doyle The Captain of the Polestar Poor and Proud by Oliver Optic Memoirs of Extraordinary Popular The Princess and Curdie Paul Prescott’s Charge, by Alger The Princess and the Goblin Prince Otto, by R. L. Stevenson The Parasite, by Arthur Conan Doyle Prester John, by John Buchan Paul the Peddler, by Alger The Pickwick Papers Raffles by E. W. Hornung Robinson Crusoe [Part 2] Robinson Crusoe, by Daniel Defoe A Book of Remarkable Criminals The Road to Oz, by L. Frank Baum

The Microsoft Research Sentence Completion Challenge To Be Read at Dusk by Charles Dickens The Red Fairy Book Rezanov by Gertrude Atherton The Return of Sherlock Holmes The Redheaded Outfield by Zane Grey Roderick Hudson by Henry James "The Reporter Who Made Himself King" The Roadmender, by Michael Fairless The Errand Boy, by Horatio Alger The Round-Up: A Romance of Arizona Round The Red Lamp, by Doyle Rasselas, Prince of Abyssinia The Rose and the Ring Reminiscences of Tolstoy Barnaby Rudge, by Charles Dickens Running a Thousand Miles for Freedom Rewards and Fairies George Sand by by Rene Doumic Sara Crewe, by Burnett TOM SAWYER ABROAD by MARK TWAIN TOM SAWYER, DETECTIVE by MARK TWAIN Tom Sawyer, by Twain/Clemens Sketches by Boz, by Charles Dickens Sylvie and Bruno, by Lewis Carroll The Song of the Cardinal The Scarlet Pimpernel Sister Carrie by Theodore Dreiser The Scarlet Car, by R. H. Davis The Scarlet Letter, by Hawthorne A Sentimental Journey, by Sterne Sense and Sensibility, by Austen Stories From the Old Attic The Shadow Line, by Joseph Conrad The Rise of Silas Lapham The Silverado Squatters Myths and Legends of the Sioux, The Sisters’ Tragedy The Works of Samuel Johnson Sketches of Young Gentlemen Somebody’s Little Girl Silas Marner by George Eliot The Snow Image #5 in our series by Nathaniel Hawthorne Rebecca Of Sunnybrook Farm Soldiers of Fortune, by Davis Sons and Lovers, by D. H. Lawrence She Stands Accused by Victor MacClure In the South Seas The Secret Sharer by Joseph Conrad Good Stories for Holidays St Ives, by Robert Louis Stevenson The Story of a Pioneer Doyle’s The Stark Munro Letters The Fifth String by J. P. Sousa The Goodness of St. Rocque and Other Stories by Alice Dunbar

15

16

Authors Suppressed Due to Excessive Length A Study In Scarlet, by Doyle Summer by Edith Wharton Sunday Under Three Heads by Dickens Selected Writings of Guy De Maupassant The Return of Tarzan by Burroughs The Beasts of Tarzan by Burroughs The Son of Tarzan Tarzan and the Jewels of Opar. Jungle Tales of Tarzan Tarzan of the Apes by Burroughs The Birds’ Christmas Carol The Bobbsey Twins at School The Bobbsey Twins in the Country The Black Tulip by Alexandre Dumas The Chimes, by Charles Dickens The Contrast by Royall Tyler The Conflict, by David Phillips The Cost, By David Graham Phillips The Cricket on the Hearth The Crossing, by Winston Churchill The Door in the Wall, et. al. The Dust by David Graham Phillips "Terminal Compromise" by Winn Schwartau Tess of the d’Urbervilles End of the Tether, by Joseph Conrad Andrew Steinmetz’s The Gaming Table: Its Votaries and Victims Vol. I Andrew Steinmetz’s The Gaming Table: Its Votaries and Victims Vol. II The Gathering of Brother Hilarius The Harvester, by Gene Stratton Porter The House Behind The Cedars The American by Henry James The Europeans, By Henry James H. G. Wells’ The Time Machine The King of the Golden River Tales and Fantasies The Lesson of the Master by James The Land that Time Forgot Daisy Miller, by Henry James The Monk, by Matthew Lewis The Mountains by Stewart Edward White The Mucker by Edgar Rice Burroughs Tanglewood Tales, by Hawthorne Tom Swift in the Land of Wonders Tono Bungay, by H. G. Wells The Travels of Sir John Mandeville Author Unknown [circa 1500] Tales of Terror and Mystery Tales of the Fish Patrol, by London The Touchstone by Edith Wharton The Price She Paid, by Phillips A Tramp Abroad, by Mark Twain Treasure Island The Red One, by Jack London The Reef, by Edith Wharton Take Me For a Ride

The Microsoft Research Sentence Completion Challenge Tracks of a Rolling Stone, by Coke The Troll Garden and Selected The Romany Rye, by George Borrow Baron Trigault’s Vengeance The Story of the Amulet, by E. Nesbit Tales of Shakespeare This Side of Paradise The Story of the Treasure Seekers Sinking of the Titanic et al The Turn of the Screw The Wouldbegoods, by E. Nesbit Twilight Stories, by Various Authors Tales From Two Hemispheres The White People, by Burnett Two Years in the Forbidden City The Uncommercial Traveller by Dickens "Undo", a novel by Joe Hutsko Uncle Tom’s Cabin Harriet Beecher Stowe Episodes In Van Bibber’s Life Vanity Fair, by William Thackary The Violet Fairy Book The Village Watch-Tower by Wiggin Violists, by Richard McGowan The Voyage Out Walden by Henry David Thoreau H. G. Wells’ War of the Worlds The Ways of Men by Eliot Gregory Weir of Hermiston, by Stevenson Wieland, by Charles Brockden Brown The White Knight: Tirant lo Blanc Early Short Fiction of Edith Wharton Parts 1 and 2 The White Company by Doyle Wild Justice by Ruth M. Sprague The Wizard of Oz The Tenant of Wildfell Hall The Warlord of Mars by Burroughs Ohio by Sherwood Anderson Within the Law by Marvin Dana Wonderful Balloon Ascents; or the The Woodlanders, by Thomas Hardy The War in the Air by H. G. Wells White Fang by Jack London When the Sleeper Wakes, by Wells Wuthering Heights by Emily Bronte Wild Wales by George Borrow The Well at the World’s End by Morris The Woman in White by Wilkie Collins The Wind in the Willows A Connecticut Yankee, by Twain The Yellow Fairy Book Yankee Gypsies, by Whittier Youth, by Joseph Conrad The Prisoner of Zenda, by Anthony Hope

17

Suggest Documents