Few readers know that Edgar had an older brother. Typically going by the name Henry, he was a poet, like his famous sibling, and a hard-drinking sailor. Orphaned and raised apart, the two reunited in adulthood and roomed together in Baltimore, where Henry staggered from alcoholism into an early death at the age of twenty-four, in 1831. Aside from a 1926 collection (“Poe’s Brother”) issued in a thousand copies, Henry’s slender body of work has never been reprinted, and he remains the most obscure corner of Poe studies.
For more than thirty years, Henry’s initials hid those two poems by Edgar, and it was by sheer luck that their true authorship was uncovered. Without the discovery in 1859 of “Tamerlane and Other Poems,” neither “Dreams” nor “The Happiest Day” would be known today as attributable to Edgar. But why had they run under Henry’s name in the North American? Most likely, Edgar was still hiding from creditors, since he’d vanished altogether by enlisting in the Army under an assumed name—as “Henri,” in fact.
But Henry Poe’s North American work also includes pieces of prose. “Monte Video” is a travel letter about Henry’s experiences as a sailor, but the other three—“The Pirate,” “Recollections,” and “A Fragment”—are fiction. The editors of the 1926 reprint of Henry’s work claimed, on no particular evidence, that he based the maritime drama of “The Pirate” on an early love affair of Edgar’s. More perceptively, they ventured that “some of the sentences seem to indicate that Edgar may have written the tale at least in part.” (It helps that the title character announces himself thus: “I am Edgar Leonard!”) “Recollections,” perhaps tellingly, recounts travelling abroad to find a long-lost brother named Leonard. But these hints were pursued no further by the editors in 1926—and one tale to run in North American, “A Fragment,” was pretty well ignored.
Yet it is strangely familiar. At just five hundred and forty-two words, “A Fragment” is a fevered first-person account by a despairing man about to shoot himself: “Heavens! my hand does tremble—No! tis only the flickering of the lamp. … No more—the pistol—I have loaded it—the balls are new—quite bright—they will soon be in my heart—Incomprehensible death—what art thou? …” Influenced by the suspenseful “Blackwood’s tales” of the day, right down to a ludicrous only-a-dream ending, it’s remarkably similar to the mad, insistent narrators of Edgar’s work.
Bear in mind that “Metzengerstein,” an 1832 Gothic tale of a family feud, is considered Edgar’s first fiction. And yet here, five years earlier—in what is ostensibly Henry’s only stint as a short-story writer—are three pieces worthy of an eighteen-year-old Edgar, and evincing elements of his later work. Edgar was, in fact, trying to start a writing career in 1827: the one account of his teen-age attempt to go it alone in Boston noted that his landlady “had no patience with a boarder who sat up nights writing on paper which he could not afterward sell … He then tried literary work, but failed to obtain employment on any of the large journals.” These three stories might be the remains of that failed effort—or they might indeed be by his older brother. So the puzzle becomes not whether the North American stories are Poe’s first published fiction: they are.
But which Poe?
In a past era, any suspicions about Edgar’s authorship of these pieces would be dutifully wrapped in supporting quotes and biographical context—and short of finding other documentation, that would be the end of it. But as J. K. Rowling discovered recently with the unmasking of “The Cuckoo’s Calling,” author attribution is becoming a very different game altogether. “The Professor Who Declared, It’s J.K. Rowling” announced a headline in the aftermath of Rowling’s confession. Patrick Juola is not a lit professor: he teaches computer science at Duquesne University, where he’s developed JGAAP, the Java Graphical Author Attribution Program.The idea behind the software, called stylometry, is an old one: the mathematician Augustus De Morgan proposed in 1851 that authorship might be sussed out through word frequencies, and the early nineteen-sixties saw the use of manual counts to determine authorship of the Federalist Papers—researchers noticed, for instance, that Hamilton used “while” and Madison used “whilst.” Modern stylometrists can deploy programs to seek subtle patterns in how individuals tend to use language—for instance, the recourse to certain chunks of words (“word stems”), as well as clustered “n-grams” of characters, words, and parts of speech. By comparing an unknown text to a group of known texts, the program can then rank the known authors by similarity. “The Cuckoo’s Calling,” for instance, was compared with work by Rowling, Ruth Rendell, P. D. James, and Val McDermid. While Rowling didn’t score No. 1 on every test—McDermid did well on some measures—Rowling had a high rank consistently across different measures. Once confronted, she confessed.
Could it work for Poe, too?
Though JGAAP’s options are bewildering to the newcomer, the basics are simple. You enter the unknown and known texts; then you apply “canonizers” that strip noise like extra spacing and case from the work. Then you choose which language events to look for, and which algorithmic drivers to analyze patterns with. The result is a ranked list, with No. 1 as the closest resemblance to the unknown work. After I prepared a set of texts, I emailed Professor Juola, unsure of which algorithmic drivers to apply. “Probably our best overall analysis method, time-tested and all that, is the Author Centroid Driver,” Professor Juola advised. “You can use it with a lot of different methods and distance functions.”
Any one test isn’t a reliable indication of authorship. But with a likely author, the results across multiple tests start to show a pattern. And with Edgar Allan Poe and six other comparable authors placed in comparison to Henry’s prose works, Juola pointed out, there was a specific set of ranks to look for if Edgar was the likely author: namely, the top three, and particularly the first or second rank. “What you want to see, ideally, is that Edgar comes out as the most likely author every time,” Juola noted. “You won’t see this, but if he comes out as the most likely or second most likely almost every time, it’s still highly likely that it’s him.”
For my analysis of Henry’s collected fiction, I picked as my known texts works by Poe and six contemporaries: “The Last of the Mohicans, ” by James Fenimore Cooper; “Twice-Told Tales,” by Nathaniel Hawthorne; “The Sketch Book of Geoffrey Crayon,” by Washington Irving; “The Quaker City,” by George Lippard; “Idiosyncrasies,” by John Neal; “The Wigwam and the Cabin,” by William Gilmore Simms; and Poe’s “Tales of the Grotesque and Arabesque.”To keep the computer from crashing, I used the first ten thousand words of each work—a good sample size. Simple matters of timing meant that some of these authors couldn’t have created Henry Poe’s prose, but no matter: they were, in the parlance of stylometry, my distracters. If Edgar could consistently rank above his peers, then maybe I had my author. After selecting for word occurrences, word stems, and n-grams for parts of speech, characters, and words, I used Juola’s recommendations for a driver (“Centroid”), for functions (“Cosine,” “Histogram,” and “Manhattan”), for a culler (“Most Common Event”), and then clicked the final screen’s button:PROCESS.
For a moment my computer appeared frozen, and my heart sank. Then came my first screen of results. I scrolled down through fifteen different test outcomes: Edgar, Edgar, Edgar, Edgar. It was a shutout: he’d swept all the No. 1 rankings.
To be fair, I hadn’t included Henry in this round. The one prose work that is biographically specific to him, “Monte Video,” is problematic—at only one thousand two hundred and sixty-three words, it’s not an ideal sample. Still, this time I ran the test again with Henry taking over James Fenimore Cooper’s slot. Edgar’s ranks didn’t budge, and Henry’s did not look especially promising: six, six, six, four, five, three, four, two, two, two, two, three, four, three.
So to the extent that his sample size could be trusted, the alleged author of this work was scoring an average rank of 3.86 out of seven—or about what one would expect from a random distracter.
The results were good for Edgar—so good that I became suspicious. Would JGAAP correctly identify a known Edgar Allan Poe story? I tossed “The Pit and Pendulum” in as the unknown text, and sure enough, Edgar’s “Tales of the Grotesque and Arabesque” sample (which did not include the story) came up as the favorite—he had nine out of fifteen possible No. 1 rankings. Remarkably, this also meant that JGAAP had ranked Edgar higher for authoring Henry’s stories than for one of his own.
If Henry’s stories are really Edgar Allan Poe’s, though, then why did they remain unattributed, even after Edgar would have no longer needed to hide his authorship?
That may be the simplest question to answer. 1827 was a turbulent year for Edgar, who even at the best of times could neglect his manuscripts. The 1841 copy of “The Murders in the Rue Morgue” exists only because a printer’s apprentice saved it from a wastebasket. (It was stillnearly incinerated on three different occasions in the following decades.) Edgar later resorted to borrowing an old copy of the Southern Literary Messenger to retrieve work for “The Raven and Other Poems”; he also had to borrow his own 1829 volume, “Poems,” from a cousin. Tracking down the North American would hardly have been so simple—it collapsed after just twenty-eight issues. For Poe, finding prose juvenilia from 1827 simply may not have been worth his effort.
But are they his works? I found that Edgar still remained on top when run against an entirely different set of distracters. Assembling everyone into a fifteen-author battle royal still couldn’t dislodge Edgar from his perch—along with Henry and the original seven, there was N. P. Willis, Catharine Maria Sedgwick, Robert Montgomery Bird, Lydia Maria Child, Charles Fenno Hoffman, Augustus Longstreet, and John Pendleton Kennedy. But when I changed Edgar’s comparison sample to the next ten-thousand words of “Tales of the Grotesque and Arabesque,” his ranking finally got dented. Even so, he remained the top pick in some runs, and among likely authors over all.
So the results are suggestive—but perhaps they are only that. Sample sizes and the choice of texts can be treacherous things. As Juola himself has pointed out, even his results for J. K. Rowling were ultimately resolved by a living author’s confession. “In the event that we were studying a long-dead author,” he mused, “this is the kind of thing that could and would be argued about in the journals for decades.” In the hands of a novice, maybe stylometric software can’t produce certainties, but it can inspire good questions—not just about attribution, but about the subtle currents of language that run even deeper than subject matter or genre.
As I finished my JGAAP session, I had a bit of fun with it: what if I threw Edgar’s “Tales of the Grotesque and Arabesque” into the hopper as the unknown, and then removed him from known authors? In short, which contemporary would JGAAP pick as its likely author if it didn’t know Edgar existed? I clicked, and my eyebrows went up. A dutiful guess of Irving or Hawthorne, it turns out, would be wrong. But then, so was my own guess of George Lippard or John Neal. The work’s overwhelming affinity was with John Pendleton Kennedy, who judged the 1833 story contest that handed Edgar Allan Poe his first big break—and made him a known author at last.
By Paul Collins via the New Yorker