Lexical FreeNet
Connected thesaurus
Main page
Help
Database info
Technical note
Acknowledgments



TECHNICAL is a synonym of TECHNOLOGICAL triggers CHANGE is more general than CHORD triggers NOTE
The following paper describes the development and intended usage of Lexical FreeNet. It includes many examples of each of the major query types in action.

D. Beeferman. Lexical discovery with an enriched semantic network. In Proceedings of the Workshop on Applications of WordNet in Natural Language Processing Systems, ACL/COLING 1998.
[ Download PDF (194k) ]



USER triggers USERS triggers INTERNET triggers PAGE triggers NOTES

In this section I'll publish insightful or entertaining comments I've received about Lexical FreeNet, and my responses. Hopefully these notes will also shed some light on the issues I thought about while developing the program. (If you send me mail about a query result, please include permission to publish it on this page if you feel others would enjoy it.)

In defense of rhymes

Why the need for "rhyme" since sounds-like is a superset of it? This is a very astute observation, and you're almost right -- sounds like is basically a superset. But not quite. "Sounds like" the way I've implemented it actually means "similar in terms of phonetic transcription", where "similar" uses a distance metric called "edit distance". Not all rhymes are actually similar sounding by this measure; for example, fascination RHY interpretation, but not fascination SIM interpretation. But more importantly, it is useful to separate the two relations because certain queries (e.g. the "rhyme coercion" query) benefit from the distinction

Mystifying triggers

From: Michael Turniansky (turnip@mail.bcpl.lib.md.us) To: dougb+lexfn@cs.cmu.edu Subject: More stuff from the lexical freenet.... Well, some of the trigger relationships still mystify (palmer=> peak => ion)? but anyway, nice job with the GENs and SPECs. According to the book of Psalms, man is little more than a beast, and little less than an angel. According to LFN, he is closer to beast: Man SPEC male SPEC beast while: Man TRIG God SPEC Spiritual_being GEN angel Yes, this is the unfortunate fact about anything derived from data: noise. It doesn't seem possible to truncate the trigger relation at any reasonable place to retain only non-bogus-seeming arcs, because bogus-seeming arcs appear quite early on. The problem is that since these triggers were originally based on 150 million words of news transcripts, they are in some sense specialized to the set of news stories that appeared over the last few years. I don't know the actual reason for the PALMER TRG PEAK TRG ION thing, but if, say, there was a single news story about a guy named Palmer who scaled Mt. Everest, and a single news story about some chemical engineering advance that mentioned peak performance of something related to ions, then these pairs might have appeared sufficiently often to make the cut. Me and my co-workers here have spent a lot of research effort recently trying to derive "quality" triggers in some way, that are more robust to happenstance co-occurrence. My brother mentions that "Laura Palmer" was a major character on Twin Peaks, and that probably explains the first.... And now, your surprising revelation of the day: learn SYN check SYN retard Didn't know learn and retard were so close, did you?

There are no accidents

To: Michael Turniansky (turnip@mail.bcpl.lib.md.us) Subject: Re: Just when you thought you were safe from me.... From: Douglas Beeferman (Douglas_Beeferman@bobo.link.cs.cmu.edu> Date: Wed, 21 Jan 1998 00:08:26 -0500 Hi Mike, Antonyms are up now. I also optimized the database a little and removed duplicate arcs that I'd accidentally been keeping, so it's 4% smaller now (and SPs should be accordingly faster). I have had a lot of fun issuing SP queries with just the synonym relation checked. It really amazes me how many different senses a single lexeme can take on. And far from being accidents of spelling, I think these overloaded words reveal quite a lot about how humanity (or English speakers, anyway) have come to equate certain concepts. I just tried a simple SP query from GOOD to PLENTIFUL and got "GOOD <=> RESPECTABLE <=> SIZABLE <=> AMPLE <=> PLENTIFUL"... Consider that "respectable" has come to mean both "morally GOOD", e.g. "a respectable way to earn a living" and "of SIZABLE quantity", e.g. "that's a respectable portion of tofu!".

Superfluous bovine droppings?

I figured out in the shower Friday that cow can become flop with only rhyme links (cow flop, get it?). Running it through your lexical freenet confirmed it, but there was a strangeness about it: COW -> BOW -> PERNOD -> COD -> FOP -> DROP -> FLOP The "drop" is superfluous, and makes the path one longer than it need be. How did that happen? (incidentally, in my shower version, I used SOP not FOP, but otherwise the path was the same) I can't believe you discovered this in the shower! I have investigated this, and here are my findings. COD->FOP is due to the "cash-on-demand" sense of COD, which rhymes with F.O.P., the meaning of which I'm uncertain. The OP in FOP ("a vain, affected man") is pronounced very slightly differently from the OP in FLOP, according to the pronunciation dictionary from which I derive the phonetic relations: FLOP F L AA1 P FOP F AO1 P The "AA1" vowel is the sound you make when your doctor tells you to open wide and say "AA1". The "AO1" vowel is a kind of "diphthong", a subtle combination of two pure vowel sounds, and is more like the first vowel in "SOFTWARE". Zo, technically, FOP and FLOP don't rhyme, and the search instead went through DROP, which according to the dictionary can be pronounced both ways. At this point you might think, how can I trust the vagaries of a dictionary that would say that pronunciation variants exist for seemingly arbitrary classes of words but not others? Your suspicion would probably justified, and all I can say is that the CMU pron dict. as the fruit of a massive amount of human engineering, is just as fallible as any other massive amount of human engineering. One thing I intend to do soon for both my online rhyming dictionary and for FreeNet is to tweak the definition of "rhyme" slightly so that it allows for very close calls of this nature, to preempt some of the irate letters I get occasionally from would-be songwriters who can't find rhymes with my program that they are certain exist.

Precedence

> When both Rhymes and SIM are allowed, and two words pair both ways, >(i.e. dash and flash) output the rhyme link in preference to the SIM >link. Personal preference, but I think that "rhyme" sounds 'stronger' >than sounds-like. Another good point. It turns out that, contrary to what I say in my documentation, the order in which I input the relations to the program that builds the graph actually does affect the results of the four "template" queries I've implemented on the web page. I agree with your sentiments about rhyme being "stronger", and I've reordered the relations as follows: SYN, TRG, COM, PAR, SPC, GEN, RHY, SIM This ordering affects the efficiency of the search slightly, as well as how "ties" are broken. Currently if two words are joined by two different relations, only the higher-precedence relation will appear in a query result (except for "SHOW" queries, in which everything is shown). Maybe the right thing to do is to output both, but it's a little trickier implementation-wise. Oh, COM and PAR are two more relations that are in the database now, but I haven't made little arrows for them so they're not on the web page. COM is "comprises", and PAR is the inverse, "part of". These are derived from WordNet like SYN, GEN, and SPC. I should have that up on the page later today. I think they're pretty cool. > Allow the user to see ALL paths that are of the smallest length. In >other words, iterate. (too much strain on resources to do this?), so >they can choose what seems best to them. Excellent idea -- a good compromise between showing a single shortest path and showing ALL paths (which would number in the trillions of trillions in some cases, I bet.) You're right that this would be a little more compute-intensive, and in fact at the moment I'm not even sure how I'd implement it. But I'll definitely keep it in mind. > You mentioned that you have been tweaking the SIM distance. Perhaps >allow user control? (or is that impossible, because the database is >static?) Indeed the database currently is blind to the "weight" of an arc after the graph is built. Keeping this info and allowing users to threshold it is certainly possible if I were willing to suck up a few dozen megs more disk space, and if I get more resources I'll do this.

Directionality

Date: Thu, 22 Jan 1998 00:15:39 -0500 From: Michael Turniansky (turnip@mail.bcpl.lib.md.us) To: dougb+lexfn@cs.cmu.edu Doug, you should probably note somewhere on your page that simply switching the two arguments on a page can give a drastically different result. This is because of a) the hierarchical preference given each relationship-type and b) because triggers are one-way relationships with no corresponding reverse operation. (BTW, Al, this solves the problem of the ugly palmer TRG peaks TRG ion SPEC atom, since the reverse gives atom TRG bomb TRG bomber RHY Palmer) Beagle -> Buster looks like this: Beagle TRG Chilean (!) TRG Newman (! Hello, Newman!) RHY human GEN buster Strange that it goes through three people to get to my dog.... --Mike T. The trigger relation, you'll notice, doesn't have an inverse -- I decided not to add this, since I didn't think the inverse (which could be interpreted as "triggers in the backwards direction") provides a lot of marginal usefulness over the existing trigger relation. But it might make sense instead to replace the trigger relation with a "co-triggers" relation, which is either (1) the union of the trigger relation with its inverse, so that it is symmetric, or (2) the symmetric subset of the current trigger relation, i.e. the set of trigger pairs whose inverses are also in the set. Yes, I did notice that (as I mentioned in my last letter, that all the others do (or are their own inverse). I also agree that it would make little sense to add the backwards trigger. But I also felt that changing the trigger to a co-trigger (only keeping those that trigger both ways (your sense 2)) would go a long way to weeding ou some of those bogus relationships of which you speak. BTW, onion Triggers crotch, which I think, from a search on news stories, comes from the O.J. Simpson trial, where he testifies about (Nicole's???) crotch flashed in The Red Onion. Hey, here's a though beagle -> chilean from the HMS Beagle, Charles Darwin's ship that visited the Galapagos, off Chilean waters? Almost all the altavista refs that have both Beagle and Chilean are references to these.... --Mike Turniansky Hi Mike, On modifying the trigger relation: The solution of filtering out the non-symmetric links has the disadvantage that the "meaningful" one-way links would be deleted. In particular, the pairs that are due to close-range contextual concordance in data, such as "WREAK HAVOC", "CONCENTRIC CIRCLES", "ABSOLUTE NONSENSE", would necessarily be axed along with the bogus noise. With the data in the state that I now have it, I can't distinguish these two roles easily. Of course, I could re-run the trigger derivation (which would take a couple weeks) and separate out the pairs that are due to bigram boundness, but here's the solution I have in mind for now. (First, I should mention that the trigger pairs, about 8 million of them, are now sorted by a statistical measure of independence called mutual information and truncated at about N=320,000.) Lower N dramatically. Include all the top N mutually informative triggers, and of the remaining triggers, include those amongst them whose inverses are in the top N. The relation overall would still be assymetric, but cleaner due to the smaller N. Hmmm...

Makes for a great conspiracy theory!

From: Robert Harper (rwh@cs.cmu.edu) To: doug.beeferman@cs.cmu.edu Subject: yet another funny path Date: Wed, 4 Feb 1998 11:40:33 -0500 Try
Clinton to Lewinsky using all available relations. The path is quite amusing. Bob

Copyright © 2003 Datamuse Corporation. All rights reserved.