搜索
您的当前位置:首页正文

Similarity of Semantic Relations

来源:榕意旅游网
SimilarityofSemanticRelations

PeterD.Turney∗

NationalResearchCouncilCanada

arXiv:cs/0608100v1 [cs.CL] 25 Aug 2006Thereareatleasttwokindsofsimilarity.Relationalsimilarityiscorrespondencebetweenre-lations,incontrastwithattributionalsimilarity,whichiscorrespondencebetweenattributes.Whentwowordshaveahighdegreeofattributionalsimilarity,wecallthemsynonyms.Whentwopairsofwordshaveahighdegreeofrelationalsimilarity,wesaythattheirrelationsareanal-ogous.Forexample,thewordpairmason:stoneisanalogoustothepaircarpenter:wood.ThispaperintroducesLatentRelationalAnalysis(LRA),amethodformeasuringrelationalsimilar-ity.LRAhaspotentialapplicationsinmanyareas,includinginformationextraction,wordsensedisambiguation,andinformationretrieval.RecentlytheVectorSpaceModel(VSM)ofinforma-tionretrievalhasbeenadaptedtomeasuringrelationalsimilarity,achievingascoreof47%onacollectionof374college-levelmultiple-choicewordanalogyquestions.IntheVSMapproach,therelationbetweenapairofwordsischaracterizedbyavectoroffrequenciesofpredefinedpatternsinalargecorpus.LRAextendstheVSMapproachinthreeways:(1)thepatternsarederivedautomaticallyfromthecorpus,(2)theSingularValueDecomposition(SVD)isusedtosmooththefrequencydata,and(3)automaticallygeneratedsynonymsareusedtoexplorevariationsofthewordpairs.LRAachieves56%onthe374analogyquestions,statisticallyequivalenttotheaveragehumanscoreof57%.Ontherelatedproblemofclassifyingsemanticrelations,LRAachievessimilargainsovertheVSM.1.Introduction

Thereareatleasttwokindsofsimilarity.Attributionalsimilarityiscorrespondencebe-tweenattributesandrelationalsimilarityiscorrespondencebetweenrelations(Medin,Goldstone,andGentner,1990).Whentwowordshaveahighdegreeofattributionalsimilarity,wecallthemsynonyms.Whentwowordpairshaveahighdegreeofrelationalsimilarity,wesaytheyareanalo-gous.

VerbalanalogiesareoftenwrittenintheformA:B::C:D,meaningAistoBasCistoD;forexample,traffic:street::water:riverbed.Trafficflowsoverastreet;waterflowsoverariverbed.Astreetcarriestraffic;ariverbedcarrieswater.Thereisahighde-greeofrelationalsimilaritybetweenthewordpairtraffic:streetandthewordpairwa-ter:riverbed.Infact,thisanalogyisthebasisofseveralmathematicaltheoriesoftrafficflow(Daganzo,1994).

InSection2,welookmorecloselyattheconnectionsbetweenattributionalandre-lationalsimilarity.Inanalogiessuchasmason:stone::carpenter:wood,itseemsthatre-lationalsimilaritycanbereducedtoattributionalsimilarity,sincemasonandcarpenterareattributionallysimilar,asarestoneandwood.Ingeneral,thisreductionfails.Con-sidertheanalogytraffic:street::water:riverbed.Trafficandwaterarenotattributionallysimilar.Streetandriverbedareonlymoderatelyattributionallysimilar.

ComputationalLinguisticsVolume1,Number1

Manyalgorithmshavebeenproposedformeasuringtheattributionalsimilaritybe-tweentwowords(Lesk,1969;Resnik,1995;LandauerandDumais,1997;JiangandConrath,1997;Lin,1998b;Turney,2001;BudanitskyandHirst,2001;BanerjeeandPedersen,2003).Mea-suresofattributionalsimilarityhavebeenstudiedextensively,duetotheirapplicationsinproblemssuchasrecognizingsynonyms(LandauerandDumais,1997),informationretrieval(Deerwesteretal.,1990),determiningsemanticorientation(Turney,2002),grad-ingstudentessays(Rehderetal.,1998),measuringtextualcohesion(MorrisandHirst,1991),andwordsensedisambiguation(Lesk,1986).

Ontheotherhand,sincemeasuresofrelationalsimilarityarenotaswelldevelopedasmeasuresofattributionalsimilarity,thepotentialapplicationsofrelationalsimilarityarenotaswellknown.Manyproblemsthatinvolvesemanticrelationswouldbenefitfromanalgorithmformeasuringrelationalsimilarity.Wediscussrelatedproblemsinnaturallanguageprocessing,informationretrieval,andinformationextractioninmoredetailinSection3.

ThispaperbuildsontheVectorSpaceModel(VSM)ofinformationretrieval.Givenaquery,asearchengineproducesarankedlistofdocuments.Thedocumentsarerankedinorderofdecreasingattributionalsimilaritybetweenthequeryandeachdoc-ument.AlmostallmodernsearchenginesmeasureattributionalsimilarityusingtheVSM(Baeza-YatesandRibeiro-Neto,1999).TurneyandLittman(2005)adapttheVSMapproachtomeasuringrelationalsimilarity.Theyusedavectoroffrequenciesofpat-ternsinacorpustorepresenttherelationbetweenapairofwords.Section4presentstheVSMapproachtomeasuringsimilarity.

InSection5,wepresentanalgorithmformeasuringrelationalsimilarity,whichwecallLatentRelationalAnalysis(LRA).Thealgorithmlearnsfromalargecorpusofunlabeled,unstructuredtext,withoutsupervision.LRAextendstheVSMapproachofTurneyandLittman(2005)inthreeways:(1)Theconnectingpatternsarederivedautomaticallyfromthecorpus,insteadofusingafixedsetofpatterns.(2)SingularValueDecomposition(SVD)isusedtosmooththefrequencydata.(3)Givenawordpairsuchastraffic:street,LRAconsiderstransformationsofthewordpair,generatedbyreplacingoneofthewordsbysynonyms,suchastraffic:road,traffic:highway.

Section6presentsourexperimentalevaluationofLRAwithacollectionof374multiple-choicewordanalogyquestionsfromtheSATcollegeentranceexam.1Anex-ampleofatypicalSATquestionappearsinTable1.Intheeducationaltestingliterature,thefirstpair(mason:stone)iscalledthestemoftheanalogy.Thecorrectchoiceiscalledthesolutionandtheincorrectchoicesaredistractors.WeevaluateLRAbytestingitsabil-itytoselectthesolutionandavoidthedistractors.Theaverageperformanceofcollege-boundseniorhighschoolstudentsonverbalSATquestionscorrespondstoanaccuracyofabout57%.LRAachievesanaccuracyofabout56%.Onthesesamequestions,theVSMattained47%.

Oneapplicationforrelationalsimilarityisclassifyingsemanticrelationsinnoun-modifierpairs(TurneyandLittman,2005).InSection7,weevaluatetheperformanceofLRAwithasetof600noun-modifierpairsfromNastaseandSzpakowicz(2003).Theproblemistoclassifyanoun-modifierpair,suchas“laserprinter”,accordingtothesemanticrelationbetweentheheadnoun(printer)andthemodifier(laser).The600pairshavebeenmanuallylabeledwith30classesofsemanticrelations.Forexample,“laserprinter”isclassifiedasinstrument;theprinterusesthelaserasaninstrumentfor

SimilarityofSemanticRelationsTurney

Stem:mason:stone

Solution:(b)carpenter:wood

Y).

Theamountofattributionalsimilaritybetweentwowords,AandB,dependsonthedegreeofcorrespondencebetweenthepropertiesofAandB.Ameasureofattributionalsimilarityisafunctionthatmapstwowords,AandB,toarealnumber,sima(A,B)∈ℜ.ThemorecorrespondencethereisbetweenthepropertiesofAandB,thegreatertheirattributionalsimilarity.Forexample,dogandwolfhavearelativelyhighdegreeofattributionalsimilarity.

Theamountofrelationalsimilaritybetweentwopairsofwords,A:BandC:D,de-pendsonthedegreeofcorrespondencebetweentherelationsbetweenAandBandtherelationsbetweenCandD.Ameasureofrelationalsimilarityisafunctionthatmapstwopairs,A:BandC:D,toarealnumber,simr(A:B,C:D)∈ℜ.Themorecor-respondencethereisbetweentherelationsofA:BandC:D,thegreatertheirrelationalsimilarity.Forexample,dog:barkandcat:meowhavearelativelyhighdegreeofrelational

THAN(X,

3

ComputationalLinguisticsVolume1,Number1

similarity.

Cognitivescientistsdistinguishwordsthataresemanticallyassociated(bee–honey)fromwordsthataresemanticallysimilar(deer–pony),althoughtheyrecognizethatsomewordsarebothassociatedandsimilar(doctor–nurse)(Chiarelloetal.,1990).Bothofthesearetypesofattributionalsimilarity,sincetheyarebasedoncorrespondencebe-tweenattributes(e.g.,beesandhoneyarebothfoundinhives;deerandponiesarebothmammals).

BudanitskyandHirst(2001)describesemanticrelatednessasfollows:

Recentresearchonthetopicincomputationallinguisticshasemphasizedtheperspectiveofsemanticrelatednessoftwolexemesinalexicalresource,orits

inverse,semanticdistance.It’simportanttonotethatsemanticrelatednessisamoregeneralconceptthansimilarity;similarentitiesareusuallyassumedtoberelatedbyvirtueoftheirlikeness(bank-trustcompany),butdissimilarentitiesmayalsobesemanticallyrelatedbylexicalrelationshipssuchasmeronymy(car-wheel)andantonymy(hot-cold),orjustbyanykindoffunctionalrelationshiporfrequentassociation(pencil-paper,penguin-Antarctica).

Astheseexamplesshow,semanticrelatednessisthesameasattributionalsimilarity(e.g.,hotandcoldarebothkindsoftemperature,pencilandpaperarebothusedforwriting).Hereweprefertousethetermattributionalsimilarity,becauseitemphasizesthecontrastwithrelationalsimilarity.Thetermsemanticrelatednessmayleadtoconfu-sionwhenthetermrelationalsimilarityisalsounderdiscussion.

Resnik(1995)describessemanticsimilarityasfollows:

Semanticsimilarityrepresentsaspecialcaseofsemanticrelatedness:forexample,carsandgasolinewouldseemtobemorecloselyrelatedthan,say,carsandbicycles,butthelatterpairarecertainlymoresimilar.Radaetal.(1989)suggestthattheassessmentofsimilarityinsemanticnetworkscaninfactbethoughtofasinvolvingjusttaxonimic(IS-A)links,totheexclusionofotherlinktypes;thatviewwillalsobetakenhere,althoughadmittedlyitexcludessomepotentiallyusefulinformation.

Thussemanticsimilarityisaspecifictypeofattributionalsimilarity.Thetermsemanticsimilarityismisleading,becauseitreferstoatypeofattributionalsimilarity,yetrela-tionalsimilarityisnotanylesssemanticthanattributionalsimilarity.

Toavoidconfusion,wewillusethetermsattributionalsimilarityandrelationalsim-ilarity,followingMedin,Goldstone,andGentner(1990).Insteadofsemanticsimilarity(Resnik,1995)orsemanticallysimilar(Chiarelloetal.,1990),wepreferthetermtaxonomi-calsimilarity,whichwetaketobeaspecifictypeofattributionalsimilarity.Weinterpretsynonymyasahighdegreeofattributionalsimilarity.Analogyisahighdegreeofrela-tionalsimilarity.

2.2MeasuringAttributionalSimilarity

Algorithmsformeasuringattributionalsimilaritycanbelexicon-based(Lesk,1986;BudanitskyandHirst,2001;BanerjeeandPedersen,2003),corpus-based(Lesk,1969;LandauerandDumais,1997;Lin,1998a;Turney,2001),orahybridofthetwo(Resnik,1995;JiangandConrath,1997;Turneyetal.,2003).Intuitively,wemightexpectthatlexicon-basedalgorithmswouldbebetteratcapturingsynonymythancorpus-basedalgorithms,sincelexicons,suchasWordNet,explicitlyprovidesynonymyinformationthatisonlyimplicitinacorpus.However,experimentsdonotsupportthisintuition.

Severalalgorithmshavebeenevaluatedusing80multiple-choicesynonymques-

4

SimilarityofSemanticRelationsTurney

Stem:levied

Solution:(a)imposed

Table3

Performanceofattributionalsimilaritymeasuresonthe80TOEFLquestions.(Theaverage

non-EnglishUScollegeapplicant’sperformanceisincludedinthebottomrow,forcomparison.)

JarmaszandSzpakowicz(2003)TerraandClarke(2003)Turneyetal.(2003)bestlexicon-basedalgorithmbestcorpus-basedalgorithmbesthybridalgorithm78.7581.2597.50

tionstakenfromtheTestofEnglishasaForeignLanguage(TOEFL).Anexampleofoneofthe80TOEFLquestionsappearsinTable2.Table3showsthebestperformanceontheTOEFLquestionsforeachtypeofattributionalsimilarityalgorithm.Theresultssupporttheclaimthatlexicon-basedalgorithmshavenoadvantageovercorpus-basedalgorithmsforrecognizingsynonymy.

2.3UsingAttributionalSimilaritytoSolveAnalogies

Wemaydistinguishnearanalogies(mason:stone::carpenter:wood)fromfaranalogies(traffic:street::water:riverbed)(Gentner,1983;Medin,Goldstone,andGentner,1990).InananalogyA:B::C:D,wherethereisahighdegreeofrelationalsimilaritybetweenA:BandC:D,ifthereisalsoahighdegreeofattributionalsimilaritybetweenAandC,andbetweenBandD,thenA:B::C:Disanearanalogy;otherwise,itisafaranalogy.

ItseemspossiblethatSATanalogyquestionsmightconsistlargelyofnearanalogies,inwhichcasetheycanbesolvedusingattributionalsimilaritymeasures.Wecouldscoreeachcandidateanalogybytheaverageoftheattributionalsimilarity,sima,betweenAandCandbetweenBandD:

score(A:B::C:D)=

1

totalnumberofguessesmade

(2)

5

ComputationalLinguisticsVolume1,Number1

AlgorithmTypePrecisionRecallF

TurneyandLittman(2005)randomrelational(VSM)random47.720.047.120.047.420.0

maximumpossiblenumbercorrectF=

2×precision×recall

(3)

2See

http://www.d.umn.edu/∼tpederse/similarity.html.

6

SimilarityofSemanticRelationsTurney

wasfirstattemptedbyasystemcalledArgus(Reitman,1965),usingasmallhand-builtsemanticnetwork.Arguscouldonlysolvethelimitedsetofanalogyquestionsthatitsprogrammerhadanticipated.Arguswasbasedonaspreadingactivationmodelanddidnotexplicitlyattempttomeasurerelationalsimilarity.

Turneyetal.(2003)combined13independentmodulestoanswerSATquestions.Thefinaloutputofthesystemwasbasedonaweightedcombinationoftheoutputsofeachindividualmodule.Thebestofthe13moduleswastheVSM,whichisdescribedindetailinTurneyandLittman(2005).TheVSMwasevaluatedonasetof374SATquestions,achievingascoreof47%.

Incontrastwiththecorpus-basedapproachofTurneyandLittman(2005),Veale(2004)appliedalexicon-basedapproachtothesame374SATquestions,attainingascoreof43%.VealeevaluatedthequalityofacandidateanalogyA:B::C:DbylookingforpathsinWordNet,joiningAtoBandCtoD.ThequalitymeasurewasbasedonthesimilaritybetweentheA:BpathsandtheC:Dpaths.

Turney(2005)introducedLatentRelationalAnalysis(LRA),anenhancedversionoftheVSMapproach,whichreached56%onthe374SATquestions.Herewegobe-yondTurney(2005)bydescribingLRAinmoredetail,performingmoreextensiveex-periments,andanalyzingthealgorithmandrelatedworkinmoredepth.

3.2StructureMappingTheory

French(2002)citesStructureMappingTheory(SMT)(Gentner,1983)anditsimplemen-tationintheStructureMappingEngine(SME)(Falkenhainer,Forbus,andGentner,1989)asthemostinfluentialworkonmodelingofanalogy-making.Thegoalofcomputationalmodelingofanalogy-makingistounderstandhowpeopleformcomplex,structuredanalogies.SMEtakesrepresentationsofasourcedomainandatargetdomain,andpro-ducesananalogicalmappingbetweenthesourceandtarget.Thedomainsaregivenstructuredpropositionalrepresentations,usingpredicatelogic.Thesedescriptionsin-cludeattributes,relations,andhigher-orderrelations(expressingrelationsbetweenre-lations).Theanalogicalmappingconnectssourcedomainrelationstotargetdomainrelations.

Forexample,thereisananalogybetweenthesolarsystemandRutherford’smodeloftheatom(Falkenhainer,Forbus,andGentner,1989).ThesolarsystemisthesourcedomainandRutherford’smodeloftheatomisthetargetdomain.Thebasicobjectsinthesourcemodelaretheplanetsandthesun.Thebasicobjectsinthetargetmodelaretheelectronsandthenucleus.Theplanetsandthesunhavevariousattributes,suchasmass(sun)andmass(planet),andvariousrelations,suchasrevolve(planet,sun)andattracts(sun,planet).Likewise,thenucleusandtheelectronshaveattributes,suchascharge(electron)andcharge(nucleus),andrelations,suchasrevolve(electron,nucleus)andattracts(nucleus,electron).SMEmapsrevolve(planet,sun)torevolve(electron,nu-cleus)andattracts(sun,planet)toattracts(nucleus,electron).

Eachindividualconnection(e.g.,fromrevolve(planet,sun)torevolve(electron,nu-cleus))inananalogicalmappingimpliesthattheconnectedrelationsaresimilar;thus,SMTrequiresameasureofrelationalsimilarity,inordertoformmaps.EarlyversionsofSMEonlymappedidenticalrelations,butlaterversionsofSMEallowedsimilar,non-identicalrelationstomatch(Falkenhainer,1990).However,thefocusofresearchinanalogy-makinghasbeenonthemappingprocessasawhole,ratherthanmeasuringthesimilaritybetweenanytwoparticularrelations,hencethesimilaritymeasuresusedinSMEatthelevelofindividualconnectionsaresomewhatrudimentary.

Webelievethatamoresophisticatedmeasureofrelationalsimilarity,suchasLRA,mayenhancetheperformanceofSME.Likewise,thefocusofourworkhereisonthesimilaritybetweenparticularrelations,andweignoresystematicmappingbetweensets

7

ComputationalLinguisticsVolume1,Number1

Metaphoricalsentence

Idemolishedhisargument.Youneedtobudgetyourtime.I’veinvestedalotoftimeinher.Mymindjustisn’toperatingtoday.Lifehascheatedme.

Inflationiseatingupourprofits.SAT-styleverbalanalogy

down::argument:refute

building:demolish::argument:refutemoney:budget::time:schedulemoney:invest::time:allocatemachine:operate::mind:thinkcharlatan:cheat::life:disappointanimal:eat::inflation:reduce

SimilarityofSemanticRelationsTurney

someextent.BarkerandSzpakowicz(1998)triedacorpus-basedapproachthatexplic-itlyusedameasureofrelationalsimilarity,buttheirmeasurewasbasedonliteralmatch-ing,whichlimiteditsabilitytogeneralize.Moldovanetal.(2004)alsousedameasureofrelationalsimilarity,basedonmappingeachnounandmodifierintosemanticclassesinWordNet.Thenoun-modifierpairsweretakenfromacorpusandthesurroundingcontextinthecorpuswasusedinawordsensedisambiguationalgorithm,toimprovethemappingofthenounandmodifierintoWordNet.TurneyandLittman(2005)usedtheVSM(asacomponentinasinglenearestneighbourlearningalgorithm)tomeasurerelationalsimilarity.Wetakethesameapproachhere,substitutingLRAfortheVSM,inSection7.

Lauer(1995)usedacorpus-basedapproach(usingtheBNC)toparaphrasenoun-modifierpairs,byinsertingtheprepositionsof,for,in,at,on,from,with,andabout.Forexample,reptilehavenwasparaphrasedashavenforreptiles.LapataandKeller(2004)achievedimprovedresultsonthistask,byusingthedatabaseofAltaVista’ssearchen-gineasacorpus.

3.5WordSenseDisambiguation

Webelievethattheintendedsenseofapolysemouswordisdeterminedbyitssemanticrelationswiththeotherwordsinthesurroundingtext.Ifwecanidentifythesemanticrelationsbetweenthegivenwordanditscontext,thenwecandisambiguatethegivenword.Yarowsky’s(1993)observationthatcollocationsarealmostalwaysmonosemousisevidenceforthisview.Federici,Montemagni,andPirrelli(1997)presentananalogy-basedapproachtowordsensedisambiguation.

Forexample,considerthewordplant.Outofcontext,plantcouldrefertoanindus-trialplantoralivingorganism.Supposeplantappearsinsometextnearfood.Atypicalapproachtodisambiguatingplantwouldcomparetheattributionalsimilarityoffoodandindustrialplanttotheattributionalsimilarityoffoodandlivingorganism(Lesk,1986;BanerjeeandPedersen,2003).Inthiscase,thedecisionmaynotbeclear,sinceindustrialplantsoftenproducefoodandlivingorganismsoftenserveasfood.Itwouldbeveryhelpfultoknowtherelationbetweenfoodandplantinthisexample.Inthephrase“foodfortheplant”,therelationbetweenfoodandplantstronglysuggeststhattheplantisalivingorganism,sinceindustrialplantsdonotneedfood.Inthetext“foodattheplant”,therelationstronglysuggeststhattheplantisanindustrialplant,sincelivingorgan-ismsarenotusuallyconsideredaslocations.Thusanalgorithmforclassifyingsemanticrelations(asinSection7)shouldbehelpfulforwordsensedisambiguation.

3.6InformationExtraction

Theproblemofrelationextractionis,givenaninputdocumentandaspecificrelationR,extractallpairsofentities(ifany)thathavetherelationRinthedocument.Theprob-lemwasintroducedaspartoftheMessageUnderstandingConferences(MUC)in1998.Zelenko,Aone,andRichardella(2003)presentakernelmethodforextractingtherela-tionsperson-affiliationandorganization-location.Forexample,inthesentence“JohnSmithisthechiefscientistoftheHardcomCorporation,”thereisaperson-affiliationrelationbe-tween“JohnSmith”and“HardcomCorporation”(Zelenko,Aone,andRichardella,2003).Thisissimilartotheproblemofclassifyingsemanticrelations(Section3.4),exceptthatinformationextractionfocusesontherelationbetweenaspecificpairofentitiesinaspecificdocument,ratherthanageneralpairofwordsingeneraltext.Thereforeanalgorithmforclassifyingsemanticrelationsshouldbeusefulforinformationextraction.

IntheVSMapproachtoclassifyingsemanticrelations(TurneyandLittman,2005),wewouldhaveatrainingsetoflabeledexamplesoftherelationperson-affiliation,forin-stance.Eachexamplewouldberepresentedbyavectorofpatternfrequencies.Givena

9

ComputationalLinguisticsVolume1,Number1

specificdocumentdiscussing“JohnSmith”and“HardcomCorporation”,wecouldcon-structavectorrepresentingtherelationbetweenthesetwoentities,andthenmeasuretherelationalsimilaritybetweenthisunlabeledvectorandeachofourlabeledtrainingvectors.Itwouldseemthatthereisaproblemhere,becausethetrainingvectorswouldberelativelydense,sincetheywouldpresumablybederivedfromalargecorpus,butthenewunlabeledvectorfor“JohnSmith”and“HardcomCorporation”wouldbeverysparse,sincetheseentitiesmightbementionedonlyonceinthegivendocument.How-ever,thisisnotanewproblemfortheVectorSpaceModel;itisthestandardsituationwhentheVSMisusedforinformationretrieval.Aquerytoasearchengineisrep-resentedbyaverysparsevectorwhereasadocumentisrepresentedbyarelativelydensevector.Therearewell-knowntechniquesininformationretrievalforcopingwiththisdisparity,suchasweightingschemesforqueryvectorsthataredifferentfromtheweightingschemesfordocumentvectors(SaltonandBuckley,1988).

3.7QuestionAnswering

Intheirpaperonclassifyingsemanticrelations,Moldovanetal.(2004)suggestthatanimportantapplicationoftheirworkisQuestionAnswering.AsdefinedintheTextREtrievalConference(TREC)QuestionAnswering(QA)track,thetaskistoanswersimplequestions,suchas“Wherehavenuclearincidentsoccurred?”,byretrievingarelevantdocumentfromalargecorpusandthenextractingashortstringfromthedoc-ument,suchas“TheThreeMileIslandnuclearincidentcausedaDOEpolicycrisis.”Moldovanetal.(2004)proposetomapagivenquestiontoasemanticrelationandthensearchforthatrelationinacorpusofsemanticallytaggedtext.Theyarguethatthede-siredsemanticrelationcaneasilybeinferredfromthesurfaceformofthequestion.Aquestionoftheform“Where...?”islikelytobeseekingforentitieswithalocationrela-tionandaquestionoftheform“Whatdid...make?”islikelytobelookingforentitieswithaproductrelation.InSection7,weshowhowLRAcanrecognizerelationssuchaslocationandproduct(seeTable19).

3.8AutomaticThesaurusGeneration

Hearst(1992)presentsanalgorithmforlearninghyponym(typeof)relationsfromacorpusandBerlandandCharniak(1999)describehowtolearnmeronym(partof)re-lationsfromacorpus.Thesealgorithmscouldbeusedtoautomaticallygenerateathesaurusordictionary,butwewouldliketohandlemorerelationsthanhyponymyandmeronymy.WordNetdistinguishesmorethanadozensemanticrelationsbetweenwords(Fellbaum,1998)andNastaseandSzpakowicz(2003)list30semanticrelationsfornoun-modifierpairs.Hearst(1992)andBerlandandCharniak(1999)usemanuallygeneratedrulestominetextforsemanticrelations.TurneyandLittman(2005)alsouseamanuallygeneratedsetof64patterns.

LRAdoesnotuseapredefinedsetofpatterns;itlearnspatternsfromalargecorpus.Insteadofmanuallygeneratingnewrulesorpatternsforeachnewsemanticrelation,itispossibletoautomaticallylearnameasureofrelationalsimilaritythatcanhandlearbitrarysemanticrelations.Anearestneighbouralgorithmcanthenusethisrelationalsimilaritymeasuretolearntoclassifyaccordingtoanysetofclassesofrelations,giventheappropriatelabeledtrainingdata.

Girju,Badulescu,andMoldovan(2003)presentanalgorithmforlearningmeronymrelationsfromacorpus.LikeHearst(1992)andBerlandandCharniak(1999),theyusemanuallygeneratedrulestominetextfortheirdesiredrelation.However,theysupple-menttheirmanualruleswithautomaticallylearnedconstraints,toincreasetheprecisionoftherules.

10

SimilarityofSemanticRelationsTurney

3.9InformationRetrieval

Veale(2003)hasdevelopedanalgorithmforrecognizingcertaintypesofwordanalo-gies,basedoninformationinWordNet.Heproposestousethealgorithmforana-logicalinformationretrieval.Forexample,thequery“Muslimchurch”shouldreturn“mosque”andthequery“Hindubible”shouldreturn“theVedas”.Thealgorithmwasdesignedwithafocusonanalogiesoftheformadjective:noun::adjective:noun,suchasChristian:church::Muslim:mosque.

Ameasureofrelationalsimilarityisapplicabletothistask.Givenapairofwords,AandB,thetaskistoreturnanotherpairofwords,XandY,suchthatthereishighrelationalsimilaritybetweenthepairA:XandthepairY:B.Forexample,givenA=“Muslim”andB=“church”,returnX=“mosque”andY=“Christian”.(ThepairMuslim:mosquehasahighrelationalsimilaritytothepairChristian:church.)

Marxetal.(2002)developedanunsupervisedalgorithmfordiscoveringanalogiesbyclusteringwordsfromtwodifferentcorpora.Eachclusterofwordsinonecorpusiscoupledone-to-onewithaclusterintheothercorpus.Forexample,oneexperimentusedacorpusofBuddhistdocumentsandacorpusofChristiandocuments.Aclusterofwordssuchas{Hindu,Mahayana,Zen,...}fromtheBuddhistcorpuswascoupledwithaclusterofwordssuchas{Catholic,Protestant,...}fromtheChristiancorpus.ThusthealgorithmappearstohavediscoveredananalogicalmappingbetweenBuddhistschoolsandtraditionsandChristianschoolsandtraditions.Thisisinterestingwork,butitisnotdirectlyapplicabletoSATanalogies,becauseitdiscoversanalogiesbetweenclustersofwords,ratherthanindividualwords.

3.10IdentifyingSemanticRoles

Asemanticframeforaneventsuchasjudgementcontainssemanticrolessuchasjudge,evaluee,andreason,whereasaneventsuchasstatementcontainsrolessuchasspeaker,ad-dressee,andmessage(GildeaandJurafsky,2002).Thetaskofidentifyingsemanticrolesistolabelthepartsofasentenceaccordingtotheirsemanticroles.Webelievethatitmaybehelpfultoviewsemanticframesandtheirsemanticrolesassetsofsemanticrelations;thusameasureofrelationalsimilarityshouldhelpustoidentifysemanticroles.Moldovanetal.(2004)arguethatsemanticrolesaremerelyaspecialcaseofse-manticrelations(Section3.4),sincesemanticrolesalwaysinvolveverbsorpredicates,butsemanticrelationscaninvolvewordsofanypartofspeech.4.TheVectorSpaceModel

ThissectionexaminespastworkonmeasuringattributionalandrelationalsimilarityusingtheVectorSpaceModel(VSM).

4.1MeasuringAttributionalSimilaritywiththeVectorSpaceModel

TheVSMwasfirstdevelopedforinformationretrieval(SaltonandMcGill,1983;SaltonandBuckley,1988;Salton,1989)anditisatthecoreofmostmodernsearchengines(Baeza-YatesandRibeiro-Neto,1999).IntheVSMapproachtoinformationretrieval,queriesanddocumentsarerepresentedbyvectors.Elementsinthesevectorsarebasedonthefrequenciesofwordsinthecor-respondingqueriesanddocuments.Thefrequenciesareusuallytransformedbyvar-iousformulasandweights,tailoredtoimprovetheeffectivenessofthesearchengine(Salton,1989).Theattributionalsimilaritybetweenaqueryandadocumentismea-suredbythecosineoftheanglebetweentheircorrespondingvectors.Foragivenquery,thesearchenginesortsthematchingdocumentsinorderofdecreasingcosine.

TheVSMapproachhasalsobeenusedtomeasuretheattributionalsimilarityofwords(Lesk,1969;Ruge,1992;PantelandLin,2002).PantelandLin(2002)clustered

11

ComputationalLinguisticsVolume1,Number1

wordsaccordingtotheirattributionalsimilarity,asmeasuredbyaVSM.Theiralgo-rithmisabletodiscoverthedifferentsensesofpolysemouswords,usingunsupervisedlearning.

LatentSemanticAnalysisenhancestheVSMapproachtoinformationretrievalbyusingtheSingularValueDecomposition(SVD)tosmooththevectors,whichhelpsto

handlenoiseandsparsenessinthedata(Deerwesteretal.,1990;Dumais,1993;LandauerandDumais,1997).SVDimprovesbothdocument-queryattributionalsimilaritymeasures(Deerwesteretal.,1990;Dumais,1993)andword-wordattributionalsimilaritymeasures(LandauerandDumais,1997).LRAalsousesSVDtosmoothvectors,aswediscussinSection5.

4.2MeasuringRelationalSimilaritywiththeVectorSpaceModel

LetR1bethesemanticrelation(orsetofrelations)betweenapairofwords,AandB,andletR2bethesemanticrelation(orsetofrelations)betweenanotherpair,CandD.WewishtomeasuretherelationalsimilaritybetweenR1andR2.TherelationsR1andR2arenotgiventous;ourtaskistoinferthesehidden(latent)relationsandthencomparethem.

IntheVSMapproachtorelationalsimilarity(TurneyandLittman,2005),wecreatevectors,r1andr2,thatrepresentfeaturesofR1andR2,andthenmeasurethesimilarityofR1andR2bythecosineoftheangleθbetweenr1andr2:

r1=󰀏r1,1,...,r1,n󰀐r2=󰀏r2,1,...r2,n󰀐

n󰀁

(5)(6)

cosine(θ)=

i=1n󰀁

r1,i·r2,i

i=1n󰀁

=

i=1

(r1,i)2·

(r2,i)2

r1·r2

r1·r1·

󰀍r1󰀍·󰀍r2󰀍

(7)

Wecreateavector,r,tocharacterizetherelationshipbetweentwowords,XandY,

bycountingthefrequenciesofvariousshortphrasescontainingXandY.TurneyandLittman(2005)usealistof64joiningterms,suchas“of”,“for”,and“to”,toform128phrasesthatcon-tainXandY,suchas“XofY”,“YofX”,“XforY”,“YforX”,“XtoY”,and“YtoX”.Thesephrasesarethenusedasqueriesforasearchengineandthenumberofhits(matchingdocuments)isrecordedforeachquery.Thisprocessyieldsavectorof128numbers.Ifthenumberofhitsforaqueryisx,thenthecorrespondingelementinthevectorrislog(x+1).Severalauthorsreportthatthelogarithmictransformationoffre-quenciesimprovescosine-basedsimilaritymeasures(SaltonandBuckley,1988;Ruge,1992;Lin,1998b).

TurneyandLittman(2005)evaluatedtheVSMapproachbyitsperformanceon374SATanalogyquestions,achievingascoreof47%.Sincetherearefivechoicesforeachquestion,theexpectedscoreforrandomguessingis20%.Toansweramultiple-choiceanalogyquestion,vectorsarecreatedforthestempairandeachchoicepair,andthencosinesarecalculatedfortheanglesbetweenthestempairandeachchoicepair.Thebestguessisthechoicepairwiththehighestcosine.WeusethesamesetofanalogyquestionstoevaluateLRAinSection6.

TheVSMwasalsoevaluatedbyitsperformanceasadistance(nearness)mea-sureinasupervisednearestneighbourclassifierfornoun-modifiersemanticrelations(TurneyandLittman,2005).Theevaluationused600hand-labelednoun-modifierpairsfromNastaseandSzpakowicz(2003).Atestingpairisclassifiedbysearchingforits

12

SimilarityofSemanticRelationsTurney

singlenearestneighbourinthelabeledtrainingdata.Thebestguessisthelabelforthetrainingpairwiththehighestcosine.LRAisevaluatedwiththesamesetofnoun-modifierpairsinSection7.

TurneyandLittman(2005)usedtheAltaVistasearchenginetoobtainthefrequencyinformationrequiredtobuildvectorsfortheVSM.ThustheircorpuswasthesetofallwebpagesindexedbyAltaVista.Atthetime,theEnglishsubsetofthiscorpusconsistedofabout5×1011words.AroundApril2004,AltaVistamadesubstantialchangestotheirsearchengine,removingtheiradvancedsearchoperators.Theirsearchenginenolongersupportstheasteriskoperator,whichwasusedbyTurneyandLittman(2005)forstemmingandwild-cardsearching.AltaVistaalsochangedtheirpolicytowardsautomatedsearching,whichisnowforbidden.3

TurneyandLittman(2005)usedAltaVista’shitcount,whichisthenumberofdocu-ments(webpages)matchingagivenquery,butLRAusesthenumberofpassages(strings)matchingaquery.InourexperimentswithLRA(Sections6and7),weusealocalcopyof

theWaterlooMultiTextSystem(Clarke,Cormack,andPalmer,1998;TerraandClarke,2003),runningona16CPUBeowulfCluster,withacorpusofabout5×1010Englishwords.TheWaterlooMultiTextSystem(WMTS)isadistributed(multiprocessor)searchengine,designedprimarilyforpassageretrieval(althoughdocumentretrievalispossible,asaspecialcaseofpassageretrieval).Thetextandindexrequireapproximatelyoneter-abyteofdiskspace.AlthoughAltaVistaonlygivesaroughestimateofthenumberofmatchingdocuments,theWaterlooMultiTextSystemgivesexactcountsofthenumberofmatchingpassages.

Turneyetal.(2003)combine13independentmodulestoanswerSATquestions.TheperformanceofLRAsignificantlysurpassesthiscombinedsystem,butthereisnorealcontestbetweentheseapproaches,becausewecansimplyaddLRAtothecombination,asafourteenthmodule.SincetheVSMmodulehadthebestperformanceofthethirteenmodules(Turneyetal.,2003),thefollowingexperimentsfocusoncomparingVSMandLRA.

5.LatentRelationalAnalysis

LRAtakesasinputasetofwordpairsandproducesasoutputameasureoftherela-tionalsimilaritybetweenanytwooftheinputpairs.LRAreliesonthreeresources,asearchenginewithaverylargecorpusoftext,abroad-coveragethesaurusofsynonyms,andanefficientimplementationofSVD.

Wefirstpresentashortdescriptionofthecorealgorithm.Later,inthefollowingsubsections,wewillgiveadetaileddescriptionofthealgorithm,asitisappliedintheexperimentsinSections6and7.

•Givenwordinaseteachofwordwordpair.pairsForaseachinput,inputlookpair,inamakethesaurusalternateforsynonymspairsbyreplacingforeachtheoriginalwordswiththeirsynonyms.Thealternatepairsareintendedtoformnearanalogieswiththecorrespondingoriginalpairs(seeSection2.3).•Filterpairsoutthatalternateco-occurpairsrarelythatindothenotcorpus.formnearIntheanalogies,precedingbystep,droppingifasynonymalternatereplacedanambiguousoriginalword,butthesynonymcapturesthewrongsenseoftheoriginalword,itislikelythatthereisnosignificantrelationbetweenthewordsinthealternatepair,sotheywillrarelyco-occur.

ComputationalLinguisticsVolume1,Number1

•Foreachoriginalandalternatepair,searchinthecorpusforshortphrasesthatbeginwithonememberofthepairandendwiththeother.Thesephraseschar-acterizetherelationbetweenthewordsineachpair.

•Foreachphrasefromthepreviousstep,createseveralpatterns,byreplacingwordsinthephrasewithwildcards.

•Buildapair-patternfrequencymatrix,inwhicheachcellrepresentsthenumberoftimesthatthecorrespondingpair(row)appearsinthecorpuswiththecor-respondingpattern(column).Thenumberwillusuallybezero,resultinginasparsematrix.

•ApplytheSingularValueDecompositiontothematrix.Thisreducesnoiseinthematrixandhelpswithsparsedata.

•Supposethatwewishtocalculatetherelationalsimilaritybetweenanytwooftheoriginalpairs.Startbylookingforthetworowvectorsinthepair-patternfrequencymatrixthatcorrespondtothetwooriginalpairs.Calculatetheco-sineoftheanglebetweenthesetworowvectors.Thenmergethecosineofthetwooriginalpairswiththecosinesoftheircorrespondingalternatepairs,asfol-lows.Ifananalogyformedwithalternatepairshasahighercosinethantheoriginalpairs,weassumethatwehavefoundabetterwaytoexpresstheanal-ogy,butwehavenotsignificantlychangeditsmeaning.Ifthecosineislower,weassumethatwemayhavechangedthemeaning,byinappropriatelyreplac-ingwordswithsynonyms.Filteroutinappropriatealternatesbydroppingallanalogiesformedofalternates,suchthatthecosinesarelessthanthecosinefortheoriginalpairs.Therelationalsimilaritybetweenthetwooriginalpairsisthencalculatedastheaverageofalloftheremainingcosines.

Themotivationforthealternatepairsistohandlecaseswheretheoriginalpairsco-occurrarelyinthecorpus.Thehopeisthatwecanfindnearanalogiesfortheoriginalpairs,suchthatthenearanalogiesco-occurmorefrequentlyinthecorpus.Thedangeristhatthealternatesmayhavedifferentrelationsfromtheoriginals.Thefilteringstepsaboveaimtoreducethisrisk.

5.1InputandOutput

Inourexperiments,theinputsetcontainsfrom600to2,244wordpairs.Theoutputsimilaritymeasureisbasedoncosines,sothedegreeofsimilaritycanrangefrom−1(dissimilar;θ=180◦)to+1(similar;θ=0◦).BeforeapplyingSVD,thevectorsarecompletelynonnegative,whichimpliesthatthecosinecanonlyrangefrom0to+1,butSVDintroducesnegativevalues,soitispossibleforthecosinetobenegative,althoughwehaveneverobservedthisinourexperiments.

5.2SearchEngineandCorpus

Inthefollowingexperiments,weusealocalcopyoftheWaterlooMultiTextSystem(Clarke,Cormack,andPalmer,1998;TerraandClarke,2003).4Thecorpusconsistsofabout5×1010Englishwords,gatheredbyawebcrawler,mainlyfromUSacademicwebsites.Thewebpagescoveraverywiderangeoftopics,styles,genres,quality,andwritingskill.TheWMTSiswellsuitedtoLRA,becausetheWMTSscaleswelltolargecorpora(oneterabyte,inourcase),itgivesexactfrequencycounts(unlikemostwebsearchengines),itisdesignedforpassageretrieval(ratherthandocumentretrieval),andithasapowerfulquerysyntax.

SimilarityofSemanticRelationsTurney

5.3Thesaurus

Asasourceofsynonyms,weuseLin’s(1998a)automaticallygeneratedthesaurus.Thisthesaurus5isavailablethroughanonlineinteractivedemonstrationoritcanbedown-loaded.Weusedtheonlinedemonstration,sincethedownloadableversionseemstocontainfewerwords.Foreachwordintheinputsetofwordpairs,weautomaticallyquerytheonlinedemonstrationandfetchtheresultinglistofsynonyms.AsacourtesytootherusersofLin’sonlinesystem,weinserta20seconddelaybetweeneachquery.

Lin’sthesauruswasgeneratedbyparsingacorpusofabout5×107Englishwords,consistingoftextfromtheWallStreetJournal,SanJoseMercury,andAPNewswire(Lin,1998a).Theparserwasusedtoextractpairsofwordsandtheirgrammaticalre-lations.Wordswerethenclusteredintosynonymsets,basedonthesimilarityoftheirgrammaticalrelations.Twowordswerejudgedtobehighlysimilarwhentheytendedtohavethesamekindsofgrammaticalrelationswiththesamesetsofwords.Givenawordanditspartofspeech,Lin’sthesaurusprovidesalistofwords,sortedinor-derofdecreasingattributionalsimilarity.ThissortingisconvenientforLRA,sinceitmakesitpossibletofocusonwordswithhigherattributionalsimilarityandignoretherest.WordNet,incontrast,givenawordanditspartofspeech,providesalistofwordsgroupedbythepossiblesensesofthegivenword,withgroupssortedbythefrequenciesofthesenses.WordNet’ssortingdoesnotdirectlycorrespondtosortingbydegreeofattributionalsimilarity,althoughvariousalgorithmshavebeenproposedforderivingattributionalsimilarityfromWordNet(Resnik,1995;JiangandConrath,1997;BudanitskyandHirst,2001;BanerjeeandPedersen,2003).

5.4SingularValueDecomposition

WeuseRohde’sSVDLIBCimplementationoftheSingularValueDecomposition,whichisbasedonSVDPACKC(Berry,1992).6InLRA,SVDisusedtoreducenoiseandcom-pensateforsparseness.

5.5TheAlgorithm

WewillgothrougheachstepofLRA,usinganexampletoillustratethesteps.As-sumethattheinputtoLRAisthe374multiple-choiceSATwordanalogyquestionsofTurneyandLittman(2005).Sincetherearesixwordpairsperquestion(thestemandfivechoices),theinputconsistsof2,244wordpairs.Let’ssupposethatwewishtocalcu-latetherelationalsimilaritybetweenthepairquart:volumeandthepairmile:distance,takenfromtheSATquestioninTable6.TheLRAalgorithmconsistsofthefollowingtwelvesteps:

1.Findalternates:ForeachwordpairA:Bintheinputset,lookinLin’s(1998a)

thesaurusforthetopnum

simis10)thataremostsimilartoA.ForeachA′

thatissimilartoA,makeanewwordpairA′:B.Likewise,lookforthetopnum

simalternatepairs.Whenlookingforsimilarwords

inLin’s(1998a)thesaurus,avoidwordsthatseemunusual(e.g.,hyphenatedwords,wordswiththreecharactersorless,wordswithnon-alphabeticalchar-

ComputationalLinguisticsVolume1,Number1

Stem:quart:volume

Solution:(b)mile:distance

simalternates

asfollows.Foreachalternatepair,sendaquerytotheWMTS,tofindthefrequencyofphrasesthatbeginwithonememberofthepairandendwiththeother.Thephrasescannothavemorethanmax

phrase=5).Sortthealternatepairsbythefrequencyoftheirphrases.Selectthetopnum

filter=3,so17alternatesaredropped).Thissteptendsto

eliminatealternatesthathavenoclearsemanticrelation.ThethirdcolumninTable7showsthefrequencywithwhicheachpairco-occursinawindowofmax

phrasewordsandtheremustbeatleastoneword

betweenthetwomembersofthewordpair.Thesephrasesgiveusinformationaboutthesemanticrelationsbetweenthewordsineachpair.Aphrasewithnowordsbetweenthetwomembersofthewordpairwouldgiveusverylittleinformationaboutthesemanticrelations(otherthanthatthewordsoccurto-getherwithacertainfrequencyinacertainorder).Table8givessomeexamplesofphrasesinthecorpusthatmatchthepairquart:volume.

4.Findpatterns:Foreachphrasefoundinthepreviousstep,buildpatternsfromtheinterveningwords.Apatternisconstructedbyreplacinganyorallornoneoftheinterveningwordswithwildcards(onewildcardcanonlyreplaceoneword).Ifaphraseisnwordslong,therearen−2interveningwordsbetweenthemembersofthegivenwordpair(e.g.,betweenquartandvolume).Thusaphrasewithnwordsgenerates2(n−2)patterns.(Weusemax

patternsmostfrequent

patternsanddiscardtherest(weusenum

SimilarityofSemanticRelationsTurney

Wordpairpint:volumegallon:volumeliter:volumesquirt:volumepail:volumevial:volume

pumping:volumeounce:volumespoonful:volumetablespoon:volume

Similarity0.2100.1590.1220.0840.0840.0840.0730.0710.0700.069

Frequency

37215003323542837313864304296

Filteringstep

accept(topalternate)accept(topalternate)

accept(topalternate)

phrasewordscanappearinaphrase.

17

ComputationalLinguisticsVolume1,Number1

P=“in”P=“*of”P=“of*”P=“**”

anotherrowforB:A.Thiswillmakethematrixmoresymmetrical,reflectingourknowledgethattherelationalsimilaritybetweenA:BandC:DshouldbethesameastherelationalsimilaritybetweenB:AandD:C.ThisduplicationofrowsisexaminedinSection6.6.

6.Mappatternstocolumns:Createamappingofthetopnum

patterns

columnsinX.ThisduplicationofcolumnsisexaminedinSection6.6.7.Generateasparsematrix:GenerateamatrixXinsparsematrixformat,suit-ableforinputtoSVDLIBC.Thevalueforthecellinrowiandcolumnjisthefrequencyofthej-thpattern(seestep6)inphrasesthatcontainthei-thwordpair(seestep5).Table9givessomeexamplesofpatternfrequenciesforquart:volume.8.Calculateentropy:Applylogandentropytransformationstothesparsema-trix(LandauerandDumais,1997).Thesetransformationshavebeenfoundtobeveryhelpfulforinformationretrieval(Harman,1986;Dumais,1990).Letxi,jbethecellinrowiandcolumnjofthematrixXfromstep7.LetmbethenumberofrowsinXandletnbethenumberofcolumns.Wewishtoweightthecellxi,jbytheentropyofthej-thcolumn.Tocalculatetheentropyofthecolumn,weneedtoconvertthecolumnintoavectorofprobabilities.Letpi,jbetheprobabilityofxi,j,calculatedbynormalizingthecolumnvectorsothat󰀁mthesumoftheelements󰀁isone,pi,j=xi,j/k=1xk,j.Theentropyofthej-th

m

columnisthenHj=−k=1pk,jlog(pk,j).Entropyisatitsmaximumwhenpi,jisauniformdistribution,pi,j=1/m,inwhichcaseHj=log(m).Entropyisatitsminimumwhenpi,jis1forsomevalueofiand0forallothervaluesofi,inwhichcaseHj=0.Wewanttogivemoreweighttocolumns(pat-terns)withfrequenciesthatvarysubstantiallyfromonerow(wordpair)tothenext,andlessweighttocolumnsthatareuniform.Thereforeweweightthecellxi,jbywj=1−Hj/log(m),whichvariesfrom0whenpi,jisuniformto1whenentropyisminimal.Wealsoapplythelogtransformationtofrequen-cies,log(xi,j+1).(Entropyiscalculatedwiththeoriginalfrequencyvalues,beforethelogtransformationisapplied.)Foralliandallj,replacetheorig-inalvaluexi,jinXbythenewvaluewjlog(xi,j+1).ThisisaninstanceoftheTF-IDF(TermFrequency-InverseDocumentFrequency)familyoftransfor-mations,whichisfamiliarininformationretrieval(SaltonandBuckley,1988;Baeza-YatesandRibeiro-Neto,1999):log(xi,j+1)istheTFtermandwjistheIDFterm.9.ApplySVD:Afterthelogandentropytransformationshavebeenappliedto

18

SimilarityofSemanticRelationsTurney

󰀍...󰀍FdenotestheFrobeniusnorm(GolubandVanLoan,1996).Wemaythink

T

ofthismatrixUkΣkVkasa“smoothed”or“compressed”versionoftheorig-inalmatrix.Inthesubsequentsteps,wewillbecalculatingcosinesforrowvectors.Forthispurpose,wecansimplifycalculationsbydroppingV.Thecosineoftwovectorsistheirdotproduct,aftertheyhavebeennormalizedtounitlength.ThematrixXXTcontainsthedotproductsofalloftherowvec-tors.Wecanfindthedotproductofthei-thandj-throwvectorsbylook-ingatthecellinrowi,columnjofthematrixXXT.SinceVTV=I,wehaveXXT=UΣVT(UΣVT)T=UΣVTVΣTUT=UΣ(UΣ)T,whichmeansthatwecancalculatecosineswiththesmallermatrixUΣ,insteadofusingX=UΣVT(Deerwesteretal.,1990).

10.Projection:CalculateUkΣk(weusek=300).Thismatrixhasthesamenumber

ofrowsasX,butonlykcolumns(insteadof2×num

thematrixX,runSVDLIBC.SVDdecomposesamatrixXintoaproductofthreematricesUΣVT,whereUandVareincolumnorthonormalform(i.e.,thecolumnsareorthogonalandhaveunitlength:UTU=VTV=I)andΣisadiagonalmatrixofsingularvalues(henceSVD)(GolubandVanLoan,1996).IfXisofrankr,thenΣisalsoofrankr.LetΣk,wherekT

ThematrixUkΣkVkisthematrixofrankkthatbestapproximatestheorig-inalmatrixX,inthesensethattheapproximationerrors.That󰀃itminimizes󰀃ˆ=UkΣkVTminimizes󰀃ˆ−X󰀃ˆofrankk,whereis,X󰀃X󰀃overallmatricesX

k

F

num

Thereforewehave(num

filter+1)versionsofA:B,theoriginaland

filter+1)versionsofC:D.

filter+1)2

cosines(inourexperiments,thereare16cosines).Forexample,supposeA:Bisquart:volumeandC:Dismile:distance.Table10givesthecosinesforthesixteencombinations.

12.Calculaterelationalsimilarity:TherelationalsimilaritybetweenA:BandC:D

istheaverageofthecosines,amongthe(num

ComputationalLinguisticsVolume1,Number1

WordpairsCosineCosine≥originalpairs

hasslippedthroughthefilteringinstep2,althoughitisnotagoodalternateforquart:volume.However,Table10showsthatallfouranalogiesthatinvolvepumping:volumearedroppedhere,instep12.

Steps11and12canberepeatedforeachtwoinputpairsthataretobecompared.ThiscompletesthedescriptionofLRA.

Table11givesthecosinesforthesampleSATquestion.Thechoicepairwiththehighestaveragecosine(thechoicewiththelargestvalueincolumn#1),choice(b),isthesolutionforthisquestion;LRAanswersthequestioncorrectly.Forcomparison,column#2givesthecosinesfortheoriginalpairsandcolumn#3givesthehighestcosine.ForthisparticularSATquestion,thereisonechoicethathasthehighestcosineforallthreecolumns,choice(b),althoughthisisnottrueingeneral.Notethatthegapbetweenthefirstchoice(b)andthesecondchoice(d)islargestfortheaveragecosines(column#1).Thissuggeststhattheaverageofthecosines(column#1)isbetteratdiscriminatingthecorrectchoicethaneithertheoriginalcosine(column#2)orthehighestcosine(column#3).

6.ExperimentswithWordAnalogyQuestions

Thissectionpresentsvariousexperimentswith374multiple-choiceSATwordanalogyquestions.

6.1BaselineLRASystem

Table12showstheperformanceofthebaselineLRAsystemonthe374SATquestions,usingtheparametersettingsandconfigurationdescribedinSection5.LRAcorrectlyanswered210ofthe374questions.160questionswereansweredincorrectlyand4ques-tionswereskipped,becausethestempairanditsalternateswererepresentedbyzero

20

SimilarityofSemanticRelationsTurney

AverageOriginalHighestcosinescosinescosinesStem:quart:volume

#1#2#3

Solution:(b)mile:distance0.6770.5250.781

Algorithm

PrecisionRecallFVeale(2004)

42.842.842.8bestattributionalsimilarity35.035.035.0randomguessing

20.020.020.0lowestco-occurrencefrequency16.816.816.8highestco-occurrencefrequency

11.8

11.8

11.8

ComputationalLinguisticsVolume1,Number1

StepDescriptionTimeH:M:SHardware

Total209:49:36

Table14

LRAversusVSMwith374SATanalogyquestions.

VSM-AVVSM-WMTSLRA176144210193196160534447.742.456.847.138.556.147.440.356.5

SimilarityofSemanticRelationsTurney

System

Recall(%correct)95%confidenceintervalforrecallHuman-level

(57%)

precisionbetweenLRAandthetwoVSMvariationsarealsosignificant,butthediffer-enceinprecisionbetweenthetwoVSMvariations(42.4%versus47.7%)isnotsignifi-cant.AlthoughVSM-AVhasacorpustentimeslargerthanLRA’s,LRAstillperformsbetterthanVSM-AV.

ComparingVSM-AVtoVSM-WMTS,thesmallercorpushasreducedthescoreoftheVSM,butmuchofthedropisduetothelargernumberofquestionsthatwereskipped(34forVSM-WMTSversus5forVSM-AV).Withthesmallercorpus,manymoreoftheinputwordpairssimplydonotappeartogetherinshortphrasesinthecorpus.LRAisabletoanswerasmanyquestionsasVSM-AV,althoughitusesthesamecorpusasVSM-WMTS,becauseLin’sthesaurusallowsLRAtosubstitutesynonymsforwordsthatarenotinthecorpus.

VSM-AVrequired17daystoprocessthe374analogyquestions(TurneyandLittman,2005),comparedto9daysforLRA.AsacourtesytoAltaVista,TurneyandLittman(2005)in-sertedafiveseconddelaybetweeneachquery.SincetheWMTSisrunninglocally,thereisnoneedfordelays.VSM-WMTSprocessedthequestionsinonlyoneday.

6.3HumanPerformance

Theaverageperformanceofcollege-boundseniorhighschoolstudentsonverbalSATquestionscorrespondstoarecall(percentcorrect)ofabout57%(TurneyandLittman,2005).TheSATItestconsistsof78verbalquestionsand60mathquestions(thereisalsoanSATIItest,coveringspecificsubjects,suchaschemistry).Analogyquestionsareonlyasub-setofthe78verbalSATquestions.Ifweassumethatthedifficultyofour374analogyquestionsiscomparabletothedifficultyofthe78verbalSATIquestions,thenwecanestimatethattheaveragecollege-boundseniorwouldcorrectlyanswerabout57%ofthe374analogyquestions.

Ofour374SATquestions,190arefromacollectionoftenofficialSATtests(Claman,2000).Onthissubsetofthequestions,LRAhasarecallof61.1%,comparedtoarecallof51.1%ontheother184questions.The184questionsthatarenotfromClaman(2000)seemtobemoredifficult.ThisindicatesthatwemaybeunderestimatinghowwellLRAper-forms,relativetocollege-boundseniorhighschoolstudents.Claman(2000)suggeststhattheanalogyquestionsmaybesomewhatharderthanotherverbalSATquestions,sowemaybeslightlyoverestimatingthemeanhumanscoreontheanalogyquestions.

Table15givesthe95%confidenceintervalsforLRA,VSM-AV,andVSM-WMTS,calculatedbytheBinomialExactTest(Agresti,1990).ThereisnosignificantdifferencebetweenLRAandhumanperformance,butVSM-AVandVSM-WMTSaresignificantlybelowhuman-levelperformance.

23

ComputationalLinguisticsVolume1,Number1

Parameter

simnum

simphrasemax

phrasefilternum

filternum

filterpatternsnum

patternsnum

patternsnum

patterns

Baseline

Value515461

Step11222224444

Precision54.254.155.856.254.356.854.355.958.457.058.1

Recall53.553.555.155.653.756.153.755.357.856.457.5

F53.853.855.555.954.056.554.055.658.156.757.8

⇒351000300050007000

6.4VaryingtheParametersinLRA

ThereareseveralparametersintheLRAalgorithm(seeSection5.5).Theparametervaluesweredeterminedbytryingasmallnumberofpossiblevaluesonasmallsetofquestionsthatweresetaside.SinceLRAisintendedtobeanunsupervisedlearningalgorithm,wedidnotattempttotunetheparametervaluestomaximizetheprecisionandrecallonthe374SATquestions.WehypothesizedthatLRAisrelativelyinsensitivetothevaluesoftheparameters.

Table16showsthevariationintheperformanceofLRAastheparametervaluesareadjusted.Wetakethebaselineparametersettings(giveninSection5.5)andvaryeachparameter,oneatatime,whileholdingtheremainingparametersfixedattheirbaselinevalues.Noneoftheprecisionandrecallvaluesaresignificantlydifferentfromthebaseline,accordingtotheFisherExactTest(Agresti,1990),atthe95%confidencelevel.Thissupportsthehypothesisthatthealgorithmisnotsensitivetotheparametervalues.

AlthoughafullrunofLRAonthe374SATquestionstakesninedays,forsomeoftheparametersitispossibletoreusecacheddatafrompreviousruns.Welimited

phrasebecausecachingwasnotashelpfulfortheexperimentswithnum

theseparameters,soexperimentingwiththemrequiredseveralweeks.

24

SimilarityofSemanticRelationsTurney

LRAbaselinesystem#1

LRAnoSVD#2LRAnosynonyms

#3

LRAnoSVDnosynonyms

#4

VSM-WMTS

#5

6.5AblationExperiments

Asmentionedintheintroduction,LRAextendstheVSMapproachofTurneyandLittman(2005)by(1)exploringvariationsontheanalogiesbyreplacingwordswithsynonyms(step1),(2)automaticallygeneratingconnectingpatterns(step4),and(3)smoothingthedatawithSVD(step9).Inthissubsection,weablateeachofthesethreecomponentstoas-sesstheircontributiontotheperformanceofLRA.Table17showstheresults.

WithoutSVD(comparecolumn#1to#2inTable17),performancedrops,butthedropisnotstatisticallysignificantwith95%confidence,accordingtotheFisherExactTest(Agresti,1990).However,wehypothesizethatthedropinperformancewouldbesignificantwithalargersetofwordpairs.Morewordpairswouldincreasethesamplesize,whichwoulddecreasethe95%confidenceinterval,whichwouldlikelyshowthatSVDismakingasignificantcontribution.Furthermore,morewordpairswouldincreasethematrixsize,whichwouldgiveSVDmoreleverage.Forexample,LandauerandDumais(1997)applySVDtoamatrixofof30,473columnsby60,768rows,butourmatrixhereis8,000columnsby17,232rows.Wearecurrentlygather-ingmoreSATquestions,totestthishypothesis.

Withoutsynonyms(comparecolumn#1to#3inTable17),recalldropssignificantly(from56.1%to49.5%),butthedropinprecisionisnotsignificant.Whenthesynonymcomponentisdropped,thenumberofskippedquestionsrisesfrom4to22,whichdemonstratesthevalueofthesynonymcomponentofLRAforcompensatingforsparsedata.

WhenbothSVDandsynonymsaredropped(comparecolumn#1to#4inTable17),thedecreaseinrecallissignificant,butthedecreaseinprecisionisnotsignificant.Again,webelievethatalargersamplesizewouldshowthedropinprecisionissignificant.

IfweeliminatebothsynonymsandSVDfromLRA,allthatdistinguishesLRAfromVSM-WMTSisthepatterns(step4).TheVSMapproachusesafixedlistof64patternstogenerate128dimensionalvectors(TurneyandLittman,2005),whereasLRAusesadynamicallygeneratedsetof4,000patterns,resultingin8,000dimensionalvectors.WecanseethevalueoftheautomaticallygeneratedpatternsbycomparingLRAwithoutsynonymsandSVD(column#4)toVSM-WMTS(column#5).Thedifferenceinbothpre-cisionandrecallisstatisticallysignificantwith95%confidence,accordingtotheFisherExactTest(Agresti,1990).

Theablationexperimentssupportthevalueofthepatterns(step4)andsynonyms(step1)inLRA,butthecontributionofSVD(step9)hasnotbeenproven,althoughwebelievemoredatawillsupportitseffectiveness.Nonetheless,thethreecomponents

25

ComputationalLinguisticsVolume1,Number1

togetherresultina16%increaseinF(compare#1to#5).

6.6MatrixSymmetry

Weknowapriorithat,ifA:B::C:D,thenB:A::D:C.Forexample,“masonistostoneascarpenteristowood”implies“stoneistomasonaswoodistocarpenter”.Thereforeagoodmeasureofrelationalsimilarity,simr,shouldobeythefollowingequation:

simr(A:B,C:D)=simr(B:A,D:C)

(8)

Insteps5and6oftheLRAalgorithm(Section5.5),weensurethatthematrixXissymmetrical,sothatequation(8)isnecessarilytrueforLRA.ThematrixisdesignedsothattherowvectorforA:BisdifferentfromtherowvectorforB:Aonlybyapermutationoftheelements.ThesamepermutationdistinguishestherowvectorsforC:DandD:C.ThereforethecosineoftheanglebetweenA:BandC:DmustbeidenticaltothecosineoftheanglebetweenB:AandD:C(seeequation(7)).

Todiscovertheconsequencesofthisdesigndecision,wealteredsteps5and6sothatsymmetryisnolongerpreserved.Instep5,foreachwordpairA:Bthatappearsintheinputset,weonlyhaveonerow.ThereisnorowforB:AunlessB:Aalsoappearsintheinputset.Thusthenumberofrowsinthematrixdroppedfrom17,232to8,616.

Instep6,wenolongerhavetwocolumnsforeachpatternP,onefor“word1Pword2”andanotherfor“word2Pword1”.However,tobefair,wekeptthetotalnumberofcolumnsat8,000.Instep4,weselectedthetop8,000patterns(insteadofthetop4,000),distinguishingthepattern“word1Pword2”fromthepattern“word2Pword1”(in-steadofconsideringthemequivalent).ThusapatternPwithahighfrequencyislikelytoappearintwocolumns,inbothpossibleorders,butalowerfrequencypatternmightappearinonlyonecolumn,inonlyonepossibleorder.

Thesechangesresultedinaslightdecreaseinperformance.Recalldroppedfrom56.1%to55.3%andprecisiondroppedfrom56.8%to55.9%.Thedecreaseisnotsta-tisticallysignificant.However,themodifiedalgorithmnolongerobeysequation(8).AlthoughdroppingsymmetryappearstocausenosignificantharmtotheperformanceofthealgorithmontheSATquestions,weprefertoretainsymmetry,toensurethatequation(8)issatisfied.

Notethat,ifA:B::C:D,itdoesnotfollowthatB:A::C:D.Forexample,itisfalsethat“stoneistomasonascarpenteristowood”.Ingeneral(exceptwhenthesemanticrelationsbetweenAandBaresymmetrical),wehavethefollowinginequality:

simr(A:B,C:D)=simr(B:A,C:D)

(9)

ThereforewedonotwantA:BandB:Atoberepresentedbyidenticalrowvectors,al-thoughitwouldensurethatequation(8)issatisfied.

6.7AllAlternatesversusBetterAlternates

Instep12ofLRA,therelationalsimilaritybetweenA:BandC:Distheaverageofthecosines,amongthe(num

SimilarityofSemanticRelationsTurney

NCorrectIncorrectSkippedPrecisionRecallF

6.8InterpretingVectors

SupposeawordpairA:BcorrespondstoavectorrinthematrixX.Itwouldbecon-venientifinspectionofrgaveusasimpleexplanationordescriptionoftherelationbetweenAandB.Forexample,supposethewordpairostrich:birdmapstotherowvectorr.Itwouldbepleasingtolookinrandfindthatthelargestelementcorrespondstothepattern“isthelargest”(i.e.,“ostrichisthelargestbird”).Unfortunately,inspec-tionofrrevealsnosuchconvenientpatterns.

Wehypothesizethatthesemanticcontentofavectorisdistributedoverthewholevector;itisnotconcentratedinafewelements.Totestthishypothesis,wemodifiedstep10ofLRA.Insteadofprojectingthe8,000dimensionalvectorsintothe300dimen-T

sionalspaceUkΣk,weusethematrixUkΣkVk.ThismatrixyieldsthesamecosinesasUkΣk,butpreservestheoriginal8,000dimensions,makingiteasiertointerpretthe

T

rowvectors.ForeachrowvectorinUkΣkVk,weselecttheNlargestvaluesandsetallothervaluestozero.TheideahereisthatwewillonlypayattentiontotheNmostimportantpatternsinr;theremainingpatternswillbeignored.Thisreducesthelengthoftherowvectors,butthecosineisthedotproductofnormalizedvectors(allvectorsarenormalizedtounitlength;seeequation(7)),sothechangetothevectorlengthshasnoimpact;onlytheangleofthevectorsisimportant.IfmostofthesemanticcontentisintheNlargestelementsofr,thensettingtheremainingelementstozeroshouldhaverelativelylittleimpact.

Table18showstheperformanceasNvariesfrom1to3,000.TheprecisionandrecallaresignificantlybelowthebaselineLRAuntilN≥300(95%confidence,FisherExactTest).Inotherwords,foratypicalSATanalogyquestion,weneedtoexaminethetop300patternstoexplainwhyLRAselectedonechoiceinsteadofanother.

WearecurrentlyworkingonanextensionofLRAthatwillexplainwithasinglepatternwhyonechoiceisbetterthananother.Wehavehadsomepromisingresults,butthisworkisnotyetmature.However,wecanconfidentlyclaimthatinterpretingthevectorsisnottrivial.

6.9ManualPatternsversusAutomaticPatterns

TurneyandLittman(2005)used64manuallygeneratedpatternswhereasLRAuses4,000automaticallygeneratedpatterns.WeknowfromSection6.5thattheautomaticallygen-eratedpatternsaresignificantlybetterthanthemanuallygeneratedpatterns.Itmaybeinterestingtoseehowmanyofthemanuallygeneratedpatternsappearwithintheauto-maticallygeneratedpatterns.Ifwerequireanexactmatch,50ofthe64manualpatternscanbefoundintheautomaticpatterns.Ifwearelenientaboutwildcards,andcount

27

ComputationalLinguisticsVolume1,Number1

thepattern“notthe”asmatching“*notthe”(forexample),then60ofthe64manualpatternsappearwithintheautomaticpatterns.Thissuggeststhattheimprovementinperformancewiththeautomaticpatternsisduetotheincreasedquantityofpatterns,ratherthanaqualitativedifferenceinthepatterns.

TurneyandLittman(2005)pointoutthatsomeoftheir64patternshavebeenusedbyotherresearchers.Forexample,Hearst(1992)usedthepattern“suchas”todis-coverhyponymsandBerlandandCharniak(1999)usedthepattern“ofthe”todiscovermeronyms.Bothofthesepatternsareincludedinthe4,000patternsautomaticallygen-eratedbyLRA.

ThenoveltyinTurneyandLittman(2005)isthattheirpatternsarenotusedtomine

textforinstancesofwordpairsthatfitthepatterns(Hearst,1992;BerlandandCharniak,1999);instead,theyareusedtogatherfrequencydataforbuildingvectorsthatrepresenttherelationbetweenagivenpairofwords.TheresultsinSection6.8showthatavectorcon-tainsmoreinformationthananysinglepatternorsmallsetofpatterns;avectorisadis-tributedrepresentation.LRAisdistinctfromHearst(1992)andBerlandandCharniak(1999)initsfocusondistributedrepresentations,whichitshareswithTurneyandLittman(2005),butLRAgoesbeyondTurneyandLittman(2005)byfindingpatternsautomatically.

RiloffandJones(1999)andYangarber(2003)alsofindpatternsautomatically,buttheirgoalistominetextforinstancesofwordpairs;thesamegoalasHearst(1992)andBerlandandCharniak(1999).BecauseLRAusespatternstobuilddistributedvec-torrepresentations,itcanexploitpatternsthatwouldbemuchtoonoisyandunreli-ableforthekindoftextmininginstanceextractionthatistheobjectiveofHearst(1992),BerlandandCharniak(1999),RiloffandJones(1999),andYangarber(2003).ThereforeLRAcansimplyselectthehighestfrequencypatterns(step4inSection5.5);itdoesnot

needthemoresophisticatedselectionalgorithmsofRiloffandJones(1999)andYangarber(2003).7.ExperimentswithNoun-ModifierRelations

Thissectiondescribesexperimentswith600noun-modifierpairs,hand-labeledwith30classesofsemanticrelations(NastaseandSzpakowicz,2003).Inthefollowingexper-iments,LRAisusedwiththebaselineparametervalues,exactlyasdescribedinSec-tion5.5.NoadjustmentsweremadetotuneLRAtothenoun-modifierpairs.LRAisusedasadistance(nearness)measureinasinglenearestneighboursupervisedlearningalgorithm.

7.1ClassesofRelations

Thefollowingexperimentsusethe600labelednoun-modifierpairsofNastaseandSzpakowicz(2003).ThisdatasetincludesinformationaboutthepartofspeechandWordNetsynset(syn-onymset;i.e.,wordsensetag)ofeachword,butouralgorithmdoesnotusethisinfor-mation.

Table19liststhe30classesofsemanticrelations.ThetableisbasedonAppendixAofNastaseandSzpakowicz(2003),withsomesimplifications.Theoriginaltablelistedseveralsemanticrelationsforwhichtherewerenoinstancesinthedataset.Thesewererelationsthataretypicallyexpressedwithlongerphrases(threeormorewords),ratherthannoun-modifierwordpairs.Forclarity,wedecidednottoincludetheserelationsinTable19.

Inthistable,HrepresentstheheadnounandMrepresentsthemodifier.Forex-ample,in“fluvirus”,theheadnoun(H)is“virus”andthemodifier(M)is“flu”(*).InEnglish,themodifier(typicallyanounoradjective)usuallyprecedestheheadnoun.Inthedescriptionofpurpose,Vrepresentsanarbitraryverb.In“concerthall”,thehallisforpresentingconcerts(Vis“present”)orholdingconcerts(Vis“hold”)(†).

28

SimilarityofSemanticRelationsTurney

NastaseandSzpakowicz(2003)organizedtherelationsintogroups.Thefivecap-italizedtermsinthe“Relation”columnofTable19arethenamesoffivegroupsofsemanticrelations.(Theoriginaltablehadasixthgroup,buttherearenoexamplesofthisgroupinthedataset.)Wemakeuseofthisgroupinginthefollowingexperiments.

7.2BaselineLRAwithSingleNearestNeighbour

Thefollowingexperimentsusesinglenearestneighbourclassificationwithleave-one-outcross-validation.Forleave-one-outcross-validation,thetestingsetconsistsofasin-glenoun-modifierpairandthetrainingsetconsistsofthe599remainingnoun-modifiers.Thedatasetissplit600times,sothateachnoun-modifiergetsaturnasthetestingwordpair.Thepredictedclassofthetestingpairistheclassofthesinglenearestneighbourinthetrainingset.Asthemeasureofnearness,weuseLRAtocalculatetherelationalsimilaritybetweenthetestingpairandthetrainingpairs.Thesinglenearestneighbouralgorithmisasupervisedlearningalgorithm(i.e.,itrequiresatrainingsetoflabeleddata),butweareusingLRAtomeasurethedistancebetweenapairanditspotentialneighbours,andLRAisitselfdeterminedinanunsupervisedfashion(i.e.,LRAdoesnotneedlabeleddata).

EachSATquestionhasfivechoices,soanswering374SATquestionsrequiredcal-culating374×5×16=29,920cosines.Thefactorof16comesfromthealternatepairs,step11inLRA.Withthenoun-modifierpairs,usingleave-one-outcross-validation,eachtestpairhas599choices,soanexhaustiveapplicationofLRAwouldrequirecalculat-ing600×599×16=5,750,400cosines.Toreducetheamountofcomputationre-quired,wefirstfindthe30nearestneighboursforeachpair,ignoringthealternatepairs(600×599=359,400cosines),andthenapplythefullLRA,includingthealternates,tojustthose30neighbours(600×30×16=288,000cosines),whichrequirescalculatingonly359,400+288,000=647,400cosines.

Thereare600wordpairsintheinputsetforLRA.Instep2,introducingalternatepairsmultipliesthenumberofpairsbyfour,resultingin2,400pairs.Instep5,foreachpairA:B,weaddB:A,yielding4,800pairs.However,somepairsaredroppedbecausetheycorrespondtozerovectorsandafewwordsdonotappearinLin’sthesaurus.Thesparsematrix(step7)has4,748rowsand8,000columns,withadensityof8.4%.

FollowingTurneyandLittman(2005),weevaluatetheperformancebyaccuracyandalsobythemacroaveragedFmeasure(Lewis,1991).Macroaveragingcalculatestheprecision,recall,andFforeachclassseparately,andthencalculatestheaverageacrossallclasses.Microaveragingcombinesthetruepositive,falsepositive,andfalsenegativecountsforalloftheclasses,andthencalculatesprecision,recall,andFfromthecombinedcounts.Macroaveraginggivesequalweighttoallclasses,butmicroaver-aginggivesmoreweighttolargerclasses.Weusemacroaveraging(givingequalweighttoallclasses),becausewehavenoreasontobelievethattheclasssizesinthedatasetreflecttheactualdistributionoftheclassesinarealcorpus.

Classificationwith30distinctclassesisahardproblem.Tomakethetaskeasier,wecancollapsethe30classesto5classes,usingthegroupingthatisgiveninTable19.Forexample,agentandbeneficiarybothcollapsetoparticipant.Onthe30classproblem,LRAwiththesinglenearestneighbouralgorithmachievesanaccuracyof39.8%(239/600)andamacroaveragedFof36.6%.Alwaysguessingthemajorityclasswouldresultinanaccuracyof8.2%(49/600).Onthe5classproblem,theaccuracyis58.0%(348/600)andthemacroaveragedFis54.6%.Alwaysguessingthemajorityclasswouldgiveanaccuracyof43.3%(260/600).Forboththe30classand5classproblems,LRA’saccuracyissignificantlyhigherthanguessingthemajorityclass,with95%confidence,accordingtotheFisherExactTest(Agresti,1990).

29

ComputationalLinguisticsVolume1,Number1

Relationcauseeffectpurposedetraction

Abbr.cseffprpdetr

Examplephrasefluvirus(*)examanxietyconcerthall(†)headachepill

Description

HmakesMoccurorexist,Hisnecessaryandsufficient

MmakesHoccurorexist,MisnecessaryandsufficientHisforV-ingM,Mdoesnotnecessarilyoccurorexist

HopposesM,HisnotsufficienttopreventM

HoccurseverytimeMoccursHoccurswhenMoccurs

HexistedwhileMexisted,Misanintervaloftime

HisdirectedtowardsM,MisnotthefinalpointHisthelocationofMHislocatedatMHoriginatesatM

MperformsH,MisanimateornaturalphenomenonMbenefitsfromHHusesM

MisacteduponbyH

frequencytimeat

timethroughfreqtattthrdailyexercisemorningexercisesix-hourmeeting

directionlocationlocationatlocationfromagent

beneficiaryinstrumentobject

objectproperty

dirloclatlfragbeninstobjobj

outgoingmailhometowndesertstormforeigncapitalstudentproteststudentdiscountlaserprintermetalseparator

QUALITY

30

SimilarityofSemanticRelationsTurney

VSM-AVVSM-WMTSLRA

VSM-AVVSM-WMTSLRA

7.3LRAversusVSM

Table20showstheperformanceofLRAandVSMonthe30classproblem.VSM-AVisVSMwiththeAltaVistacorpusandVSM-WMTSisVSMwiththeWMTScorpus.TheresultsforVSM-AVaretakenfromTurneyandLittman(2005).Allthreepairwisediffer-encesinthethreeFmeasuresarestatisticallysignificantatthe95%level,accordingtothePairedT-Test(FeeldersandVerkooijen,1995).TheaccuracyofLRAissignificantlyhigherthantheaccuraciesofVSM-AVandVSM-WMTS,accordingtotheFisherExactTest(Agresti,1990),butthedifferencebetweenthetwoVSMaccuraciesisnotsignifi-cant.

Table21comparestheperformanceofLRAandVSMonthe5classproblem.TheaccuracyandFmeasureofLRAaresignificantlyhigherthantheaccuraciesandFmea-suresofVSM-AVandVSM-WMTS,butthedifferencesbetweenthetwoVSMaccuraciesandFmeasuresarenotsignificant.8.Discussion

TheexperimentalresultsinSections6and7demonstratethatLRAperformssignifi-cantlybetterthantheVSM,butitisalsoclearthatthereisroomforimprovement.Theaccuracymightnotyetbeadequateforpracticalapplications,althoughpastworkhas

shownthatitispossibletoadjustthetradeoffofprecisionversusrecall(TurneyandLittman,2005).Forsomeoftheapplications,suchasinformationextraction,LRAmightbesuitableifitisadjustedforhighprecision,attheexpenseoflowrecall.

Anotherlimitationisspeed;ittookalmostninedaysforLRAtoanswer374analogyquestions.However,withprogressincomputerhardware,speedwillgraduallybecome

31

ComputationalLinguisticsVolume1,Number1

lessofaconcern.Also,thesoftwarehasnotbeenoptimizedforspeed;thereareseveralplaceswheretheefficiencycouldbeincreasedandmanyoperationsareparallelizable.ItmayalsobepossibletoprecomputemuchoftheinformationforLRA,althoughthiswouldrequiresubstantialchangestothealgorithm.

ThedifferenceinperformancebetweenVSM-AVandVSM-WMTSshowsthatVSMissensitivetothesizeofthecorpus.AlthoughLRAisabletosurpassVSM-AVwhentheWMTScorpusisonlyaboutonetenththesizeoftheAVcorpus,itseemslikelythatLRAwouldperformbetterwithalargercorpus.TheWMTScorpusrequiresoneterabyteofharddiskspace,butprogressinhardwarewilllikelymaketenorevenonehundredterabytesaffordableintherelativelynearfuture.

Fornoun-modifierclassification,morelabeleddatashouldyieldperformanceim-provements.With600noun-modifierpairsand30classes,theaverageclasshasonly20examples.Weexpectthattheaccuracywouldimprovesubstantiallywithfiveortentimesmoreexamples.Unfortunately,itistimeconsumingandexpensivetoacquirehand-labeleddata.

Anotherissuewithnoun-modifierclassificationisthechoiceofclassificationschemeforthesemanticrelations.The30classesofNastaseandSzpakowicz(2003)mightnotbethebestscheme.Otherresearchershaveproposeddifferentschemes(Vanderwende,1994;BarkerandSzpakowicz,1998;RosarioandHearst,2001;Rosario,Hearst,andFillmore,2002).Itseemslikelythatsomeschemesareeasierformachinelearningthanothers.Forsomeapplications,30classesmaynotbenecessary;the5classschememaybesufficient.

LRA,likeVSM,isacorpus-basedapproachtomeasuringrelationalsimilarity.Pastworksuggeststhatahybridapproach,combiningmultiplemodules,somecorpus-based,somelexicon-based,willsurpassanypurebredapproach(Turneyetal.,2003).Infuturework,itwouldbenaturaltocombinethecorpus-basedapproachofLRAwiththelexicon-basedapproachofVeale(2004),perhapsusingthecombinationmethodofTurneyetal.(2003).

TheSingularValueDecompositionisonlyoneofmanymethodsforhandlingsparse,noisydata.WehavealsoexperimentedwithNonnegativeMatrixFactorization(NMF)(LeeandSeung,1999),ProbabilisticLatentSemanticAnalysis(PLSA)(Hofmann,1999),KernelPrincipalComponentsAnalysis(KPCA)(Scholkopf,Smola,andMuller,1997),andIterativeScaling(IS)(Ando,2000).Wehadsomeinterestingresultswithsmallma-trices(around2,000rowsby1,000columns),butnoneofthesemethodsseemedsub-stantiallybetterthanSVDandnoneofthemscaleduptothematrixsizesweareusinghere(e.g.,17,232rowsand8,000columns;seeSection6.1).

Instep4ofLRA,wesimplyselectthetopnum

SimilarityofSemanticRelationsTurney

(1)thepatternsaregenerateddynamicallyfromthecorpus,(2)SVDisusedtosmooththedata,and(3)athesaurusisusedtoexplorevariationsofthewordpairs.WiththeWMTScorpus(about5×1010Englishwords),LRAachievesanFof56.5%,whereastheFofVSMis40.3%.

Wehavepresentedseveralexamplesofthemanypotentialapplicationsformea-suresofrelationalsimilarity.Justasattributionalsimilaritymeasureshaveproventohavemanypracticaluses,weexpectthatrelationalsimilaritymeasureswillsoonbe-comewidelyused.Gentneretal.(2001)arguethatrelationalsimilarityisessentialtounderstandingnovelmetaphors(asopposedtoconventionalmetaphors).Manyre-searchershavearguedthatmetaphoristheheartofhumanthinking(LakoffandJohnson,1980;HofstadterandtheFluidAnalogiesResearchGroup,1995;Gentneretal.,2001;French,2002).Webelievethatrelationalsimilarityplaysafundamentalroleinthemindandthereforerelationalsimilaritymeasurescouldbecrucialforartificialintelligence.

Infuturework,weplantoinvestigatesomepotentialapplicationsforLRA.Itispos-siblethattheerrorrateofLRAisstilltoohighforpracticalapplications,butthefactthatLRAmatchesaveragehumanperformanceonSATanalogyquestionsisencouraging.

33

ComputationalLinguisticsAcknowledgments

Volume1,Number1

ThankstoMichaelLittmanforsharingthe374SATanalogyquestionsandforinspiringmetotacklethem.ThankstoViviNastaseandStanSzpakowiczforsharingtheir600classifiednoun-modifierphrases.ThankstoEgidioTerra,Char-lieClarke,andtheSchoolofComputerScienceoftheUniversityofWaterloo,forgivingusa

copyoftheWaterlooMultiTextSystemandtheir[BerlandandCharniak1999]Berland,TerabyteCorpus.ThankstoDekangLinformak-MatthewandEugeneCharniak.1999.

inghisDependency-BasedWordSimilaritylex-Findingpartsinverylargecorpora.In

iconavailableonline.ThankstoDougRohdeProceedingsofthe37thAnnualMeetingforSVDLIBCandMichaelBerryforSVDPACK.oftheAssociationforComputationalThankstoTedPedersenformakinghisWord-Linguistics(ACL’99),pages57–64,

Net::Similaritypackageavailable.ThankstoJoelNewBrunswick,NJ.Martinforcommentsonthepaper.Thanksto

theanonymousreviewersofComputationalLin-[Berry1992]Berry,MichaelW.1992.Largeguisticsfortheirveryhelpfulcommentsandsug-scalesingularvaluecomputations.In-gestions.ternationalJournalofSupercomputerAp-

andPeteWhitelock,editors,Proceed-ingsoftheThirty-SixthAnnualMeet-ingoftheAssociationforComputational

LinguisticsandSeventeenthInternationalConferenceonComputationalLinguistics(COLING-ACL’98),pages96–102,SanFrancisco,California.MorganKauf-mannPublishers.

References

[Agresti1990]Agresti,Alan.1990.Categor-icalDataAnalysis.Wiley.[Ando2000]Ando,RieKubota.2000.La-tentsemanticspace:Iterativescalingimprovesprecisionofinter-documentsimilaritymeasurement.InProceed-ingsofthe23rdAnnualACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval(SIGIR-2000),pages216–223.

plications,6(1):13–49.

[BudanitskyandHirst2001]Budanitsky,

AlexanderandGraemeHirst.2001.Semanticdistanceinwordnet:Anexperimental,application-orientedevaluationoffivemeasures.InPro-ceedingsoftheWorkshoponWordNetandOtherLexicalResources,SecondMeetingoftheNorthAmericanChapteroftheAssociationforComputationalLin-guistics(NAACL-2001),pages29–24,Pittsburgh,PA.

[Baeza-YatesandRibeiro-Neto1999][Chiarelloetal.1990]Chiarello,Christine,

Baeza-Yates,RicardoA.andCurtBurgess,LorieRichards,andBerthierA.Ribeiro-Neto.1999.AlmaPollock.1990.Semanticandas-ModernInformationRetrieval.ACMsociativepriminginthecerebralhemi-Press.spheres:Somewordsdo,somewords

don’t...sometimes,someplaces.Brain

[BanerjeeandPedersen2003]Banerjee,Sa-andLanguage,38:75–104.

tanjeevandTedPedersen.2003.Ex-tendedglossoverlapsasameasureof

[Claman2000]Claman,Cathy.2000.10

semanticrelatedness.InProceedingsof

RealSATs.CollegeEntranceExamina-theEighteenthInternationalJointConfer-tionBoard.

enceonArtificialIntelligence(IJCAI-03),pages805–810,Acapulco,Mexico.

[Clarke,Cormack,andPalmer1998]

Clarke,CharlesL.A.,GordonV.Cor-[BarkerandSzpakowicz1998]Barker,Ken

mack,andChristopherR.Palmer.andStanSzpakowicz.1998.Semi-1998.Anoverviewofmultitext.ACMautomaticrecognitionofnounmodi-SIGIRForum,32(2):14–15.fierrelationships.InChristianBoitet

34

SimilarityofSemanticRelationsTurney

[Daganzo1994]Daganzo,CarlosF.1994.

Thecelltransmissionmodel:Ady-namicrepresentationofhighwaytraf-ficconsistentwiththehydrodynamictheory.TransportationResearchPartB:Methodological,28(4):269–287.

[Deerwesteretal.1990]Deerwester,

ScottC.,SusanT.Dumais,ThomasK.Landauer,GeorgeW.Furnas,and

RichardA.Harshman.1990.Indexing[FeeldersandVerkooijen1995]Feelders,bylatentsemanticanalysis.JournalAdandWilliamVerkooijen.1995.oftheAmericanSocietyforInformationWhichmethodlearnsthemostfromScience(JASIS),41(6):391–407.data?Methodologicalissuesinthe

analysisofcomparativestudies.In

[Dolan1995]Dolan,WilliamB.1995.

FifthInternationalWorkshoponArti-Metaphorasanemergentpropertyof

ficialIntelligenceandStatistics,pages

machine-readabledictionaries.InPro-219–225,Ft.Lauderdale,Florida.

ceedingsoftheAAAI1995SpringSympo-siumSeries:RepresentationandAcquisi-[Fellbaum1998]Fellbaum,Christiane,edi-tionofLexicalKnowledge:Polysemy,Am-tor.1998.WordNet:AnElectronicLexi-biguityandGenerativity,pages27–32.calDatabase.MITPress.[Dumais1990]Dumais,SusanT.1990.En-[French2002]French,RobertM.2002.Thehancingperformanceinlatentseman-computationalmodelingofanalogy-ticindexing(LSI)retrieval.Techni-making.TrendsinCognitiveSciences,

calReportTM-ARH-017527,Bellcore,

6(5):200–205.

Morristown,NJ.

[Gentner1983]Gentner,Dedre.1983.

[Dumais1993]Dumais,SusanT.1993.

Structure-mapping:Atheoretical

Latentsemanticindexing(LSI)and

frameworkforanalogy.Cognitive

TREC-2.InD.K.Harman,editor,Pro-Science,7(2):155–170.

ceedingsoftheSecondTextREtrieval

Conference(TREC-2),pages105–115.[Gentneretal.2001]Gentner,Dedre,BrianNationalInstituteofStandardsandBowdle,PhillipWolff,andConsueloTechnology.Boronat.2001.Metaphorislike[Falkenhainer1990]Falkenhainer,Brian.1990.Analogicalinterpretationincontext.InProceedingsoftheTwelfthAnnualConferenceoftheCognitiveScienceSociety,pages69–76.LawrenceErlbaumAssociates.

analogy.InDedreGentner,KeithJ.Holyoak,andBoichoN.Kokinov,ed-itors,TheAnalogicalMind:PerspectivesfromCognitiveScience,pages199–253,Cambridge,MA.MITPress.

Inferringsemanticsimilarityfromdistributionalevidence:Ananalogy-basedapproachtowordsensedisambiguation.InProceedingsoftheACL/EACLWorkshoponAutomaticInformationExtractionandBuildingofLexicalSemanticResourcesforNLPApplications,pages90–97,Madrid,Spain.

[GildeaandJurafsky2002]Gildea,Daniel

andDanielJurafsky.2002.Automatic[Falkenhainer,Forbus,andGentner1989]

labelingofsemanticroles.Computa-Falkenhainer,Brian,KennethD.For-tionalLinguistics,28(3):245–288.bus,andDedreGentner.1989.The

structure-mappingengine:Algorithm

andexamples.ArtificialIntelligence,[Girju,Badulescu,andMoldovan2003]

Girju,Roxana,AdrianaBadulescu,41(1):1–63.

andDanI.Moldovan.2003.Learning

[Federici,Montemagni,andPirrelli1997]semanticconstraintsfortheautomatic

Federici,Stefano,SimonettaMon-discoveryofpart-wholerelations.

temagni,andVitoPirrelli.1997.InProceedingsoftheHumanLanguage

35

ComputationalLinguisticsVolume1,Number1

TechnologyConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics(HLT-NAACL2003),pages80–87.

[Goldenberg2005]Goldenberg,David.

2005.Theemperor’snewclothes:Undressingthenewandunimprovedsat.GelfMag-azine,March.http://www.gelf-magazine.com/mt/archives/the

new

SimilarityofSemanticRelationsTurney

[Lauer1995]Lauer,Mark.1995.Design-[Marxetal.2002]Marx,Zvika,IdoDagan,

ingStatisticalLanguageLearners:Exper-JoachimBuhmann,andEliShamir.

imentsonCompoundNouns.Ph.D.the-2002.Coupledclustering:Amethod

sis,MacquarieUniversity.fordetectingstructuralcorrespon-dence.JournalofMachineLearningRe-[LeacockandChodorow1998]Leacock,search,3:747–780.ClaudiaandMartinChodorow.

1998.Combininglocalcontextand[Medin,Goldstone,andGentner1990]

Medin,DouglasL.,RobertL.Gold-WordNetsimilarityforwordsense

stone,andDedreGentner.1990.identification.InChristianeFellbaum,

Similarityinvolvingattributesandeditor,WordNet:AnElectronicLexical

relations:JudgmentsofsimilarityDatabase,pages265–283.MITPress.

anddifferencearenotinverses.

[LeeandSeung1999]Lee,DanielD.andPsychologicalScience,1(1):64–69.

H.SebastianSeung.1999.Learn-Dan,ingthepartsofobjectsbynonnegative[Moldovanetal.2004]Moldovan,

AdrianaBadulescu,MartaTatu,matrixfactorization.Nature,401:788–

DanielAntohe,andRoxanaGirju.791.

2004.Modelsforthesemanticclassifi-cationofnounphrases.InProceedings[Lesk1969]Lesk,MichaelE.1969.Word-oftheComputationalLexicalSemanticswordassociationsindocumentre-WorkshopatHLT-NAACL2004,pagestrievalsystems.AmericanDocumenta-60–67,Boston,MA.tion,20(1):27–38.[Lesk1986]Lesk,MichaelE.1986.Auto-[MorrisandHirst1991]Morris,JaneandGraemeHirst.1991.Lexicalcohesionmaticsensedisambiguationusingma-computedbythesauralrelationsasanchinereadabledictionaries:Howto

indicatorofthestructureoftext.Com-tellapineconefromaicecreamcone.

putationalLinguistics,17(1):21–48.InProceedingsofACMSIGDOC’86,

pages24–26.

[NastaseandSzpakowicz2003]Nastase,

ViviandStanSzpakowicz.2003.[Lewis1991]Lewis,DavidD.1991.Eval-Exploringnoun-modifierseman-uatingtextcategorization.InPro-ticrelations.InFifthInternationalceedingsoftheSpeechandNaturalLan-WorkshoponComputationalSemanticsguageWorkshop,pages312–318,Asilo-(IWCS-5),pages285–301,Tilburg,Themar,CA.MorganKaufmann.

Netherlands.

[Lin1998a]Lin,Dekang.1998a.Auto-[PantelandLin2002]Pantel,Patrickand

maticretrievalandclusteringofsimi-DekangLin.2002.Discoveringword

larwords.InProceedingsofthe36thAn-sensesfromtext.InProceedingsof

nualMeetingoftheAssociationforCom-ACMSIGKDDConferenceonKnowledge

putationalLinguisticsandthe17thIn-DiscoveryandDataMining,pages613–

ternationalConferenceonComputational

619.

Linguistics(COLING-ACL’98),pages768–774,Montreal,Canada.[Radaetal.1989]Rada,Roy,HafedhMili,

EllenBicknell,andMariaBlettner.

[Lin1998b]Lin,Dekang.1998b.An1989.Developmentandapplicationof

information-theoreticdefinitionofametriconsemanticnets.IEEETrans-similarity.InProceedingsofthe15thactionsonSystems,Man,andCybernet-InternationalConferenceonMachineics,19(1):17–30.Learning(ICML’98),pages296–304.

Bob,M.E.MorganKaufmann,SanFrancisco,[Rehderetal.1998]Rehder,

Schreiner,MichaelB.W.Wolfe,DarrellCA.

37

ComputationalLinguisticsVolume1,Number1

Laham,ThomasK.Landauer,andmationbyComputer.Addison-Wesley,WalterKintsch.1998.UsinglatentReading,MA.semanticanalysistoassessknowl-edge:Sometechnicalconsiderations.[SaltonandBuckley1988]Salton,Gerard

andChrisBuckley.1988.Term-DiscourseProcesses,25:337–354.

weightingapproachesinautomatictextretrieval.InformationProcessing[Reitman1965]Reitman,WalterR.1965.

andManagement,24(5):513–523.CognitionandThought:AnInformation

ProcessingApproach.JohnWileyand

[SaltonandMcGill1983]Salton,Gerard

Sons,NewYork,NY.

andMichaelJ.McGill.1983.Intro-ductiontoModernInformationRetrieval.[Resnik1995]Resnik,Philip.1995.Us-McGraw-Hill,NewYork,NY.inginformationcontenttoevaluatesemanticsimilarityinataxonomy.[Scholkopf,Smola,andMuller1997]

InProceedingsofthe14thInternationalScholkopf,Bernhard,AlexanderJ.JointConferenceonArtificialIntelligenceSmola,andKlaus-RobertMuller.(IJCAI-95),pages448–453,SanMateo,1997.KernelprincipalcomponentCA.MorganKaufmann.analysis.InProceedingsoftheIn-ternationalConferenceonArtificial[RiloffandJones1999]Riloff,Ellenand

NeuralNetworks(ICANN-1997),pages

RosieJones.1999.Learningdictio-583–588,Berlin.

nariesforinformationextractionby

multi-levelbootstrapping.InProceed-[TerraandClarke2003]Terra,EgidioandingsoftheSixteenthNationalConferenceCharlesL.A.Clarke.2003.FrequencyonArtificialIntelligence(AAAI-99),estimatesforstatisticalwordsimilar-pages474–479.itymeasures.InProceedingsofthe

HumanLanguageTechnologyandNorth

[RosarioandHearst2001]Rosario,Barbara

AmericanChapterofAssociationofCom-andMartiHearst.2001.Classify-putationalLinguisticsConference2003

ingthesemanticrelationsinnoun-(HLT/NAACL2003),pages244–251.

compoundsviaadomain-specificlexi-calhierarchy.InProceedingsofthe2001[Turney2001]Turney,PeterD.2001.Min-ConferenceonEmpiricalMethodsinNat-ingtheWebforsynonyms:PMI-IR

uralLanguageProcessing(EMNLP-01),versusLSAonTOEFL.InProceed-pages82–90.ingsoftheTwelfthEuropeanConference

onMachineLearning,pages491–502,

[Rosario,Hearst,andFillmore2002]Berlin.Springer.

Rosario,Barbara,MartiHearst,

2002.andCharlesFillmore.2002.The[Turney2002]Turney,PeterD.

Thumbsuporthumbsdown?Se-descentofhierarchy,andselectionin

manticorientationappliedtounsuper-relationalsemantics.InProceedingsof

visedclassificationofreviews.InPro-the40thAnnualMeetingoftheAssocia-ceedingsofthe40thAnnualMeetingoftionforComputationalLinguistics(ACL

theAssociationforComputationalLin-’02),pages417–424,Philadelphia,PA.

guistics(ACL’02),pages417–424.

[Ruge1992]Ruge,Gerda.1992.Experi-mentsonlinguistically-basedtermas-[Turney2005]Turney,PeterD.2005.Mea-suringsemanticsimilaritybylatentsociations.InformationProcessingand

relationalanalysis.InProceedingsofManagement,28(3):317–332.

theNineteenthInternationalJointCon-[Salton1989]Salton,Gerard.1989.Au-ferenceonArtificialIntelligence(IJCAI-tomaticTextProcessing:TheTransfor-05),pages1136–1141,Edinburgh,Scot-mation,Analysis,andRetrievalofInfor-land.

38

SimilarityofSemanticRelationsTurney

[TurneyandLittman2005]Turney,PeterD.

JournalofMachineLearningResearch,andMichaelL.Littman.2005.3:1083–1106.

Corpus-basedlearningofanalogiesandsemanticrelations.MachineLearn-ing,60(1–3):251–278.[Turneyetal.2003]Turney,PeterD.,

MichaelL.Littman,JeffreyBigham,andVictorShnayder.2003.Com-biningindependentmodulestosolvemultiple-choicesynonymandanal-ogyproblems.InProceedingsoftheInternationalConferenceonRecentAd-vancesinNaturalLanguageProcessing(RANLP-03),pages482–489,Borovets,Bulgaria.[Vanderwende1994]Vanderwende,Lucy.

1994.Algorithmforautomaticin-terpretationofnounsequences.InProceedingsoftheFifteenthInternationalConferenceonComputationalLinguistics,pages782–788,Kyoto,Japan.[Veale2003]Veale,Tony.2003.Theanalogi-calthesaurus.InProceedingsofthe15thInnovativeApplicationsofArtificialIn-telligenceConference(IAAI2003),pages137–142,Acapulco,Mexico.[Veale2004]Veale,Tony.2004.WordNet

sitstheSAT:Aknowledge-basedap-proachtolexicalanalogy.InPro-ceedingsofthe16thEuropeanConferenceonArtificialIntelligence(ECAI2004),pages606–612,Valencia,Spain.[Yangarber2003]Yangarber,Roman.2003.

Counter-trainingindiscoveryofse-manticpatterns.InProceedingsofthe41stAnnualMeetingoftheAssocia-tionforComputationalLinguistics(ACL-2003),pages343–350,Sapporo,Japan.[Yarowsky1993]Yarowsky,David.1993.

Onesensepercollocation.InPro-ceedingsoftheARPAHumanLanguageTechnologyWorkshop,pages266–271,Princeton,NJ.[Zelenko,Aone,andRichardella2003]

Zelenko,Dmitry,ChinatsuAone,andAnthonyRichardella.2003.Ker-nelmethodsforrelationextraction.

39

因篇幅问题不能全部显示,请点此查看更多更全内容

Top