搜索
您的当前位置:首页正文

Similarity of semantic relations

来源:榕意旅游网
SimilarityofSemanticRelations

PeterD.Turney

NationalResearchCouncilCanada

Thereareatleasttwokindsofsimilarity.Relationalsimilarityiscorrespondencebetweenre-lations,incontrastwithattributionalsimilarity,whichiscorrespondencebetweenattributes.Whentwowordshaveahighdegreeofattributionalsimilarity,wecallthemsynonyms.Whentwopairsofwordshaveahighdegreeofrelationalsimilarity,wesaythattheirrelationsareanal-ogous.Forexample,thewordpairmason:stoneisanalogoustothepaircarpenter:wood.ThispaperintroducesLatentRelationalAnalysis(LRA),amethodformeasuringrelationalsimilar-ity.LRAhaspotentialapplicationsinmanyareas,includinginformationextraction,wordsensedisambiguation,andinformationretrieval.RecentlytheVectorSpaceModel(VSM)ofinforma-tionretrievalhasbeenadaptedtomeasuringrelationalsimilarity,achievingascoreof47%onacollectionof374college-levelmultiple-choicewordanalogyquestions.IntheVSMapproach,therelationbetweenapairofwordsischaracterizedbyavectoroffrequenciesofpredefinedpatternsinalargecorpus.LRAextendstheVSMapproachinthreeways:(1)thepatternsarederivedautomaticallyfromthecorpus,(2)theSingularValueDecomposition(SVD)isusedtosmooththefrequencydata,and(3)automaticallygeneratedsynonymsareusedtoexplorevariationsofthewordpairs.LRAachieves56%onthe374analogyquestions,statisticallyequivalenttotheaveragehumanscoreof57%.Ontherelatedproblemofclassifyingsemanticrelations,LRAachievessimilargainsovertheVSM.1.Introduction

Thereareatleasttwokindsofsimilarity.Attributionalsimilarityiscorrespondencebe-tweenattributesandrelationalsimilarityiscorrespondencebetweenrelations(Medin,Goldstone,andGentner,1990).Whentwowordshaveahighdegreeofattributionalsimilarity,wecallthemsynonyms.Whentwowordpairshaveahighdegreeofrela-tionalsimilarity,wesaytheyareanalogous.

VerbalanalogiesareoftenwrittenintheformA:B::C:D,meaningAistoBasCistoD;forexample,traffic:street::water:riverbed.Trafficflowsoverastreet;waterflowsoverariverbed.Astreetcarriestraffic;ariverbedcarrieswater.Thereisahighdegreeofrela-tionalsimilaritybetweenthewordpairtraffic:streetandthewordpairwater:riverbed.Infact,thisanalogyisthebasisofseveralmathematicaltheoriesoftrafficflow(Da-ganzo,1994).

InSection2,welookmorecloselyattheconnectionsbetweenattributionalandre-lationalsimilarity.Inanalogiessuchasmason:stone::carpenter:wood,itseemsthatre-lationalsimilaritycanbereducedtoattributionalsimilarity,sincemasonandcarpenterareattributionallysimilar,asarestoneandwood.Ingeneral,thisreductionfails.Con-sidertheanalogytraffic:street::water:riverbed.Trafficandwaterarenotattributionallysimilar.Streetandriverbedareonlymoderatelyattributionallysimilar.

ComputationalLinguisticsVolume1,Number1

Manyalgorithmshavebeenproposedformeasuringtheattributionalsimilaritybe-tweentwowords(Lesk,1969;Resnik,1995;LandauerandDumais,1997;JiangandConrath,1997;Lin,1998b;Turney,2001;BudanitskyandHirst,2001;BanerjeeandPed-ersen,2003).Measuresofattributionalsimilarityhavebeenstudiedextensively,duetotheirapplicationsinproblemssuchasrecognizingsynonyms(LandauerandDumais,1997),informationretrieval(Deerwesteretal.,1990),determiningsemanticorientation(Turney,2002),gradingstudentessays(Rehderetal.,1998),measuringtextualcohesion(MorrisandHirst,1991),andwordsensedisambiguation(Lesk,1986).

Ontheotherhand,sincemeasuresofrelationalsimilarityarenotaswelldevelopedasmeasuresofattributionalsimilarity,thepotentialapplicationsofrelationalsimilarityarenotaswellknown.Manyproblemsthatinvolvesemanticrelationswouldbenefitfromanalgorithmformeasuringrelationalsimilarity.Wediscussrelatedproblemsinnaturallanguageprocessing,informationretrieval,andinformationextractioninmoredetailinSection3.

ThispaperbuildsontheVectorSpaceModel(VSM)ofinformationretrieval.Givenaquery,asearchengineproducesarankedlistofdocuments.Thedocumentsarerankedinorderofdecreasingattributionalsimilaritybetweenthequeryandeachdoc-ument.AlmostallmodernsearchenginesmeasureattributionalsimilarityusingtheVSM(Baeza-YatesandRibeiro-Neto,1999).TurneyandLittman(2005)adapttheVSMapproachtomeasuringrelationalsimilarity.Theyusedavectoroffrequenciesofpat-ternsinacorpustorepresenttherelationbetweenapairofwords.Section4presentstheVSMapproachtomeasuringsimilarity.

InSection5,wepresentanalgorithmformeasuringrelationalsimilarity,whichwecallLatentRelationalAnalysis(LRA).Thealgorithmlearnsfromalargecorpusofunlabeled,unstructuredtext,withoutsupervision.LRAextendstheVSMapproachofTurneyandLittman(2005)inthreeways:(1)Theconnectingpatternsarederivedautomaticallyfromthecorpus,insteadofusingafixedsetofpatterns.(2)SingularValueDecomposition(SVD)isusedtosmooththefrequencydata.(3)Givenawordpairsuchastraffic:street,LRAconsiderstransformationsofthewordpair,generatedbyreplacingoneofthewordsbysynonyms,suchastraffic:road,traffic:highway.

Section6presentsourexperimentalevaluationofLRAwithacollectionof374multiple-choicewordanalogyquestionsfromtheSATcollegeentranceexam.1Anex-ampleofatypicalSATquestionappearsinTable1.Intheeducationaltestingliterature,thefirstpair(mason:stone)iscalledthestemoftheanalogy.Thecorrectchoiceiscalledthesolutionandtheincorrectchoicesaredistractors.WeevaluateLRAbytestingitsabil-itytoselectthesolutionandavoidthedistractors.Theaverageperformanceofcollege-boundseniorhighschoolstudentsonverbalSATquestionscorrespondstoanaccuracyofabout57%.LRAachievesanaccuracyofabout56%.Onthesesamequestions,theVSMattained47%.

Oneapplicationforrelationalsimilarityisclassifyingsemanticrelationsinnoun-modifierpairs(TurneyandLittman,2005).InSection7,weevaluatetheperformanceofLRAwithasetof600noun-modifierpairsfromNastaseandSzpakowicz(2003).Theproblemistoclassifyanoun-modifierpair,suchas“laserprinter”,accordingtothesemanticrelationbetweentheheadnoun(printer)andthemodifier(laser).The600pairshavebeenmanuallylabeledwith30classesofsemanticrelations.Forexample,“laserprinter”isclassifiedasinstrument;theprinterusesthelaserasaninstrumentfor

SimilarityofSemanticRelationsTurney

Stem:mason:stone

Solution:(b)carpenter:wood

,).

Theamountofattributionalsimilaritybetweentwowords,and,dependsonthedegreeofcorrespondencebetweenthepropertiesofand.Ameasureofattributionalsimilarityisafunctionthatmapstwowords,and,toarealnumber,

.Themorecorrespondencethereisbetweenthepropertiesofand,thegreatertheirattributionalsimilarity.Forexample,dogandwolfhavearelativelyhighdegreeofattributionalsimilarity.

Theamountofrelationalsimilaritybetweentwopairsofwords,A:BandC:D,de-pendsonthedegreeofcorrespondencebetweentherelationsbetweenandandtherelationsbetweenand.Ameasureofrelationalsimilarityisafunctionthatmapstwopairs,A:BandC:D,toarealnumber,.Themorecor-respondencethereisbetweentherelationsofA:BandC:D,thegreatertheirrelationalsimilarity.Forexample,dog:barkandcat:meowhavearelativelyhighdegreeofrelational

THAN(

3

ComputationalLinguisticsVolume1,Number1

similarity.

Cognitivescientistsdistinguishwordsthataresemanticallyassociated(bee–honey)fromwordsthataresemanticallysimilar(deer–pony),althoughtheyrecognizethatsomewordsarebothassociatedandsimilar(doctor–nurse)(Chiarelloetal.,1990).Bothofthesearetypesofattributionalsimilarity,sincetheyarebasedoncorrespondencebe-tweenattributes(e.g.,beesandhoneyarebothfoundinhives;deerandponiesarebothmammals).

BudanitskyandHirst(2001)describesemanticrelatednessasfollows:

Recentresearchonthetopicincomputationallinguisticshasemphasizedtheperspectiveofsemanticrelatednessoftwolexemesinalexicalresource,orits

inverse,semanticdistance.It’simportanttonotethatsemanticrelatednessisamoregeneralconceptthansimilarity;similarentitiesareusuallyassumedtoberelatedbyvirtueoftheirlikeness(bank-trustcompany),butdissimilarentitiesmayalsobesemanticallyrelatedbylexicalrelationshipssuchasmeronymy(car-wheel)andantonymy(hot-cold),orjustbyanykindoffunctionalrelationshiporfrequentassociation(pencil-paper,penguin-Antarctica).

Astheseexamplesshow,semanticrelatednessisthesameasattributionalsimilarity(e.g.,hotandcoldarebothkindsoftemperature,pencilandpaperarebothusedforwriting).Hereweprefertousethetermattributionalsimilarity,becauseitemphasizesthecontrastwithrelationalsimilarity.Thetermsemanticrelatednessmayleadtoconfu-sionwhenthetermrelationalsimilarityisalsounderdiscussion.

Resnik(1995)describessemanticsimilarityasfollows:

Semanticsimilarityrepresentsaspecialcaseofsemanticrelatedness:forexample,carsandgasolinewouldseemtobemorecloselyrelatedthan,say,carsandbicycles,butthelatterpairarecertainlymoresimilar.Radaetal.(1989)suggestthattheassessmentofsimilarityinsemanticnetworkscaninfactbethoughtofasinvolvingjusttaxonimic(IS-A)links,totheexclusionofotherlinktypes;thatviewwillalsobetakenhere,althoughadmittedlyitexcludessomepotentiallyusefulinformation.

Thussemanticsimilarityisaspecifictypeofattributionalsimilarity.Thetermsemanticsimilarityismisleading,becauseitreferstoatypeofattributionalsimilarity,yetrela-tionalsimilarityisnotanylesssemanticthanattributionalsimilarity.

Toavoidconfusion,wewillusethetermsattributionalsimilarityandrelationalsim-ilarity,followingMedin,Goldstone,andGentner(1990).Insteadofsemanticsimilarity(Resnik,1995)orsemanticallysimilar(Chiarelloetal.,1990),wepreferthetermtaxonomi-calsimilarity,whichwetaketobeaspecifictypeofattributionalsimilarity.Weinterpretsynonymyasahighdegreeofattributionalsimilarity.Analogyisahighdegreeofrela-tionalsimilarity.

2.2MeasuringAttributionalSimilarity

Algorithmsformeasuringattributionalsimilaritycanbelexicon-based(Lesk,1986;Bu-danitskyandHirst,2001;BanerjeeandPedersen,2003),corpus-based(Lesk,1969;Lan-dauerandDumais,1997;Lin,1998a;Turney,2001),orahybridofthetwo(Resnik,1995;JiangandConrath,1997;Turneyetal.,2003).Intuitively,wemightexpectthatlexicon-basedalgorithmswouldbebetteratcapturingsynonymythancorpus-basedalgorithms,sincelexicons,suchasWordNet,explicitlyprovidesynonymyinformationthatisonlyimplicitinacorpus.However,experimentsdonotsupportthisintuition.

Severalalgorithmshavebeenevaluatedusing80multiple-choicesynonymques-

4

SimilarityofSemanticRelationsTurney

Stem:levied

Solution:(a)imposed

Table3

Performanceofattributionalsimilaritymeasuresonthe80TOEFLquestions.(Theaverage

non-EnglishUScollegeapplicant’sperformanceisincludedinthebottomrow,forcomparison.)

JarmaszandSzpakowicz(2003)TerraandClarke(2003)Turneyetal.(2003)bestlexicon-basedalgorithmbestcorpus-basedalgorithmbesthybridalgorithm78.7581.2597.50

tionstakenfromtheTestofEnglishasaForeignLanguage(TOEFL).Anexampleofoneofthe80TOEFLquestionsappearsinTable2.Table3showsthebestperformanceontheTOEFLquestionsforeachtypeofattributionalsimilarityalgorithm.Theresultssupporttheclaimthatlexicon-basedalgorithmshavenoadvantageovercorpus-basedalgorithmsforrecognizingsynonymy.

2.3UsingAttributionalSimilaritytoSolveAnalogies

Wemaydistinguishnearanalogies(mason:stone::carpenter:wood)fromfaranalogies(traffic:street::water:riverbed)(Gentner,1983;Medin,Goldstone,andGentner,1990).InananalogyA:B::C:D,wherethereisahighdegreeofrelationalsimilaritybetweenA:BandC:D,ifthereisalsoahighdegreeofattributionalsimilaritybetweenand,andbetweenand,thenA:B::C:Disanearanalogy;otherwise,itisafaranalogy.

ItseemspossiblethatSATanalogyquestionsmightconsistlargelyofnearanalogies,inwhichcasetheycanbesolvedusingattributionalsimilaritymeasures.Wecouldscoreeachcandidateanalogybytheaverageoftheattributionalsimilarity,,betweenandandbetweenand:

(2)

5

ComputationalLinguisticsVolume1,Number1

AlgorithmTypePrecisionRecall

TurneyandLittman(2005)randomrelational(VSM)random47.720.047.120.047.420.0

(3)

Seehttp://www.d.umn.edu/tpederse/similarity.html.

6

SimilarityofSemanticRelationsTurney

wasfirstattemptedbyasystemcalledArgus(Reitman,1965),usingasmallhand-builtsemanticnetwork.Arguscouldonlysolvethelimitedsetofanalogyquestionsthatitsprogrammerhadanticipated.Arguswasbasedonaspreadingactivationmodelanddidnotexplicitlyattempttomeasurerelationalsimilarity.

Turneyetal.(2003)combined13independentmodulestoanswerSATquestions.Thefinaloutputofthesystemwasbasedonaweightedcombinationoftheoutputsofeachindividualmodule.Thebestofthe13moduleswastheVSM,whichisdescribedindetailinTurneyandLittman(2005).TheVSMwasevaluatedonasetof374SATquestions,achievingascoreof47%.

Incontrastwiththecorpus-basedapproachofTurneyandLittman(2005),Veale(2004)appliedalexicon-basedapproachtothesame374SATquestions,attainingascoreof43%.VealeevaluatedthequalityofacandidateanalogyA:B::C:DbylookingforpathsinWordNet,joiningtoandto.ThequalitymeasurewasbasedonthesimilaritybetweentheA:BpathsandtheC:Dpaths.

Turney(2005)introducedLatentRelationalAnalysis(LRA),anenhancedversionoftheVSMapproach,whichreached56%onthe374SATquestions.HerewegobeyondTurney(2005)bydescribingLRAinmoredetail,performingmoreextensiveexperi-ments,andanalyzingthealgorithmandrelatedworkinmoredepth.

3.2StructureMappingTheory

French(2002)citesStructureMappingTheory(SMT)(Gentner,1983)anditsimplemen-tationintheStructureMappingEngine(SME)(Falkenhainer,Forbus,andGentner,1989)asthemostinfluentialworkonmodelingofanalogy-making.Thegoalofcomputationalmodelingofanalogy-makingistounderstandhowpeopleformcomplex,structuredanalogies.SMEtakesrepresentationsofasourcedomainandatargetdomain,andpro-ducesananalogicalmappingbetweenthesourceandtarget.Thedomainsaregivenstructuredpropositionalrepresentations,usingpredicatelogic.Thesedescriptionsin-cludeattributes,relations,andhigher-orderrelations(expressingrelationsbetweenre-lations).Theanalogicalmappingconnectssourcedomainrelationstotargetdomainrelations.

Forexample,thereisananalogybetweenthesolarsystemandRutherford’smodeloftheatom(Falkenhainer,Forbus,andGentner,1989).ThesolarsystemisthesourcedomainandRutherford’smodeloftheatomisthetargetdomain.Thebasicobjectsinthesourcemodelaretheplanetsandthesun.Thebasicobjectsinthetargetmodelaretheelectronsandthenucleus.Theplanetsandthesunhavevariousattributes,suchasmass(sun)andmass(planet),andvariousrelations,suchasrevolve(planet,sun)andattracts(sun,planet).Likewise,thenucleusandtheelectronshaveattributes,suchascharge(electron)andcharge(nucleus),andrelations,suchasrevolve(electron,nucleus)andattracts(nucleus,electron).SMEmapsrevolve(planet,sun)torevolve(electron,nu-cleus)andattracts(sun,planet)toattracts(nucleus,electron).

Eachindividualconnection(e.g.,fromrevolve(planet,sun)torevolve(electron,nu-cleus))inananalogicalmappingimpliesthattheconnectedrelationsaresimilar;thus,SMTrequiresameasureofrelationalsimilarity,inordertoformmaps.EarlyversionsofSMEonlymappedidenticalrelations,butlaterversionsofSMEallowedsimilar,non-identicalrelationstomatch(Falkenhainer,1990).However,thefocusofresearchinanalogy-makinghasbeenonthemappingprocessasawhole,ratherthanmeasuringthesimilaritybetweenanytwoparticularrelations,hencethesimilaritymeasuresusedinSMEatthelevelofindividualconnectionsaresomewhatrudimentary.

Webelievethatamoresophisticatedmeasureofrelationalsimilarity,suchasLRA,mayenhancetheperformanceofSME.Likewise,thefocusofourworkhereisonthesimilaritybetweenparticularrelations,andweignoresystematicmappingbetweensets

7

ComputationalLinguisticsVolume1,Number1

Metaphoricalsentence

Idemolishedhisargument.Youneedtobudgetyourtime.I’veinvestedalotoftimeinher.Mymindjustisn’toperatingtoday.Lifehascheatedme.

Inflationiseatingupourprofits.SAT-styleverbalanalogy

down::argument:refute

building:demolish::argument:refutemoney:budget::time:schedulemoney:invest::time:allocatemachine:operate::mind:thinkcharlatan:cheat::life:disappointanimal:eat::inflation:reduce

SimilarityofSemanticRelationsTurney

extent.BarkerandSzpakowicz(1998)triedacorpus-basedapproachthatexplicitlyusedameasureofrelationalsimilarity,buttheirmeasurewasbasedonliteralmatching,whichlimiteditsabilitytogeneralize.Moldovanetal.(2004)alsousedameasureofrelationalsimilarity,basedonmappingeachnounandmodifierintosemanticclassesinWordNet.Thenoun-modifierpairsweretakenfromacorpusandthesurroundingcontextinthecorpuswasusedinawordsensedisambiguationalgorithm,toimprovethemappingofthenounandmodifierintoWordNet.TurneyandLittman(2005)usedtheVSM(asacomponentinasinglenearestneighbourlearningalgorithm)tomeasurerelationalsimilarity.Wetakethesameapproachhere,substitutingLRAfortheVSM,inSection7.

Lauer(1995)usedacorpus-basedapproach(usingtheBNC)toparaphrasenoun-modifierpairs,byinsertingtheprepositionsof,for,in,at,on,from,with,andabout.Forexample,reptilehavenwasparaphrasedashavenforreptiles.LapataandKeller(2004)achievedimprovedresultsonthistask,byusingthedatabaseofAltaVista’ssearchen-gineasacorpus.

3.5WordSenseDisambiguation

Webelievethattheintendedsenseofapolysemouswordisdeterminedbyitssemanticrelationswiththeotherwordsinthesurroundingtext.Ifwecanidentifythesemanticrelationsbetweenthegivenwordanditscontext,thenwecandisambiguatethegivenword.Yarowsky’s(1993)observationthatcollocationsarealmostalwaysmonosemousisevidenceforthisview.Federici,Montemagni,andPirrelli(1997)presentananalogy-basedapproachtowordsensedisambiguation.

Forexample,considerthewordplant.Outofcontext,plantcouldrefertoanin-dustrialplantoralivingorganism.Supposeplantappearsinsometextnearfood.Atypicalapproachtodisambiguatingplantwouldcomparetheattributionalsimilarityoffoodandindustrialplanttotheattributionalsimilarityoffoodandlivingorganism(Lesk,1986;BanerjeeandPedersen,2003).Inthiscase,thedecisionmaynotbeclear,sincein-dustrialplantsoftenproducefoodandlivingorganismsoftenserveasfood.Itwouldbeveryhelpfultoknowtherelationbetweenfoodandplantinthisexample.Inthephrase“foodfortheplant”,therelationbetweenfoodandplantstronglysuggeststhattheplantisalivingorganism,sinceindustrialplantsdonotneedfood.Inthetext“foodattheplant”,therelationstronglysuggeststhattheplantisanindustrialplant,sincelivingorganismsarenotusuallyconsideredaslocations.Thusanalgorithmforclassifyingsemanticrelations(asinSection7)shouldbehelpfulforwordsensedisambiguation.3.6InformationExtraction

Theproblemofrelationextractionis,givenaninputdocumentandaspecificrelation,extractallpairsofentities(ifany)thathavetherelationinthedocument.Theprob-lemwasintroducedaspartoftheMessageUnderstandingConferences(MUC)in1998.Zelenko,Aone,andRichardella(2003)presentakernelmethodforextractingtherela-tionsperson-affiliationandorganization-location.Forexample,inthesentence“JohnSmithisthechiefscientistoftheHardcomCorporation,”thereisaperson-affiliationrelationbetween“JohnSmith”and“HardcomCorporation”(Zelenko,Aone,andRichardella,2003).Thisissimilartotheproblemofclassifyingsemanticrelations(Section3.4),ex-ceptthatinformationextractionfocusesontherelationbetweenaspecificpairofentitiesinaspecificdocument,ratherthanageneralpairofwordsingeneraltext.Thereforeanalgorithmforclassifyingsemanticrelationsshouldbeusefulforinformationextraction.

IntheVSMapproachtoclassifyingsemanticrelations(TurneyandLittman,2005),wewouldhaveatrainingsetoflabeledexamplesoftherelationperson-affiliation,forin-stance.Eachexamplewouldberepresentedbyavectorofpatternfrequencies.Givena

9

ComputationalLinguisticsVolume1,Number1

specificdocumentdiscussing“JohnSmith”and“HardcomCorporation”,wecouldcon-structavectorrepresentingtherelationbetweenthesetwoentities,andthenmeasuretherelationalsimilaritybetweenthisunlabeledvectorandeachofourlabeledtrainingvectors.Itwouldseemthatthereisaproblemhere,becausethetrainingvectorswouldberelativelydense,sincetheywouldpresumablybederivedfromalargecorpus,butthenewunlabeledvectorfor“JohnSmith”and“HardcomCorporation”wouldbeverysparse,sincetheseentitiesmightbementionedonlyonceinthegivendocument.How-ever,thisisnotanewproblemfortheVectorSpaceModel;itisthestandardsituationwhentheVSMisusedforinformationretrieval.Aquerytoasearchengineisrep-resentedbyaverysparsevectorwhereasadocumentisrepresentedbyarelativelydensevector.Therearewell-knowntechniquesininformationretrievalforcopingwiththisdisparity,suchasweightingschemesforqueryvectorsthataredifferentfromtheweightingschemesfordocumentvectors(SaltonandBuckley,1988).

3.7QuestionAnswering

Intheirpaperonclassifyingsemanticrelations,Moldovanetal.(2004)suggestthatanimportantapplicationoftheirworkisQuestionAnswering.AsdefinedintheTextRE-trievalConference(TREC)QuestionAnswering(QA)track,thetaskistoanswersimplequestions,suchas“Wherehavenuclearincidentsoccurred?”,byretrievingarelevantdocumentfromalargecorpusandthenextractingashortstringfromthedocument,suchas“TheThreeMileIslandnuclearincidentcausedaDOEpolicycrisis.”Moldovanetal.(2004)proposetomapagivenquestiontoasemanticrelationandthensearchforthatrelationinacorpusofsemanticallytaggedtext.Theyarguethatthedesiredse-manticrelationcaneasilybeinferredfromthesurfaceformofthequestion.Aquestionoftheform“Where...?”islikelytobeseekingforentitieswithalocationrelationandaquestionoftheform“Whatdid...make?”islikelytobelookingforentitieswithaproductrelation.InSection7,weshowhowLRAcanrecognizerelationssuchaslocationandproduct(seeTable19).

3.8AutomaticThesaurusGeneration

Hearst(1992)presentsanalgorithmforlearninghyponym(typeof)relationsfromacor-pusandBerlandandCharniak(1999)describehowtolearnmeronym(partof)relationsfromacorpus.Thesealgorithmscouldbeusedtoautomaticallygenerateathesaurusordictionary,butwewouldliketohandlemorerelationsthanhyponymyandmeronymy.WordNetdistinguishesmorethanadozensemanticrelationsbetweenwords(Fellbaum,1998)andNastaseandSzpakowicz(2003)list30semanticrelationsfornoun-modifierpairs.Hearst(1992)andBerlandandCharniak(1999)usemanuallygeneratedrulestominetextforsemanticrelations.TurneyandLittman(2005)alsouseamanuallygener-atedsetof64patterns.

LRAdoesnotuseapredefinedsetofpatterns;itlearnspatternsfromalargecorpus.Insteadofmanuallygeneratingnewrulesorpatternsforeachnewsemanticrelation,itispossibletoautomaticallylearnameasureofrelationalsimilaritythatcanhandlearbitrarysemanticrelations.Anearestneighbouralgorithmcanthenusethisrelationalsimilaritymeasuretolearntoclassifyaccordingtoanysetofclassesofrelations,giventheappropriatelabeledtrainingdata.

Girju,Badulescu,andMoldovan(2003)presentanalgorithmforlearningmeronymrelationsfromacorpus.LikeHearst(1992)andBerlandandCharniak(1999),theyusemanuallygeneratedrulestominetextfortheirdesiredrelation.However,theysupple-menttheirmanualruleswithautomaticallylearnedconstraints,toincreasetheprecisionoftherules.

10

SimilarityofSemanticRelationsTurney

3.9InformationRetrieval

Veale(2003)hasdevelopedanalgorithmforrecognizingcertaintypesofwordanalo-gies,basedoninformationinWordNet.Heproposestousethealgorithmforana-logicalinformationretrieval.Forexample,thequery“Muslimchurch”shouldreturn“mosque”andthequery“Hindubible”shouldreturn“theVedas”.Thealgorithmwasdesignedwithafocusonanalogiesoftheformadjective:noun::adjective:noun,suchasChristian:church::Muslim:mosque.

Ameasureofrelationalsimilarityisapplicabletothistask.Givenapairofwords,and,thetaskistoreturnanotherpairofwords,and,suchthatthereishighrelationalsimilaritybetweenthepair:andthepair:.Forexample,given=“Muslim”and=“church”,return=“mosque”and=“Christian”.(ThepairMuslim:mosquehasahighrelationalsimilaritytothepairChristian:church.)

Marxetal.(2002)developedanunsupervisedalgorithmfordiscoveringanalogiesbyclusteringwordsfromtwodifferentcorpora.Eachclusterofwordsinonecorpusiscoupledone-to-onewithaclusterintheothercorpus.Forexample,oneexperimentusedacorpusofBuddhistdocumentsandacorpusofChristiandocuments.AclusterofwordssuchasHindu,Mahayana,Zen,...fromtheBuddhistcorpuswascoupledwithaclusterofwordssuchasCatholic,Protestant,...fromtheChristiancorpus.ThusthealgorithmappearstohavediscoveredananalogicalmappingbetweenBuddhistschoolsandtraditionsandChristianschoolsandtraditions.Thisisinterestingwork,butitisnotdirectlyapplicabletoSATanalogies,becauseitdiscoversanalogiesbetweenclustersofwords,ratherthanindividualwords.

3.10IdentifyingSemanticRoles

Asemanticframeforaneventsuchasjudgementcontainssemanticrolessuchasjudge,evaluee,andreason,whereasaneventsuchasstatementcontainsrolessuchasspeaker,addressee,andmessage(GildeaandJurafsky,2002).Thetaskofidentifyingsemanticrolesistolabelthepartsofasentenceaccordingtotheirsemanticroles.Webelievethatitmaybehelpfultoviewsemanticframesandtheirsemanticrolesassetsofsemanticrelations;thusameasureofrelationalsimilarityshouldhelpustoidentifysemanticroles.Moldovanetal.(2004)arguethatsemanticrolesaremerelyaspecialcaseofsemanticrelations(Section3.4),sincesemanticrolesalwaysinvolveverbsorpredicates,butsemanticrelationscaninvolvewordsofanypartofspeech.4.TheVectorSpaceModel

ThissectionexaminespastworkonmeasuringattributionalandrelationalsimilarityusingtheVectorSpaceModel(VSM).

4.1MeasuringAttributionalSimilaritywiththeVectorSpaceModel

TheVSMwasfirstdevelopedforinformationretrieval(SaltonandMcGill,1983;SaltonandBuckley,1988;Salton,1989)anditisatthecoreofmostmodernsearchengines(Baeza-YatesandRibeiro-Neto,1999).IntheVSMapproachtoinformationretrieval,queriesanddocumentsarerepresentedbyvectors.Elementsinthesevectorsarebasedonthefrequenciesofwordsinthecorrespondingqueriesanddocuments.Thefrequen-ciesareusuallytransformedbyvariousformulasandweights,tailoredtoimprovetheeffectivenessofthesearchengine(Salton,1989).Theattributionalsimilaritybetweenaqueryandadocumentismeasuredbythecosineoftheanglebetweentheircorrespond-ingvectors.Foragivenquery,thesearchenginesortsthematchingdocumentsinorderofdecreasingcosine.

TheVSMapproachhasalsobeenusedtomeasuretheattributionalsimilarityof

11

ComputationalLinguisticsVolume1,Number1

words(Lesk,1969;Ruge,1992;PantelandLin,2002).PantelandLin(2002)clusteredwordsaccordingtotheirattributionalsimilarity,asmeasuredbyaVSM.Theiralgo-rithmisabletodiscoverthedifferentsensesofpolysemouswords,usingunsupervisedlearning.

LatentSemanticAnalysisenhancestheVSMapproachtoinformationretrievalbyusingtheSingularValueDecomposition(SVD)tosmooththevectors,whichhelpstohandlenoiseandsparsenessinthedata(Deerwesteretal.,1990;Dumais,1993;Lan-dauerandDumais,1997).SVDimprovesbothdocument-queryattributionalsimilaritymeasures(Deerwesteretal.,1990;Dumais,1993)andword-wordattributionalsimilar-itymeasures(LandauerandDumais,1997).LRAalsousesSVDtosmoothvectors,aswediscussinSection5.

4.2MeasuringRelationalSimilaritywiththeVectorSpaceModelLetbethesemanticrelation(orsetofrelations)betweenapairofwords,and,andletbethesemanticrelation(orsetofrelations)betweenanotherpair,and.Wewishtomeasuretherelationalsimilaritybetweenand.Therelationsandarenotgiventous;ourtaskistoinferthesehidden(latent)relationsandthencomparethem.

IntheVSMapproachtorelationalsimilarity(TurneyandLittman,2005),wecreatevectors,and,thatrepresentfeaturesofand,andthenmeasurethesimilarityofandbythecosineoftheanglebetweenand:

(5)(6)

(7)

Wecreateavector,,tocharacterizetherelationshipbetweentwowords,and,bycountingthefrequenciesofvariousshortphrasescontainingand.TurneyandLittman(2005)usealistof64joiningterms,suchas“of”,“for”,and“to”,toform128phrasesthatcontainand,suchas“of”,“of”,“for”,“for”,“to”,and“to”.Thesephrasesarethenusedasqueriesforasearchengineandthenumberofhits(matchingdocuments)isrecordedforeachquery.Thisprocessyieldsavectorof128numbers.Ifthenumberofhitsforaqueryis,thenthecorrespondingelementinthevectoris.Severalauthorsreportthatthelogarithmictransformationoffrequenciesimprovescosine-basedsimilaritymeasures(SaltonandBuckley,1988;Ruge,1992;Lin,1998b).

TurneyandLittman(2005)evaluatedtheVSMapproachbyitsperformanceon374SATanalogyquestions,achievingascoreof47%.Sincetherearefivechoicesforeachquestion,theexpectedscoreforrandomguessingis20%.Toansweramultiple-choiceanalogyquestion,vectorsarecreatedforthestempairandeachchoicepair,andthencosinesarecalculatedfortheanglesbetweenthestempairandeachchoicepair.Thebestguessisthechoicepairwiththehighestcosine.WeusethesamesetofanalogyquestionstoevaluateLRAinSection6.

TheVSMwasalsoevaluatedbyitsperformanceasadistance(nearness)measureinasupervisednearestneighbourclassifierfornoun-modifiersemanticrelations(Turney

12

SimilarityofSemanticRelationsTurney

andLittman,2005).Theevaluationused600hand-labelednoun-modifierpairsfromNastaseandSzpakowicz(2003).Atestingpairisclassifiedbysearchingforitssinglenearestneighbourinthelabeledtrainingdata.Thebestguessisthelabelforthetrainingpairwiththehighestcosine.LRAisevaluatedwiththesamesetofnoun-modifierpairsinSection7.

TurneyandLittman(2005)usedtheAltaVistasearchenginetoobtainthefrequencyinformationrequiredtobuildvectorsfortheVSM.ThustheircorpuswasthesetofallwebpagesindexedbyAltaVista.Atthetime,theEnglishsubsetofthiscorpusconsistedofaboutwords.AroundApril2004,AltaVistamadesubstantialchangestotheirsearchengine,removingtheiradvancedsearchoperators.Theirsearchenginenolongersupportstheasteriskoperator,whichwasusedbyTurneyandLittman(2005)forstemmingandwild-cardsearching.AltaVistaalsochangedtheirpolicytowardsautomatedsearching,whichisnowforbidden.3

TurneyandLittman(2005)usedAltaVista’shitcount,whichisthenumberofdocu-ments(webpages)matchingagivenquery,butLRAusesthenumberofpassages(strings)matchingaquery.InourexperimentswithLRA(Sections6and7),weusealocalcopyoftheWaterlooMultiTextSystem(Clarke,Cormack,andPalmer,1998;TerraandClarke,2003),runningona16CPUBeowulfCluster,withacorpusofaboutEnglishwords.TheWaterlooMultiTextSystem(WMTS)isadistributed(multiprocessor)searchengine,designedprimarilyforpassageretrieval(althoughdocumentretrievalispossi-ble,asaspecialcaseofpassageretrieval).Thetextandindexrequireapproximatelyoneterabyteofdiskspace.AlthoughAltaVistaonlygivesaroughestimateofthenumberofmatchingdocuments,theWaterlooMultiTextSystemgivesexactcountsofthenumberofmatchingpassages.

Turneyetal.(2003)combine13independentmodulestoanswerSATquestions.TheperformanceofLRAsignificantlysurpassesthiscombinedsystem,butthereisnorealcontestbetweentheseapproaches,becausewecansimplyaddLRAtothecombination,asafourteenthmodule.SincetheVSMmodulehadthebestperformanceofthethirteenmodules(Turneyetal.,2003),thefollowingexperimentsfocusoncomparingVSMandLRA.

5.LatentRelationalAnalysis

LRAtakesasinputasetofwordpairsandproducesasoutputameasureoftherela-tionalsimilaritybetweenanytwooftheinputpairs.LRAreliesonthreeresources,asearchenginewithaverylargecorpusoftext,abroad-coveragethesaurusofsynonyms,andanefficientimplementationofSVD.

Wefirstpresentashortdescriptionofthecorealgorithm.Later,inthefollowingsubsections,wewillgiveadetaileddescriptionofthealgorithm,asitisappliedintheexperimentsinSections6and7.

Givenasetofwordpairsasinput,lookinathesaurusforsynonymsforeachwordineachwordpair.Foreachinputpair,makealternatepairsbyreplacingtheoriginalwordswiththeirsynonyms.Thealternatepairsareintendedtoformnearanalogieswiththecorrespondingoriginalpairs(seeSection2.3).Filteroutalternatepairsthatdonotformnearanalogies,bydroppingalternatepairsthatco-occurrarelyinthecorpus.Intheprecedingstep,ifasynonym

ComputationalLinguisticsVolume1,Number1

replacedanambiguousoriginalword,butthesynonymcapturesthewrongsenseoftheoriginalword,itislikelythatthereisnosignificantrelationbetweenthewordsinthealternatepair,sotheywillrarelyco-occur.

Foreachoriginalandalternatepair,searchinthecorpusforshortphrasesthatbeginwithonememberofthepairandendwiththeother.Thesephraseschar-acterizetherelationbetweenthewordsineachpair.

Foreachphrasefromthepreviousstep,createseveralpatterns,byreplacingwordsinthephrasewithwildcards.

Buildapair-patternfrequencymatrix,inwhicheachcellrepresentsthenumberoftimesthatthecorrespondingpair(row)appearsinthecorpuswiththecor-respondingpattern(column).Thenumberwillusuallybezero,resultinginasparsematrix.

ApplytheSingularValueDecompositiontothematrix.Thisreducesnoiseinthematrixandhelpswithsparsedata.

Supposethatwewishtocalculatetherelationalsimilaritybetweenanytwooftheoriginalpairs.Startbylookingforthetworowvectorsinthepair-patternfrequencymatrixthatcorrespondtothetwooriginalpairs.Calculatetheco-sineoftheanglebetweenthesetworowvectors.Thenmergethecosineofthetwooriginalpairswiththecosinesoftheircorrespondingalternatepairs,asfol-lows.Ifananalogyformedwithalternatepairshasahighercosinethantheoriginalpairs,weassumethatwehavefoundabetterwaytoexpresstheanal-ogy,butwehavenotsignificantlychangeditsmeaning.Ifthecosineislower,weassumethatwemayhavechangedthemeaning,byinappropriatelyreplac-ingwordswithsynonyms.Filteroutinappropriatealternatesbydroppingallanalogiesformedofalternates,suchthatthecosinesarelessthanthecosinefortheoriginalpairs.Therelationalsimilaritybetweenthetwooriginalpairsisthencalculatedastheaverageofalloftheremainingcosines.

Themotivationforthealternatepairsistohandlecaseswheretheoriginalpairsco-occurrarelyinthecorpus.Thehopeisthatwecanfindnearanalogiesfortheoriginalpairs,suchthatthenearanalogiesco-occurmorefrequentlyinthecorpus.Thedangeristhatthealternatesmayhavedifferentrelationsfromtheoriginals.Thefilteringstepsaboveaimtoreducethisrisk.

5.1InputandOutput

Inourexperiments,theinputsetcontainsfrom600to2,244wordpairs.Theoutputsimilaritymeasureisbasedoncosines,sothedegreeofsimilaritycanrangefrom(dissimilar;)to(similar;).BeforeapplyingSVD,thevectorsarecompletelynonnegative,whichimpliesthatthecosinecanonlyrangefromto,butSVDintroducesnegativevalues,soitispossibleforthecosinetobenegative,althoughwehaveneverobservedthisinourexperiments.

5.2SearchEngineandCorpus

Inthefollowingexperiments,weusealocalcopyoftheWaterlooMultiTextSystem(Clarke,Cormack,andPalmer,1998;TerraandClarke,2003).4ThecorpusconsistsofaboutEnglishwords,gatheredbyawebcrawler,mainlyfromUSacademic

SimilarityofSemanticRelationsTurney

websites.Thewebpagescoveraverywiderangeoftopics,styles,genres,quality,andwritingskill.TheWMTSiswellsuitedtoLRA,becausetheWMTSscaleswelltolargecorpora(oneterabyte,inourcase),itgivesexactfrequencycounts(unlikemostwebsearchengines),itisdesignedforpassageretrieval(ratherthandocumentretrieval),andithasapowerfulquerysyntax.

5.3Thesaurus

Asasourceofsynonyms,weuseLin’s(1998a)automaticallygeneratedthesaurus.Thisthesaurusisavailablethroughanonlineinteractivedemonstrationoritcanbedown-loaded.5Weusedtheonlinedemonstration,sincethedownloadableversionseemstocontainfewerwords.Foreachwordintheinputsetofwordpairs,weautomaticallyquerytheonlinedemonstrationandfetchtheresultinglistofsynonyms.AsacourtesytootherusersofLin’sonlinesystem,weinserta20seconddelaybetweeneachquery.

Lin’sthesauruswasgeneratedbyparsingacorpusofaboutEnglishwords,consistingoftextfromtheWallStreetJournal,SanJoseMercury,andAPNewswire(Lin,1998a).Theparserwasusedtoextractpairsofwordsandtheirgrammaticalrelations.Wordswerethenclusteredintosynonymsets,basedonthesimilarityoftheirgrammat-icalrelations.Twowordswerejudgedtobehighlysimilarwhentheytendedtohavethesamekindsofgrammaticalrelationswiththesamesetsofwords.Givenawordanditspartofspeech,Lin’sthesaurusprovidesalistofwords,sortedinorderofdecreasingattributionalsimilarity.ThissortingisconvenientforLRA,sinceitmakesitpossibletofocusonwordswithhigherattributionalsimilarityandignoretherest.WordNet,incontrast,givenawordanditspartofspeech,providesalistofwordsgroupedbythepossiblesensesofthegivenword,withgroupssortedbythefrequenciesofthesenses.WordNet’ssortingdoesnotdirectlycorrespondtosortingbydegreeofattributionalsimilarity,althoughvariousalgorithmshavebeenproposedforderivingattributionalsimilarityfromWordNet(Resnik,1995;JiangandConrath,1997;BudanitskyandHirst,2001;BanerjeeandPedersen,2003).

5.4SingularValueDecomposition

WeuseRohde’sSVDLIBCimplementationoftheSingularValueDecomposition,whichisbasedonSVDPACKC(Berry,1992).6InLRA,SVDisusedtoreducenoiseandcom-pensateforsparseness.

5.5TheAlgorithm

WewillgothrougheachstepofLRA,usinganexampletoillustratethesteps.AssumethattheinputtoLRAisthe374multiple-choiceSATwordanalogyquestionsofTur-neyandLittman(2005).Sincetherearesixwordpairsperquestion(thestemandfivechoices),theinputconsistsof2,244wordpairs.Let’ssupposethatwewishtocalculatetherelationalsimilaritybetweenthepairquart:volumeandthepairmile:distance,takenfromtheSATquestioninTable6.TheLRAalgorithmconsistsofthefollowingtwelvesteps:

1.Findalternates:Foreachwordpair:intheinputset,lookinLin’s(1998a)thesaurusforthetop

is10)thataremostsimilarto.Foreachthatissimilarto,makeanewwordpair:.Likewise,lookforthetop

Theonlinedemonstrationisathttp://www.cs.ualberta.ca/lindek/demos/depsim.htmandthe

downloadableversionisathttp://armena.cs.ualberta.ca/lindek/downloads/sims.lsp.gz.

SVDLIBCisavailableathttp://tedlab.mit.edu/dr/SVDLIBC/andSVDPACKCisavailableat

http://www.netlib.org/svdpack/.

15

ComputationalLinguisticsVolume1,Number1

Stem:quart:volume

Solution:(b)mile:distance

alternatepairs.Whenlookingforsimilarwords

inLin’s(1998a)thesaurus,avoidwordsthatseemunusual(e.g.,hyphenatedwords,wordswiththreecharactersorless,wordswithnon-alphabeticalchar-acters,multi-wordphrases,andcapitalizedwords).ThefirstcolumninTable7showsthealternatepairsthataregeneratedfortheoriginalpairquart:volume.

2.Filteralternates:Foreachoriginalpair

:,filterthe

words(weuse

(weuse

mostfrequentalternatesanddiscardtheremainder

words.ThelastcolumninTable7showsthepairsthatareselected.

3.Findphrases:Foreachpair(originalsandalternates),makealistofphrasesinthecorpusthatcontainthepair.QuerytheWMTSforallphrasesthatbeginwithonememberofthepairandendwiththeother(ineitherorder).Weig-noresuffixeswhensearchingforphrasesthatmatchagivenpair.Thephrasescannothavemorethan

,soa

phrasegeneratesatmosteightpatterns.)Foreachpattern,countthenumber

16

SimilarityofSemanticRelationsTurney

Wordpairpint:volumegallon:volumeliter:volumesquirt:volumepail:volumevial:volume

pumping:volumeounce:volumespoonful:volumetablespoon:volume

Similarity0.2100.1590.1220.0840.0840.0840.0730.0710.0700.069

Frequency

37215003323542837313864304296

Filteringstep

accept(topalternate)accept(topalternate)

accept(topalternate)

wordscanappearinaphrase.

17

ComputationalLinguisticsVolume1,Number1

=“in”

=“*of”

=“of*”

=“**”

ofpairs(originalsandalternates)withphrasesthatmatchthepattern(awildcardmustmatchexactlyoneword).Keepthetop

).Typicallythere

willbemillionsofpatterns,soitisnotfeasibletokeepthemall.

5.Mappairstorows:Inpreparationforbuildingthematrix,createamappingofwordpairstorownumbers.Foreachpair:,createarowfor:andanotherrowfor:.Thiswillmakethematrixmoresymmetrical,reflectingourknowledgethattherelationalsimilaritybetween:and:shouldbethesameastherelationalsimilaritybetween:and:.ThisduplicationofrowsisexaminedinSection6.6.

6.Mappatternstocolumns:Createamappingofthetopcolumnsin

.ThisduplicationofcolumnsisexaminedinSection6.6.

7.Generateasparsematrix:Generateamatrixinsparsematrixformat,suit-ableforinputtoSVDLIBC.Thevalueforthecellinrowandcolumnisthefrequencyofthe-thpattern(seestep6)inphrasesthatcontainthe-thwordpair(seestep5).Table9givessomeexamplesofpatternfrequenciesforquart:volume.

8.Calculateentropy:Applylogandentropytransformationstothesparsematrix(LandauerandDumais,1997).Thesetransformationshavebeenfoundtobeveryhelpfulforinformationretrieval(Harman,1986;Dumais,1990).Letbethecellinrowandcolumnofthematrixfromstep7.Letbethenumberofrowsinandletbethenumberofcolumns.Wewishtoweightthecellbytheentropyofthe-thcolumn.Tocalculatetheentropyofthecolumn,weneedtoconvertthecolumnintoavectorofprobabilities.Letbetheprobabilityof,calculatedbynormalizingthecolumnvectorsothatthesumoftheelementsisone,.Theentropyofthe-thcolumnisthen.Entropyisatitsmaximumwhen

isauniformdistribution,,inwhichcase.Entropyisatitsminimumwhenis1forsomevalueofand0forallothervaluesof,inwhichcase.Wewanttogivemoreweighttocolumns(pat-terns)withfrequenciesthatvarysubstantiallyfromonerow(wordpair)tothenext,andlessweighttocolumnsthatareuniform.Thereforeweweightthecellby,whichvariesfrom0whenisuniformto1whenentropyisminimal.Wealsoapplythelogtransformationtofrequen-cies,.(Entropyiscalculatedwiththeoriginalfrequencyvalues,beforethelogtransformationisapplied.)Forallandall,replacetheoriginal

18

SimilarityofSemanticRelationsTurney

valueinbythenewvalue.ThisisaninstanceoftheTF-IDF(TermFrequency-InverseDocumentFrequency)familyoftransformations,whichisfamiliarininformationretrieval(SaltonandBuckley,1988;Baeza-YatesandRibeiro-Neto,1999):istheTFtermandistheIDFterm.

9.ApplySVD:Afterthelogandentropytransformationshavebeenappliedtothematrix,runSVDLIBC.SVDdecomposesamatrixintoaproductofthreematrices,whereandareincolumnorthonormalform(i.e.,thecolumnsareorthogonalandhaveunitlength:)andisadiagonalmatrixofsingularvalues(henceSVD)(GolubandVanLoan,1996).Ifisofrank,thenisalsoofrank.Let,where,bethediagonalmatrixformedfromthetopsingularvalues,andletand

bethematricesproducedbyselectingthecorrespondingcolumnsfromand.Thematrixisthematrixofrankthatbestapproximatestheoriginalmatrix,inthesensethatitminimizestheapproximationerrors.

Thatis,

minimizes

overallmatrices

ofrank,

wheredenotestheFrobeniusnorm(GolubandVanLoan,1996).Wemaythinkofthismatrixasa“smoothed”or“compressed”versionoftheoriginalmatrix.Inthesubsequentsteps,wewillbecalculatingcosinesforrowvectors.Forthispurpose,wecansimplifycalculationsbydropping.Thecosineoftwovectorsistheirdotproduct,aftertheyhavebeennormalizedtounitlength.Thematrixcontainsthedotproductsofalloftherowvectors.Wecanfindthedotproductofthe-thand-throwvectorsbylookingatthecellinrow,columnofthematrix.Since,wehave

,whichmeansthatwecan

calculatecosineswiththesmallermatrix,insteadofusing(Deerwesteretal.,1990).

10.Projection:Calculate

ofrowsas,butonly

(weuse).Thismatrixhasthesamenumbercolumns(insteadof

versionsof

Thereforewehave

:,theoriginalandversionsof:.

cosines(inourexperiments,thereare16cosines).Forexample,suppose:

isquart:volumeand:ismile:distance.Table10givesthecosinesforthesixteencombinations.

12.Calculaterelationalsimilarity:Therelationalsimilaritybetween:

istheaverageofthecosines,amongthe

and

:

ComputationalLinguisticsVolume1,Number1

WordpairsCosineCosine

originalpairs

1andmayhaveslippedthroughthefilteringinstep2.Averagingthecosines,asopposedtotakingtheirmaximum,isintendedtoprovidesomeresistancetonoise.Forquart:volumeandmile:distance,thethirdcolumninTable10showswhichalternatesareusedtocalculatetheaverage.Forthesetwopairs,theaverageoftheselectedcosinesis0.677.InTable7,weseethatpumping:volumehasslippedthroughthefilteringinstep2,althoughitisnotagoodalternateforquart:volume.However,Table10showsthatallfouranalogiesthatinvolvepumping:volumearedroppedhere,instep12.

Steps11and12canberepeatedforeachtwoinputpairsthataretobecompared.ThiscompletesthedescriptionofLRA.

Table11givesthecosinesforthesampleSATquestion.Thechoicepairwiththehighestaveragecosine(thechoicewiththelargestvalueincolumn#1),choice(b),isthesolutionforthisquestion;LRAanswersthequestioncorrectly.Forcomparison,column#2givesthecosinesfortheoriginalpairsandcolumn#3givesthehighestcosine.ForthisparticularSATquestion,thereisonechoicethathasthehighestcosineforallthreecolumns,choice(b),althoughthisisnottrueingeneral.Notethatthegapbetweenthefirstchoice(b)andthesecondchoice(d)islargestfortheaveragecosines(column#1).Thissuggeststhattheaverageofthecosines(column#1)isbetteratdiscriminatingthecorrectchoicethaneithertheoriginalcosine(column#2)orthehighestcosine(column#3).

6.ExperimentswithWordAnalogyQuestions

Thissectionpresentsvariousexperimentswith374multiple-choiceSATwordanalogyquestions.

20

SimilarityofSemanticRelationsTurney

Stem:quart:volume

Averagecosines#1Originalcosines#2Highestcosines#3

Solution:(b)mile:distance0.6770.5250.781

Algorithm

Veale(2004)

bestattributionalsimilarityrandomguessing

lowestco-occurrencefrequencyhighestco-occurrencefrequency

Precision42.835.020.016.811.8

Recall42.835.020.016.811.8

42.835.020.016.811.8

ComputationalLinguisticsVolume1,Number1

StepDescriptionTimeH:M:SHardware

Total209:49:36

SimilarityofSemanticRelationsTurney

AlgorithmCorrectIncorrectSkippedPrecisionRecall

ComparingVSM-AVtoVSM-WMTS,thesmallercorpushasreducedthescoreoftheVSM,butmuchofthedropisduetothelargernumberofquestionsthatwereskipped(34forVSM-WMTSversus5forVSM-AV).Withthesmallercorpus,manymoreoftheinputwordpairssimplydonotappeartogetherinshortphrasesinthecorpus.LRAisabletoanswerasmanyquestionsasVSM-AV,althoughitusesthesamecorpusasVSM-WMTS,becauseLin’sthesaurusallowsLRAtosubstitutesynonymsforwordsthatarenotinthecorpus.

VSM-AVrequired17daystoprocessthe374analogyquestions(TurneyandLittman,2005),comparedto9daysforLRA.AsacourtesytoAltaVista,TurneyandLittman(2005)insertedafiveseconddelaybetweeneachquery.SincetheWMTSisrunninglocally,thereisnoneedfordelays.VSM-WMTSprocessedthequestionsinonlyoneday.

6.3HumanPerformance

Theaverageperformanceofcollege-boundseniorhighschoolstudentsonverbalSATquestionscorrespondstoarecall(percentcorrect)ofabout57%(TurneyandLittman,2005).TheSATItestconsistsof78verbalquestionsand60mathquestions(thereisalsoanSATIItest,coveringspecificsubjects,suchaschemistry).Analogyquestionsareonlyasubsetofthe78verbalSATquestions.Ifweassumethatthedifficultyofour374analogyquestionsiscomparabletothedifficultyofthe78verbalSATIquestions,thenwecanestimatethattheaveragecollege-boundseniorwouldcorrectlyanswerabout57%ofthe374analogyquestions.

Ofour374SATquestions,190arefromacollectionoftenofficialSATtests(Claman,2000).Onthissubsetofthequestions,LRAhasarecallof61.1%,comparedtoarecallof51.1%ontheother184questions.The184questionsthatarenotfromClaman(2000)seemtobemoredifficult.ThisindicatesthatwemaybeunderestimatinghowwellLRAperforms,relativetocollege-boundseniorhighschoolstudents.Claman(2000)suggeststhattheanalogyquestionsmaybesomewhatharderthanotherverbalSATquestions,sowemaybeslightlyoverestimatingthemeanhumanscoreontheanalogyquestions.

Table15givesthe95%confidenceintervalsforLRA,VSM-AV,andVSM-WMTS,calculatedbytheBinomialExactTest(Agresti,1990).ThereisnosignificantdifferencebetweenLRAandhumanperformance,butVSM-AVandVSM-WMTSaresignificantlybelowhuman-levelperformance.

6.4VaryingtheParametersinLRA

ThereareseveralparametersintheLRAalgorithm(seeSection5.5).Theparametervaluesweredeterminedbytryingasmallnumberofpossiblevaluesonasmallsetofquestionsthatweresetaside.SinceLRAisintendedtobeanunsupervisedlearningalgorithm,wedidnotattempttotunetheparametervaluestomaximizetheprecisionandrecallonthe374SATquestions.WehypothesizedthatLRAisrelativelyinsensitivetothevaluesoftheparameters.

23

ComputationalLinguisticsVolume1,Number1

System

Recall(%correct)95%confidenceintervalforrecallHuman-level

(57%)

Table16showsthevariationintheperformanceofLRAastheparametervaluesareadjusted.Wetakethebaselineparametersettings(giveninSection5.5)andvaryeachparameter,oneatatime,whileholdingtheremainingparametersfixedattheirbaselinevalues.Noneoftheprecisionandrecallvaluesaresignificantlydifferentfromthebaseline,accordingtotheFisherExactTest(Agresti,1990),atthe95%confidencelevel.Thissupportsthehypothesisthatthealgorithmisnotsensitivetotheparametervalues.

AlthoughafullrunofLRAonthe374SATquestionstakesninedays,forsomeoftheparametersitispossibletoreusecacheddatafrompreviousruns.Welimitedtheexperimentswithbecausecachingwasnotashelpfulfortheseparameters,soexperimentingwiththemrequiredseveralweeks.

6.5AblationExperiments

Asmentionedintheintroduction,LRAextendstheVSMapproachofTurneyandLittman(2005)by(1)exploringvariationsontheanalogiesbyreplacingwordswithsynonyms(step1),(2)automaticallygeneratingconnectingpatterns(step4),and(3)smoothingthedatawithSVD(step9).Inthissubsection,weablateeachofthesethreecomponentstoassesstheircontributiontotheperformanceofLRA.Table17showstheresults.

WithoutSVD(comparecolumn#1to#2inTable17),performancedrops,butthedropisnotstatisticallysignificantwith95%confidence,accordingtotheFisherExactTest(Agresti,1990).However,wehypothesizethatthedropinperformancewouldbesignificantwithalargersetofwordpairs.Morewordpairswouldincreasethesamplesize,whichwoulddecreasethe95%confidenceinterval,whichwouldlikelyshowthatSVDismakingasignificantcontribution.Furthermore,morewordpairswouldincreasethematrixsize,whichwouldgiveSVDmoreleverage.Forexample,LandauerandDumais(1997)applySVDtoamatrixofof30,473columnsby60,768rows,butourmatrixhereis8,000columnsby17,232rows.WearecurrentlygatheringmoreSATquestions,totestthishypothesis.

Withoutsynonyms(comparecolumn#1to#3inTable17),recalldropssignificantly(from56.1%to49.5%),butthedropinprecisionisnotsignificant.Whenthesynonymcomponentisdropped,thenumberofskippedquestionsrisesfrom4to22,whichdemonstratesthevalueofthesynonymcomponentofLRAforcompensatingforsparsedata.

WhenbothSVDandsynonymsaredropped(comparecolumn#1to#4inTable17),thedecreaseinrecallissignificant,butthedecreaseinprecisionisnotsignificant.Again,webelievethatalargersamplesizewouldshowthedropinprecisionissignificant.

IfweeliminatebothsynonymsandSVDfromLRA,allthatdistinguishesLRAfrom

24

SimilarityofSemanticRelationsTurney

Parameter

Baseline

Value515461

Step11222224444

Precision54.254.155.856.254.356.854.355.958.457.058.1

Recall53.553.555.155.653.756.153.755.357.856.457.5

53.853.855.555.954.056.554.055.658.156.757.8

351000300050007000

LRAbaselinesystem#1

LRAnoSVD#2LRAnosynonyms

#3

LRAnoSVDnosynonyms

#4

VSM-WMTS

#5

25

ComputationalLinguisticsVolume1,Number1

VSM-WMTSisthepatterns(step4).TheVSMapproachusesafixedlistof64patternstogenerate128dimensionalvectors(TurneyandLittman,2005),whereasLRAusesadynamicallygeneratedsetof4,000patterns,resultingin8,000dimensionalvectors.WecanseethevalueoftheautomaticallygeneratedpatternsbycomparingLRAwithoutsynonymsandSVD(column#4)toVSM-WMTS(column#5).Thedifferenceinbothpre-cisionandrecallisstatisticallysignificantwith95%confidence,accordingtotheFisherExactTest(Agresti,1990).

Theablationexperimentssupportthevalueofthepatterns(step4)andsynonyms(step1)inLRA,butthecontributionofSVD(step9)hasnotbeenproven,althoughwebelievemoredatawillsupportitseffectiveness.Nonetheless,thethreecomponentstogetherresultina16%increasein(compare#1to#5).

6.6MatrixSymmetry

Weknowapriorithat,ifA:B::C:D,thenB:A::D:C.Forexample,“masonistostoneascarpenteristowood”implies“stoneistomasonaswoodistocarpenter”.Thereforeagoodmeasureofrelationalsimilarity,,shouldobeythefollowingequation:

(8)

Insteps5and6oftheLRAalgorithm(Section5.5),weensurethatthematrixissymmetrical,sothatequation(8)isnecessarilytrueforLRA.ThematrixisdesignedsothattherowvectorforA:BisdifferentfromtherowvectorforB:Aonlybyapermutationoftheelements.ThesamepermutationdistinguishestherowvectorsforC:DandD:C.ThereforethecosineoftheanglebetweenA:BandC:DmustbeidenticaltothecosineoftheanglebetweenB:AandD:C(seeequation(7)).

Todiscovertheconsequencesofthisdesigndecision,wealteredsteps5and6sothatsymmetryisnolongerpreserved.Instep5,foreachwordpairA:Bthatappearsintheinputset,weonlyhaveonerow.ThereisnorowforB:AunlessB:Aalsoappearsintheinputset.Thusthenumberofrowsinthematrixdroppedfrom17,232to8,616.

Instep6,wenolongerhavetwocolumnsforeachpattern,onefor“”andanotherfor“”.However,tobefair,wekeptthetotalnumberofcolumnsat8,000.Instep4,weselectedthetop8,000patterns(insteadofthetop4,000),distinguishingthepattern“”fromthepattern“”(in-steadofconsideringthemequivalent).Thusapatternwithahighfrequencyislikelytoappearintwocolumns,inbothpossibleorders,butalowerfrequencypatternmightappearinonlyonecolumn,inonlyonepossibleorder.

Thesechangesresultedinaslightdecreaseinperformance.Recalldroppedfrom56.1%to55.3%andprecisiondroppedfrom56.8%to55.9%.Thedecreaseisnotsta-tisticallysignificant.However,themodifiedalgorithmnolongerobeysequation(8).AlthoughdroppingsymmetryappearstocausenosignificantharmtotheperformanceofthealgorithmontheSATquestions,weprefertoretainsymmetry,toensurethatequation(8)issatisfied.

Notethat,ifA:B::C:D,itdoesnotfollowthatB:A::C:D.Forexample,itisfalsethat“stoneistomasonascarpenteristowood”.Ingeneral(exceptwhenthesemanticrelationsbetweenandaresymmetrical),wehavethefollowinginequality:

(9)

ThereforewedonotwantA:BandB:Atoberepresentedbyidenticalrowvectors,al-thoughitwouldensurethatequation(8)issatisfied.

26

SimilarityofSemanticRelationsTurney

CorrectIncorrectSkippedPrecisionRecall

6.7AllAlternatesversusBetterAlternates

Instep12ofLRA,therelationalsimilaritybetweencosines,amongthe

:

and

:

istheaverageofthe

ComputationalLinguisticsVolume1,Number1

vectorsisnottrivial.

6.9ManualPatternsversusAutomaticPatterns

TurneyandLittman(2005)used64manuallygeneratedpatternswhereasLRAuses4,000automaticallygeneratedpatterns.WeknowfromSection6.5thattheautomati-callygeneratedpatternsaresignificantlybetterthanthemanuallygeneratedpatterns.Itmaybeinterestingtoseehowmanyofthemanuallygeneratedpatternsappearwithintheautomaticallygeneratedpatterns.Ifwerequireanexactmatch,50ofthe64manualpatternscanbefoundintheautomaticpatterns.Ifwearelenientaboutwildcards,andcountthepattern“notthe”asmatching“*notthe”(forexample),then60ofthe64man-ualpatternsappearwithintheautomaticpatterns.Thissuggeststhattheimprovementinperformancewiththeautomaticpatternsisduetotheincreasedquantityofpatterns,ratherthanaqualitativedifferenceinthepatterns.

TurneyandLittman(2005)pointoutthatsomeoftheir64patternshavebeenusedbyotherresearchers.Forexample,Hearst(1992)usedthepattern“suchas”todis-coverhyponymsandBerlandandCharniak(1999)usedthepattern“ofthe”todiscovermeronyms.Bothofthesepatternsareincludedinthe4,000patternsautomaticallygen-eratedbyLRA.

ThenoveltyinTurneyandLittman(2005)isthattheirpatternsarenotusedtominetextforinstancesofwordpairsthatfitthepatterns(Hearst,1992;BerlandandCharniak,1999);instead,theyareusedtogatherfrequencydataforbuildingvectorsthatrepresenttherelationbetweenagivenpairofwords.TheresultsinSection6.8showthatavectorcontainsmoreinformationthananysinglepatternorsmallsetofpatterns;avectorisadistributedrepresentation.LRAisdistinctfromHearst(1992)andBerlandandChar-niak(1999)initsfocusondistributedrepresentations,whichitshareswithTurneyandLittman(2005),butLRAgoesbeyondTurneyandLittman(2005)byfindingpatternsautomatically.

RiloffandJones(1999)andYangarber(2003)alsofindpatternsautomatically,buttheirgoalistominetextforinstancesofwordpairs;thesamegoalasHearst(1992)andBerlandandCharniak(1999).BecauseLRAusespatternstobuilddistributedvectorrepresentations,itcanexploitpatternsthatwouldbemuchtoonoisyandunreliableforthekindoftextmininginstanceextractionthatistheobjectiveofHearst(1992),BerlandandCharniak(1999),RiloffandJones(1999),andYangarber(2003).ThereforeLRAcansimplyselectthehighestfrequencypatterns(step4inSection5.5);itdoesnotneedthemoresophisticatedselectionalgorithmsofRiloffandJones(1999)andYangarber(2003).7.ExperimentswithNoun-ModifierRelations

Thissectiondescribesexperimentswith600noun-modifierpairs,hand-labeledwith30classesofsemanticrelations(NastaseandSzpakowicz,2003).Inthefollowingexper-iments,LRAisusedwiththebaselineparametervalues,exactlyasdescribedinSec-tion5.5.NoadjustmentsweremadetotuneLRAtothenoun-modifierpairs.LRAisusedasadistance(nearness)measureinasinglenearestneighboursupervisedlearningalgorithm.

7.1ClassesofRelations

Thefollowingexperimentsusethe600labelednoun-modifierpairsofNastaseandSz-pakowicz(2003).ThisdatasetincludesinformationaboutthepartofspeechandWord-Netsynset(synonymset;i.e.,wordsensetag)ofeachword,butouralgorithmdoesnotusethisinformation.

Table19liststhe30classesofsemanticrelations.ThetableisbasedonAppendixA

28

SimilarityofSemanticRelationsTurney

ofNastaseandSzpakowicz(2003),withsomesimplifications.Theoriginaltablelistedseveralsemanticrelationsforwhichtherewerenoinstancesinthedataset.Thesewererelationsthataretypicallyexpressedwithlongerphrases(threeormorewords),ratherthannoun-modifierwordpairs.Forclarity,wedecidednottoincludetheserelationsinTable19.

Inthistable,representstheheadnounandrepresentsthemodifier.Forex-ample,in“fluvirus”,theheadnoun()is“virus”andthemodifier()is“flu”(*).InEnglish,themodifier(typicallyanounoradjective)usuallyprecedestheheadnoun.Inthedescriptionofpurpose,representsanarbitraryverb.In“concerthall”,thehallisforpresentingconcerts(is“present”)orholdingconcerts(is“hold”)().

NastaseandSzpakowicz(2003)organizedtherelationsintogroups.Thefivecap-italizedtermsinthe“Relation”columnofTable19arethenamesoffivegroupsofsemanticrelations.(Theoriginaltablehadasixthgroup,buttherearenoexamplesofthisgroupinthedataset.)Wemakeuseofthisgroupinginthefollowingexperiments.

7.2BaselineLRAwithSingleNearestNeighbour

Thefollowingexperimentsusesinglenearestneighbourclassificationwithleave-one-outcross-validation.Forleave-one-outcross-validation,thetestingsetconsistsofasin-glenoun-modifierpairandthetrainingsetconsistsofthe599remainingnoun-modifiers.Thedatasetissplit600times,sothateachnoun-modifiergetsaturnasthetestingwordpair.Thepredictedclassofthetestingpairistheclassofthesinglenearestneighbourinthetrainingset.Asthemeasureofnearness,weuseLRAtocalculatetherelationalsimilaritybetweenthetestingpairandthetrainingpairs.Thesinglenearestneighbouralgorithmisasupervisedlearningalgorithm(i.e.,itrequiresatrainingsetoflabeleddata),butweareusingLRAtomeasurethedistancebetweenapairanditspotentialneighbours,andLRAisitselfdeterminedinanunsupervisedfashion(i.e.,LRAdoesnotneedlabeleddata).

EachSATquestionhasfivechoices,soanswering374SATquestionsrequiredcal-culatingcosines.Thefactorof16comesfromthealternatepairs,step11inLRA.Withthenoun-modifierpairs,usingleave-one-outcross-validation,eachtestpairhas599choices,soanexhaustiveapplicationofLRAwouldrequirecalculat-ingcosines.Toreducetheamountofcomputationre-quired,wefirstfindthe30nearestneighboursforeachpair,ignoringthealternatepairs(cosines),andthenapplythefullLRA,includingthealternates,tojustthose30neighbours(cosines),whichrequirescalculatingonlycosines.

Thereare600wordpairsintheinputsetforLRA.Instep2,introducingalternatepairsmultipliesthenumberofpairsbyfour,resultingin2,400pairs.Instep5,foreachpair:,weadd:,yielding4,800pairs.However,somepairsaredroppedbecausetheycorrespondtozerovectorsandafewwordsdonotappearinLin’sthesaurus.Thesparsematrix(step7)has4,748rowsand8,000columns,withadensityof8.4%.

FollowingTurneyandLittman(2005),weevaluatetheperformancebyaccuracyandalsobythemacroaveragedmeasure(Lewis,1991).Macroaveragingcalculatestheprecision,recall,andforeachclassseparately,andthencalculatestheaverageacrossallclasses.Microaveragingcombinesthetruepositive,falsepositive,andfalsenegativecountsforalloftheclasses,andthencalculatesprecision,recall,andfromthecombinedcounts.Macroaveraginggivesequalweighttoallclasses,butmicroaveraginggivesmoreweighttolargerclasses.Weusemacroaveraging(givingequalweighttoallclasses),becausewehavenoreasontobelievethattheclasssizesinthedatasetreflecttheactualdistributionoftheclassesinarealcorpus.

Classificationwith30distinctclassesisahardproblem.Tomakethetaskeasier,we

29

ComputationalLinguisticsVolume1,Number1

Relationcauseeffectpurposedetraction

Abbr.cseffprpdetr

Examplephrasefluvirus(*)examanxietyconcerthall()

Description

headachepill

makesoccurorexist,isnecessaryandsufficientmakesoccurorexist,isnecessaryandsufficientisforV-ing,doesnotnecessarilyoccurorexist

opposes,isnotsufficienttoprevent

frequencytimeat

timethroughfreqtattthrdailyexercisemorningexercisesix-hourmeeting

occurseverytimeoccursoccurswhenoccursexistedwhileexisted,isanintervaloftime

directionlocationlocationatlocationfromagent

beneficiaryinstrumentobject

objectproperty

dirloclatlfragbeninstobjobj

outgoingmailhometowndesertstormforeigncapitalstudentproteststudentdiscountlaserprintermetalseparator

isdirectedtowardsnotthefinalpointisthelocationofislocatedatoriginatesat

,

is

performs,isanimateornaturalphenomenonbenefitsfromuses

isacteduponby

QUALITY

30

SimilarityofSemanticRelationsTurney

VSM-AVVSM-WMTSLRA

VSM-AVVSM-WMTSLRA

cancollapsethe30classesto5classes,usingthegroupingthatisgiveninTable19.Forexample,agentandbeneficiarybothcollapsetoparticipant.Onthe30classproblem,LRAwiththesinglenearestneighbouralgorithmachievesanaccuracyof39.8%(239/600)andamacroaveragedof36.6%.Alwaysguessingthemajorityclasswouldresultinanaccuracyof8.2%().Onthe5classproblem,theaccuracyis58.0%(348/600)andthemacroaveragedis54.6%.Alwaysguessingthemajorityclasswouldgiveanaccuracyof43.3%().Forboththe30classand5classproblems,LRA’saccuracyissignificantlyhigherthanguessingthemajorityclass,with95%confidence,accordingtotheFisherExactTest(Agresti,1990).

7.3LRAversusVSM

Table20showstheperformanceofLRAandVSMonthe30classproblem.VSM-AVisVSMwiththeAltaVistacorpusandVSM-WMTSisVSMwiththeWMTScorpus.TheresultsforVSM-AVaretakenfromTurneyandLittman(2005).Allthreepairwisediffer-encesinthethreemeasuresarestatisticallysignificantatthe95%level,accordingtothePairedT-Test(FeeldersandVerkooijen,1995).TheaccuracyofLRAissignificantlyhigherthantheaccuraciesofVSM-AVandVSM-WMTS,accordingtotheFisherExactTest(Agresti,1990),butthedifferencebetweenthetwoVSMaccuraciesisnotsignifi-cant.

Table21comparestheperformanceofLRAandVSMonthe5classproblem.TheaccuracyandmeasureofLRAaresignificantlyhigherthantheaccuraciesandmea-suresofVSM-AVandVSM-WMTS,butthedifferencesbetweenthetwoVSMaccuraciesandmeasuresarenotsignificant.

31

ComputationalLinguisticsVolume1,Number1

8.Discussion

TheexperimentalresultsinSections6and7demonstratethatLRAperformssignif-icantlybetterthantheVSM,butitisalsoclearthatthereisroomforimprovement.Theaccuracymightnotyetbeadequateforpracticalapplications,althoughpastworkhasshownthatitispossibletoadjustthetradeoffofprecisionversusrecall(TurneyandLittman,2005).Forsomeoftheapplications,suchasinformationextraction,LRAmightbesuitableifitisadjustedforhighprecision,attheexpenseoflowrecall.

Anotherlimitationisspeed;ittookalmostninedaysforLRAtoanswer374analogyquestions.However,withprogressincomputerhardware,speedwillgraduallybecomelessofaconcern.Also,thesoftwarehasnotbeenoptimizedforspeed;thereareseveralplaceswheretheefficiencycouldbeincreasedandmanyoperationsareparallelizable.ItmayalsobepossibletoprecomputemuchoftheinformationforLRA,althoughthiswouldrequiresubstantialchangestothealgorithm.

ThedifferenceinperformancebetweenVSM-AVandVSM-WMTSshowsthatVSMissensitivetothesizeofthecorpus.AlthoughLRAisabletosurpassVSM-AVwhentheWMTScorpusisonlyaboutonetenththesizeoftheAVcorpus,itseemslikelythatLRAwouldperformbetterwithalargercorpus.TheWMTScorpusrequiresoneterabyteofharddiskspace,butprogressinhardwarewilllikelymaketenorevenonehundredterabytesaffordableintherelativelynearfuture.

Fornoun-modifierclassification,morelabeleddatashouldyieldperformanceim-provements.With600noun-modifierpairsand30classes,theaverageclasshasonly20examples.Weexpectthattheaccuracywouldimprovesubstantiallywithfiveortentimesmoreexamples.Unfortunately,itistimeconsumingandexpensivetoacquirehand-labeleddata.

Anotherissuewithnoun-modifierclassificationisthechoiceofclassificationschemeforthesemanticrelations.The30classesofNastaseandSzpakowicz(2003)mightnotbethebestscheme.Otherresearchershaveproposeddifferentschemes(Vanderwende,1994;BarkerandSzpakowicz,1998;RosarioandHearst,2001;Rosario,Hearst,andFill-more,2002).Itseemslikelythatsomeschemesareeasierformachinelearningthanothers.Forsomeapplications,30classesmaynotbenecessary;the5classschememaybesufficient.

LRA,likeVSM,isacorpus-basedapproachtomeasuringrelationalsimilarity.Pastworksuggeststhatahybridapproach,combiningmultiplemodules,somecorpus-based,somelexicon-based,willsurpassanypurebredapproach(Turneyetal.,2003).Infuturework,itwouldbenaturaltocombinethecorpus-basedapproachofLRAwiththelexicon-basedapproachofVeale(2004),perhapsusingthecombinationmethodofTurneyetal.(2003).

TheSingularValueDecompositionisonlyoneofmanymethodsforhandlingsparse,noisydata.WehavealsoexperimentedwithNonnegativeMatrixFactorization(NMF)(LeeandSeung,1999),ProbabilisticLatentSemanticAnalysis(PLSA)(Hofmann,1999),KernelPrincipalComponentsAnalysis(KPCA)(Scholkopf,Smola,andMuller,1997),andIterativeScaling(IS)(Ando,2000).Wehadsomeinterestingresultswithsmallmatrices(around2,000rowsby1,000columns),butnoneofthesemethodsseemedsub-stantiallybetterthanSVDandnoneofthemscaleduptothematrixsizesweareusinghere(e.g.,17,232rowsand8,000columns;seeSection6.1).

Instep4ofLRA,wesimplyselectthetop

SimilarityofSemanticRelationsTurney

method,butitispossiblethatfutureworkwillfindamethodthatyieldssignificantimprovementinperformance.9.Conclusion

Thispaperhasintroducedanewmethodforcalculatingrelationalsimilarity,LatentRelationalAnalysis.TheexperimentsdemonstratethatLRAperformsbetterthantheVSMapproach,whenevaluatedwithSATwordanalogyquestionsandwiththetaskofclassifyingnoun-modifierexpressions.TheVSMapproachrepresentstherelationbe-tweenapairofwordswithavector,inwhichtheelementsarebasedonthefrequenciesof64hand-builtpatternsinalargecorpus.LRAextendsthisapproachinthreeways:(1)thepatternsaregenerateddynamicallyfromthecorpus,(2)SVDisusedtosmooththedata,and(3)athesaurusisusedtoexplorevariationsofthewordpairs.WiththeWMTScorpus(aboutEnglishwords),LRAachievesanof56.5%,whereastheofVSMis40.3%.

Wehavepresentedseveralexamplesofthemanypotentialapplicationsformea-suresofrelationalsimilarity.Justasattributionalsimilaritymeasureshaveproventohavemanypracticaluses,weexpectthatrelationalsimilaritymeasureswillsoonbe-comewidelyused.Gentneretal.(2001)arguethatrelationalsimilarityisessentialtounderstandingnovelmetaphors(asopposedtoconventionalmetaphors).Manyre-searchershavearguedthatmetaphoristheheartofhumanthinking(LakoffandJohn-son,1980;HofstadterandtheFluidAnalogiesResearchGroup,1995;Gentneretal.,2001;French,2002).Webelievethatrelationalsimilarityplaysafundamentalroleinthemindandthereforerelationalsimilaritymeasurescouldbecrucialforartificialintelli-gence.

Infuturework,weplantoinvestigatesomepotentialapplicationsforLRA.Itispos-siblethattheerrorrateofLRAisstilltoohighforpracticalapplications,butthefactthatLRAmatchesaveragehumanperformanceonSATanalogyquestionsisencouraging.

33

ComputationalLinguisticsAcknowledgments

Volume1,Number1

ThankstoMichaelLittmanforsharingthe374SATanalogyquestionsandforinspiringmetotacklethem.ThankstoViviNastaseandStanSzpakowiczforsharingtheir600classifiednoun-modifierphrases.ThankstoEgidioTerra,Char-Berland,MatthewandEugeneCharniak.

1999.Findingpartsinverylargecor-lieClarke,andtheSchoolofComputerScience

pora.InProceedingsofthe37thAnnualoftheUniversityofWaterloo,forgivingusa

MeetingoftheAssociationforComputa-copyoftheWaterlooMultiTextSystemandtheir

tionalLinguistics(ACL’99),pages57–TerabyteCorpus.ThankstoDekangLinformak-64,NewBrunswick,NJ.inghisDependency-BasedWordSimilaritylex-iconavailableonline.ThankstoDougRohdeBerry,MichaelW.1992.Largescalesingu-forSVDLIBCandMichaelBerryforSVDPACK.larvaluecomputations.International

ThankstoTedPedersenformakinghisWord-JournalofSupercomputerApplications,

Net::Similaritypackageavailable.ThankstoJoel6(1):13–49.Martinforcommentsonthepaper.Thanksto

theanonymousreviewersofComputationalLin-Budanitsky,AlexanderandGraemeHirst.

2001.Semanticdistanceinwordnet:guisticsfortheirveryhelpfulcommentsandsug-Anexperimental,application-orientedgestions.

tationalLinguisticsandSeventeenthIn-ternationalConferenceonComputationalLinguistics(COLING-ACL’98),pages96–102,SanFrancisco,California.MorganKaufmannPublishers.

References

Agresti,Alan.1990.CategoricalDataAnal-ysis.Wiley.

Ando,RieKubota.2000.Latentseman-ticspace:Iterativescalingimproves

precisionofinter-documentsimilarityChiarello,Christine,CurtBurgess,Lorie

Richards,andAlmaPollock.1990.Se-measurement.InProceedingsofthe

manticandassociativepriminginthe23rdAnnualACMSIGIRConferenceon

cerebralhemispheres:Somewordsdo,ResearchandDevelopmentinInformation

somewordsdon’t...sometimes,someRetrieval(SIGIR-2000),pages216–223.

places.BrainandLanguage,38:75–104.

Baeza-Yates,RicardoA.andBerthierA.

Ribeiro-Neto.1999.ModernInforma-Claman,Cathy.2000.10RealSATs.Col-legeEntranceExaminationBoard.tionRetrieval.ACMPress.Clarke,CharlesL.A.,GordonV.Cormack,

Banerjee,SatanjeevandTedPedersen.

andChristopherR.Palmer.1998.An

2003.Extendedglossoverlapsasa

overviewofmultitext.ACMSIGIRFo-measureofsemanticrelatedness.In

rum,32(2):14–15.

ProceedingsoftheEighteenthInterna-tionalJointConferenceonArtificialIntel-Daganzo,CarlosF.1994.Thecelltrans-ligence(IJCAI-03),pages805–810,Aca-missionmodel:Adynamicrepresen-pulco,Mexico.tationofhighwaytrafficconsistent

withthehydrodynamictheory.Trans-Barker,KenandStanSzpakowicz.1998.portationResearchPartB:Methodologi-Semi-automaticrecognitionofnouncal,28(4):269–287.

modifierrelationships.InChristian

BoitetandPeteWhitelock,editors,Deerwester,ScottC.,SusanT.Dumais,

ThomasK.Landauer,GeorgeW.Fur-ProceedingsoftheThirty-SixthAnnual

nas,andRichardA.Harshman.1990.MeetingoftheAssociationforCompu-

evaluationoffivemeasures.InPro-ceedingsoftheWorkshoponWordNetandOtherLexicalResources,SecondMeet-ingoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics(NAACL-2001),pages29–24,Pitts-burgh,PA.

34

SimilarityofSemanticRelationsTurney

Fellbaum,Christiane,editor.1998.Word-Net:AnElectronicLexicalDatabase.Dolan,WilliamB.1995.Metaphor

MITPress.asanemergentpropertyofmachine-readabledictionaries.InProceedingsofFrench,RobertM.2002.Thecompu-theAAAI1995SpringSymposiumSeries:tationalmodelingofanalogy-making.

RepresentationandAcquisitionofLexi-TrendsinCognitiveSciences,6(5):200–

calKnowledge:Polysemy,Ambiguityand205.Generativity,pages27–32.

Gentner,Dedre.1983.Structure-mapping:

Dumais,SusanT.1990.EnhancingAtheoreticalframeworkforanalogy.

performanceinlatentsemanticindex-CognitiveScience,7(2):155–170.

ing(LSI)retrieval.TechnicalRe-portTM-ARH-017527,Bellcore,Mor-Gentner,Dedre,BrianBowdle,Phillip

Wolff,andConsueloBoronat.2001.ristown,NJ.

Metaphorislikeanalogy.InDedre

Dumais,SusanT.1993.LatentsemanticGentner,KeithJ.Holyoak,andBoi-indexing(LSI)andTREC-2.InD.K.choN.Kokinov,editors,TheAnalogical

Harman,editor,ProceedingsoftheSec-Mind:PerspectivesfromCognitiveSci-ondTextREtrievalConference(TREC-ence,pages199–253,Cambridge,MA.

2),pages105–115.NationalInstituteofMITPress.StandardsandTechnology.

Gildea,DanielandDanielJurafsky.2002.

Automaticlabelingofsemanticroles.Falkenhainer,Brian.1990.Analogicalin-ComputationalLinguistics,28(3):245–terpretationincontext.InProceedings

288.oftheTwelfthAnnualConferenceofthe

CognitiveScienceSociety,pages69–76.

Girju,Roxana,AdrianaBadulescu,and

LawrenceErlbaumAssociates.

DanI.Moldovan.2003.Learn-ingsemanticconstraintsfortheau-Falkenhainer,Brian,KennethD.Forbus,

tomaticdiscoveryofpart-wholerela-andDedreGentner.1989.The

tions.InProceedingsoftheHumanLan-structure-mappingengine:Algorithm

guageTechnologyConferenceoftheNorthandexamples.ArtificialIntelligence,

AmericanChapteroftheAssociationfor41(1):1–63.

ComputationalLinguistics(HLT-NAACL

Federici,Stefano,SimonettaMontemagni,2003),pages80–87.

andVitoPirrelli.1997.Inferringse-2005.Theem-manticsimilarityfromdistributionalGoldenberg,David.

peror’snewclothes:Undressingevidence:Ananalogy-basedapproach

thenewandunimprovedsat.Gelftowordsensedisambiguation.In

Magazine,March.http://www.gelf-ProceedingsoftheACL/EACLWork-magazine.com/mt/archives/theshoponAutomaticInformationExtrac-newtionandBuildingofLexicalSemanticRe-sourcesforNLPApplications,pages90–

97,Madrid,Spain.Feelders,AdandWilliamVerkooijen.

1995.Whichmethodlearnsthemostfromdata?Methodologicalissuesintheanalysisofcomparativestudies.InFifthInternationalWorkshoponArtificial

Indexingbylatentsemanticanalysis.JournaloftheAmericanSocietyforInfor-mationScience(JASIS),41(6):391–407.

IntelligenceandStatistics,pages219–225,Ft.Lauderdale,Florida.

ComputationalLinguisticsVolume1,Number1

NinthAnnualInternationalACMSI-Lakoff,GeorgeandMarkJohnson.1980.GIRConferenceonResearchandDe-MetaphorsWeLiveBy.Universityof

velopmentinInformationRetrieval(SI-ChicagoPress,Chicago,IL.

GIR’86),pages186–193,Pisa,Italy.

Landauer,ThomasK.andSusanT.Du-Hearst,MartiA.1992.Automaticac-mais.1997.AsolutiontoPlato’sprob-quisitionofhyponymsfromlargetextlem:Thelatentsemanticanalysisthe-corpora.InProceedingsoftheFour-oryoftheacquisition,induction,andteenthInternationalConferenceonCom-representationofknowledge.Psycho-putationalLinguistics,pages539–545,logicalReview,104(2):211–240.Nantes,France.

Lapata,MirellaandFrankKeller.2004.

Hirst,GraemeandDavidSt-Onge.1998.Thewebasabaseline:Evaluatingthe

Lexicalchainsasrepresentationsofperformanceofunsupervisedweb-contextforthedetectionandcorrec-basedmodelsforarangeofNLPtionofmalapropisms.InChristianetasks.InProceedingsoftheHumanLan-Fellbaum,editor,WordNet:AnElec-guageTechnologyConferenceoftheNorthtronicLexicalDatabase,pages305–332.AmericanChapteroftheAssociationforMITPress.ComputationalLinguistics(HLT-NAACL

2004),pages121–128.

Hofmann,Thomas.1999.Probabilistic

LatentSemanticIndexing.InProceed-Lauer,Mark.1995.DesigningStatisticalingsofthe22ndAnnualACMConferenceLanguageLearners:ExperimentsonCom-onResearchandDevelopmentinInforma-poundNouns.Ph.D.thesis,MacquarietionRetrieval(SIGIR’99),pages50–57,University.Berkeley,California,August.

Leacock,ClaudiaandMartinChodorow.

Hofstadter,DouglasandtheFluidAnalo-1998.Combininglocalcontextand

giesResearchGroup.1995.FluidCon-WordNetsimilarityforwordsense

ceptsandCreativeAnalogies:Computeridentification.InChristianeFellbaum,ModelsoftheFundamentalMechanismseditor,WordNet:AnElectronicLexicalofThought.BasicBooks,NewYork,Database,pages265–283.MITPress.NY.

Lee,DanielD.andH.SebastianSeung.

Jarmasz,MarioandStanSzpakowicz.1999.Learningthepartsofobjectsby

2003.Roget’sthesaurusandsemanticnonnegativematrixfactorization.Na-similarity.InProceedingsoftheInterna-ture,401:788–791.tionalConferenceonRecentAdvancesin

NaturalLanguageProcessing(RANLP-Lesk,MichaelE.1969.Word-wordassoci-ationsindocumentretrievalsystems.03),pages212–219,Borovets,Bulgaria.

AmericanDocumentation,20(1):27–38.

Jiang,JayJ.andDavidW.Conrath.1997.

SemanticsimilaritybasedoncorpusLesk,MichaelE.1986.Automaticsense

disambiguationusingmachineread-statisticsandlexicaltaxonomy.InPro-abledictionaries:HowtotellapineceedingsoftheInternationalConference

conefromaicecreamcone.InProceed-onResearchinComputationalLinguistics

ingsofACMSIGDOC’86,pages24–26.(ROCLINGX),pages19–33,Tapei,Tai-wan.

Lewis,DavidD.1991.Evaluatingtextcat-Kurtz,Stanley.2002.Testingdebate.egorization.InProceedingsoftheSpeech

NationalReviewMagazine,August.andNaturalLanguageWorkshop,pageshttp://www.nationalreview.com/kur-312–318,Asilomar,CA.MorganKauf-tz/kurtz082102.asp.mann.

36

SimilarityofSemanticRelationsTurney

Lin,Dekang.1998a.AutomaticretrievalProceedingsofACMSIGKDDConfer-andclusteringofsimilarwords.InenceonKnowledgeDiscoveryandData

Proceedingsofthe36thAnnualMeet-Mining,pages613–619.

ingoftheAssociationforComputational

Linguisticsandthe17thInternationalRada,Roy,HafedhMili,EllenBicknell,

andMariaBlettner.1989.Develop-ConferenceonComputationalLinguis-mentandapplicationofametriconse-tics(COLING-ACL’98),pages768–774,

manticnets.IEEETransactionsonSys-Montreal,Canada.

tems,Man,andCybernetics,19(1):17–30.

Lin,Dekang.1998b.Aninformation-theoreticdefinitionofsimilarity.InRehder,Bob,M.E.Schreiner,MichaelB.W.Proceedingsofthe15thInternationalCon-Wolfe,DarrellLaham,ThomasK.Lan-ferenceonMachineLearning(ICML’98),dauer,andWalterKintsch.1998.Us-pages296–304.MorganKaufmann,inglatentsemanticanalysistoassessSanFrancisco,CA.knowledge:Sometechnicalconsidera-tions.DiscourseProcesses,25:337–354.Marx,Zvika,IdoDagan,JoachimBuh-mann,andEliShamir.2002.Cou-Reitman,WalterR.1965.Cognitionand

pledclustering:Amethodfordetect-Thought:AnInformationProcessingAp-ingstructuralcorrespondence.Jour-proach.JohnWileyandSons,New

nalofMachineLearningResearch,3:747–

York,NY.

780.Medin,DouglasL.,RobertL.Goldstone,Resnik,Philip.1995.Usinginforma-tioncontenttoevaluatesemanticsim-andDedreGentner.1990.Similar-ilarityinataxonomy.InProceedingsityinvolvingattributesandrelations:

ofthe14thInternationalJointConfer-Judgmentsofsimilarityanddifference

enceonArtificialIntelligence(IJCAI-95),arenotinverses.PsychologicalScience,

pages448–453,SanMateo,CA.Mor-1(1):64–69.

ganKaufmann.

Moldovan,Dan,AdrianaBadulescu,

MartaTatu,DanielAntohe,andRiloff,EllenandRosieJones.1999.RoxanaGirju.2004.ModelsforLearningdictionariesforinformationthesemanticclassificationofnounextractionbymulti-levelbootstrap-phrases.InProceedingsoftheCom-ping.InProceedingsoftheSixteenthNa-putationalLexicalSemanticsWorkshoptionalConferenceonArtificialIntelligenceatHLT-NAACL2004,pages60–67,(AAAI-99),pages474–479.Boston,MA.

Rosario,BarbaraandMartiHearst.

Morris,JaneandGraemeHirst.1991.Lex-2001.Classifyingthesemanticre-icalcohesioncomputedbythesaurallationsinnoun-compoundsviaarelationsasanindicatorofthestruc-domain-specificlexicalhierarchy.In

tureoftext.ComputationalLinguistics,Proceedingsofthe2001Conferenceon17(1):21–48.EmpiricalMethodsinNaturalLanguage

Processing(EMNLP-01),pages82–90.Nastase,ViviandStanSzpakowicz.2003.

Exploringnoun-modifiersemanticre-lations.InFifthInternationalWorkshopRosario,Barbara,MartiHearst,andCharlesFillmore.2002.Thede-onComputationalSemantics(IWCS-5),

scentofhierarchy,andselectioninre-pages285–301,Tilburg,TheNether-lationalsemantics.InProceedingsofthelands.

40thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL’02),Pantel,PatrickandDekangLin.2002.Dis-pages417–424,Philadelphia,PA.coveringwordsensesfromtext.In

37

ComputationalLinguisticsVolume1,Number1

Ruge,Gerda.1992.Experimentson

linguistically-basedtermassociations.InformationProcessingandManagement,28(3):317–332.

InternationalJointConferenceonArtifi-cialIntelligence(IJCAI-05),pages1136–1141,Edinburgh,Scotland.

Turney,PeterD.andMichaelL.Littman.

2005.Corpus-basedlearningofanalo-Salton,Gerard.1989.AutomaticText

giesandsemanticrelations.MachineProcessing:TheTransformation,Analy-Learning,60(1–3):251–278.sis,andRetrievalofInformationbyCom-puter.Addison-Wesley,Reading,MA.

Turney,PeterD.,MichaelL.Littman,Jef-freyBigham,andVictorShnayder.Salton,GerardandChrisBuckley.1988.

2003.Combiningindependentmod-Term-weightingapproachesinauto-ulestosolvemultiple-choicesynonymmatictextretrieval.InformationPro-andanalogyproblems.InProceed-cessingandManagement,24(5):513–523.

ingsoftheInternationalConferenceonRecentAdvancesinNaturalLanguageSalton,GerardandMichaelJ.McGill.

Processing(RANLP-03),pages482–489,1983.IntroductiontoModernInfor-Borovets,Bulgaria.mationRetrieval.McGraw-Hill,New

York,NY.

Vanderwende,Lucy.1994.Algorithm

forautomaticinterpretationofnounScholkopf,Bernhard,AlexanderJ.Smola,

sequences.InProceedingsoftheFif-andKlaus-RobertMuller.1997.Ker-teenthInternationalConferenceonCom-nelprincipalcomponentanalysis.In

putationalLinguistics,pages782–788,ProceedingsoftheInternationalCon-Kyoto,Japan.ferenceonArtificialNeuralNetworks

(ICANN-1997),pages583–588,Berlin.

Veale,Tony.2003.Theanalogicalthe-saurus.InProceedingsofthe15thIn-Terra,EgidioandCharlesL.A.Clarke.

novativeApplicationsofArtificialIntel-2003.Frequencyestimatesforstatis-ligenceConference(IAAI2003),pagesticalwordsimilaritymeasures.InPro-137–142,Acapulco,Mexico.ceedingsoftheHumanLanguageTechnol-ogyandNorthAmericanChapterofAsso-Veale,Tony.2004.WordNetsitstheSAT:

ciationofComputationalLinguisticsCon-Aknowledge-basedapproachtolexi-ference2003(HLT/NAACL2003),pagescalanalogy.InProceedingsofthe16th244–251.EuropeanConferenceonArtificialIntelli-gence(ECAI2004),pages606–612,Va-Turney,PeterD.2001.MiningtheWeb

lencia,Spain.forsynonyms:PMI-IRversusLSAon

TOEFL.InProceedingsoftheTwelfth

EuropeanConferenceonMachineLearn-ing,pages491–502,Berlin.Springer.

Yangarber,Roman.2003.Counter-trainingindiscoveryofsemanticpat-terns.InProceedingsofthe41stAnnual

MeetingoftheAssociationforCompu-tationalLinguistics(ACL-2003),pages343–350,Sapporo,Japan.

Turney,PeterD.2002.Thumbsup

orthumbsdown?Semanticorienta-tionappliedtounsupervisedclassifi-cationofreviews.InProceedingsoftheYarowsky,David.1993.Onesenseper40thAnnualMeetingoftheAssociationcollocation.InProceedingsoftheARPAforComputationalLinguistics(ACL’02),HumanLanguageTechnologyWorkshop,pages417–424.pages266–271,Princeton,NJ.Turney,PeterD.2005.Measuringseman-Zelenko,Dmitry,ChinatsuAone,andAn-thonyRichardella.2003.Kernelmeth-ticsimilaritybylatentrelationalanal-odsforrelationextraction.Journalysis.InProceedingsoftheNineteenth

38

SimilarityofSemanticRelationsTurney

ofMachineLearningResearch,3:1083–1106.

39

因篇幅问题不能全部显示,请点此查看更多更全内容

Top