PeterD.Turney
NationalResearchCouncilCanada
Thereareatleasttwokindsofsimilarity.Relationalsimilarityiscorrespondencebetweenre-lations,incontrastwithattributionalsimilarity,whichiscorrespondencebetweenattributes.Whentwowordshaveahighdegreeofattributionalsimilarity,wecallthemsynonyms.Whentwopairsofwordshaveahighdegreeofrelationalsimilarity,wesaythattheirrelationsareanal-ogous.Forexample,thewordpairmason:stoneisanalogoustothepaircarpenter:wood.ThispaperintroducesLatentRelationalAnalysis(LRA),amethodformeasuringrelationalsimilar-ity.LRAhaspotentialapplicationsinmanyareas,includinginformationextraction,wordsensedisambiguation,andinformationretrieval.RecentlytheVectorSpaceModel(VSM)ofinforma-tionretrievalhasbeenadaptedtomeasuringrelationalsimilarity,achievingascoreof47%onacollectionof374college-levelmultiple-choicewordanalogyquestions.IntheVSMapproach,therelationbetweenapairofwordsischaracterizedbyavectoroffrequenciesofpredefinedpatternsinalargecorpus.LRAextendstheVSMapproachinthreeways:(1)thepatternsarederivedautomaticallyfromthecorpus,(2)theSingularValueDecomposition(SVD)isusedtosmooththefrequencydata,and(3)automaticallygeneratedsynonymsareusedtoexplorevariationsofthewordpairs.LRAachieves56%onthe374analogyquestions,statisticallyequivalenttotheaveragehumanscoreof57%.Ontherelatedproblemofclassifyingsemanticrelations,LRAachievessimilargainsovertheVSM.1.Introduction
Thereareatleasttwokindsofsimilarity.Attributionalsimilarityiscorrespondencebe-tweenattributesandrelationalsimilarityiscorrespondencebetweenrelations(Medin,Goldstone,andGentner,1990).Whentwowordshaveahighdegreeofattributionalsimilarity,wecallthemsynonyms.Whentwowordpairshaveahighdegreeofrela-tionalsimilarity,wesaytheyareanalogous.
VerbalanalogiesareoftenwrittenintheformA:B::C:D,meaningAistoBasCistoD;forexample,traffic:street::water:riverbed.Trafficflowsoverastreet;waterflowsoverariverbed.Astreetcarriestraffic;ariverbedcarrieswater.Thereisahighdegreeofrela-tionalsimilaritybetweenthewordpairtraffic:streetandthewordpairwater:riverbed.Infact,thisanalogyisthebasisofseveralmathematicaltheoriesoftrafficflow(Da-ganzo,1994).
InSection2,welookmorecloselyattheconnectionsbetweenattributionalandre-lationalsimilarity.Inanalogiessuchasmason:stone::carpenter:wood,itseemsthatre-lationalsimilaritycanbereducedtoattributionalsimilarity,sincemasonandcarpenterareattributionallysimilar,asarestoneandwood.Ingeneral,thisreductionfails.Con-sidertheanalogytraffic:street::water:riverbed.Trafficandwaterarenotattributionallysimilar.Streetandriverbedareonlymoderatelyattributionallysimilar.
ComputationalLinguisticsVolume1,Number1
Manyalgorithmshavebeenproposedformeasuringtheattributionalsimilaritybe-tweentwowords(Lesk,1969;Resnik,1995;LandauerandDumais,1997;JiangandConrath,1997;Lin,1998b;Turney,2001;BudanitskyandHirst,2001;BanerjeeandPed-ersen,2003).Measuresofattributionalsimilarityhavebeenstudiedextensively,duetotheirapplicationsinproblemssuchasrecognizingsynonyms(LandauerandDumais,1997),informationretrieval(Deerwesteretal.,1990),determiningsemanticorientation(Turney,2002),gradingstudentessays(Rehderetal.,1998),measuringtextualcohesion(MorrisandHirst,1991),andwordsensedisambiguation(Lesk,1986).
Ontheotherhand,sincemeasuresofrelationalsimilarityarenotaswelldevelopedasmeasuresofattributionalsimilarity,thepotentialapplicationsofrelationalsimilarityarenotaswellknown.Manyproblemsthatinvolvesemanticrelationswouldbenefitfromanalgorithmformeasuringrelationalsimilarity.Wediscussrelatedproblemsinnaturallanguageprocessing,informationretrieval,andinformationextractioninmoredetailinSection3.
ThispaperbuildsontheVectorSpaceModel(VSM)ofinformationretrieval.Givenaquery,asearchengineproducesarankedlistofdocuments.Thedocumentsarerankedinorderofdecreasingattributionalsimilaritybetweenthequeryandeachdoc-ument.AlmostallmodernsearchenginesmeasureattributionalsimilarityusingtheVSM(Baeza-YatesandRibeiro-Neto,1999).TurneyandLittman(2005)adapttheVSMapproachtomeasuringrelationalsimilarity.Theyusedavectoroffrequenciesofpat-ternsinacorpustorepresenttherelationbetweenapairofwords.Section4presentstheVSMapproachtomeasuringsimilarity.
InSection5,wepresentanalgorithmformeasuringrelationalsimilarity,whichwecallLatentRelationalAnalysis(LRA).Thealgorithmlearnsfromalargecorpusofunlabeled,unstructuredtext,withoutsupervision.LRAextendstheVSMapproachofTurneyandLittman(2005)inthreeways:(1)Theconnectingpatternsarederivedautomaticallyfromthecorpus,insteadofusingafixedsetofpatterns.(2)SingularValueDecomposition(SVD)isusedtosmooththefrequencydata.(3)Givenawordpairsuchastraffic:street,LRAconsiderstransformationsofthewordpair,generatedbyreplacingoneofthewordsbysynonyms,suchastraffic:road,traffic:highway.
Section6presentsourexperimentalevaluationofLRAwithacollectionof374multiple-choicewordanalogyquestionsfromtheSATcollegeentranceexam.1Anex-ampleofatypicalSATquestionappearsinTable1.Intheeducationaltestingliterature,thefirstpair(mason:stone)iscalledthestemoftheanalogy.Thecorrectchoiceiscalledthesolutionandtheincorrectchoicesaredistractors.WeevaluateLRAbytestingitsabil-itytoselectthesolutionandavoidthedistractors.Theaverageperformanceofcollege-boundseniorhighschoolstudentsonverbalSATquestionscorrespondstoanaccuracyofabout57%.LRAachievesanaccuracyofabout56%.Onthesesamequestions,theVSMattained47%.
Oneapplicationforrelationalsimilarityisclassifyingsemanticrelationsinnoun-modifierpairs(TurneyandLittman,2005).InSection7,weevaluatetheperformanceofLRAwithasetof600noun-modifierpairsfromNastaseandSzpakowicz(2003).Theproblemistoclassifyanoun-modifierpair,suchas“laserprinter”,accordingtothesemanticrelationbetweentheheadnoun(printer)andthemodifier(laser).The600pairshavebeenmanuallylabeledwith30classesofsemanticrelations.Forexample,“laserprinter”isclassifiedasinstrument;theprinterusesthelaserasaninstrumentfor
SimilarityofSemanticRelationsTurney
Stem:mason:stone
Solution:(b)carpenter:wood
,).
Theamountofattributionalsimilaritybetweentwowords,and,dependsonthedegreeofcorrespondencebetweenthepropertiesofand.Ameasureofattributionalsimilarityisafunctionthatmapstwowords,and,toarealnumber,
.Themorecorrespondencethereisbetweenthepropertiesofand,thegreatertheirattributionalsimilarity.Forexample,dogandwolfhavearelativelyhighdegreeofattributionalsimilarity.
Theamountofrelationalsimilaritybetweentwopairsofwords,A:BandC:D,de-pendsonthedegreeofcorrespondencebetweentherelationsbetweenandandtherelationsbetweenand.Ameasureofrelationalsimilarityisafunctionthatmapstwopairs,A:BandC:D,toarealnumber,.Themorecor-respondencethereisbetweentherelationsofA:BandC:D,thegreatertheirrelationalsimilarity.Forexample,dog:barkandcat:meowhavearelativelyhighdegreeofrelational
THAN(
3
ComputationalLinguisticsVolume1,Number1
similarity.
Cognitivescientistsdistinguishwordsthataresemanticallyassociated(bee–honey)fromwordsthataresemanticallysimilar(deer–pony),althoughtheyrecognizethatsomewordsarebothassociatedandsimilar(doctor–nurse)(Chiarelloetal.,1990).Bothofthesearetypesofattributionalsimilarity,sincetheyarebasedoncorrespondencebe-tweenattributes(e.g.,beesandhoneyarebothfoundinhives;deerandponiesarebothmammals).
BudanitskyandHirst(2001)describesemanticrelatednessasfollows:
Recentresearchonthetopicincomputationallinguisticshasemphasizedtheperspectiveofsemanticrelatednessoftwolexemesinalexicalresource,orits
inverse,semanticdistance.It’simportanttonotethatsemanticrelatednessisamoregeneralconceptthansimilarity;similarentitiesareusuallyassumedtoberelatedbyvirtueoftheirlikeness(bank-trustcompany),butdissimilarentitiesmayalsobesemanticallyrelatedbylexicalrelationshipssuchasmeronymy(car-wheel)andantonymy(hot-cold),orjustbyanykindoffunctionalrelationshiporfrequentassociation(pencil-paper,penguin-Antarctica).
Astheseexamplesshow,semanticrelatednessisthesameasattributionalsimilarity(e.g.,hotandcoldarebothkindsoftemperature,pencilandpaperarebothusedforwriting).Hereweprefertousethetermattributionalsimilarity,becauseitemphasizesthecontrastwithrelationalsimilarity.Thetermsemanticrelatednessmayleadtoconfu-sionwhenthetermrelationalsimilarityisalsounderdiscussion.
Resnik(1995)describessemanticsimilarityasfollows:
Semanticsimilarityrepresentsaspecialcaseofsemanticrelatedness:forexample,carsandgasolinewouldseemtobemorecloselyrelatedthan,say,carsandbicycles,butthelatterpairarecertainlymoresimilar.Radaetal.(1989)suggestthattheassessmentofsimilarityinsemanticnetworkscaninfactbethoughtofasinvolvingjusttaxonimic(IS-A)links,totheexclusionofotherlinktypes;thatviewwillalsobetakenhere,althoughadmittedlyitexcludessomepotentiallyusefulinformation.
Thussemanticsimilarityisaspecifictypeofattributionalsimilarity.Thetermsemanticsimilarityismisleading,becauseitreferstoatypeofattributionalsimilarity,yetrela-tionalsimilarityisnotanylesssemanticthanattributionalsimilarity.
Toavoidconfusion,wewillusethetermsattributionalsimilarityandrelationalsim-ilarity,followingMedin,Goldstone,andGentner(1990).Insteadofsemanticsimilarity(Resnik,1995)orsemanticallysimilar(Chiarelloetal.,1990),wepreferthetermtaxonomi-calsimilarity,whichwetaketobeaspecifictypeofattributionalsimilarity.Weinterpretsynonymyasahighdegreeofattributionalsimilarity.Analogyisahighdegreeofrela-tionalsimilarity.
2.2MeasuringAttributionalSimilarity
Algorithmsformeasuringattributionalsimilaritycanbelexicon-based(Lesk,1986;Bu-danitskyandHirst,2001;BanerjeeandPedersen,2003),corpus-based(Lesk,1969;Lan-dauerandDumais,1997;Lin,1998a;Turney,2001),orahybridofthetwo(Resnik,1995;JiangandConrath,1997;Turneyetal.,2003).Intuitively,wemightexpectthatlexicon-basedalgorithmswouldbebetteratcapturingsynonymythancorpus-basedalgorithms,sincelexicons,suchasWordNet,explicitlyprovidesynonymyinformationthatisonlyimplicitinacorpus.However,experimentsdonotsupportthisintuition.
Severalalgorithmshavebeenevaluatedusing80multiple-choicesynonymques-
4
SimilarityofSemanticRelationsTurney
Stem:levied
Solution:(a)imposed
Table3
Performanceofattributionalsimilaritymeasuresonthe80TOEFLquestions.(Theaverage
non-EnglishUScollegeapplicant’sperformanceisincludedinthebottomrow,forcomparison.)
JarmaszandSzpakowicz(2003)TerraandClarke(2003)Turneyetal.(2003)bestlexicon-basedalgorithmbestcorpus-basedalgorithmbesthybridalgorithm78.7581.2597.50
tionstakenfromtheTestofEnglishasaForeignLanguage(TOEFL).Anexampleofoneofthe80TOEFLquestionsappearsinTable2.Table3showsthebestperformanceontheTOEFLquestionsforeachtypeofattributionalsimilarityalgorithm.Theresultssupporttheclaimthatlexicon-basedalgorithmshavenoadvantageovercorpus-basedalgorithmsforrecognizingsynonymy.
2.3UsingAttributionalSimilaritytoSolveAnalogies
Wemaydistinguishnearanalogies(mason:stone::carpenter:wood)fromfaranalogies(traffic:street::water:riverbed)(Gentner,1983;Medin,Goldstone,andGentner,1990).InananalogyA:B::C:D,wherethereisahighdegreeofrelationalsimilaritybetweenA:BandC:D,ifthereisalsoahighdegreeofattributionalsimilaritybetweenand,andbetweenand,thenA:B::C:Disanearanalogy;otherwise,itisafaranalogy.
ItseemspossiblethatSATanalogyquestionsmightconsistlargelyofnearanalogies,inwhichcasetheycanbesolvedusingattributionalsimilaritymeasures.Wecouldscoreeachcandidateanalogybytheaverageoftheattributionalsimilarity,,betweenandandbetweenand:
(2)
5
ComputationalLinguisticsVolume1,Number1
AlgorithmTypePrecisionRecall
TurneyandLittman(2005)randomrelational(VSM)random47.720.047.120.047.420.0
(3)
Seehttp://www.d.umn.edu/tpederse/similarity.html.
6
SimilarityofSemanticRelationsTurney
wasfirstattemptedbyasystemcalledArgus(Reitman,1965),usingasmallhand-builtsemanticnetwork.Arguscouldonlysolvethelimitedsetofanalogyquestionsthatitsprogrammerhadanticipated.Arguswasbasedonaspreadingactivationmodelanddidnotexplicitlyattempttomeasurerelationalsimilarity.
Turneyetal.(2003)combined13independentmodulestoanswerSATquestions.Thefinaloutputofthesystemwasbasedonaweightedcombinationoftheoutputsofeachindividualmodule.Thebestofthe13moduleswastheVSM,whichisdescribedindetailinTurneyandLittman(2005).TheVSMwasevaluatedonasetof374SATquestions,achievingascoreof47%.
Incontrastwiththecorpus-basedapproachofTurneyandLittman(2005),Veale(2004)appliedalexicon-basedapproachtothesame374SATquestions,attainingascoreof43%.VealeevaluatedthequalityofacandidateanalogyA:B::C:DbylookingforpathsinWordNet,joiningtoandto.ThequalitymeasurewasbasedonthesimilaritybetweentheA:BpathsandtheC:Dpaths.
Turney(2005)introducedLatentRelationalAnalysis(LRA),anenhancedversionoftheVSMapproach,whichreached56%onthe374SATquestions.HerewegobeyondTurney(2005)bydescribingLRAinmoredetail,performingmoreextensiveexperi-ments,andanalyzingthealgorithmandrelatedworkinmoredepth.
3.2StructureMappingTheory
French(2002)citesStructureMappingTheory(SMT)(Gentner,1983)anditsimplemen-tationintheStructureMappingEngine(SME)(Falkenhainer,Forbus,andGentner,1989)asthemostinfluentialworkonmodelingofanalogy-making.Thegoalofcomputationalmodelingofanalogy-makingistounderstandhowpeopleformcomplex,structuredanalogies.SMEtakesrepresentationsofasourcedomainandatargetdomain,andpro-ducesananalogicalmappingbetweenthesourceandtarget.Thedomainsaregivenstructuredpropositionalrepresentations,usingpredicatelogic.Thesedescriptionsin-cludeattributes,relations,andhigher-orderrelations(expressingrelationsbetweenre-lations).Theanalogicalmappingconnectssourcedomainrelationstotargetdomainrelations.
Forexample,thereisananalogybetweenthesolarsystemandRutherford’smodeloftheatom(Falkenhainer,Forbus,andGentner,1989).ThesolarsystemisthesourcedomainandRutherford’smodeloftheatomisthetargetdomain.Thebasicobjectsinthesourcemodelaretheplanetsandthesun.Thebasicobjectsinthetargetmodelaretheelectronsandthenucleus.Theplanetsandthesunhavevariousattributes,suchasmass(sun)andmass(planet),andvariousrelations,suchasrevolve(planet,sun)andattracts(sun,planet).Likewise,thenucleusandtheelectronshaveattributes,suchascharge(electron)andcharge(nucleus),andrelations,suchasrevolve(electron,nucleus)andattracts(nucleus,electron).SMEmapsrevolve(planet,sun)torevolve(electron,nu-cleus)andattracts(sun,planet)toattracts(nucleus,electron).
Eachindividualconnection(e.g.,fromrevolve(planet,sun)torevolve(electron,nu-cleus))inananalogicalmappingimpliesthattheconnectedrelationsaresimilar;thus,SMTrequiresameasureofrelationalsimilarity,inordertoformmaps.EarlyversionsofSMEonlymappedidenticalrelations,butlaterversionsofSMEallowedsimilar,non-identicalrelationstomatch(Falkenhainer,1990).However,thefocusofresearchinanalogy-makinghasbeenonthemappingprocessasawhole,ratherthanmeasuringthesimilaritybetweenanytwoparticularrelations,hencethesimilaritymeasuresusedinSMEatthelevelofindividualconnectionsaresomewhatrudimentary.
Webelievethatamoresophisticatedmeasureofrelationalsimilarity,suchasLRA,mayenhancetheperformanceofSME.Likewise,thefocusofourworkhereisonthesimilaritybetweenparticularrelations,andweignoresystematicmappingbetweensets
7
ComputationalLinguisticsVolume1,Number1
Metaphoricalsentence
Idemolishedhisargument.Youneedtobudgetyourtime.I’veinvestedalotoftimeinher.Mymindjustisn’toperatingtoday.Lifehascheatedme.
Inflationiseatingupourprofits.SAT-styleverbalanalogy
down::argument:refute
building:demolish::argument:refutemoney:budget::time:schedulemoney:invest::time:allocatemachine:operate::mind:thinkcharlatan:cheat::life:disappointanimal:eat::inflation:reduce
SimilarityofSemanticRelationsTurney
extent.BarkerandSzpakowicz(1998)triedacorpus-basedapproachthatexplicitlyusedameasureofrelationalsimilarity,buttheirmeasurewasbasedonliteralmatching,whichlimiteditsabilitytogeneralize.Moldovanetal.(2004)alsousedameasureofrelationalsimilarity,basedonmappingeachnounandmodifierintosemanticclassesinWordNet.Thenoun-modifierpairsweretakenfromacorpusandthesurroundingcontextinthecorpuswasusedinawordsensedisambiguationalgorithm,toimprovethemappingofthenounandmodifierintoWordNet.TurneyandLittman(2005)usedtheVSM(asacomponentinasinglenearestneighbourlearningalgorithm)tomeasurerelationalsimilarity.Wetakethesameapproachhere,substitutingLRAfortheVSM,inSection7.
Lauer(1995)usedacorpus-basedapproach(usingtheBNC)toparaphrasenoun-modifierpairs,byinsertingtheprepositionsof,for,in,at,on,from,with,andabout.Forexample,reptilehavenwasparaphrasedashavenforreptiles.LapataandKeller(2004)achievedimprovedresultsonthistask,byusingthedatabaseofAltaVista’ssearchen-gineasacorpus.
3.5WordSenseDisambiguation
Webelievethattheintendedsenseofapolysemouswordisdeterminedbyitssemanticrelationswiththeotherwordsinthesurroundingtext.Ifwecanidentifythesemanticrelationsbetweenthegivenwordanditscontext,thenwecandisambiguatethegivenword.Yarowsky’s(1993)observationthatcollocationsarealmostalwaysmonosemousisevidenceforthisview.Federici,Montemagni,andPirrelli(1997)presentananalogy-basedapproachtowordsensedisambiguation.
Forexample,considerthewordplant.Outofcontext,plantcouldrefertoanin-dustrialplantoralivingorganism.Supposeplantappearsinsometextnearfood.Atypicalapproachtodisambiguatingplantwouldcomparetheattributionalsimilarityoffoodandindustrialplanttotheattributionalsimilarityoffoodandlivingorganism(Lesk,1986;BanerjeeandPedersen,2003).Inthiscase,thedecisionmaynotbeclear,sincein-dustrialplantsoftenproducefoodandlivingorganismsoftenserveasfood.Itwouldbeveryhelpfultoknowtherelationbetweenfoodandplantinthisexample.Inthephrase“foodfortheplant”,therelationbetweenfoodandplantstronglysuggeststhattheplantisalivingorganism,sinceindustrialplantsdonotneedfood.Inthetext“foodattheplant”,therelationstronglysuggeststhattheplantisanindustrialplant,sincelivingorganismsarenotusuallyconsideredaslocations.Thusanalgorithmforclassifyingsemanticrelations(asinSection7)shouldbehelpfulforwordsensedisambiguation.3.6InformationExtraction
Theproblemofrelationextractionis,givenaninputdocumentandaspecificrelation,extractallpairsofentities(ifany)thathavetherelationinthedocument.Theprob-lemwasintroducedaspartoftheMessageUnderstandingConferences(MUC)in1998.Zelenko,Aone,andRichardella(2003)presentakernelmethodforextractingtherela-tionsperson-affiliationandorganization-location.Forexample,inthesentence“JohnSmithisthechiefscientistoftheHardcomCorporation,”thereisaperson-affiliationrelationbetween“JohnSmith”and“HardcomCorporation”(Zelenko,Aone,andRichardella,2003).Thisissimilartotheproblemofclassifyingsemanticrelations(Section3.4),ex-ceptthatinformationextractionfocusesontherelationbetweenaspecificpairofentitiesinaspecificdocument,ratherthanageneralpairofwordsingeneraltext.Thereforeanalgorithmforclassifyingsemanticrelationsshouldbeusefulforinformationextraction.
IntheVSMapproachtoclassifyingsemanticrelations(TurneyandLittman,2005),wewouldhaveatrainingsetoflabeledexamplesoftherelationperson-affiliation,forin-stance.Eachexamplewouldberepresentedbyavectorofpatternfrequencies.Givena
9
ComputationalLinguisticsVolume1,Number1
specificdocumentdiscussing“JohnSmith”and“HardcomCorporation”,wecouldcon-structavectorrepresentingtherelationbetweenthesetwoentities,andthenmeasuretherelationalsimilaritybetweenthisunlabeledvectorandeachofourlabeledtrainingvectors.Itwouldseemthatthereisaproblemhere,becausethetrainingvectorswouldberelativelydense,sincetheywouldpresumablybederivedfromalargecorpus,butthenewunlabeledvectorfor“JohnSmith”and“HardcomCorporation”wouldbeverysparse,sincetheseentitiesmightbementionedonlyonceinthegivendocument.How-ever,thisisnotanewproblemfortheVectorSpaceModel;itisthestandardsituationwhentheVSMisusedforinformationretrieval.Aquerytoasearchengineisrep-resentedbyaverysparsevectorwhereasadocumentisrepresentedbyarelativelydensevector.Therearewell-knowntechniquesininformationretrievalforcopingwiththisdisparity,suchasweightingschemesforqueryvectorsthataredifferentfromtheweightingschemesfordocumentvectors(SaltonandBuckley,1988).
3.7QuestionAnswering
Intheirpaperonclassifyingsemanticrelations,Moldovanetal.(2004)suggestthatanimportantapplicationoftheirworkisQuestionAnswering.AsdefinedintheTextRE-trievalConference(TREC)QuestionAnswering(QA)track,thetaskistoanswersimplequestions,suchas“Wherehavenuclearincidentsoccurred?”,byretrievingarelevantdocumentfromalargecorpusandthenextractingashortstringfromthedocument,suchas“TheThreeMileIslandnuclearincidentcausedaDOEpolicycrisis.”Moldovanetal.(2004)proposetomapagivenquestiontoasemanticrelationandthensearchforthatrelationinacorpusofsemanticallytaggedtext.Theyarguethatthedesiredse-manticrelationcaneasilybeinferredfromthesurfaceformofthequestion.Aquestionoftheform“Where...?”islikelytobeseekingforentitieswithalocationrelationandaquestionoftheform“Whatdid...make?”islikelytobelookingforentitieswithaproductrelation.InSection7,weshowhowLRAcanrecognizerelationssuchaslocationandproduct(seeTable19).
3.8AutomaticThesaurusGeneration
Hearst(1992)presentsanalgorithmforlearninghyponym(typeof)relationsfromacor-pusandBerlandandCharniak(1999)describehowtolearnmeronym(partof)relationsfromacorpus.Thesealgorithmscouldbeusedtoautomaticallygenerateathesaurusordictionary,butwewouldliketohandlemorerelationsthanhyponymyandmeronymy.WordNetdistinguishesmorethanadozensemanticrelationsbetweenwords(Fellbaum,1998)andNastaseandSzpakowicz(2003)list30semanticrelationsfornoun-modifierpairs.Hearst(1992)andBerlandandCharniak(1999)usemanuallygeneratedrulestominetextforsemanticrelations.TurneyandLittman(2005)alsouseamanuallygener-atedsetof64patterns.
LRAdoesnotuseapredefinedsetofpatterns;itlearnspatternsfromalargecorpus.Insteadofmanuallygeneratingnewrulesorpatternsforeachnewsemanticrelation,itispossibletoautomaticallylearnameasureofrelationalsimilaritythatcanhandlearbitrarysemanticrelations.Anearestneighbouralgorithmcanthenusethisrelationalsimilaritymeasuretolearntoclassifyaccordingtoanysetofclassesofrelations,giventheappropriatelabeledtrainingdata.
Girju,Badulescu,andMoldovan(2003)presentanalgorithmforlearningmeronymrelationsfromacorpus.LikeHearst(1992)andBerlandandCharniak(1999),theyusemanuallygeneratedrulestominetextfortheirdesiredrelation.However,theysupple-menttheirmanualruleswithautomaticallylearnedconstraints,toincreasetheprecisionoftherules.
10
SimilarityofSemanticRelationsTurney
3.9InformationRetrieval
Veale(2003)hasdevelopedanalgorithmforrecognizingcertaintypesofwordanalo-gies,basedoninformationinWordNet.Heproposestousethealgorithmforana-logicalinformationretrieval.Forexample,thequery“Muslimchurch”shouldreturn“mosque”andthequery“Hindubible”shouldreturn“theVedas”.Thealgorithmwasdesignedwithafocusonanalogiesoftheformadjective:noun::adjective:noun,suchasChristian:church::Muslim:mosque.
Ameasureofrelationalsimilarityisapplicabletothistask.Givenapairofwords,and,thetaskistoreturnanotherpairofwords,and,suchthatthereishighrelationalsimilaritybetweenthepair:andthepair:.Forexample,given=“Muslim”and=“church”,return=“mosque”and=“Christian”.(ThepairMuslim:mosquehasahighrelationalsimilaritytothepairChristian:church.)
Marxetal.(2002)developedanunsupervisedalgorithmfordiscoveringanalogiesbyclusteringwordsfromtwodifferentcorpora.Eachclusterofwordsinonecorpusiscoupledone-to-onewithaclusterintheothercorpus.Forexample,oneexperimentusedacorpusofBuddhistdocumentsandacorpusofChristiandocuments.AclusterofwordssuchasHindu,Mahayana,Zen,...fromtheBuddhistcorpuswascoupledwithaclusterofwordssuchasCatholic,Protestant,...fromtheChristiancorpus.ThusthealgorithmappearstohavediscoveredananalogicalmappingbetweenBuddhistschoolsandtraditionsandChristianschoolsandtraditions.Thisisinterestingwork,butitisnotdirectlyapplicabletoSATanalogies,becauseitdiscoversanalogiesbetweenclustersofwords,ratherthanindividualwords.
3.10IdentifyingSemanticRoles
Asemanticframeforaneventsuchasjudgementcontainssemanticrolessuchasjudge,evaluee,andreason,whereasaneventsuchasstatementcontainsrolessuchasspeaker,addressee,andmessage(GildeaandJurafsky,2002).Thetaskofidentifyingsemanticrolesistolabelthepartsofasentenceaccordingtotheirsemanticroles.Webelievethatitmaybehelpfultoviewsemanticframesandtheirsemanticrolesassetsofsemanticrelations;thusameasureofrelationalsimilarityshouldhelpustoidentifysemanticroles.Moldovanetal.(2004)arguethatsemanticrolesaremerelyaspecialcaseofsemanticrelations(Section3.4),sincesemanticrolesalwaysinvolveverbsorpredicates,butsemanticrelationscaninvolvewordsofanypartofspeech.4.TheVectorSpaceModel
ThissectionexaminespastworkonmeasuringattributionalandrelationalsimilarityusingtheVectorSpaceModel(VSM).
4.1MeasuringAttributionalSimilaritywiththeVectorSpaceModel
TheVSMwasfirstdevelopedforinformationretrieval(SaltonandMcGill,1983;SaltonandBuckley,1988;Salton,1989)anditisatthecoreofmostmodernsearchengines(Baeza-YatesandRibeiro-Neto,1999).IntheVSMapproachtoinformationretrieval,queriesanddocumentsarerepresentedbyvectors.Elementsinthesevectorsarebasedonthefrequenciesofwordsinthecorrespondingqueriesanddocuments.Thefrequen-ciesareusuallytransformedbyvariousformulasandweights,tailoredtoimprovetheeffectivenessofthesearchengine(Salton,1989).Theattributionalsimilaritybetweenaqueryandadocumentismeasuredbythecosineoftheanglebetweentheircorrespond-ingvectors.Foragivenquery,thesearchenginesortsthematchingdocumentsinorderofdecreasingcosine.
TheVSMapproachhasalsobeenusedtomeasuretheattributionalsimilarityof
11
ComputationalLinguisticsVolume1,Number1
words(Lesk,1969;Ruge,1992;PantelandLin,2002).PantelandLin(2002)clusteredwordsaccordingtotheirattributionalsimilarity,asmeasuredbyaVSM.Theiralgo-rithmisabletodiscoverthedifferentsensesofpolysemouswords,usingunsupervisedlearning.
LatentSemanticAnalysisenhancestheVSMapproachtoinformationretrievalbyusingtheSingularValueDecomposition(SVD)tosmooththevectors,whichhelpstohandlenoiseandsparsenessinthedata(Deerwesteretal.,1990;Dumais,1993;Lan-dauerandDumais,1997).SVDimprovesbothdocument-queryattributionalsimilaritymeasures(Deerwesteretal.,1990;Dumais,1993)andword-wordattributionalsimilar-itymeasures(LandauerandDumais,1997).LRAalsousesSVDtosmoothvectors,aswediscussinSection5.
4.2MeasuringRelationalSimilaritywiththeVectorSpaceModelLetbethesemanticrelation(orsetofrelations)betweenapairofwords,and,andletbethesemanticrelation(orsetofrelations)betweenanotherpair,and.Wewishtomeasuretherelationalsimilaritybetweenand.Therelationsandarenotgiventous;ourtaskistoinferthesehidden(latent)relationsandthencomparethem.
IntheVSMapproachtorelationalsimilarity(TurneyandLittman,2005),wecreatevectors,and,thatrepresentfeaturesofand,andthenmeasurethesimilarityofandbythecosineoftheanglebetweenand:
(5)(6)
(7)
Wecreateavector,,tocharacterizetherelationshipbetweentwowords,and,bycountingthefrequenciesofvariousshortphrasescontainingand.TurneyandLittman(2005)usealistof64joiningterms,suchas“of”,“for”,and“to”,toform128phrasesthatcontainand,suchas“of”,“of”,“for”,“for”,“to”,and“to”.Thesephrasesarethenusedasqueriesforasearchengineandthenumberofhits(matchingdocuments)isrecordedforeachquery.Thisprocessyieldsavectorof128numbers.Ifthenumberofhitsforaqueryis,thenthecorrespondingelementinthevectoris.Severalauthorsreportthatthelogarithmictransformationoffrequenciesimprovescosine-basedsimilaritymeasures(SaltonandBuckley,1988;Ruge,1992;Lin,1998b).
TurneyandLittman(2005)evaluatedtheVSMapproachbyitsperformanceon374SATanalogyquestions,achievingascoreof47%.Sincetherearefivechoicesforeachquestion,theexpectedscoreforrandomguessingis20%.Toansweramultiple-choiceanalogyquestion,vectorsarecreatedforthestempairandeachchoicepair,andthencosinesarecalculatedfortheanglesbetweenthestempairandeachchoicepair.Thebestguessisthechoicepairwiththehighestcosine.WeusethesamesetofanalogyquestionstoevaluateLRAinSection6.
TheVSMwasalsoevaluatedbyitsperformanceasadistance(nearness)measureinasupervisednearestneighbourclassifierfornoun-modifiersemanticrelations(Turney
12
SimilarityofSemanticRelationsTurney
andLittman,2005).Theevaluationused600hand-labelednoun-modifierpairsfromNastaseandSzpakowicz(2003).Atestingpairisclassifiedbysearchingforitssinglenearestneighbourinthelabeledtrainingdata.Thebestguessisthelabelforthetrainingpairwiththehighestcosine.LRAisevaluatedwiththesamesetofnoun-modifierpairsinSection7.
TurneyandLittman(2005)usedtheAltaVistasearchenginetoobtainthefrequencyinformationrequiredtobuildvectorsfortheVSM.ThustheircorpuswasthesetofallwebpagesindexedbyAltaVista.Atthetime,theEnglishsubsetofthiscorpusconsistedofaboutwords.AroundApril2004,AltaVistamadesubstantialchangestotheirsearchengine,removingtheiradvancedsearchoperators.Theirsearchenginenolongersupportstheasteriskoperator,whichwasusedbyTurneyandLittman(2005)forstemmingandwild-cardsearching.AltaVistaalsochangedtheirpolicytowardsautomatedsearching,whichisnowforbidden.3
TurneyandLittman(2005)usedAltaVista’shitcount,whichisthenumberofdocu-ments(webpages)matchingagivenquery,butLRAusesthenumberofpassages(strings)matchingaquery.InourexperimentswithLRA(Sections6and7),weusealocalcopyoftheWaterlooMultiTextSystem(Clarke,Cormack,andPalmer,1998;TerraandClarke,2003),runningona16CPUBeowulfCluster,withacorpusofaboutEnglishwords.TheWaterlooMultiTextSystem(WMTS)isadistributed(multiprocessor)searchengine,designedprimarilyforpassageretrieval(althoughdocumentretrievalispossi-ble,asaspecialcaseofpassageretrieval).Thetextandindexrequireapproximatelyoneterabyteofdiskspace.AlthoughAltaVistaonlygivesaroughestimateofthenumberofmatchingdocuments,theWaterlooMultiTextSystemgivesexactcountsofthenumberofmatchingpassages.
Turneyetal.(2003)combine13independentmodulestoanswerSATquestions.TheperformanceofLRAsignificantlysurpassesthiscombinedsystem,butthereisnorealcontestbetweentheseapproaches,becausewecansimplyaddLRAtothecombination,asafourteenthmodule.SincetheVSMmodulehadthebestperformanceofthethirteenmodules(Turneyetal.,2003),thefollowingexperimentsfocusoncomparingVSMandLRA.
5.LatentRelationalAnalysis
LRAtakesasinputasetofwordpairsandproducesasoutputameasureoftherela-tionalsimilaritybetweenanytwooftheinputpairs.LRAreliesonthreeresources,asearchenginewithaverylargecorpusoftext,abroad-coveragethesaurusofsynonyms,andanefficientimplementationofSVD.
Wefirstpresentashortdescriptionofthecorealgorithm.Later,inthefollowingsubsections,wewillgiveadetaileddescriptionofthealgorithm,asitisappliedintheexperimentsinSections6and7.
Givenasetofwordpairsasinput,lookinathesaurusforsynonymsforeachwordineachwordpair.Foreachinputpair,makealternatepairsbyreplacingtheoriginalwordswiththeirsynonyms.Thealternatepairsareintendedtoformnearanalogieswiththecorrespondingoriginalpairs(seeSection2.3).Filteroutalternatepairsthatdonotformnearanalogies,bydroppingalternatepairsthatco-occurrarelyinthecorpus.Intheprecedingstep,ifasynonym
ComputationalLinguisticsVolume1,Number1
replacedanambiguousoriginalword,butthesynonymcapturesthewrongsenseoftheoriginalword,itislikelythatthereisnosignificantrelationbetweenthewordsinthealternatepair,sotheywillrarelyco-occur.
Foreachoriginalandalternatepair,searchinthecorpusforshortphrasesthatbeginwithonememberofthepairandendwiththeother.Thesephraseschar-acterizetherelationbetweenthewordsineachpair.
Foreachphrasefromthepreviousstep,createseveralpatterns,byreplacingwordsinthephrasewithwildcards.
Buildapair-patternfrequencymatrix,inwhicheachcellrepresentsthenumberoftimesthatthecorrespondingpair(row)appearsinthecorpuswiththecor-respondingpattern(column).Thenumberwillusuallybezero,resultinginasparsematrix.
ApplytheSingularValueDecompositiontothematrix.Thisreducesnoiseinthematrixandhelpswithsparsedata.
Supposethatwewishtocalculatetherelationalsimilaritybetweenanytwooftheoriginalpairs.Startbylookingforthetworowvectorsinthepair-patternfrequencymatrixthatcorrespondtothetwooriginalpairs.Calculatetheco-sineoftheanglebetweenthesetworowvectors.Thenmergethecosineofthetwooriginalpairswiththecosinesoftheircorrespondingalternatepairs,asfol-lows.Ifananalogyformedwithalternatepairshasahighercosinethantheoriginalpairs,weassumethatwehavefoundabetterwaytoexpresstheanal-ogy,butwehavenotsignificantlychangeditsmeaning.Ifthecosineislower,weassumethatwemayhavechangedthemeaning,byinappropriatelyreplac-ingwordswithsynonyms.Filteroutinappropriatealternatesbydroppingallanalogiesformedofalternates,suchthatthecosinesarelessthanthecosinefortheoriginalpairs.Therelationalsimilaritybetweenthetwooriginalpairsisthencalculatedastheaverageofalloftheremainingcosines.
Themotivationforthealternatepairsistohandlecaseswheretheoriginalpairsco-occurrarelyinthecorpus.Thehopeisthatwecanfindnearanalogiesfortheoriginalpairs,suchthatthenearanalogiesco-occurmorefrequentlyinthecorpus.Thedangeristhatthealternatesmayhavedifferentrelationsfromtheoriginals.Thefilteringstepsaboveaimtoreducethisrisk.
5.1InputandOutput
Inourexperiments,theinputsetcontainsfrom600to2,244wordpairs.Theoutputsimilaritymeasureisbasedoncosines,sothedegreeofsimilaritycanrangefrom(dissimilar;)to(similar;).BeforeapplyingSVD,thevectorsarecompletelynonnegative,whichimpliesthatthecosinecanonlyrangefromto,butSVDintroducesnegativevalues,soitispossibleforthecosinetobenegative,althoughwehaveneverobservedthisinourexperiments.
5.2SearchEngineandCorpus
Inthefollowingexperiments,weusealocalcopyoftheWaterlooMultiTextSystem(Clarke,Cormack,andPalmer,1998;TerraandClarke,2003).4ThecorpusconsistsofaboutEnglishwords,gatheredbyawebcrawler,mainlyfromUSacademic
SimilarityofSemanticRelationsTurney
websites.Thewebpagescoveraverywiderangeoftopics,styles,genres,quality,andwritingskill.TheWMTSiswellsuitedtoLRA,becausetheWMTSscaleswelltolargecorpora(oneterabyte,inourcase),itgivesexactfrequencycounts(unlikemostwebsearchengines),itisdesignedforpassageretrieval(ratherthandocumentretrieval),andithasapowerfulquerysyntax.
5.3Thesaurus
Asasourceofsynonyms,weuseLin’s(1998a)automaticallygeneratedthesaurus.Thisthesaurusisavailablethroughanonlineinteractivedemonstrationoritcanbedown-loaded.5Weusedtheonlinedemonstration,sincethedownloadableversionseemstocontainfewerwords.Foreachwordintheinputsetofwordpairs,weautomaticallyquerytheonlinedemonstrationandfetchtheresultinglistofsynonyms.AsacourtesytootherusersofLin’sonlinesystem,weinserta20seconddelaybetweeneachquery.
Lin’sthesauruswasgeneratedbyparsingacorpusofaboutEnglishwords,consistingoftextfromtheWallStreetJournal,SanJoseMercury,andAPNewswire(Lin,1998a).Theparserwasusedtoextractpairsofwordsandtheirgrammaticalrelations.Wordswerethenclusteredintosynonymsets,basedonthesimilarityoftheirgrammat-icalrelations.Twowordswerejudgedtobehighlysimilarwhentheytendedtohavethesamekindsofgrammaticalrelationswiththesamesetsofwords.Givenawordanditspartofspeech,Lin’sthesaurusprovidesalistofwords,sortedinorderofdecreasingattributionalsimilarity.ThissortingisconvenientforLRA,sinceitmakesitpossibletofocusonwordswithhigherattributionalsimilarityandignoretherest.WordNet,incontrast,givenawordanditspartofspeech,providesalistofwordsgroupedbythepossiblesensesofthegivenword,withgroupssortedbythefrequenciesofthesenses.WordNet’ssortingdoesnotdirectlycorrespondtosortingbydegreeofattributionalsimilarity,althoughvariousalgorithmshavebeenproposedforderivingattributionalsimilarityfromWordNet(Resnik,1995;JiangandConrath,1997;BudanitskyandHirst,2001;BanerjeeandPedersen,2003).
5.4SingularValueDecomposition
WeuseRohde’sSVDLIBCimplementationoftheSingularValueDecomposition,whichisbasedonSVDPACKC(Berry,1992).6InLRA,SVDisusedtoreducenoiseandcom-pensateforsparseness.
5.5TheAlgorithm
WewillgothrougheachstepofLRA,usinganexampletoillustratethesteps.AssumethattheinputtoLRAisthe374multiple-choiceSATwordanalogyquestionsofTur-neyandLittman(2005).Sincetherearesixwordpairsperquestion(thestemandfivechoices),theinputconsistsof2,244wordpairs.Let’ssupposethatwewishtocalculatetherelationalsimilaritybetweenthepairquart:volumeandthepairmile:distance,takenfromtheSATquestioninTable6.TheLRAalgorithmconsistsofthefollowingtwelvesteps:
1.Findalternates:Foreachwordpair:intheinputset,lookinLin’s(1998a)thesaurusforthetop
is10)thataremostsimilarto.Foreachthatissimilarto,makeanewwordpair:.Likewise,lookforthetop
Theonlinedemonstrationisathttp://www.cs.ualberta.ca/lindek/demos/depsim.htmandthe
downloadableversionisathttp://armena.cs.ualberta.ca/lindek/downloads/sims.lsp.gz.
SVDLIBCisavailableathttp://tedlab.mit.edu/dr/SVDLIBC/andSVDPACKCisavailableat
http://www.netlib.org/svdpack/.
15
ComputationalLinguisticsVolume1,Number1
Stem:quart:volume
Solution:(b)mile:distance
alternatepairs.Whenlookingforsimilarwords
inLin’s(1998a)thesaurus,avoidwordsthatseemunusual(e.g.,hyphenatedwords,wordswiththreecharactersorless,wordswithnon-alphabeticalchar-acters,multi-wordphrases,andcapitalizedwords).ThefirstcolumninTable7showsthealternatepairsthataregeneratedfortheoriginalpairquart:volume.
2.Filteralternates:Foreachoriginalpair
:,filterthe
words(weuse
(weuse
mostfrequentalternatesanddiscardtheremainder
words.ThelastcolumninTable7showsthepairsthatareselected.
3.Findphrases:Foreachpair(originalsandalternates),makealistofphrasesinthecorpusthatcontainthepair.QuerytheWMTSforallphrasesthatbeginwithonememberofthepairandendwiththeother(ineitherorder).Weig-noresuffixeswhensearchingforphrasesthatmatchagivenpair.Thephrasescannothavemorethan
,soa
phrasegeneratesatmosteightpatterns.)Foreachpattern,countthenumber
16
SimilarityofSemanticRelationsTurney
Wordpairpint:volumegallon:volumeliter:volumesquirt:volumepail:volumevial:volume
pumping:volumeounce:volumespoonful:volumetablespoon:volume
Similarity0.2100.1590.1220.0840.0840.0840.0730.0710.0700.069
Frequency
37215003323542837313864304296
Filteringstep
accept(topalternate)accept(topalternate)
accept(topalternate)
wordscanappearinaphrase.
17
ComputationalLinguisticsVolume1,Number1
=“in”
=“*of”
=“of*”
=“**”
ofpairs(originalsandalternates)withphrasesthatmatchthepattern(awildcardmustmatchexactlyoneword).Keepthetop
).Typicallythere
willbemillionsofpatterns,soitisnotfeasibletokeepthemall.
5.Mappairstorows:Inpreparationforbuildingthematrix,createamappingofwordpairstorownumbers.Foreachpair:,createarowfor:andanotherrowfor:.Thiswillmakethematrixmoresymmetrical,reflectingourknowledgethattherelationalsimilaritybetween:and:shouldbethesameastherelationalsimilaritybetween:and:.ThisduplicationofrowsisexaminedinSection6.6.
6.Mappatternstocolumns:Createamappingofthetopcolumnsin
.ThisduplicationofcolumnsisexaminedinSection6.6.
7.Generateasparsematrix:Generateamatrixinsparsematrixformat,suit-ableforinputtoSVDLIBC.Thevalueforthecellinrowandcolumnisthefrequencyofthe-thpattern(seestep6)inphrasesthatcontainthe-thwordpair(seestep5).Table9givessomeexamplesofpatternfrequenciesforquart:volume.
8.Calculateentropy:Applylogandentropytransformationstothesparsematrix(LandauerandDumais,1997).Thesetransformationshavebeenfoundtobeveryhelpfulforinformationretrieval(Harman,1986;Dumais,1990).Letbethecellinrowandcolumnofthematrixfromstep7.Letbethenumberofrowsinandletbethenumberofcolumns.Wewishtoweightthecellbytheentropyofthe-thcolumn.Tocalculatetheentropyofthecolumn,weneedtoconvertthecolumnintoavectorofprobabilities.Letbetheprobabilityof,calculatedbynormalizingthecolumnvectorsothatthesumoftheelementsisone,.Theentropyofthe-thcolumnisthen.Entropyisatitsmaximumwhen
isauniformdistribution,,inwhichcase.Entropyisatitsminimumwhenis1forsomevalueofand0forallothervaluesof,inwhichcase.Wewanttogivemoreweighttocolumns(pat-terns)withfrequenciesthatvarysubstantiallyfromonerow(wordpair)tothenext,andlessweighttocolumnsthatareuniform.Thereforeweweightthecellby,whichvariesfrom0whenisuniformto1whenentropyisminimal.Wealsoapplythelogtransformationtofrequen-cies,.(Entropyiscalculatedwiththeoriginalfrequencyvalues,beforethelogtransformationisapplied.)Forallandall,replacetheoriginal
18
SimilarityofSemanticRelationsTurney
valueinbythenewvalue.ThisisaninstanceoftheTF-IDF(TermFrequency-InverseDocumentFrequency)familyoftransformations,whichisfamiliarininformationretrieval(SaltonandBuckley,1988;Baeza-YatesandRibeiro-Neto,1999):istheTFtermandistheIDFterm.
9.ApplySVD:Afterthelogandentropytransformationshavebeenappliedtothematrix,runSVDLIBC.SVDdecomposesamatrixintoaproductofthreematrices,whereandareincolumnorthonormalform(i.e.,thecolumnsareorthogonalandhaveunitlength:)andisadiagonalmatrixofsingularvalues(henceSVD)(GolubandVanLoan,1996).Ifisofrank,thenisalsoofrank.Let,where,bethediagonalmatrixformedfromthetopsingularvalues,andletand
bethematricesproducedbyselectingthecorrespondingcolumnsfromand.Thematrixisthematrixofrankthatbestapproximatestheoriginalmatrix,inthesensethatitminimizestheapproximationerrors.
Thatis,
minimizes
overallmatrices
ofrank,
wheredenotestheFrobeniusnorm(GolubandVanLoan,1996).Wemaythinkofthismatrixasa“smoothed”or“compressed”versionoftheoriginalmatrix.Inthesubsequentsteps,wewillbecalculatingcosinesforrowvectors.Forthispurpose,wecansimplifycalculationsbydropping.Thecosineoftwovectorsistheirdotproduct,aftertheyhavebeennormalizedtounitlength.Thematrixcontainsthedotproductsofalloftherowvectors.Wecanfindthedotproductofthe-thand-throwvectorsbylookingatthecellinrow,columnofthematrix.Since,wehave
,whichmeansthatwecan
calculatecosineswiththesmallermatrix,insteadofusing(Deerwesteretal.,1990).
10.Projection:Calculate
ofrowsas,butonly
(weuse).Thismatrixhasthesamenumbercolumns(insteadof
versionsof
Thereforewehave
:,theoriginalandversionsof:.
cosines(inourexperiments,thereare16cosines).Forexample,suppose:
isquart:volumeand:ismile:distance.Table10givesthecosinesforthesixteencombinations.
12.Calculaterelationalsimilarity:Therelationalsimilaritybetween:
istheaverageofthecosines,amongthe
and
:
ComputationalLinguisticsVolume1,Number1
WordpairsCosineCosine
originalpairs
1andmayhaveslippedthroughthefilteringinstep2.Averagingthecosines,asopposedtotakingtheirmaximum,isintendedtoprovidesomeresistancetonoise.Forquart:volumeandmile:distance,thethirdcolumninTable10showswhichalternatesareusedtocalculatetheaverage.Forthesetwopairs,theaverageoftheselectedcosinesis0.677.InTable7,weseethatpumping:volumehasslippedthroughthefilteringinstep2,althoughitisnotagoodalternateforquart:volume.However,Table10showsthatallfouranalogiesthatinvolvepumping:volumearedroppedhere,instep12.
Steps11and12canberepeatedforeachtwoinputpairsthataretobecompared.ThiscompletesthedescriptionofLRA.
Table11givesthecosinesforthesampleSATquestion.Thechoicepairwiththehighestaveragecosine(thechoicewiththelargestvalueincolumn#1),choice(b),isthesolutionforthisquestion;LRAanswersthequestioncorrectly.Forcomparison,column#2givesthecosinesfortheoriginalpairsandcolumn#3givesthehighestcosine.ForthisparticularSATquestion,thereisonechoicethathasthehighestcosineforallthreecolumns,choice(b),althoughthisisnottrueingeneral.Notethatthegapbetweenthefirstchoice(b)andthesecondchoice(d)islargestfortheaveragecosines(column#1).Thissuggeststhattheaverageofthecosines(column#1)isbetteratdiscriminatingthecorrectchoicethaneithertheoriginalcosine(column#2)orthehighestcosine(column#3).
6.ExperimentswithWordAnalogyQuestions
Thissectionpresentsvariousexperimentswith374multiple-choiceSATwordanalogyquestions.
20
SimilarityofSemanticRelationsTurney
Stem:quart:volume
Averagecosines#1Originalcosines#2Highestcosines#3
Solution:(b)mile:distance0.6770.5250.781
Algorithm
Veale(2004)
bestattributionalsimilarityrandomguessing
lowestco-occurrencefrequencyhighestco-occurrencefrequency
Precision42.835.020.016.811.8
Recall42.835.020.016.811.8
42.835.020.016.811.8
ComputationalLinguisticsVolume1,Number1
StepDescriptionTimeH:M:SHardware
Total209:49:36
SimilarityofSemanticRelationsTurney
AlgorithmCorrectIncorrectSkippedPrecisionRecall
ComparingVSM-AVtoVSM-WMTS,thesmallercorpushasreducedthescoreoftheVSM,butmuchofthedropisduetothelargernumberofquestionsthatwereskipped(34forVSM-WMTSversus5forVSM-AV).Withthesmallercorpus,manymoreoftheinputwordpairssimplydonotappeartogetherinshortphrasesinthecorpus.LRAisabletoanswerasmanyquestionsasVSM-AV,althoughitusesthesamecorpusasVSM-WMTS,becauseLin’sthesaurusallowsLRAtosubstitutesynonymsforwordsthatarenotinthecorpus.
VSM-AVrequired17daystoprocessthe374analogyquestions(TurneyandLittman,2005),comparedto9daysforLRA.AsacourtesytoAltaVista,TurneyandLittman(2005)insertedafiveseconddelaybetweeneachquery.SincetheWMTSisrunninglocally,thereisnoneedfordelays.VSM-WMTSprocessedthequestionsinonlyoneday.
6.3HumanPerformance
Theaverageperformanceofcollege-boundseniorhighschoolstudentsonverbalSATquestionscorrespondstoarecall(percentcorrect)ofabout57%(TurneyandLittman,2005).TheSATItestconsistsof78verbalquestionsand60mathquestions(thereisalsoanSATIItest,coveringspecificsubjects,suchaschemistry).Analogyquestionsareonlyasubsetofthe78verbalSATquestions.Ifweassumethatthedifficultyofour374analogyquestionsiscomparabletothedifficultyofthe78verbalSATIquestions,thenwecanestimatethattheaveragecollege-boundseniorwouldcorrectlyanswerabout57%ofthe374analogyquestions.
Ofour374SATquestions,190arefromacollectionoftenofficialSATtests(Claman,2000).Onthissubsetofthequestions,LRAhasarecallof61.1%,comparedtoarecallof51.1%ontheother184questions.The184questionsthatarenotfromClaman(2000)seemtobemoredifficult.ThisindicatesthatwemaybeunderestimatinghowwellLRAperforms,relativetocollege-boundseniorhighschoolstudents.Claman(2000)suggeststhattheanalogyquestionsmaybesomewhatharderthanotherverbalSATquestions,sowemaybeslightlyoverestimatingthemeanhumanscoreontheanalogyquestions.
Table15givesthe95%confidenceintervalsforLRA,VSM-AV,andVSM-WMTS,calculatedbytheBinomialExactTest(Agresti,1990).ThereisnosignificantdifferencebetweenLRAandhumanperformance,butVSM-AVandVSM-WMTSaresignificantlybelowhuman-levelperformance.
6.4VaryingtheParametersinLRA
ThereareseveralparametersintheLRAalgorithm(seeSection5.5).Theparametervaluesweredeterminedbytryingasmallnumberofpossiblevaluesonasmallsetofquestionsthatweresetaside.SinceLRAisintendedtobeanunsupervisedlearningalgorithm,wedidnotattempttotunetheparametervaluestomaximizetheprecisionandrecallonthe374SATquestions.WehypothesizedthatLRAisrelativelyinsensitivetothevaluesoftheparameters.
23
ComputationalLinguisticsVolume1,Number1
System
Recall(%correct)95%confidenceintervalforrecallHuman-level
(57%)
Table16showsthevariationintheperformanceofLRAastheparametervaluesareadjusted.Wetakethebaselineparametersettings(giveninSection5.5)andvaryeachparameter,oneatatime,whileholdingtheremainingparametersfixedattheirbaselinevalues.Noneoftheprecisionandrecallvaluesaresignificantlydifferentfromthebaseline,accordingtotheFisherExactTest(Agresti,1990),atthe95%confidencelevel.Thissupportsthehypothesisthatthealgorithmisnotsensitivetotheparametervalues.
AlthoughafullrunofLRAonthe374SATquestionstakesninedays,forsomeoftheparametersitispossibletoreusecacheddatafrompreviousruns.Welimitedtheexperimentswithbecausecachingwasnotashelpfulfortheseparameters,soexperimentingwiththemrequiredseveralweeks.
6.5AblationExperiments
Asmentionedintheintroduction,LRAextendstheVSMapproachofTurneyandLittman(2005)by(1)exploringvariationsontheanalogiesbyreplacingwordswithsynonyms(step1),(2)automaticallygeneratingconnectingpatterns(step4),and(3)smoothingthedatawithSVD(step9).Inthissubsection,weablateeachofthesethreecomponentstoassesstheircontributiontotheperformanceofLRA.Table17showstheresults.
WithoutSVD(comparecolumn#1to#2inTable17),performancedrops,butthedropisnotstatisticallysignificantwith95%confidence,accordingtotheFisherExactTest(Agresti,1990).However,wehypothesizethatthedropinperformancewouldbesignificantwithalargersetofwordpairs.Morewordpairswouldincreasethesamplesize,whichwoulddecreasethe95%confidenceinterval,whichwouldlikelyshowthatSVDismakingasignificantcontribution.Furthermore,morewordpairswouldincreasethematrixsize,whichwouldgiveSVDmoreleverage.Forexample,LandauerandDumais(1997)applySVDtoamatrixofof30,473columnsby60,768rows,butourmatrixhereis8,000columnsby17,232rows.WearecurrentlygatheringmoreSATquestions,totestthishypothesis.
Withoutsynonyms(comparecolumn#1to#3inTable17),recalldropssignificantly(from56.1%to49.5%),butthedropinprecisionisnotsignificant.Whenthesynonymcomponentisdropped,thenumberofskippedquestionsrisesfrom4to22,whichdemonstratesthevalueofthesynonymcomponentofLRAforcompensatingforsparsedata.
WhenbothSVDandsynonymsaredropped(comparecolumn#1to#4inTable17),thedecreaseinrecallissignificant,butthedecreaseinprecisionisnotsignificant.Again,webelievethatalargersamplesizewouldshowthedropinprecisionissignificant.
IfweeliminatebothsynonymsandSVDfromLRA,allthatdistinguishesLRAfrom
24
SimilarityofSemanticRelationsTurney
Parameter
Baseline
Value515461
Step11222224444
Precision54.254.155.856.254.356.854.355.958.457.058.1
Recall53.553.555.155.653.756.153.755.357.856.457.5
53.853.855.555.954.056.554.055.658.156.757.8
351000300050007000
LRAbaselinesystem#1
LRAnoSVD#2LRAnosynonyms
#3
LRAnoSVDnosynonyms
#4
VSM-WMTS
#5
25
ComputationalLinguisticsVolume1,Number1
VSM-WMTSisthepatterns(step4).TheVSMapproachusesafixedlistof64patternstogenerate128dimensionalvectors(TurneyandLittman,2005),whereasLRAusesadynamicallygeneratedsetof4,000patterns,resultingin8,000dimensionalvectors.WecanseethevalueoftheautomaticallygeneratedpatternsbycomparingLRAwithoutsynonymsandSVD(column#4)toVSM-WMTS(column#5).Thedifferenceinbothpre-cisionandrecallisstatisticallysignificantwith95%confidence,accordingtotheFisherExactTest(Agresti,1990).
Theablationexperimentssupportthevalueofthepatterns(step4)andsynonyms(step1)inLRA,butthecontributionofSVD(step9)hasnotbeenproven,althoughwebelievemoredatawillsupportitseffectiveness.Nonetheless,thethreecomponentstogetherresultina16%increasein(compare#1to#5).
6.6MatrixSymmetry
Weknowapriorithat,ifA:B::C:D,thenB:A::D:C.Forexample,“masonistostoneascarpenteristowood”implies“stoneistomasonaswoodistocarpenter”.Thereforeagoodmeasureofrelationalsimilarity,,shouldobeythefollowingequation:
(8)
Insteps5and6oftheLRAalgorithm(Section5.5),weensurethatthematrixissymmetrical,sothatequation(8)isnecessarilytrueforLRA.ThematrixisdesignedsothattherowvectorforA:BisdifferentfromtherowvectorforB:Aonlybyapermutationoftheelements.ThesamepermutationdistinguishestherowvectorsforC:DandD:C.ThereforethecosineoftheanglebetweenA:BandC:DmustbeidenticaltothecosineoftheanglebetweenB:AandD:C(seeequation(7)).
Todiscovertheconsequencesofthisdesigndecision,wealteredsteps5and6sothatsymmetryisnolongerpreserved.Instep5,foreachwordpairA:Bthatappearsintheinputset,weonlyhaveonerow.ThereisnorowforB:AunlessB:Aalsoappearsintheinputset.Thusthenumberofrowsinthematrixdroppedfrom17,232to8,616.
Instep6,wenolongerhavetwocolumnsforeachpattern,onefor“”andanotherfor“”.However,tobefair,wekeptthetotalnumberofcolumnsat8,000.Instep4,weselectedthetop8,000patterns(insteadofthetop4,000),distinguishingthepattern“”fromthepattern“”(in-steadofconsideringthemequivalent).Thusapatternwithahighfrequencyislikelytoappearintwocolumns,inbothpossibleorders,butalowerfrequencypatternmightappearinonlyonecolumn,inonlyonepossibleorder.
Thesechangesresultedinaslightdecreaseinperformance.Recalldroppedfrom56.1%to55.3%andprecisiondroppedfrom56.8%to55.9%.Thedecreaseisnotsta-tisticallysignificant.However,themodifiedalgorithmnolongerobeysequation(8).AlthoughdroppingsymmetryappearstocausenosignificantharmtotheperformanceofthealgorithmontheSATquestions,weprefertoretainsymmetry,toensurethatequation(8)issatisfied.
Notethat,ifA:B::C:D,itdoesnotfollowthatB:A::C:D.Forexample,itisfalsethat“stoneistomasonascarpenteristowood”.Ingeneral(exceptwhenthesemanticrelationsbetweenandaresymmetrical),wehavethefollowinginequality:
(9)
ThereforewedonotwantA:BandB:Atoberepresentedbyidenticalrowvectors,al-thoughitwouldensurethatequation(8)issatisfied.
26
SimilarityofSemanticRelationsTurney
CorrectIncorrectSkippedPrecisionRecall
6.7AllAlternatesversusBetterAlternates
Instep12ofLRA,therelationalsimilaritybetweencosines,amongthe
:
and
:
istheaverageofthe
ComputationalLinguisticsVolume1,Number1
vectorsisnottrivial.
6.9ManualPatternsversusAutomaticPatterns
TurneyandLittman(2005)used64manuallygeneratedpatternswhereasLRAuses4,000automaticallygeneratedpatterns.WeknowfromSection6.5thattheautomati-callygeneratedpatternsaresignificantlybetterthanthemanuallygeneratedpatterns.Itmaybeinterestingtoseehowmanyofthemanuallygeneratedpatternsappearwithintheautomaticallygeneratedpatterns.Ifwerequireanexactmatch,50ofthe64manualpatternscanbefoundintheautomaticpatterns.Ifwearelenientaboutwildcards,andcountthepattern“notthe”asmatching“*notthe”(forexample),then60ofthe64man-ualpatternsappearwithintheautomaticpatterns.Thissuggeststhattheimprovementinperformancewiththeautomaticpatternsisduetotheincreasedquantityofpatterns,ratherthanaqualitativedifferenceinthepatterns.
TurneyandLittman(2005)pointoutthatsomeoftheir64patternshavebeenusedbyotherresearchers.Forexample,Hearst(1992)usedthepattern“suchas”todis-coverhyponymsandBerlandandCharniak(1999)usedthepattern“ofthe”todiscovermeronyms.Bothofthesepatternsareincludedinthe4,000patternsautomaticallygen-eratedbyLRA.
ThenoveltyinTurneyandLittman(2005)isthattheirpatternsarenotusedtominetextforinstancesofwordpairsthatfitthepatterns(Hearst,1992;BerlandandCharniak,1999);instead,theyareusedtogatherfrequencydataforbuildingvectorsthatrepresenttherelationbetweenagivenpairofwords.TheresultsinSection6.8showthatavectorcontainsmoreinformationthananysinglepatternorsmallsetofpatterns;avectorisadistributedrepresentation.LRAisdistinctfromHearst(1992)andBerlandandChar-niak(1999)initsfocusondistributedrepresentations,whichitshareswithTurneyandLittman(2005),butLRAgoesbeyondTurneyandLittman(2005)byfindingpatternsautomatically.
RiloffandJones(1999)andYangarber(2003)alsofindpatternsautomatically,buttheirgoalistominetextforinstancesofwordpairs;thesamegoalasHearst(1992)andBerlandandCharniak(1999).BecauseLRAusespatternstobuilddistributedvectorrepresentations,itcanexploitpatternsthatwouldbemuchtoonoisyandunreliableforthekindoftextmininginstanceextractionthatistheobjectiveofHearst(1992),BerlandandCharniak(1999),RiloffandJones(1999),andYangarber(2003).ThereforeLRAcansimplyselectthehighestfrequencypatterns(step4inSection5.5);itdoesnotneedthemoresophisticatedselectionalgorithmsofRiloffandJones(1999)andYangarber(2003).7.ExperimentswithNoun-ModifierRelations
Thissectiondescribesexperimentswith600noun-modifierpairs,hand-labeledwith30classesofsemanticrelations(NastaseandSzpakowicz,2003).Inthefollowingexper-iments,LRAisusedwiththebaselineparametervalues,exactlyasdescribedinSec-tion5.5.NoadjustmentsweremadetotuneLRAtothenoun-modifierpairs.LRAisusedasadistance(nearness)measureinasinglenearestneighboursupervisedlearningalgorithm.
7.1ClassesofRelations
Thefollowingexperimentsusethe600labelednoun-modifierpairsofNastaseandSz-pakowicz(2003).ThisdatasetincludesinformationaboutthepartofspeechandWord-Netsynset(synonymset;i.e.,wordsensetag)ofeachword,butouralgorithmdoesnotusethisinformation.
Table19liststhe30classesofsemanticrelations.ThetableisbasedonAppendixA
28
SimilarityofSemanticRelationsTurney
ofNastaseandSzpakowicz(2003),withsomesimplifications.Theoriginaltablelistedseveralsemanticrelationsforwhichtherewerenoinstancesinthedataset.Thesewererelationsthataretypicallyexpressedwithlongerphrases(threeormorewords),ratherthannoun-modifierwordpairs.Forclarity,wedecidednottoincludetheserelationsinTable19.
Inthistable,representstheheadnounandrepresentsthemodifier.Forex-ample,in“fluvirus”,theheadnoun()is“virus”andthemodifier()is“flu”(*).InEnglish,themodifier(typicallyanounoradjective)usuallyprecedestheheadnoun.Inthedescriptionofpurpose,representsanarbitraryverb.In“concerthall”,thehallisforpresentingconcerts(is“present”)orholdingconcerts(is“hold”)().
NastaseandSzpakowicz(2003)organizedtherelationsintogroups.Thefivecap-italizedtermsinthe“Relation”columnofTable19arethenamesoffivegroupsofsemanticrelations.(Theoriginaltablehadasixthgroup,buttherearenoexamplesofthisgroupinthedataset.)Wemakeuseofthisgroupinginthefollowingexperiments.
7.2BaselineLRAwithSingleNearestNeighbour
Thefollowingexperimentsusesinglenearestneighbourclassificationwithleave-one-outcross-validation.Forleave-one-outcross-validation,thetestingsetconsistsofasin-glenoun-modifierpairandthetrainingsetconsistsofthe599remainingnoun-modifiers.Thedatasetissplit600times,sothateachnoun-modifiergetsaturnasthetestingwordpair.Thepredictedclassofthetestingpairistheclassofthesinglenearestneighbourinthetrainingset.Asthemeasureofnearness,weuseLRAtocalculatetherelationalsimilaritybetweenthetestingpairandthetrainingpairs.Thesinglenearestneighbouralgorithmisasupervisedlearningalgorithm(i.e.,itrequiresatrainingsetoflabeleddata),butweareusingLRAtomeasurethedistancebetweenapairanditspotentialneighbours,andLRAisitselfdeterminedinanunsupervisedfashion(i.e.,LRAdoesnotneedlabeleddata).
EachSATquestionhasfivechoices,soanswering374SATquestionsrequiredcal-culatingcosines.Thefactorof16comesfromthealternatepairs,step11inLRA.Withthenoun-modifierpairs,usingleave-one-outcross-validation,eachtestpairhas599choices,soanexhaustiveapplicationofLRAwouldrequirecalculat-ingcosines.Toreducetheamountofcomputationre-quired,wefirstfindthe30nearestneighboursforeachpair,ignoringthealternatepairs(cosines),andthenapplythefullLRA,includingthealternates,tojustthose30neighbours(cosines),whichrequirescalculatingonlycosines.
Thereare600wordpairsintheinputsetforLRA.Instep2,introducingalternatepairsmultipliesthenumberofpairsbyfour,resultingin2,400pairs.Instep5,foreachpair:,weadd:,yielding4,800pairs.However,somepairsaredroppedbecausetheycorrespondtozerovectorsandafewwordsdonotappearinLin’sthesaurus.Thesparsematrix(step7)has4,748rowsand8,000columns,withadensityof8.4%.
FollowingTurneyandLittman(2005),weevaluatetheperformancebyaccuracyandalsobythemacroaveragedmeasure(Lewis,1991).Macroaveragingcalculatestheprecision,recall,andforeachclassseparately,andthencalculatestheaverageacrossallclasses.Microaveragingcombinesthetruepositive,falsepositive,andfalsenegativecountsforalloftheclasses,andthencalculatesprecision,recall,andfromthecombinedcounts.Macroaveraginggivesequalweighttoallclasses,butmicroaveraginggivesmoreweighttolargerclasses.Weusemacroaveraging(givingequalweighttoallclasses),becausewehavenoreasontobelievethattheclasssizesinthedatasetreflecttheactualdistributionoftheclassesinarealcorpus.
Classificationwith30distinctclassesisahardproblem.Tomakethetaskeasier,we
29
ComputationalLinguisticsVolume1,Number1
Relationcauseeffectpurposedetraction
Abbr.cseffprpdetr
Examplephrasefluvirus(*)examanxietyconcerthall()
Description
headachepill
makesoccurorexist,isnecessaryandsufficientmakesoccurorexist,isnecessaryandsufficientisforV-ing,doesnotnecessarilyoccurorexist
opposes,isnotsufficienttoprevent
frequencytimeat
timethroughfreqtattthrdailyexercisemorningexercisesix-hourmeeting
occurseverytimeoccursoccurswhenoccursexistedwhileexisted,isanintervaloftime
directionlocationlocationatlocationfromagent
beneficiaryinstrumentobject
objectproperty
dirloclatlfragbeninstobjobj
outgoingmailhometowndesertstormforeigncapitalstudentproteststudentdiscountlaserprintermetalseparator
isdirectedtowardsnotthefinalpointisthelocationofislocatedatoriginatesat
,
is
performs,isanimateornaturalphenomenonbenefitsfromuses
isacteduponby
QUALITY
30
SimilarityofSemanticRelationsTurney
VSM-AVVSM-WMTSLRA
VSM-AVVSM-WMTSLRA
cancollapsethe30classesto5classes,usingthegroupingthatisgiveninTable19.Forexample,agentandbeneficiarybothcollapsetoparticipant.Onthe30classproblem,LRAwiththesinglenearestneighbouralgorithmachievesanaccuracyof39.8%(239/600)andamacroaveragedof36.6%.Alwaysguessingthemajorityclasswouldresultinanaccuracyof8.2%().Onthe5classproblem,theaccuracyis58.0%(348/600)andthemacroaveragedis54.6%.Alwaysguessingthemajorityclasswouldgiveanaccuracyof43.3%().Forboththe30classand5classproblems,LRA’saccuracyissignificantlyhigherthanguessingthemajorityclass,with95%confidence,accordingtotheFisherExactTest(Agresti,1990).
7.3LRAversusVSM
Table20showstheperformanceofLRAandVSMonthe30classproblem.VSM-AVisVSMwiththeAltaVistacorpusandVSM-WMTSisVSMwiththeWMTScorpus.TheresultsforVSM-AVaretakenfromTurneyandLittman(2005).Allthreepairwisediffer-encesinthethreemeasuresarestatisticallysignificantatthe95%level,accordingtothePairedT-Test(FeeldersandVerkooijen,1995).TheaccuracyofLRAissignificantlyhigherthantheaccuraciesofVSM-AVandVSM-WMTS,accordingtotheFisherExactTest(Agresti,1990),butthedifferencebetweenthetwoVSMaccuraciesisnotsignifi-cant.
Table21comparestheperformanceofLRAandVSMonthe5classproblem.TheaccuracyandmeasureofLRAaresignificantlyhigherthantheaccuraciesandmea-suresofVSM-AVandVSM-WMTS,butthedifferencesbetweenthetwoVSMaccuraciesandmeasuresarenotsignificant.
31
ComputationalLinguisticsVolume1,Number1
8.Discussion
TheexperimentalresultsinSections6and7demonstratethatLRAperformssignif-icantlybetterthantheVSM,butitisalsoclearthatthereisroomforimprovement.Theaccuracymightnotyetbeadequateforpracticalapplications,althoughpastworkhasshownthatitispossibletoadjustthetradeoffofprecisionversusrecall(TurneyandLittman,2005).Forsomeoftheapplications,suchasinformationextraction,LRAmightbesuitableifitisadjustedforhighprecision,attheexpenseoflowrecall.
Anotherlimitationisspeed;ittookalmostninedaysforLRAtoanswer374analogyquestions.However,withprogressincomputerhardware,speedwillgraduallybecomelessofaconcern.Also,thesoftwarehasnotbeenoptimizedforspeed;thereareseveralplaceswheretheefficiencycouldbeincreasedandmanyoperationsareparallelizable.ItmayalsobepossibletoprecomputemuchoftheinformationforLRA,althoughthiswouldrequiresubstantialchangestothealgorithm.
ThedifferenceinperformancebetweenVSM-AVandVSM-WMTSshowsthatVSMissensitivetothesizeofthecorpus.AlthoughLRAisabletosurpassVSM-AVwhentheWMTScorpusisonlyaboutonetenththesizeoftheAVcorpus,itseemslikelythatLRAwouldperformbetterwithalargercorpus.TheWMTScorpusrequiresoneterabyteofharddiskspace,butprogressinhardwarewilllikelymaketenorevenonehundredterabytesaffordableintherelativelynearfuture.
Fornoun-modifierclassification,morelabeleddatashouldyieldperformanceim-provements.With600noun-modifierpairsand30classes,theaverageclasshasonly20examples.Weexpectthattheaccuracywouldimprovesubstantiallywithfiveortentimesmoreexamples.Unfortunately,itistimeconsumingandexpensivetoacquirehand-labeleddata.
Anotherissuewithnoun-modifierclassificationisthechoiceofclassificationschemeforthesemanticrelations.The30classesofNastaseandSzpakowicz(2003)mightnotbethebestscheme.Otherresearchershaveproposeddifferentschemes(Vanderwende,1994;BarkerandSzpakowicz,1998;RosarioandHearst,2001;Rosario,Hearst,andFill-more,2002).Itseemslikelythatsomeschemesareeasierformachinelearningthanothers.Forsomeapplications,30classesmaynotbenecessary;the5classschememaybesufficient.
LRA,likeVSM,isacorpus-basedapproachtomeasuringrelationalsimilarity.Pastworksuggeststhatahybridapproach,combiningmultiplemodules,somecorpus-based,somelexicon-based,willsurpassanypurebredapproach(Turneyetal.,2003).Infuturework,itwouldbenaturaltocombinethecorpus-basedapproachofLRAwiththelexicon-basedapproachofVeale(2004),perhapsusingthecombinationmethodofTurneyetal.(2003).
TheSingularValueDecompositionisonlyoneofmanymethodsforhandlingsparse,noisydata.WehavealsoexperimentedwithNonnegativeMatrixFactorization(NMF)(LeeandSeung,1999),ProbabilisticLatentSemanticAnalysis(PLSA)(Hofmann,1999),KernelPrincipalComponentsAnalysis(KPCA)(Scholkopf,Smola,andMuller,1997),andIterativeScaling(IS)(Ando,2000).Wehadsomeinterestingresultswithsmallmatrices(around2,000rowsby1,000columns),butnoneofthesemethodsseemedsub-stantiallybetterthanSVDandnoneofthemscaleduptothematrixsizesweareusinghere(e.g.,17,232rowsand8,000columns;seeSection6.1).
Instep4ofLRA,wesimplyselectthetop
SimilarityofSemanticRelationsTurney
method,butitispossiblethatfutureworkwillfindamethodthatyieldssignificantimprovementinperformance.9.Conclusion
Thispaperhasintroducedanewmethodforcalculatingrelationalsimilarity,LatentRelationalAnalysis.TheexperimentsdemonstratethatLRAperformsbetterthantheVSMapproach,whenevaluatedwithSATwordanalogyquestionsandwiththetaskofclassifyingnoun-modifierexpressions.TheVSMapproachrepresentstherelationbe-tweenapairofwordswithavector,inwhichtheelementsarebasedonthefrequenciesof64hand-builtpatternsinalargecorpus.LRAextendsthisapproachinthreeways:(1)thepatternsaregenerateddynamicallyfromthecorpus,(2)SVDisusedtosmooththedata,and(3)athesaurusisusedtoexplorevariationsofthewordpairs.WiththeWMTScorpus(aboutEnglishwords),LRAachievesanof56.5%,whereastheofVSMis40.3%.
Wehavepresentedseveralexamplesofthemanypotentialapplicationsformea-suresofrelationalsimilarity.Justasattributionalsimilaritymeasureshaveproventohavemanypracticaluses,weexpectthatrelationalsimilaritymeasureswillsoonbe-comewidelyused.Gentneretal.(2001)arguethatrelationalsimilarityisessentialtounderstandingnovelmetaphors(asopposedtoconventionalmetaphors).Manyre-searchershavearguedthatmetaphoristheheartofhumanthinking(LakoffandJohn-son,1980;HofstadterandtheFluidAnalogiesResearchGroup,1995;Gentneretal.,2001;French,2002).Webelievethatrelationalsimilarityplaysafundamentalroleinthemindandthereforerelationalsimilaritymeasurescouldbecrucialforartificialintelli-gence.
Infuturework,weplantoinvestigatesomepotentialapplicationsforLRA.Itispos-siblethattheerrorrateofLRAisstilltoohighforpracticalapplications,butthefactthatLRAmatchesaveragehumanperformanceonSATanalogyquestionsisencouraging.
33
ComputationalLinguisticsAcknowledgments
Volume1,Number1
ThankstoMichaelLittmanforsharingthe374SATanalogyquestionsandforinspiringmetotacklethem.ThankstoViviNastaseandStanSzpakowiczforsharingtheir600classifiednoun-modifierphrases.ThankstoEgidioTerra,Char-Berland,MatthewandEugeneCharniak.
1999.Findingpartsinverylargecor-lieClarke,andtheSchoolofComputerScience
pora.InProceedingsofthe37thAnnualoftheUniversityofWaterloo,forgivingusa
MeetingoftheAssociationforComputa-copyoftheWaterlooMultiTextSystemandtheir
tionalLinguistics(ACL’99),pages57–TerabyteCorpus.ThankstoDekangLinformak-64,NewBrunswick,NJ.inghisDependency-BasedWordSimilaritylex-iconavailableonline.ThankstoDougRohdeBerry,MichaelW.1992.Largescalesingu-forSVDLIBCandMichaelBerryforSVDPACK.larvaluecomputations.International
ThankstoTedPedersenformakinghisWord-JournalofSupercomputerApplications,
Net::Similaritypackageavailable.ThankstoJoel6(1):13–49.Martinforcommentsonthepaper.Thanksto
theanonymousreviewersofComputationalLin-Budanitsky,AlexanderandGraemeHirst.
2001.Semanticdistanceinwordnet:guisticsfortheirveryhelpfulcommentsandsug-Anexperimental,application-orientedgestions.
tationalLinguisticsandSeventeenthIn-ternationalConferenceonComputationalLinguistics(COLING-ACL’98),pages96–102,SanFrancisco,California.MorganKaufmannPublishers.
References
Agresti,Alan.1990.CategoricalDataAnal-ysis.Wiley.
Ando,RieKubota.2000.Latentseman-ticspace:Iterativescalingimproves
precisionofinter-documentsimilarityChiarello,Christine,CurtBurgess,Lorie
Richards,andAlmaPollock.1990.Se-measurement.InProceedingsofthe
manticandassociativepriminginthe23rdAnnualACMSIGIRConferenceon
cerebralhemispheres:Somewordsdo,ResearchandDevelopmentinInformation
somewordsdon’t...sometimes,someRetrieval(SIGIR-2000),pages216–223.
places.BrainandLanguage,38:75–104.
Baeza-Yates,RicardoA.andBerthierA.
Ribeiro-Neto.1999.ModernInforma-Claman,Cathy.2000.10RealSATs.Col-legeEntranceExaminationBoard.tionRetrieval.ACMPress.Clarke,CharlesL.A.,GordonV.Cormack,
Banerjee,SatanjeevandTedPedersen.
andChristopherR.Palmer.1998.An
2003.Extendedglossoverlapsasa
overviewofmultitext.ACMSIGIRFo-measureofsemanticrelatedness.In
rum,32(2):14–15.
ProceedingsoftheEighteenthInterna-tionalJointConferenceonArtificialIntel-Daganzo,CarlosF.1994.Thecelltrans-ligence(IJCAI-03),pages805–810,Aca-missionmodel:Adynamicrepresen-pulco,Mexico.tationofhighwaytrafficconsistent
withthehydrodynamictheory.Trans-Barker,KenandStanSzpakowicz.1998.portationResearchPartB:Methodologi-Semi-automaticrecognitionofnouncal,28(4):269–287.
modifierrelationships.InChristian
BoitetandPeteWhitelock,editors,Deerwester,ScottC.,SusanT.Dumais,
ThomasK.Landauer,GeorgeW.Fur-ProceedingsoftheThirty-SixthAnnual
nas,andRichardA.Harshman.1990.MeetingoftheAssociationforCompu-
evaluationoffivemeasures.InPro-ceedingsoftheWorkshoponWordNetandOtherLexicalResources,SecondMeet-ingoftheNorthAmericanChapteroftheAssociationforComputationalLinguis-tics(NAACL-2001),pages29–24,Pitts-burgh,PA.
34
SimilarityofSemanticRelationsTurney
Fellbaum,Christiane,editor.1998.Word-Net:AnElectronicLexicalDatabase.Dolan,WilliamB.1995.Metaphor
MITPress.asanemergentpropertyofmachine-readabledictionaries.InProceedingsofFrench,RobertM.2002.Thecompu-theAAAI1995SpringSymposiumSeries:tationalmodelingofanalogy-making.
RepresentationandAcquisitionofLexi-TrendsinCognitiveSciences,6(5):200–
calKnowledge:Polysemy,Ambiguityand205.Generativity,pages27–32.
Gentner,Dedre.1983.Structure-mapping:
Dumais,SusanT.1990.EnhancingAtheoreticalframeworkforanalogy.
performanceinlatentsemanticindex-CognitiveScience,7(2):155–170.
ing(LSI)retrieval.TechnicalRe-portTM-ARH-017527,Bellcore,Mor-Gentner,Dedre,BrianBowdle,Phillip
Wolff,andConsueloBoronat.2001.ristown,NJ.
Metaphorislikeanalogy.InDedre
Dumais,SusanT.1993.LatentsemanticGentner,KeithJ.Holyoak,andBoi-indexing(LSI)andTREC-2.InD.K.choN.Kokinov,editors,TheAnalogical
Harman,editor,ProceedingsoftheSec-Mind:PerspectivesfromCognitiveSci-ondTextREtrievalConference(TREC-ence,pages199–253,Cambridge,MA.
2),pages105–115.NationalInstituteofMITPress.StandardsandTechnology.
Gildea,DanielandDanielJurafsky.2002.
Automaticlabelingofsemanticroles.Falkenhainer,Brian.1990.Analogicalin-ComputationalLinguistics,28(3):245–terpretationincontext.InProceedings
288.oftheTwelfthAnnualConferenceofthe
CognitiveScienceSociety,pages69–76.
Girju,Roxana,AdrianaBadulescu,and
LawrenceErlbaumAssociates.
DanI.Moldovan.2003.Learn-ingsemanticconstraintsfortheau-Falkenhainer,Brian,KennethD.Forbus,
tomaticdiscoveryofpart-wholerela-andDedreGentner.1989.The
tions.InProceedingsoftheHumanLan-structure-mappingengine:Algorithm
guageTechnologyConferenceoftheNorthandexamples.ArtificialIntelligence,
AmericanChapteroftheAssociationfor41(1):1–63.
ComputationalLinguistics(HLT-NAACL
Federici,Stefano,SimonettaMontemagni,2003),pages80–87.
andVitoPirrelli.1997.Inferringse-2005.Theem-manticsimilarityfromdistributionalGoldenberg,David.
peror’snewclothes:Undressingevidence:Ananalogy-basedapproach
thenewandunimprovedsat.Gelftowordsensedisambiguation.In
Magazine,March.http://www.gelf-ProceedingsoftheACL/EACLWork-magazine.com/mt/archives/theshoponAutomaticInformationExtrac-newtionandBuildingofLexicalSemanticRe-sourcesforNLPApplications,pages90–
97,Madrid,Spain.Feelders,AdandWilliamVerkooijen.
1995.Whichmethodlearnsthemostfromdata?Methodologicalissuesintheanalysisofcomparativestudies.InFifthInternationalWorkshoponArtificial
Indexingbylatentsemanticanalysis.JournaloftheAmericanSocietyforInfor-mationScience(JASIS),41(6):391–407.
IntelligenceandStatistics,pages219–225,Ft.Lauderdale,Florida.
ComputationalLinguisticsVolume1,Number1
NinthAnnualInternationalACMSI-Lakoff,GeorgeandMarkJohnson.1980.GIRConferenceonResearchandDe-MetaphorsWeLiveBy.Universityof
velopmentinInformationRetrieval(SI-ChicagoPress,Chicago,IL.
GIR’86),pages186–193,Pisa,Italy.
Landauer,ThomasK.andSusanT.Du-Hearst,MartiA.1992.Automaticac-mais.1997.AsolutiontoPlato’sprob-quisitionofhyponymsfromlargetextlem:Thelatentsemanticanalysisthe-corpora.InProceedingsoftheFour-oryoftheacquisition,induction,andteenthInternationalConferenceonCom-representationofknowledge.Psycho-putationalLinguistics,pages539–545,logicalReview,104(2):211–240.Nantes,France.
Lapata,MirellaandFrankKeller.2004.
Hirst,GraemeandDavidSt-Onge.1998.Thewebasabaseline:Evaluatingthe
Lexicalchainsasrepresentationsofperformanceofunsupervisedweb-contextforthedetectionandcorrec-basedmodelsforarangeofNLPtionofmalapropisms.InChristianetasks.InProceedingsoftheHumanLan-Fellbaum,editor,WordNet:AnElec-guageTechnologyConferenceoftheNorthtronicLexicalDatabase,pages305–332.AmericanChapteroftheAssociationforMITPress.ComputationalLinguistics(HLT-NAACL
2004),pages121–128.
Hofmann,Thomas.1999.Probabilistic
LatentSemanticIndexing.InProceed-Lauer,Mark.1995.DesigningStatisticalingsofthe22ndAnnualACMConferenceLanguageLearners:ExperimentsonCom-onResearchandDevelopmentinInforma-poundNouns.Ph.D.thesis,MacquarietionRetrieval(SIGIR’99),pages50–57,University.Berkeley,California,August.
Leacock,ClaudiaandMartinChodorow.
Hofstadter,DouglasandtheFluidAnalo-1998.Combininglocalcontextand
giesResearchGroup.1995.FluidCon-WordNetsimilarityforwordsense
ceptsandCreativeAnalogies:Computeridentification.InChristianeFellbaum,ModelsoftheFundamentalMechanismseditor,WordNet:AnElectronicLexicalofThought.BasicBooks,NewYork,Database,pages265–283.MITPress.NY.
Lee,DanielD.andH.SebastianSeung.
Jarmasz,MarioandStanSzpakowicz.1999.Learningthepartsofobjectsby
2003.Roget’sthesaurusandsemanticnonnegativematrixfactorization.Na-similarity.InProceedingsoftheInterna-ture,401:788–791.tionalConferenceonRecentAdvancesin
NaturalLanguageProcessing(RANLP-Lesk,MichaelE.1969.Word-wordassoci-ationsindocumentretrievalsystems.03),pages212–219,Borovets,Bulgaria.
AmericanDocumentation,20(1):27–38.
Jiang,JayJ.andDavidW.Conrath.1997.
SemanticsimilaritybasedoncorpusLesk,MichaelE.1986.Automaticsense
disambiguationusingmachineread-statisticsandlexicaltaxonomy.InPro-abledictionaries:HowtotellapineceedingsoftheInternationalConference
conefromaicecreamcone.InProceed-onResearchinComputationalLinguistics
ingsofACMSIGDOC’86,pages24–26.(ROCLINGX),pages19–33,Tapei,Tai-wan.
Lewis,DavidD.1991.Evaluatingtextcat-Kurtz,Stanley.2002.Testingdebate.egorization.InProceedingsoftheSpeech
NationalReviewMagazine,August.andNaturalLanguageWorkshop,pageshttp://www.nationalreview.com/kur-312–318,Asilomar,CA.MorganKauf-tz/kurtz082102.asp.mann.
36
SimilarityofSemanticRelationsTurney
Lin,Dekang.1998a.AutomaticretrievalProceedingsofACMSIGKDDConfer-andclusteringofsimilarwords.InenceonKnowledgeDiscoveryandData
Proceedingsofthe36thAnnualMeet-Mining,pages613–619.
ingoftheAssociationforComputational
Linguisticsandthe17thInternationalRada,Roy,HafedhMili,EllenBicknell,
andMariaBlettner.1989.Develop-ConferenceonComputationalLinguis-mentandapplicationofametriconse-tics(COLING-ACL’98),pages768–774,
manticnets.IEEETransactionsonSys-Montreal,Canada.
tems,Man,andCybernetics,19(1):17–30.
Lin,Dekang.1998b.Aninformation-theoreticdefinitionofsimilarity.InRehder,Bob,M.E.Schreiner,MichaelB.W.Proceedingsofthe15thInternationalCon-Wolfe,DarrellLaham,ThomasK.Lan-ferenceonMachineLearning(ICML’98),dauer,andWalterKintsch.1998.Us-pages296–304.MorganKaufmann,inglatentsemanticanalysistoassessSanFrancisco,CA.knowledge:Sometechnicalconsidera-tions.DiscourseProcesses,25:337–354.Marx,Zvika,IdoDagan,JoachimBuh-mann,andEliShamir.2002.Cou-Reitman,WalterR.1965.Cognitionand
pledclustering:Amethodfordetect-Thought:AnInformationProcessingAp-ingstructuralcorrespondence.Jour-proach.JohnWileyandSons,New
nalofMachineLearningResearch,3:747–
York,NY.
780.Medin,DouglasL.,RobertL.Goldstone,Resnik,Philip.1995.Usinginforma-tioncontenttoevaluatesemanticsim-andDedreGentner.1990.Similar-ilarityinataxonomy.InProceedingsityinvolvingattributesandrelations:
ofthe14thInternationalJointConfer-Judgmentsofsimilarityanddifference
enceonArtificialIntelligence(IJCAI-95),arenotinverses.PsychologicalScience,
pages448–453,SanMateo,CA.Mor-1(1):64–69.
ganKaufmann.
Moldovan,Dan,AdrianaBadulescu,
MartaTatu,DanielAntohe,andRiloff,EllenandRosieJones.1999.RoxanaGirju.2004.ModelsforLearningdictionariesforinformationthesemanticclassificationofnounextractionbymulti-levelbootstrap-phrases.InProceedingsoftheCom-ping.InProceedingsoftheSixteenthNa-putationalLexicalSemanticsWorkshoptionalConferenceonArtificialIntelligenceatHLT-NAACL2004,pages60–67,(AAAI-99),pages474–479.Boston,MA.
Rosario,BarbaraandMartiHearst.
Morris,JaneandGraemeHirst.1991.Lex-2001.Classifyingthesemanticre-icalcohesioncomputedbythesaurallationsinnoun-compoundsviaarelationsasanindicatorofthestruc-domain-specificlexicalhierarchy.In
tureoftext.ComputationalLinguistics,Proceedingsofthe2001Conferenceon17(1):21–48.EmpiricalMethodsinNaturalLanguage
Processing(EMNLP-01),pages82–90.Nastase,ViviandStanSzpakowicz.2003.
Exploringnoun-modifiersemanticre-lations.InFifthInternationalWorkshopRosario,Barbara,MartiHearst,andCharlesFillmore.2002.Thede-onComputationalSemantics(IWCS-5),
scentofhierarchy,andselectioninre-pages285–301,Tilburg,TheNether-lationalsemantics.InProceedingsofthelands.
40thAnnualMeetingoftheAssociationforComputationalLinguistics(ACL’02),Pantel,PatrickandDekangLin.2002.Dis-pages417–424,Philadelphia,PA.coveringwordsensesfromtext.In
37
ComputationalLinguisticsVolume1,Number1
Ruge,Gerda.1992.Experimentson
linguistically-basedtermassociations.InformationProcessingandManagement,28(3):317–332.
InternationalJointConferenceonArtifi-cialIntelligence(IJCAI-05),pages1136–1141,Edinburgh,Scotland.
Turney,PeterD.andMichaelL.Littman.
2005.Corpus-basedlearningofanalo-Salton,Gerard.1989.AutomaticText
giesandsemanticrelations.MachineProcessing:TheTransformation,Analy-Learning,60(1–3):251–278.sis,andRetrievalofInformationbyCom-puter.Addison-Wesley,Reading,MA.
Turney,PeterD.,MichaelL.Littman,Jef-freyBigham,andVictorShnayder.Salton,GerardandChrisBuckley.1988.
2003.Combiningindependentmod-Term-weightingapproachesinauto-ulestosolvemultiple-choicesynonymmatictextretrieval.InformationPro-andanalogyproblems.InProceed-cessingandManagement,24(5):513–523.
ingsoftheInternationalConferenceonRecentAdvancesinNaturalLanguageSalton,GerardandMichaelJ.McGill.
Processing(RANLP-03),pages482–489,1983.IntroductiontoModernInfor-Borovets,Bulgaria.mationRetrieval.McGraw-Hill,New
York,NY.
Vanderwende,Lucy.1994.Algorithm
forautomaticinterpretationofnounScholkopf,Bernhard,AlexanderJ.Smola,
sequences.InProceedingsoftheFif-andKlaus-RobertMuller.1997.Ker-teenthInternationalConferenceonCom-nelprincipalcomponentanalysis.In
putationalLinguistics,pages782–788,ProceedingsoftheInternationalCon-Kyoto,Japan.ferenceonArtificialNeuralNetworks
(ICANN-1997),pages583–588,Berlin.
Veale,Tony.2003.Theanalogicalthe-saurus.InProceedingsofthe15thIn-Terra,EgidioandCharlesL.A.Clarke.
novativeApplicationsofArtificialIntel-2003.Frequencyestimatesforstatis-ligenceConference(IAAI2003),pagesticalwordsimilaritymeasures.InPro-137–142,Acapulco,Mexico.ceedingsoftheHumanLanguageTechnol-ogyandNorthAmericanChapterofAsso-Veale,Tony.2004.WordNetsitstheSAT:
ciationofComputationalLinguisticsCon-Aknowledge-basedapproachtolexi-ference2003(HLT/NAACL2003),pagescalanalogy.InProceedingsofthe16th244–251.EuropeanConferenceonArtificialIntelli-gence(ECAI2004),pages606–612,Va-Turney,PeterD.2001.MiningtheWeb
lencia,Spain.forsynonyms:PMI-IRversusLSAon
TOEFL.InProceedingsoftheTwelfth
EuropeanConferenceonMachineLearn-ing,pages491–502,Berlin.Springer.
Yangarber,Roman.2003.Counter-trainingindiscoveryofsemanticpat-terns.InProceedingsofthe41stAnnual
MeetingoftheAssociationforCompu-tationalLinguistics(ACL-2003),pages343–350,Sapporo,Japan.
Turney,PeterD.2002.Thumbsup
orthumbsdown?Semanticorienta-tionappliedtounsupervisedclassifi-cationofreviews.InProceedingsoftheYarowsky,David.1993.Onesenseper40thAnnualMeetingoftheAssociationcollocation.InProceedingsoftheARPAforComputationalLinguistics(ACL’02),HumanLanguageTechnologyWorkshop,pages417–424.pages266–271,Princeton,NJ.Turney,PeterD.2005.Measuringseman-Zelenko,Dmitry,ChinatsuAone,andAn-thonyRichardella.2003.Kernelmeth-ticsimilaritybylatentrelationalanal-odsforrelationextraction.Journalysis.InProceedingsoftheNineteenth
38
SimilarityofSemanticRelationsTurney
ofMachineLearningResearch,3:1083–1106.
39
因篇幅问题不能全部显示,请点此查看更多更全内容