PeterD.Turney∗
NationalResearchCouncilCanada
arXiv:cs/0608100v1 [cs.CL] 25 Aug 2006Thereareatleasttwokindsofsimilarity.Relationalsimilarityiscorrespondencebetweenre-lations,incontrastwithattributionalsimilarity,whichiscorrespondencebetweenattributes.Whentwowordshaveahighdegreeofattributionalsimilarity,wecallthemsynonyms.Whentwopairsofwordshaveahighdegreeofrelationalsimilarity,wesaythattheirrelationsareanal-ogous.Forexample,thewordpairmason:stoneisanalogoustothepaircarpenter:wood.ThispaperintroducesLatentRelationalAnalysis(LRA),amethodformeasuringrelationalsimilar-ity.LRAhaspotentialapplicationsinmanyareas,includinginformationextraction,wordsensedisambiguation,andinformationretrieval.RecentlytheVectorSpaceModel(VSM)ofinforma-tionretrievalhasbeenadaptedtomeasuringrelationalsimilarity,achievingascoreof47%onacollectionof374college-levelmultiple-choicewordanalogyquestions.IntheVSMapproach,therelationbetweenapairofwordsischaracterizedbyavectoroffrequenciesofpredefinedpatternsinalargecorpus.LRAextendstheVSMapproachinthreeways:(1)thepatternsarederivedautomaticallyfromthecorpus,(2)theSingularValueDecomposition(SVD)isusedtosmooththefrequencydata,and(3)automaticallygeneratedsynonymsareusedtoexplorevariationsofthewordpairs.LRAachieves56%onthe374analogyquestions,statisticallyequivalenttotheaveragehumanscoreof57%.Ontherelatedproblemofclassifyingsemanticrelations,LRAachievessimilargainsovertheVSM.1.Introduction
Thereareatleasttwokindsofsimilarity.Attributionalsimilarityiscorrespondencebe-tweenattributesandrelationalsimilarityiscorrespondencebetweenrelations(Medin,Goldstone,andGentner,1990).Whentwowordshaveahighdegreeofattributionalsimilarity,wecallthemsynonyms.Whentwowordpairshaveahighdegreeofrelationalsimilarity,wesaytheyareanalo-gous.
VerbalanalogiesareoftenwrittenintheformA:B::C:D,meaningAistoBasCistoD;forexample,traffic:street::water:riverbed.Trafficflowsoverastreet;waterflowsoverariverbed.Astreetcarriestraffic;ariverbedcarrieswater.Thereisahighde-greeofrelationalsimilaritybetweenthewordpairtraffic:streetandthewordpairwa-ter:riverbed.Infact,thisanalogyisthebasisofseveralmathematicaltheoriesoftrafficflow(Daganzo,1994).
InSection2,welookmorecloselyattheconnectionsbetweenattributionalandre-lationalsimilarity.Inanalogiessuchasmason:stone::carpenter:wood,itseemsthatre-lationalsimilaritycanbereducedtoattributionalsimilarity,sincemasonandcarpenterareattributionallysimilar,asarestoneandwood.Ingeneral,thisreductionfails.Con-sidertheanalogytraffic:street::water:riverbed.Trafficandwaterarenotattributionallysimilar.Streetandriverbedareonlymoderatelyattributionallysimilar.
ComputationalLinguisticsVolume1,Number1
Manyalgorithmshavebeenproposedformeasuringtheattributionalsimilaritybe-tweentwowords(Lesk,1969;Resnik,1995;LandauerandDumais,1997;JiangandConrath,1997;Lin,1998b;Turney,2001;BudanitskyandHirst,2001;BanerjeeandPedersen,2003).Mea-suresofattributionalsimilarityhavebeenstudiedextensively,duetotheirapplicationsinproblemssuchasrecognizingsynonyms(LandauerandDumais,1997),informationretrieval(Deerwesteretal.,1990),determiningsemanticorientation(Turney,2002),grad-ingstudentessays(Rehderetal.,1998),measuringtextualcohesion(MorrisandHirst,1991),andwordsensedisambiguation(Lesk,1986).
Ontheotherhand,sincemeasuresofrelationalsimilarityarenotaswelldevelopedasmeasuresofattributionalsimilarity,thepotentialapplicationsofrelationalsimilarityarenotaswellknown.Manyproblemsthatinvolvesemanticrelationswouldbenefitfromanalgorithmformeasuringrelationalsimilarity.Wediscussrelatedproblemsinnaturallanguageprocessing,informationretrieval,andinformationextractioninmoredetailinSection3.
ThispaperbuildsontheVectorSpaceModel(VSM)ofinformationretrieval.Givenaquery,asearchengineproducesarankedlistofdocuments.Thedocumentsarerankedinorderofdecreasingattributionalsimilaritybetweenthequeryandeachdoc-ument.AlmostallmodernsearchenginesmeasureattributionalsimilarityusingtheVSM(Baeza-YatesandRibeiro-Neto,1999).TurneyandLittman(2005)adapttheVSMapproachtomeasuringrelationalsimilarity.Theyusedavectoroffrequenciesofpat-ternsinacorpustorepresenttherelationbetweenapairofwords.Section4presentstheVSMapproachtomeasuringsimilarity.
InSection5,wepresentanalgorithmformeasuringrelationalsimilarity,whichwecallLatentRelationalAnalysis(LRA).Thealgorithmlearnsfromalargecorpusofunlabeled,unstructuredtext,withoutsupervision.LRAextendstheVSMapproachofTurneyandLittman(2005)inthreeways:(1)Theconnectingpatternsarederivedautomaticallyfromthecorpus,insteadofusingafixedsetofpatterns.(2)SingularValueDecomposition(SVD)isusedtosmooththefrequencydata.(3)Givenawordpairsuchastraffic:street,LRAconsiderstransformationsofthewordpair,generatedbyreplacingoneofthewordsbysynonyms,suchastraffic:road,traffic:highway.
Section6presentsourexperimentalevaluationofLRAwithacollectionof374multiple-choicewordanalogyquestionsfromtheSATcollegeentranceexam.1Anex-ampleofatypicalSATquestionappearsinTable1.Intheeducationaltestingliterature,thefirstpair(mason:stone)iscalledthestemoftheanalogy.Thecorrectchoiceiscalledthesolutionandtheincorrectchoicesaredistractors.WeevaluateLRAbytestingitsabil-itytoselectthesolutionandavoidthedistractors.Theaverageperformanceofcollege-boundseniorhighschoolstudentsonverbalSATquestionscorrespondstoanaccuracyofabout57%.LRAachievesanaccuracyofabout56%.Onthesesamequestions,theVSMattained47%.
Oneapplicationforrelationalsimilarityisclassifyingsemanticrelationsinnoun-modifierpairs(TurneyandLittman,2005).InSection7,weevaluatetheperformanceofLRAwithasetof600noun-modifierpairsfromNastaseandSzpakowicz(2003).Theproblemistoclassifyanoun-modifierpair,suchas“laserprinter”,accordingtothesemanticrelationbetweentheheadnoun(printer)andthemodifier(laser).The600pairshavebeenmanuallylabeledwith30classesofsemanticrelations.Forexample,“laserprinter”isclassifiedasinstrument;theprinterusesthelaserasaninstrumentfor
SimilarityofSemanticRelationsTurney
Stem:mason:stone
Solution:(b)carpenter:wood
Y).
Theamountofattributionalsimilaritybetweentwowords,AandB,dependsonthedegreeofcorrespondencebetweenthepropertiesofAandB.Ameasureofattributionalsimilarityisafunctionthatmapstwowords,AandB,toarealnumber,sima(A,B)∈ℜ.ThemorecorrespondencethereisbetweenthepropertiesofAandB,thegreatertheirattributionalsimilarity.Forexample,dogandwolfhavearelativelyhighdegreeofattributionalsimilarity.
Theamountofrelationalsimilaritybetweentwopairsofwords,A:BandC:D,de-pendsonthedegreeofcorrespondencebetweentherelationsbetweenAandBandtherelationsbetweenCandD.Ameasureofrelationalsimilarityisafunctionthatmapstwopairs,A:BandC:D,toarealnumber,simr(A:B,C:D)∈ℜ.Themorecor-respondencethereisbetweentherelationsofA:BandC:D,thegreatertheirrelationalsimilarity.Forexample,dog:barkandcat:meowhavearelativelyhighdegreeofrelational
THAN(X,
3
ComputationalLinguisticsVolume1,Number1
similarity.
Cognitivescientistsdistinguishwordsthataresemanticallyassociated(bee–honey)fromwordsthataresemanticallysimilar(deer–pony),althoughtheyrecognizethatsomewordsarebothassociatedandsimilar(doctor–nurse)(Chiarelloetal.,1990).Bothofthesearetypesofattributionalsimilarity,sincetheyarebasedoncorrespondencebe-tweenattributes(e.g.,beesandhoneyarebothfoundinhives;deerandponiesarebothmammals).
BudanitskyandHirst(2001)describesemanticrelatednessasfollows:
Recentresearchonthetopicincomputationallinguisticshasemphasizedtheperspectiveofsemanticrelatednessoftwolexemesinalexicalresource,orits
inverse,semanticdistance.It’simportanttonotethatsemanticrelatednessisamoregeneralconceptthansimilarity;similarentitiesareusuallyassumedtoberelatedbyvirtueoftheirlikeness(bank-trustcompany),butdissimilarentitiesmayalsobesemanticallyrelatedbylexicalrelationshipssuchasmeronymy(car-wheel)andantonymy(hot-cold),orjustbyanykindoffunctionalrelationshiporfrequentassociation(pencil-paper,penguin-Antarctica).
Astheseexamplesshow,semanticrelatednessisthesameasattributionalsimilarity(e.g.,hotandcoldarebothkindsoftemperature,pencilandpaperarebothusedforwriting).Hereweprefertousethetermattributionalsimilarity,becauseitemphasizesthecontrastwithrelationalsimilarity.Thetermsemanticrelatednessmayleadtoconfu-sionwhenthetermrelationalsimilarityisalsounderdiscussion.
Resnik(1995)describessemanticsimilarityasfollows:
Semanticsimilarityrepresentsaspecialcaseofsemanticrelatedness:forexample,carsandgasolinewouldseemtobemorecloselyrelatedthan,say,carsandbicycles,butthelatterpairarecertainlymoresimilar.Radaetal.(1989)suggestthattheassessmentofsimilarityinsemanticnetworkscaninfactbethoughtofasinvolvingjusttaxonimic(IS-A)links,totheexclusionofotherlinktypes;thatviewwillalsobetakenhere,althoughadmittedlyitexcludessomepotentiallyusefulinformation.
Thussemanticsimilarityisaspecifictypeofattributionalsimilarity.Thetermsemanticsimilarityismisleading,becauseitreferstoatypeofattributionalsimilarity,yetrela-tionalsimilarityisnotanylesssemanticthanattributionalsimilarity.
Toavoidconfusion,wewillusethetermsattributionalsimilarityandrelationalsim-ilarity,followingMedin,Goldstone,andGentner(1990).Insteadofsemanticsimilarity(Resnik,1995)orsemanticallysimilar(Chiarelloetal.,1990),wepreferthetermtaxonomi-calsimilarity,whichwetaketobeaspecifictypeofattributionalsimilarity.Weinterpretsynonymyasahighdegreeofattributionalsimilarity.Analogyisahighdegreeofrela-tionalsimilarity.
2.2MeasuringAttributionalSimilarity
Algorithmsformeasuringattributionalsimilaritycanbelexicon-based(Lesk,1986;BudanitskyandHirst,2001;BanerjeeandPedersen,2003),corpus-based(Lesk,1969;LandauerandDumais,1997;Lin,1998a;Turney,2001),orahybridofthetwo(Resnik,1995;JiangandConrath,1997;Turneyetal.,2003).Intuitively,wemightexpectthatlexicon-basedalgorithmswouldbebetteratcapturingsynonymythancorpus-basedalgorithms,sincelexicons,suchasWordNet,explicitlyprovidesynonymyinformationthatisonlyimplicitinacorpus.However,experimentsdonotsupportthisintuition.
Severalalgorithmshavebeenevaluatedusing80multiple-choicesynonymques-
4
SimilarityofSemanticRelationsTurney
Stem:levied
Solution:(a)imposed
Table3
Performanceofattributionalsimilaritymeasuresonthe80TOEFLquestions.(Theaverage
non-EnglishUScollegeapplicant’sperformanceisincludedinthebottomrow,forcomparison.)
JarmaszandSzpakowicz(2003)TerraandClarke(2003)Turneyetal.(2003)bestlexicon-basedalgorithmbestcorpus-basedalgorithmbesthybridalgorithm78.7581.2597.50
tionstakenfromtheTestofEnglishasaForeignLanguage(TOEFL).Anexampleofoneofthe80TOEFLquestionsappearsinTable2.Table3showsthebestperformanceontheTOEFLquestionsforeachtypeofattributionalsimilarityalgorithm.Theresultssupporttheclaimthatlexicon-basedalgorithmshavenoadvantageovercorpus-basedalgorithmsforrecognizingsynonymy.
2.3UsingAttributionalSimilaritytoSolveAnalogies
Wemaydistinguishnearanalogies(mason:stone::carpenter:wood)fromfaranalogies(traffic:street::water:riverbed)(Gentner,1983;Medin,Goldstone,andGentner,1990).InananalogyA:B::C:D,wherethereisahighdegreeofrelationalsimilaritybetweenA:BandC:D,ifthereisalsoahighdegreeofattributionalsimilaritybetweenAandC,andbetweenBandD,thenA:B::C:Disanearanalogy;otherwise,itisafaranalogy.
ItseemspossiblethatSATanalogyquestionsmightconsistlargelyofnearanalogies,inwhichcasetheycanbesolvedusingattributionalsimilaritymeasures.Wecouldscoreeachcandidateanalogybytheaverageoftheattributionalsimilarity,sima,betweenAandCandbetweenBandD:
score(A:B::C:D)=
1
totalnumberofguessesmade
(2)
5
ComputationalLinguisticsVolume1,Number1
AlgorithmTypePrecisionRecallF
TurneyandLittman(2005)randomrelational(VSM)random47.720.047.120.047.420.0
maximumpossiblenumbercorrectF=
2×precision×recall
(3)
2See
http://www.d.umn.edu/∼tpederse/similarity.html.
6
SimilarityofSemanticRelationsTurney
wasfirstattemptedbyasystemcalledArgus(Reitman,1965),usingasmallhand-builtsemanticnetwork.Arguscouldonlysolvethelimitedsetofanalogyquestionsthatitsprogrammerhadanticipated.Arguswasbasedonaspreadingactivationmodelanddidnotexplicitlyattempttomeasurerelationalsimilarity.
Turneyetal.(2003)combined13independentmodulestoanswerSATquestions.Thefinaloutputofthesystemwasbasedonaweightedcombinationoftheoutputsofeachindividualmodule.Thebestofthe13moduleswastheVSM,whichisdescribedindetailinTurneyandLittman(2005).TheVSMwasevaluatedonasetof374SATquestions,achievingascoreof47%.
Incontrastwiththecorpus-basedapproachofTurneyandLittman(2005),Veale(2004)appliedalexicon-basedapproachtothesame374SATquestions,attainingascoreof43%.VealeevaluatedthequalityofacandidateanalogyA:B::C:DbylookingforpathsinWordNet,joiningAtoBandCtoD.ThequalitymeasurewasbasedonthesimilaritybetweentheA:BpathsandtheC:Dpaths.
Turney(2005)introducedLatentRelationalAnalysis(LRA),anenhancedversionoftheVSMapproach,whichreached56%onthe374SATquestions.Herewegobe-yondTurney(2005)bydescribingLRAinmoredetail,performingmoreextensiveex-periments,andanalyzingthealgorithmandrelatedworkinmoredepth.
3.2StructureMappingTheory
French(2002)citesStructureMappingTheory(SMT)(Gentner,1983)anditsimplemen-tationintheStructureMappingEngine(SME)(Falkenhainer,Forbus,andGentner,1989)asthemostinfluentialworkonmodelingofanalogy-making.Thegoalofcomputationalmodelingofanalogy-makingistounderstandhowpeopleformcomplex,structuredanalogies.SMEtakesrepresentationsofasourcedomainandatargetdomain,andpro-ducesananalogicalmappingbetweenthesourceandtarget.Thedomainsaregivenstructuredpropositionalrepresentations,usingpredicatelogic.Thesedescriptionsin-cludeattributes,relations,andhigher-orderrelations(expressingrelationsbetweenre-lations).Theanalogicalmappingconnectssourcedomainrelationstotargetdomainrelations.
Forexample,thereisananalogybetweenthesolarsystemandRutherford’smodeloftheatom(Falkenhainer,Forbus,andGentner,1989).ThesolarsystemisthesourcedomainandRutherford’smodeloftheatomisthetargetdomain.Thebasicobjectsinthesourcemodelaretheplanetsandthesun.Thebasicobjectsinthetargetmodelaretheelectronsandthenucleus.Theplanetsandthesunhavevariousattributes,suchasmass(sun)andmass(planet),andvariousrelations,suchasrevolve(planet,sun)andattracts(sun,planet).Likewise,thenucleusandtheelectronshaveattributes,suchascharge(electron)andcharge(nucleus),andrelations,suchasrevolve(electron,nucleus)andattracts(nucleus,electron).SMEmapsrevolve(planet,sun)torevolve(electron,nu-cleus)andattracts(sun,planet)toattracts(nucleus,electron).
Eachindividualconnection(e.g.,fromrevolve(planet,sun)torevolve(electron,nu-cleus))inananalogicalmappingimpliesthattheconnectedrelationsaresimilar;thus,SMTrequiresameasureofrelationalsimilarity,inordertoformmaps.EarlyversionsofSMEonlymappedidenticalrelations,butlaterversionsofSMEallowedsimilar,non-identicalrelationstomatch(Falkenhainer,1990).However,thefocusofresearchinanalogy-makinghasbeenonthemappingprocessasawhole,ratherthanmeasuringthesimilaritybetweenanytwoparticularrelations,hencethesimilaritymeasuresusedinSMEatthelevelofindividualconnectionsaresomewhatrudimentary.
Webelievethatamoresophisticatedmeasureofrelationalsimilarity,suchasLRA,mayenhancetheperformanceofSME.Likewise,thefocusofourworkhereisonthesimilaritybetweenparticularrelations,andweignoresystematicmappingbetweensets
7
ComputationalLinguisticsVolume1,Number1
Metaphoricalsentence
Idemolishedhisargument.Youneedtobudgetyourtime.I’veinvestedalotoftimeinher.Mymindjustisn’toperatingtoday.Lifehascheatedme.
Inflationiseatingupourprofits.SAT-styleverbalanalogy
down::argument:refute
building:demolish::argument:refutemoney:budget::time:schedulemoney:invest::time:allocatemachine:operate::mind:thinkcharlatan:cheat::life:disappointanimal:eat::inflation:reduce
SimilarityofSemanticRelationsTurney
someextent.BarkerandSzpakowicz(1998)triedacorpus-basedapproachthatexplic-itlyusedameasureofrelationalsimilarity,buttheirmeasurewasbasedonliteralmatch-ing,whichlimiteditsabilitytogeneralize.Moldovanetal.(2004)alsousedameasureofrelationalsimilarity,basedonmappingeachnounandmodifierintosemanticclassesinWordNet.Thenoun-modifierpairsweretakenfromacorpusandthesurroundingcontextinthecorpuswasusedinawordsensedisambiguationalgorithm,toimprovethemappingofthenounandmodifierintoWordNet.TurneyandLittman(2005)usedtheVSM(asacomponentinasinglenearestneighbourlearningalgorithm)tomeasurerelationalsimilarity.Wetakethesameapproachhere,substitutingLRAfortheVSM,inSection7.
Lauer(1995)usedacorpus-basedapproach(usingtheBNC)toparaphrasenoun-modifierpairs,byinsertingtheprepositionsof,for,in,at,on,from,with,andabout.Forexample,reptilehavenwasparaphrasedashavenforreptiles.LapataandKeller(2004)achievedimprovedresultsonthistask,byusingthedatabaseofAltaVista’ssearchen-gineasacorpus.
3.5WordSenseDisambiguation
Webelievethattheintendedsenseofapolysemouswordisdeterminedbyitssemanticrelationswiththeotherwordsinthesurroundingtext.Ifwecanidentifythesemanticrelationsbetweenthegivenwordanditscontext,thenwecandisambiguatethegivenword.Yarowsky’s(1993)observationthatcollocationsarealmostalwaysmonosemousisevidenceforthisview.Federici,Montemagni,andPirrelli(1997)presentananalogy-basedapproachtowordsensedisambiguation.
Forexample,considerthewordplant.Outofcontext,plantcouldrefertoanindus-trialplantoralivingorganism.Supposeplantappearsinsometextnearfood.Atypicalapproachtodisambiguatingplantwouldcomparetheattributionalsimilarityoffoodandindustrialplanttotheattributionalsimilarityoffoodandlivingorganism(Lesk,1986;BanerjeeandPedersen,2003).Inthiscase,thedecisionmaynotbeclear,sinceindustrialplantsoftenproducefoodandlivingorganismsoftenserveasfood.Itwouldbeveryhelpfultoknowtherelationbetweenfoodandplantinthisexample.Inthephrase“foodfortheplant”,therelationbetweenfoodandplantstronglysuggeststhattheplantisalivingorganism,sinceindustrialplantsdonotneedfood.Inthetext“foodattheplant”,therelationstronglysuggeststhattheplantisanindustrialplant,sincelivingorgan-ismsarenotusuallyconsideredaslocations.Thusanalgorithmforclassifyingsemanticrelations(asinSection7)shouldbehelpfulforwordsensedisambiguation.
3.6InformationExtraction
Theproblemofrelationextractionis,givenaninputdocumentandaspecificrelationR,extractallpairsofentities(ifany)thathavetherelationRinthedocument.Theprob-lemwasintroducedaspartoftheMessageUnderstandingConferences(MUC)in1998.Zelenko,Aone,andRichardella(2003)presentakernelmethodforextractingtherela-tionsperson-affiliationandorganization-location.Forexample,inthesentence“JohnSmithisthechiefscientistoftheHardcomCorporation,”thereisaperson-affiliationrelationbe-tween“JohnSmith”and“HardcomCorporation”(Zelenko,Aone,andRichardella,2003).Thisissimilartotheproblemofclassifyingsemanticrelations(Section3.4),exceptthatinformationextractionfocusesontherelationbetweenaspecificpairofentitiesinaspecificdocument,ratherthanageneralpairofwordsingeneraltext.Thereforeanalgorithmforclassifyingsemanticrelationsshouldbeusefulforinformationextraction.
IntheVSMapproachtoclassifyingsemanticrelations(TurneyandLittman,2005),wewouldhaveatrainingsetoflabeledexamplesoftherelationperson-affiliation,forin-stance.Eachexamplewouldberepresentedbyavectorofpatternfrequencies.Givena
9
ComputationalLinguisticsVolume1,Number1
specificdocumentdiscussing“JohnSmith”and“HardcomCorporation”,wecouldcon-structavectorrepresentingtherelationbetweenthesetwoentities,andthenmeasuretherelationalsimilaritybetweenthisunlabeledvectorandeachofourlabeledtrainingvectors.Itwouldseemthatthereisaproblemhere,becausethetrainingvectorswouldberelativelydense,sincetheywouldpresumablybederivedfromalargecorpus,butthenewunlabeledvectorfor“JohnSmith”and“HardcomCorporation”wouldbeverysparse,sincetheseentitiesmightbementionedonlyonceinthegivendocument.How-ever,thisisnotanewproblemfortheVectorSpaceModel;itisthestandardsituationwhentheVSMisusedforinformationretrieval.Aquerytoasearchengineisrep-resentedbyaverysparsevectorwhereasadocumentisrepresentedbyarelativelydensevector.Therearewell-knowntechniquesininformationretrievalforcopingwiththisdisparity,suchasweightingschemesforqueryvectorsthataredifferentfromtheweightingschemesfordocumentvectors(SaltonandBuckley,1988).
3.7QuestionAnswering
Intheirpaperonclassifyingsemanticrelations,Moldovanetal.(2004)suggestthatanimportantapplicationoftheirworkisQuestionAnswering.AsdefinedintheTextREtrievalConference(TREC)QuestionAnswering(QA)track,thetaskistoanswersimplequestions,suchas“Wherehavenuclearincidentsoccurred?”,byretrievingarelevantdocumentfromalargecorpusandthenextractingashortstringfromthedoc-ument,suchas“TheThreeMileIslandnuclearincidentcausedaDOEpolicycrisis.”Moldovanetal.(2004)proposetomapagivenquestiontoasemanticrelationandthensearchforthatrelationinacorpusofsemanticallytaggedtext.Theyarguethatthede-siredsemanticrelationcaneasilybeinferredfromthesurfaceformofthequestion.Aquestionoftheform“Where...?”islikelytobeseekingforentitieswithalocationrela-tionandaquestionoftheform“Whatdid...make?”islikelytobelookingforentitieswithaproductrelation.InSection7,weshowhowLRAcanrecognizerelationssuchaslocationandproduct(seeTable19).
3.8AutomaticThesaurusGeneration
Hearst(1992)presentsanalgorithmforlearninghyponym(typeof)relationsfromacorpusandBerlandandCharniak(1999)describehowtolearnmeronym(partof)re-lationsfromacorpus.Thesealgorithmscouldbeusedtoautomaticallygenerateathesaurusordictionary,butwewouldliketohandlemorerelationsthanhyponymyandmeronymy.WordNetdistinguishesmorethanadozensemanticrelationsbetweenwords(Fellbaum,1998)andNastaseandSzpakowicz(2003)list30semanticrelationsfornoun-modifierpairs.Hearst(1992)andBerlandandCharniak(1999)usemanuallygeneratedrulestominetextforsemanticrelations.TurneyandLittman(2005)alsouseamanuallygeneratedsetof64patterns.
LRAdoesnotuseapredefinedsetofpatterns;itlearnspatternsfromalargecorpus.Insteadofmanuallygeneratingnewrulesorpatternsforeachnewsemanticrelation,itispossibletoautomaticallylearnameasureofrelationalsimilaritythatcanhandlearbitrarysemanticrelations.Anearestneighbouralgorithmcanthenusethisrelationalsimilaritymeasuretolearntoclassifyaccordingtoanysetofclassesofrelations,giventheappropriatelabeledtrainingdata.
Girju,Badulescu,andMoldovan(2003)presentanalgorithmforlearningmeronymrelationsfromacorpus.LikeHearst(1992)andBerlandandCharniak(1999),theyusemanuallygeneratedrulestominetextfortheirdesiredrelation.However,theysupple-menttheirmanualruleswithautomaticallylearnedconstraints,toincreasetheprecisionoftherules.
10
SimilarityofSemanticRelationsTurney
3.9InformationRetrieval
Veale(2003)hasdevelopedanalgorithmforrecognizingcertaintypesofwordanalo-gies,basedoninformationinWordNet.Heproposestousethealgorithmforana-logicalinformationretrieval.Forexample,thequery“Muslimchurch”shouldreturn“mosque”andthequery“Hindubible”shouldreturn“theVedas”.Thealgorithmwasdesignedwithafocusonanalogiesoftheformadjective:noun::adjective:noun,suchasChristian:church::Muslim:mosque.
Ameasureofrelationalsimilarityisapplicabletothistask.Givenapairofwords,AandB,thetaskistoreturnanotherpairofwords,XandY,suchthatthereishighrelationalsimilaritybetweenthepairA:XandthepairY:B.Forexample,givenA=“Muslim”andB=“church”,returnX=“mosque”andY=“Christian”.(ThepairMuslim:mosquehasahighrelationalsimilaritytothepairChristian:church.)
Marxetal.(2002)developedanunsupervisedalgorithmfordiscoveringanalogiesbyclusteringwordsfromtwodifferentcorpora.Eachclusterofwordsinonecorpusiscoupledone-to-onewithaclusterintheothercorpus.Forexample,oneexperimentusedacorpusofBuddhistdocumentsandacorpusofChristiandocuments.Aclusterofwordssuchas{Hindu,Mahayana,Zen,...}fromtheBuddhistcorpuswascoupledwithaclusterofwordssuchas{Catholic,Protestant,...}fromtheChristiancorpus.ThusthealgorithmappearstohavediscoveredananalogicalmappingbetweenBuddhistschoolsandtraditionsandChristianschoolsandtraditions.Thisisinterestingwork,butitisnotdirectlyapplicabletoSATanalogies,becauseitdiscoversanalogiesbetweenclustersofwords,ratherthanindividualwords.
3.10IdentifyingSemanticRoles
Asemanticframeforaneventsuchasjudgementcontainssemanticrolessuchasjudge,evaluee,andreason,whereasaneventsuchasstatementcontainsrolessuchasspeaker,ad-dressee,andmessage(GildeaandJurafsky,2002).Thetaskofidentifyingsemanticrolesistolabelthepartsofasentenceaccordingtotheirsemanticroles.Webelievethatitmaybehelpfultoviewsemanticframesandtheirsemanticrolesassetsofsemanticrelations;thusameasureofrelationalsimilarityshouldhelpustoidentifysemanticroles.Moldovanetal.(2004)arguethatsemanticrolesaremerelyaspecialcaseofse-manticrelations(Section3.4),sincesemanticrolesalwaysinvolveverbsorpredicates,butsemanticrelationscaninvolvewordsofanypartofspeech.4.TheVectorSpaceModel
ThissectionexaminespastworkonmeasuringattributionalandrelationalsimilarityusingtheVectorSpaceModel(VSM).
4.1MeasuringAttributionalSimilaritywiththeVectorSpaceModel
TheVSMwasfirstdevelopedforinformationretrieval(SaltonandMcGill,1983;SaltonandBuckley,1988;Salton,1989)anditisatthecoreofmostmodernsearchengines(Baeza-YatesandRibeiro-Neto,1999).IntheVSMapproachtoinformationretrieval,queriesanddocumentsarerepresentedbyvectors.Elementsinthesevectorsarebasedonthefrequenciesofwordsinthecor-respondingqueriesanddocuments.Thefrequenciesareusuallytransformedbyvar-iousformulasandweights,tailoredtoimprovetheeffectivenessofthesearchengine(Salton,1989).Theattributionalsimilaritybetweenaqueryandadocumentismea-suredbythecosineoftheanglebetweentheircorrespondingvectors.Foragivenquery,thesearchenginesortsthematchingdocumentsinorderofdecreasingcosine.
TheVSMapproachhasalsobeenusedtomeasuretheattributionalsimilarityofwords(Lesk,1969;Ruge,1992;PantelandLin,2002).PantelandLin(2002)clustered
11
ComputationalLinguisticsVolume1,Number1
wordsaccordingtotheirattributionalsimilarity,asmeasuredbyaVSM.Theiralgo-rithmisabletodiscoverthedifferentsensesofpolysemouswords,usingunsupervisedlearning.
LatentSemanticAnalysisenhancestheVSMapproachtoinformationretrievalbyusingtheSingularValueDecomposition(SVD)tosmooththevectors,whichhelpsto
handlenoiseandsparsenessinthedata(Deerwesteretal.,1990;Dumais,1993;LandauerandDumais,1997).SVDimprovesbothdocument-queryattributionalsimilaritymeasures(Deerwesteretal.,1990;Dumais,1993)andword-wordattributionalsimilaritymeasures(LandauerandDumais,1997).LRAalsousesSVDtosmoothvectors,aswediscussinSection5.
4.2MeasuringRelationalSimilaritywiththeVectorSpaceModel
LetR1bethesemanticrelation(orsetofrelations)betweenapairofwords,AandB,andletR2bethesemanticrelation(orsetofrelations)betweenanotherpair,CandD.WewishtomeasuretherelationalsimilaritybetweenR1andR2.TherelationsR1andR2arenotgiventous;ourtaskistoinferthesehidden(latent)relationsandthencomparethem.
IntheVSMapproachtorelationalsimilarity(TurneyandLittman,2005),wecreatevectors,r1andr2,thatrepresentfeaturesofR1andR2,andthenmeasurethesimilarityofR1andR2bythecosineoftheangleθbetweenr1andr2:
r1=r1,1,...,r1,nr2=r2,1,...r2,n
n
(5)(6)
cosine(θ)=
i=1n
r1,i·r2,i
i=1n
=
i=1
(r1,i)2·
(r2,i)2
r1·r2
√
r1·r1·
r1·r2
(7)
Wecreateavector,r,tocharacterizetherelationshipbetweentwowords,XandY,
bycountingthefrequenciesofvariousshortphrasescontainingXandY.TurneyandLittman(2005)usealistof64joiningterms,suchas“of”,“for”,and“to”,toform128phrasesthatcon-tainXandY,suchas“XofY”,“YofX”,“XforY”,“YforX”,“XtoY”,and“YtoX”.Thesephrasesarethenusedasqueriesforasearchengineandthenumberofhits(matchingdocuments)isrecordedforeachquery.Thisprocessyieldsavectorof128numbers.Ifthenumberofhitsforaqueryisx,thenthecorrespondingelementinthevectorrislog(x+1).Severalauthorsreportthatthelogarithmictransformationoffre-quenciesimprovescosine-basedsimilaritymeasures(SaltonandBuckley,1988;Ruge,1992;Lin,1998b).
TurneyandLittman(2005)evaluatedtheVSMapproachbyitsperformanceon374SATanalogyquestions,achievingascoreof47%.Sincetherearefivechoicesforeachquestion,theexpectedscoreforrandomguessingis20%.Toansweramultiple-choiceanalogyquestion,vectorsarecreatedforthestempairandeachchoicepair,andthencosinesarecalculatedfortheanglesbetweenthestempairandeachchoicepair.Thebestguessisthechoicepairwiththehighestcosine.WeusethesamesetofanalogyquestionstoevaluateLRAinSection6.
TheVSMwasalsoevaluatedbyitsperformanceasadistance(nearness)mea-sureinasupervisednearestneighbourclassifierfornoun-modifiersemanticrelations(TurneyandLittman,2005).Theevaluationused600hand-labelednoun-modifierpairsfromNastaseandSzpakowicz(2003).Atestingpairisclassifiedbysearchingforits
12
SimilarityofSemanticRelationsTurney
singlenearestneighbourinthelabeledtrainingdata.Thebestguessisthelabelforthetrainingpairwiththehighestcosine.LRAisevaluatedwiththesamesetofnoun-modifierpairsinSection7.
TurneyandLittman(2005)usedtheAltaVistasearchenginetoobtainthefrequencyinformationrequiredtobuildvectorsfortheVSM.ThustheircorpuswasthesetofallwebpagesindexedbyAltaVista.Atthetime,theEnglishsubsetofthiscorpusconsistedofabout5×1011words.AroundApril2004,AltaVistamadesubstantialchangestotheirsearchengine,removingtheiradvancedsearchoperators.Theirsearchenginenolongersupportstheasteriskoperator,whichwasusedbyTurneyandLittman(2005)forstemmingandwild-cardsearching.AltaVistaalsochangedtheirpolicytowardsautomatedsearching,whichisnowforbidden.3
TurneyandLittman(2005)usedAltaVista’shitcount,whichisthenumberofdocu-ments(webpages)matchingagivenquery,butLRAusesthenumberofpassages(strings)matchingaquery.InourexperimentswithLRA(Sections6and7),weusealocalcopyof
theWaterlooMultiTextSystem(Clarke,Cormack,andPalmer,1998;TerraandClarke,2003),runningona16CPUBeowulfCluster,withacorpusofabout5×1010Englishwords.TheWaterlooMultiTextSystem(WMTS)isadistributed(multiprocessor)searchengine,designedprimarilyforpassageretrieval(althoughdocumentretrievalispossible,asaspecialcaseofpassageretrieval).Thetextandindexrequireapproximatelyoneter-abyteofdiskspace.AlthoughAltaVistaonlygivesaroughestimateofthenumberofmatchingdocuments,theWaterlooMultiTextSystemgivesexactcountsofthenumberofmatchingpassages.
Turneyetal.(2003)combine13independentmodulestoanswerSATquestions.TheperformanceofLRAsignificantlysurpassesthiscombinedsystem,butthereisnorealcontestbetweentheseapproaches,becausewecansimplyaddLRAtothecombination,asafourteenthmodule.SincetheVSMmodulehadthebestperformanceofthethirteenmodules(Turneyetal.,2003),thefollowingexperimentsfocusoncomparingVSMandLRA.
5.LatentRelationalAnalysis
LRAtakesasinputasetofwordpairsandproducesasoutputameasureoftherela-tionalsimilaritybetweenanytwooftheinputpairs.LRAreliesonthreeresources,asearchenginewithaverylargecorpusoftext,abroad-coveragethesaurusofsynonyms,andanefficientimplementationofSVD.
Wefirstpresentashortdescriptionofthecorealgorithm.Later,inthefollowingsubsections,wewillgiveadetaileddescriptionofthealgorithm,asitisappliedintheexperimentsinSections6and7.
•Givenwordinaseteachofwordwordpair.pairsForaseachinput,inputlookpair,inamakethesaurusalternateforsynonymspairsbyreplacingforeachtheoriginalwordswiththeirsynonyms.Thealternatepairsareintendedtoformnearanalogieswiththecorrespondingoriginalpairs(seeSection2.3).•Filterpairsoutthatalternateco-occurpairsrarelythatindothenotcorpus.formnearIntheanalogies,precedingbystep,droppingifasynonymalternatereplacedanambiguousoriginalword,butthesynonymcapturesthewrongsenseoftheoriginalword,itislikelythatthereisnosignificantrelationbetweenthewordsinthealternatepair,sotheywillrarelyco-occur.
ComputationalLinguisticsVolume1,Number1
•Foreachoriginalandalternatepair,searchinthecorpusforshortphrasesthatbeginwithonememberofthepairandendwiththeother.Thesephraseschar-acterizetherelationbetweenthewordsineachpair.
•Foreachphrasefromthepreviousstep,createseveralpatterns,byreplacingwordsinthephrasewithwildcards.
•Buildapair-patternfrequencymatrix,inwhicheachcellrepresentsthenumberoftimesthatthecorrespondingpair(row)appearsinthecorpuswiththecor-respondingpattern(column).Thenumberwillusuallybezero,resultinginasparsematrix.
•ApplytheSingularValueDecompositiontothematrix.Thisreducesnoiseinthematrixandhelpswithsparsedata.
•Supposethatwewishtocalculatetherelationalsimilaritybetweenanytwooftheoriginalpairs.Startbylookingforthetworowvectorsinthepair-patternfrequencymatrixthatcorrespondtothetwooriginalpairs.Calculatetheco-sineoftheanglebetweenthesetworowvectors.Thenmergethecosineofthetwooriginalpairswiththecosinesoftheircorrespondingalternatepairs,asfol-lows.Ifananalogyformedwithalternatepairshasahighercosinethantheoriginalpairs,weassumethatwehavefoundabetterwaytoexpresstheanal-ogy,butwehavenotsignificantlychangeditsmeaning.Ifthecosineislower,weassumethatwemayhavechangedthemeaning,byinappropriatelyreplac-ingwordswithsynonyms.Filteroutinappropriatealternatesbydroppingallanalogiesformedofalternates,suchthatthecosinesarelessthanthecosinefortheoriginalpairs.Therelationalsimilaritybetweenthetwooriginalpairsisthencalculatedastheaverageofalloftheremainingcosines.
Themotivationforthealternatepairsistohandlecaseswheretheoriginalpairsco-occurrarelyinthecorpus.Thehopeisthatwecanfindnearanalogiesfortheoriginalpairs,suchthatthenearanalogiesco-occurmorefrequentlyinthecorpus.Thedangeristhatthealternatesmayhavedifferentrelationsfromtheoriginals.Thefilteringstepsaboveaimtoreducethisrisk.
5.1InputandOutput
Inourexperiments,theinputsetcontainsfrom600to2,244wordpairs.Theoutputsimilaritymeasureisbasedoncosines,sothedegreeofsimilaritycanrangefrom−1(dissimilar;θ=180◦)to+1(similar;θ=0◦).BeforeapplyingSVD,thevectorsarecompletelynonnegative,whichimpliesthatthecosinecanonlyrangefrom0to+1,butSVDintroducesnegativevalues,soitispossibleforthecosinetobenegative,althoughwehaveneverobservedthisinourexperiments.
5.2SearchEngineandCorpus
Inthefollowingexperiments,weusealocalcopyoftheWaterlooMultiTextSystem(Clarke,Cormack,andPalmer,1998;TerraandClarke,2003).4Thecorpusconsistsofabout5×1010Englishwords,gatheredbyawebcrawler,mainlyfromUSacademicwebsites.Thewebpagescoveraverywiderangeoftopics,styles,genres,quality,andwritingskill.TheWMTSiswellsuitedtoLRA,becausetheWMTSscaleswelltolargecorpora(oneterabyte,inourcase),itgivesexactfrequencycounts(unlikemostwebsearchengines),itisdesignedforpassageretrieval(ratherthandocumentretrieval),andithasapowerfulquerysyntax.
SimilarityofSemanticRelationsTurney
5.3Thesaurus
Asasourceofsynonyms,weuseLin’s(1998a)automaticallygeneratedthesaurus.Thisthesaurus5isavailablethroughanonlineinteractivedemonstrationoritcanbedown-loaded.Weusedtheonlinedemonstration,sincethedownloadableversionseemstocontainfewerwords.Foreachwordintheinputsetofwordpairs,weautomaticallyquerytheonlinedemonstrationandfetchtheresultinglistofsynonyms.AsacourtesytootherusersofLin’sonlinesystem,weinserta20seconddelaybetweeneachquery.
Lin’sthesauruswasgeneratedbyparsingacorpusofabout5×107Englishwords,consistingoftextfromtheWallStreetJournal,SanJoseMercury,andAPNewswire(Lin,1998a).Theparserwasusedtoextractpairsofwordsandtheirgrammaticalre-lations.Wordswerethenclusteredintosynonymsets,basedonthesimilarityoftheirgrammaticalrelations.Twowordswerejudgedtobehighlysimilarwhentheytendedtohavethesamekindsofgrammaticalrelationswiththesamesetsofwords.Givenawordanditspartofspeech,Lin’sthesaurusprovidesalistofwords,sortedinor-derofdecreasingattributionalsimilarity.ThissortingisconvenientforLRA,sinceitmakesitpossibletofocusonwordswithhigherattributionalsimilarityandignoretherest.WordNet,incontrast,givenawordanditspartofspeech,providesalistofwordsgroupedbythepossiblesensesofthegivenword,withgroupssortedbythefrequenciesofthesenses.WordNet’ssortingdoesnotdirectlycorrespondtosortingbydegreeofattributionalsimilarity,althoughvariousalgorithmshavebeenproposedforderivingattributionalsimilarityfromWordNet(Resnik,1995;JiangandConrath,1997;BudanitskyandHirst,2001;BanerjeeandPedersen,2003).
5.4SingularValueDecomposition
WeuseRohde’sSVDLIBCimplementationoftheSingularValueDecomposition,whichisbasedonSVDPACKC(Berry,1992).6InLRA,SVDisusedtoreducenoiseandcom-pensateforsparseness.
5.5TheAlgorithm
WewillgothrougheachstepofLRA,usinganexampletoillustratethesteps.As-sumethattheinputtoLRAisthe374multiple-choiceSATwordanalogyquestionsofTurneyandLittman(2005).Sincetherearesixwordpairsperquestion(thestemandfivechoices),theinputconsistsof2,244wordpairs.Let’ssupposethatwewishtocalcu-latetherelationalsimilaritybetweenthepairquart:volumeandthepairmile:distance,takenfromtheSATquestioninTable6.TheLRAalgorithmconsistsofthefollowingtwelvesteps:
1.Findalternates:ForeachwordpairA:Bintheinputset,lookinLin’s(1998a)
thesaurusforthetopnum
simis10)thataremostsimilartoA.ForeachA′
thatissimilartoA,makeanewwordpairA′:B.Likewise,lookforthetopnum
simalternatepairs.Whenlookingforsimilarwords
inLin’s(1998a)thesaurus,avoidwordsthatseemunusual(e.g.,hyphenatedwords,wordswiththreecharactersorless,wordswithnon-alphabeticalchar-
ComputationalLinguisticsVolume1,Number1
Stem:quart:volume
Solution:(b)mile:distance
simalternates
asfollows.Foreachalternatepair,sendaquerytotheWMTS,tofindthefrequencyofphrasesthatbeginwithonememberofthepairandendwiththeother.Thephrasescannothavemorethanmax
phrase=5).Sortthealternatepairsbythefrequencyoftheirphrases.Selectthetopnum
filter=3,so17alternatesaredropped).Thissteptendsto
eliminatealternatesthathavenoclearsemanticrelation.ThethirdcolumninTable7showsthefrequencywithwhicheachpairco-occursinawindowofmax
phrasewordsandtheremustbeatleastoneword
betweenthetwomembersofthewordpair.Thesephrasesgiveusinformationaboutthesemanticrelationsbetweenthewordsineachpair.Aphrasewithnowordsbetweenthetwomembersofthewordpairwouldgiveusverylittleinformationaboutthesemanticrelations(otherthanthatthewordsoccurto-getherwithacertainfrequencyinacertainorder).Table8givessomeexamplesofphrasesinthecorpusthatmatchthepairquart:volume.
4.Findpatterns:Foreachphrasefoundinthepreviousstep,buildpatternsfromtheinterveningwords.Apatternisconstructedbyreplacinganyorallornoneoftheinterveningwordswithwildcards(onewildcardcanonlyreplaceoneword).Ifaphraseisnwordslong,therearen−2interveningwordsbetweenthemembersofthegivenwordpair(e.g.,betweenquartandvolume).Thusaphrasewithnwordsgenerates2(n−2)patterns.(Weusemax
patternsmostfrequent
patternsanddiscardtherest(weusenum
SimilarityofSemanticRelationsTurney
Wordpairpint:volumegallon:volumeliter:volumesquirt:volumepail:volumevial:volume
pumping:volumeounce:volumespoonful:volumetablespoon:volume
Similarity0.2100.1590.1220.0840.0840.0840.0730.0710.0700.069
Frequency
37215003323542837313864304296
Filteringstep
accept(topalternate)accept(topalternate)
accept(topalternate)
phrasewordscanappearinaphrase.
17
ComputationalLinguisticsVolume1,Number1
P=“in”P=“*of”P=“of*”P=“**”
anotherrowforB:A.Thiswillmakethematrixmoresymmetrical,reflectingourknowledgethattherelationalsimilaritybetweenA:BandC:DshouldbethesameastherelationalsimilaritybetweenB:AandD:C.ThisduplicationofrowsisexaminedinSection6.6.
6.Mappatternstocolumns:Createamappingofthetopnum
patterns
columnsinX.ThisduplicationofcolumnsisexaminedinSection6.6.7.Generateasparsematrix:GenerateamatrixXinsparsematrixformat,suit-ableforinputtoSVDLIBC.Thevalueforthecellinrowiandcolumnjisthefrequencyofthej-thpattern(seestep6)inphrasesthatcontainthei-thwordpair(seestep5).Table9givessomeexamplesofpatternfrequenciesforquart:volume.8.Calculateentropy:Applylogandentropytransformationstothesparsema-trix(LandauerandDumais,1997).Thesetransformationshavebeenfoundtobeveryhelpfulforinformationretrieval(Harman,1986;Dumais,1990).Letxi,jbethecellinrowiandcolumnjofthematrixXfromstep7.LetmbethenumberofrowsinXandletnbethenumberofcolumns.Wewishtoweightthecellxi,jbytheentropyofthej-thcolumn.Tocalculatetheentropyofthecolumn,weneedtoconvertthecolumnintoavectorofprobabilities.Letpi,jbetheprobabilityofxi,j,calculatedbynormalizingthecolumnvectorsothatmthesumoftheelementsisone,pi,j=xi,j/k=1xk,j.Theentropyofthej-th
m
columnisthenHj=−k=1pk,jlog(pk,j).Entropyisatitsmaximumwhenpi,jisauniformdistribution,pi,j=1/m,inwhichcaseHj=log(m).Entropyisatitsminimumwhenpi,jis1forsomevalueofiand0forallothervaluesofi,inwhichcaseHj=0.Wewanttogivemoreweighttocolumns(pat-terns)withfrequenciesthatvarysubstantiallyfromonerow(wordpair)tothenext,andlessweighttocolumnsthatareuniform.Thereforeweweightthecellxi,jbywj=1−Hj/log(m),whichvariesfrom0whenpi,jisuniformto1whenentropyisminimal.Wealsoapplythelogtransformationtofrequen-cies,log(xi,j+1).(Entropyiscalculatedwiththeoriginalfrequencyvalues,beforethelogtransformationisapplied.)Foralliandallj,replacetheorig-inalvaluexi,jinXbythenewvaluewjlog(xi,j+1).ThisisaninstanceoftheTF-IDF(TermFrequency-InverseDocumentFrequency)familyoftransfor-mations,whichisfamiliarininformationretrieval(SaltonandBuckley,1988;Baeza-YatesandRibeiro-Neto,1999):log(xi,j+1)istheTFtermandwjistheIDFterm.9.ApplySVD:Afterthelogandentropytransformationshavebeenappliedto
18
SimilarityofSemanticRelationsTurney
...FdenotestheFrobeniusnorm(GolubandVanLoan,1996).Wemaythink
T
ofthismatrixUkΣkVkasa“smoothed”or“compressed”versionoftheorig-inalmatrix.Inthesubsequentsteps,wewillbecalculatingcosinesforrowvectors.Forthispurpose,wecansimplifycalculationsbydroppingV.Thecosineoftwovectorsistheirdotproduct,aftertheyhavebeennormalizedtounitlength.ThematrixXXTcontainsthedotproductsofalloftherowvec-tors.Wecanfindthedotproductofthei-thandj-throwvectorsbylook-ingatthecellinrowi,columnjofthematrixXXT.SinceVTV=I,wehaveXXT=UΣVT(UΣVT)T=UΣVTVΣTUT=UΣ(UΣ)T,whichmeansthatwecancalculatecosineswiththesmallermatrixUΣ,insteadofusingX=UΣVT(Deerwesteretal.,1990).
10.Projection:CalculateUkΣk(weusek=300).Thismatrixhasthesamenumber
ofrowsasX,butonlykcolumns(insteadof2×num
thematrixX,runSVDLIBC.SVDdecomposesamatrixXintoaproductofthreematricesUΣVT,whereUandVareincolumnorthonormalform(i.e.,thecolumnsareorthogonalandhaveunitlength:UTU=VTV=I)andΣisadiagonalmatrixofsingularvalues(henceSVD)(GolubandVanLoan,1996).IfXisofrankr,thenΣisalsoofrankr.LetΣk,wherek ThematrixUkΣkVkisthematrixofrankkthatbestapproximatestheorig-inalmatrixX,inthesensethattheapproximationerrors.Thatitminimizesˆ=UkΣkVTminimizesˆ−Xˆofrankk,whereis,XXoverallmatricesX k F num Thereforewehave(num filter+1)versionsofA:B,theoriginaland filter+1)versionsofC:D. filter+1)2 cosines(inourexperiments,thereare16cosines).Forexample,supposeA:Bisquart:volumeandC:Dismile:distance.Table10givesthecosinesforthesixteencombinations. 12.Calculaterelationalsimilarity:TherelationalsimilaritybetweenA:BandC:D istheaverageofthecosines,amongthe(num ComputationalLinguisticsVolume1,Number1 WordpairsCosineCosine≥originalpairs hasslippedthroughthefilteringinstep2,althoughitisnotagoodalternateforquart:volume.However,Table10showsthatallfouranalogiesthatinvolvepumping:volumearedroppedhere,instep12. Steps11and12canberepeatedforeachtwoinputpairsthataretobecompared.ThiscompletesthedescriptionofLRA. Table11givesthecosinesforthesampleSATquestion.Thechoicepairwiththehighestaveragecosine(thechoicewiththelargestvalueincolumn#1),choice(b),isthesolutionforthisquestion;LRAanswersthequestioncorrectly.Forcomparison,column#2givesthecosinesfortheoriginalpairsandcolumn#3givesthehighestcosine.ForthisparticularSATquestion,thereisonechoicethathasthehighestcosineforallthreecolumns,choice(b),althoughthisisnottrueingeneral.Notethatthegapbetweenthefirstchoice(b)andthesecondchoice(d)islargestfortheaveragecosines(column#1).Thissuggeststhattheaverageofthecosines(column#1)isbetteratdiscriminatingthecorrectchoicethaneithertheoriginalcosine(column#2)orthehighestcosine(column#3). 6.ExperimentswithWordAnalogyQuestions Thissectionpresentsvariousexperimentswith374multiple-choiceSATwordanalogyquestions. 6.1BaselineLRASystem Table12showstheperformanceofthebaselineLRAsystemonthe374SATquestions,usingtheparametersettingsandconfigurationdescribedinSection5.LRAcorrectlyanswered210ofthe374questions.160questionswereansweredincorrectlyand4ques-tionswereskipped,becausethestempairanditsalternateswererepresentedbyzero 20 SimilarityofSemanticRelationsTurney AverageOriginalHighestcosinescosinescosinesStem:quart:volume #1#2#3 Solution:(b)mile:distance0.6770.5250.781 Algorithm PrecisionRecallFVeale(2004) 42.842.842.8bestattributionalsimilarity35.035.035.0randomguessing 20.020.020.0lowestco-occurrencefrequency16.816.816.8highestco-occurrencefrequency 11.8 11.8 11.8 ComputationalLinguisticsVolume1,Number1 StepDescriptionTimeH:M:SHardware Total209:49:36 Table14 LRAversusVSMwith374SATanalogyquestions. VSM-AVVSM-WMTSLRA176144210193196160534447.742.456.847.138.556.147.440.356.5 SimilarityofSemanticRelationsTurney System Recall(%correct)95%confidenceintervalforrecallHuman-level (57%) precisionbetweenLRAandthetwoVSMvariationsarealsosignificant,butthediffer-enceinprecisionbetweenthetwoVSMvariations(42.4%versus47.7%)isnotsignifi-cant.AlthoughVSM-AVhasacorpustentimeslargerthanLRA’s,LRAstillperformsbetterthanVSM-AV. ComparingVSM-AVtoVSM-WMTS,thesmallercorpushasreducedthescoreoftheVSM,butmuchofthedropisduetothelargernumberofquestionsthatwereskipped(34forVSM-WMTSversus5forVSM-AV).Withthesmallercorpus,manymoreoftheinputwordpairssimplydonotappeartogetherinshortphrasesinthecorpus.LRAisabletoanswerasmanyquestionsasVSM-AV,althoughitusesthesamecorpusasVSM-WMTS,becauseLin’sthesaurusallowsLRAtosubstitutesynonymsforwordsthatarenotinthecorpus. VSM-AVrequired17daystoprocessthe374analogyquestions(TurneyandLittman,2005),comparedto9daysforLRA.AsacourtesytoAltaVista,TurneyandLittman(2005)in-sertedafiveseconddelaybetweeneachquery.SincetheWMTSisrunninglocally,thereisnoneedfordelays.VSM-WMTSprocessedthequestionsinonlyoneday. 6.3HumanPerformance Theaverageperformanceofcollege-boundseniorhighschoolstudentsonverbalSATquestionscorrespondstoarecall(percentcorrect)ofabout57%(TurneyandLittman,2005).TheSATItestconsistsof78verbalquestionsand60mathquestions(thereisalsoanSATIItest,coveringspecificsubjects,suchaschemistry).Analogyquestionsareonlyasub-setofthe78verbalSATquestions.Ifweassumethatthedifficultyofour374analogyquestionsiscomparabletothedifficultyofthe78verbalSATIquestions,thenwecanestimatethattheaveragecollege-boundseniorwouldcorrectlyanswerabout57%ofthe374analogyquestions. Ofour374SATquestions,190arefromacollectionoftenofficialSATtests(Claman,2000).Onthissubsetofthequestions,LRAhasarecallof61.1%,comparedtoarecallof51.1%ontheother184questions.The184questionsthatarenotfromClaman(2000)seemtobemoredifficult.ThisindicatesthatwemaybeunderestimatinghowwellLRAper-forms,relativetocollege-boundseniorhighschoolstudents.Claman(2000)suggeststhattheanalogyquestionsmaybesomewhatharderthanotherverbalSATquestions,sowemaybeslightlyoverestimatingthemeanhumanscoreontheanalogyquestions. Table15givesthe95%confidenceintervalsforLRA,VSM-AV,andVSM-WMTS,calculatedbytheBinomialExactTest(Agresti,1990).ThereisnosignificantdifferencebetweenLRAandhumanperformance,butVSM-AVandVSM-WMTSaresignificantlybelowhuman-levelperformance. 23 ComputationalLinguisticsVolume1,Number1 Parameter simnum simphrasemax phrasefilternum filternum filterpatternsnum patternsnum patternsnum patterns Baseline Value515461 Step11222224444 Precision54.254.155.856.254.356.854.355.958.457.058.1 Recall53.553.555.155.653.756.153.755.357.856.457.5 F53.853.855.555.954.056.554.055.658.156.757.8 ⇒351000300050007000 6.4VaryingtheParametersinLRA ThereareseveralparametersintheLRAalgorithm(seeSection5.5).Theparametervaluesweredeterminedbytryingasmallnumberofpossiblevaluesonasmallsetofquestionsthatweresetaside.SinceLRAisintendedtobeanunsupervisedlearningalgorithm,wedidnotattempttotunetheparametervaluestomaximizetheprecisionandrecallonthe374SATquestions.WehypothesizedthatLRAisrelativelyinsensitivetothevaluesoftheparameters. Table16showsthevariationintheperformanceofLRAastheparametervaluesareadjusted.Wetakethebaselineparametersettings(giveninSection5.5)andvaryeachparameter,oneatatime,whileholdingtheremainingparametersfixedattheirbaselinevalues.Noneoftheprecisionandrecallvaluesaresignificantlydifferentfromthebaseline,accordingtotheFisherExactTest(Agresti,1990),atthe95%confidencelevel.Thissupportsthehypothesisthatthealgorithmisnotsensitivetotheparametervalues. AlthoughafullrunofLRAonthe374SATquestionstakesninedays,forsomeoftheparametersitispossibletoreusecacheddatafrompreviousruns.Welimited phrasebecausecachingwasnotashelpfulfortheexperimentswithnum theseparameters,soexperimentingwiththemrequiredseveralweeks. 24 SimilarityofSemanticRelationsTurney LRAbaselinesystem#1 LRAnoSVD#2LRAnosynonyms #3 LRAnoSVDnosynonyms #4 VSM-WMTS #5 6.5AblationExperiments Asmentionedintheintroduction,LRAextendstheVSMapproachofTurneyandLittman(2005)by(1)exploringvariationsontheanalogiesbyreplacingwordswithsynonyms(step1),(2)automaticallygeneratingconnectingpatterns(step4),and(3)smoothingthedatawithSVD(step9).Inthissubsection,weablateeachofthesethreecomponentstoas-sesstheircontributiontotheperformanceofLRA.Table17showstheresults. WithoutSVD(comparecolumn#1to#2inTable17),performancedrops,butthedropisnotstatisticallysignificantwith95%confidence,accordingtotheFisherExactTest(Agresti,1990).However,wehypothesizethatthedropinperformancewouldbesignificantwithalargersetofwordpairs.Morewordpairswouldincreasethesamplesize,whichwoulddecreasethe95%confidenceinterval,whichwouldlikelyshowthatSVDismakingasignificantcontribution.Furthermore,morewordpairswouldincreasethematrixsize,whichwouldgiveSVDmoreleverage.Forexample,LandauerandDumais(1997)applySVDtoamatrixofof30,473columnsby60,768rows,butourmatrixhereis8,000columnsby17,232rows.Wearecurrentlygather-ingmoreSATquestions,totestthishypothesis. Withoutsynonyms(comparecolumn#1to#3inTable17),recalldropssignificantly(from56.1%to49.5%),butthedropinprecisionisnotsignificant.Whenthesynonymcomponentisdropped,thenumberofskippedquestionsrisesfrom4to22,whichdemonstratesthevalueofthesynonymcomponentofLRAforcompensatingforsparsedata. WhenbothSVDandsynonymsaredropped(comparecolumn#1to#4inTable17),thedecreaseinrecallissignificant,butthedecreaseinprecisionisnotsignificant.Again,webelievethatalargersamplesizewouldshowthedropinprecisionissignificant. IfweeliminatebothsynonymsandSVDfromLRA,allthatdistinguishesLRAfromVSM-WMTSisthepatterns(step4).TheVSMapproachusesafixedlistof64patternstogenerate128dimensionalvectors(TurneyandLittman,2005),whereasLRAusesadynamicallygeneratedsetof4,000patterns,resultingin8,000dimensionalvectors.WecanseethevalueoftheautomaticallygeneratedpatternsbycomparingLRAwithoutsynonymsandSVD(column#4)toVSM-WMTS(column#5).Thedifferenceinbothpre-cisionandrecallisstatisticallysignificantwith95%confidence,accordingtotheFisherExactTest(Agresti,1990). Theablationexperimentssupportthevalueofthepatterns(step4)andsynonyms(step1)inLRA,butthecontributionofSVD(step9)hasnotbeenproven,althoughwebelievemoredatawillsupportitseffectiveness.Nonetheless,thethreecomponents 25 ComputationalLinguisticsVolume1,Number1 togetherresultina16%increaseinF(compare#1to#5). 6.6MatrixSymmetry Weknowapriorithat,ifA:B::C:D,thenB:A::D:C.Forexample,“masonistostoneascarpenteristowood”implies“stoneistomasonaswoodistocarpenter”.Thereforeagoodmeasureofrelationalsimilarity,simr,shouldobeythefollowingequation: simr(A:B,C:D)=simr(B:A,D:C) (8) Insteps5and6oftheLRAalgorithm(Section5.5),weensurethatthematrixXissymmetrical,sothatequation(8)isnecessarilytrueforLRA.ThematrixisdesignedsothattherowvectorforA:BisdifferentfromtherowvectorforB:Aonlybyapermutationoftheelements.ThesamepermutationdistinguishestherowvectorsforC:DandD:C.ThereforethecosineoftheanglebetweenA:BandC:DmustbeidenticaltothecosineoftheanglebetweenB:AandD:C(seeequation(7)). Todiscovertheconsequencesofthisdesigndecision,wealteredsteps5and6sothatsymmetryisnolongerpreserved.Instep5,foreachwordpairA:Bthatappearsintheinputset,weonlyhaveonerow.ThereisnorowforB:AunlessB:Aalsoappearsintheinputset.Thusthenumberofrowsinthematrixdroppedfrom17,232to8,616. Instep6,wenolongerhavetwocolumnsforeachpatternP,onefor“word1Pword2”andanotherfor“word2Pword1”.However,tobefair,wekeptthetotalnumberofcolumnsat8,000.Instep4,weselectedthetop8,000patterns(insteadofthetop4,000),distinguishingthepattern“word1Pword2”fromthepattern“word2Pword1”(in-steadofconsideringthemequivalent).ThusapatternPwithahighfrequencyislikelytoappearintwocolumns,inbothpossibleorders,butalowerfrequencypatternmightappearinonlyonecolumn,inonlyonepossibleorder. Thesechangesresultedinaslightdecreaseinperformance.Recalldroppedfrom56.1%to55.3%andprecisiondroppedfrom56.8%to55.9%.Thedecreaseisnotsta-tisticallysignificant.However,themodifiedalgorithmnolongerobeysequation(8).AlthoughdroppingsymmetryappearstocausenosignificantharmtotheperformanceofthealgorithmontheSATquestions,weprefertoretainsymmetry,toensurethatequation(8)issatisfied. Notethat,ifA:B::C:D,itdoesnotfollowthatB:A::C:D.Forexample,itisfalsethat“stoneistomasonascarpenteristowood”.Ingeneral(exceptwhenthesemanticrelationsbetweenAandBaresymmetrical),wehavethefollowinginequality: simr(A:B,C:D)=simr(B:A,C:D) (9) ThereforewedonotwantA:BandB:Atoberepresentedbyidenticalrowvectors,al-thoughitwouldensurethatequation(8)issatisfied. 6.7AllAlternatesversusBetterAlternates Instep12ofLRA,therelationalsimilaritybetweenA:BandC:Distheaverageofthecosines,amongthe(num SimilarityofSemanticRelationsTurney NCorrectIncorrectSkippedPrecisionRecallF 6.8InterpretingVectors SupposeawordpairA:BcorrespondstoavectorrinthematrixX.Itwouldbecon-venientifinspectionofrgaveusasimpleexplanationordescriptionoftherelationbetweenAandB.Forexample,supposethewordpairostrich:birdmapstotherowvectorr.Itwouldbepleasingtolookinrandfindthatthelargestelementcorrespondstothepattern“isthelargest”(i.e.,“ostrichisthelargestbird”).Unfortunately,inspec-tionofrrevealsnosuchconvenientpatterns. Wehypothesizethatthesemanticcontentofavectorisdistributedoverthewholevector;itisnotconcentratedinafewelements.Totestthishypothesis,wemodifiedstep10ofLRA.Insteadofprojectingthe8,000dimensionalvectorsintothe300dimen-T sionalspaceUkΣk,weusethematrixUkΣkVk.ThismatrixyieldsthesamecosinesasUkΣk,butpreservestheoriginal8,000dimensions,makingiteasiertointerpretthe T rowvectors.ForeachrowvectorinUkΣkVk,weselecttheNlargestvaluesandsetallothervaluestozero.TheideahereisthatwewillonlypayattentiontotheNmostimportantpatternsinr;theremainingpatternswillbeignored.Thisreducesthelengthoftherowvectors,butthecosineisthedotproductofnormalizedvectors(allvectorsarenormalizedtounitlength;seeequation(7)),sothechangetothevectorlengthshasnoimpact;onlytheangleofthevectorsisimportant.IfmostofthesemanticcontentisintheNlargestelementsofr,thensettingtheremainingelementstozeroshouldhaverelativelylittleimpact. Table18showstheperformanceasNvariesfrom1to3,000.TheprecisionandrecallaresignificantlybelowthebaselineLRAuntilN≥300(95%confidence,FisherExactTest).Inotherwords,foratypicalSATanalogyquestion,weneedtoexaminethetop300patternstoexplainwhyLRAselectedonechoiceinsteadofanother. WearecurrentlyworkingonanextensionofLRAthatwillexplainwithasinglepatternwhyonechoiceisbetterthananother.Wehavehadsomepromisingresults,butthisworkisnotyetmature.However,wecanconfidentlyclaimthatinterpretingthevectorsisnottrivial. 6.9ManualPatternsversusAutomaticPatterns TurneyandLittman(2005)used64manuallygeneratedpatternswhereasLRAuses4,000automaticallygeneratedpatterns.WeknowfromSection6.5thattheautomaticallygen-eratedpatternsaresignificantlybetterthanthemanuallygeneratedpatterns.Itmaybeinterestingtoseehowmanyofthemanuallygeneratedpatternsappearwithintheauto-maticallygeneratedpatterns.Ifwerequireanexactmatch,50ofthe64manualpatternscanbefoundintheautomaticpatterns.Ifwearelenientaboutwildcards,andcount 27 ComputationalLinguisticsVolume1,Number1 thepattern“notthe”asmatching“*notthe”(forexample),then60ofthe64manualpatternsappearwithintheautomaticpatterns.Thissuggeststhattheimprovementinperformancewiththeautomaticpatternsisduetotheincreasedquantityofpatterns,ratherthanaqualitativedifferenceinthepatterns. TurneyandLittman(2005)pointoutthatsomeoftheir64patternshavebeenusedbyotherresearchers.Forexample,Hearst(1992)usedthepattern“suchas”todis-coverhyponymsandBerlandandCharniak(1999)usedthepattern“ofthe”todiscovermeronyms.Bothofthesepatternsareincludedinthe4,000patternsautomaticallygen-eratedbyLRA. ThenoveltyinTurneyandLittman(2005)isthattheirpatternsarenotusedtomine textforinstancesofwordpairsthatfitthepatterns(Hearst,1992;BerlandandCharniak,1999);instead,theyareusedtogatherfrequencydataforbuildingvectorsthatrepresenttherelationbetweenagivenpairofwords.TheresultsinSection6.8showthatavectorcon-tainsmoreinformationthananysinglepatternorsmallsetofpatterns;avectorisadis-tributedrepresentation.LRAisdistinctfromHearst(1992)andBerlandandCharniak(1999)initsfocusondistributedrepresentations,whichitshareswithTurneyandLittman(2005),butLRAgoesbeyondTurneyandLittman(2005)byfindingpatternsautomatically. RiloffandJones(1999)andYangarber(2003)alsofindpatternsautomatically,buttheirgoalistominetextforinstancesofwordpairs;thesamegoalasHearst(1992)andBerlandandCharniak(1999).BecauseLRAusespatternstobuilddistributedvec-torrepresentations,itcanexploitpatternsthatwouldbemuchtoonoisyandunreli-ableforthekindoftextmininginstanceextractionthatistheobjectiveofHearst(1992),BerlandandCharniak(1999),RiloffandJones(1999),andYangarber(2003).ThereforeLRAcansimplyselectthehighestfrequencypatterns(step4inSection5.5);itdoesnot needthemoresophisticatedselectionalgorithmsofRiloffandJones(1999)andYangarber(2003).7.ExperimentswithNoun-ModifierRelations Thissectiondescribesexperimentswith600noun-modifierpairs,hand-labeledwith30classesofsemanticrelations(NastaseandSzpakowicz,2003).Inthefollowingexper-iments,LRAisusedwiththebaselineparametervalues,exactlyasdescribedinSec-tion5.5.NoadjustmentsweremadetotuneLRAtothenoun-modifierpairs.LRAisusedasadistance(nearness)measureinasinglenearestneighboursupervisedlearningalgorithm. 7.1ClassesofRelations Thefollowingexperimentsusethe600labelednoun-modifierpairsofNastaseandSzpakowicz(2003).ThisdatasetincludesinformationaboutthepartofspeechandWordNetsynset(syn-onymset;i.e.,wordsensetag)ofeachword,butouralgorithmdoesnotusethisinfor-mation. Table19liststhe30classesofsemanticrelations.ThetableisbasedonAppendixAofNastaseandSzpakowicz(2003),withsomesimplifications.Theoriginaltablelistedseveralsemanticrelationsforwhichtherewerenoinstancesinthedataset.Thesewererelationsthataretypicallyexpressedwithlongerphrases(threeormorewords),ratherthannoun-modifierwordpairs.Forclarity,wedecidednottoincludetheserelationsinTable19. Inthistable,HrepresentstheheadnounandMrepresentsthemodifier.Forex-ample,in“fluvirus”,theheadnoun(H)is“virus”andthemodifier(M)is“flu”(*).InEnglish,themodifier(typicallyanounoradjective)usuallyprecedestheheadnoun.Inthedescriptionofpurpose,Vrepresentsanarbitraryverb.In“concerthall”,thehallisforpresentingconcerts(Vis“present”)orholdingconcerts(Vis“hold”)(†). 28 SimilarityofSemanticRelationsTurney NastaseandSzpakowicz(2003)organizedtherelationsintogroups.Thefivecap-italizedtermsinthe“Relation”columnofTable19arethenamesoffivegroupsofsemanticrelations.(Theoriginaltablehadasixthgroup,buttherearenoexamplesofthisgroupinthedataset.)Wemakeuseofthisgroupinginthefollowingexperiments. 7.2BaselineLRAwithSingleNearestNeighbour Thefollowingexperimentsusesinglenearestneighbourclassificationwithleave-one-outcross-validation.Forleave-one-outcross-validation,thetestingsetconsistsofasin-glenoun-modifierpairandthetrainingsetconsistsofthe599remainingnoun-modifiers.Thedatasetissplit600times,sothateachnoun-modifiergetsaturnasthetestingwordpair.Thepredictedclassofthetestingpairistheclassofthesinglenearestneighbourinthetrainingset.Asthemeasureofnearness,weuseLRAtocalculatetherelationalsimilaritybetweenthetestingpairandthetrainingpairs.Thesinglenearestneighbouralgorithmisasupervisedlearningalgorithm(i.e.,itrequiresatrainingsetoflabeleddata),butweareusingLRAtomeasurethedistancebetweenapairanditspotentialneighbours,andLRAisitselfdeterminedinanunsupervisedfashion(i.e.,LRAdoesnotneedlabeleddata). EachSATquestionhasfivechoices,soanswering374SATquestionsrequiredcal-culating374×5×16=29,920cosines.Thefactorof16comesfromthealternatepairs,step11inLRA.Withthenoun-modifierpairs,usingleave-one-outcross-validation,eachtestpairhas599choices,soanexhaustiveapplicationofLRAwouldrequirecalculat-ing600×599×16=5,750,400cosines.Toreducetheamountofcomputationre-quired,wefirstfindthe30nearestneighboursforeachpair,ignoringthealternatepairs(600×599=359,400cosines),andthenapplythefullLRA,includingthealternates,tojustthose30neighbours(600×30×16=288,000cosines),whichrequirescalculatingonly359,400+288,000=647,400cosines. Thereare600wordpairsintheinputsetforLRA.Instep2,introducingalternatepairsmultipliesthenumberofpairsbyfour,resultingin2,400pairs.Instep5,foreachpairA:B,weaddB:A,yielding4,800pairs.However,somepairsaredroppedbecausetheycorrespondtozerovectorsandafewwordsdonotappearinLin’sthesaurus.Thesparsematrix(step7)has4,748rowsand8,000columns,withadensityof8.4%. FollowingTurneyandLittman(2005),weevaluatetheperformancebyaccuracyandalsobythemacroaveragedFmeasure(Lewis,1991).Macroaveragingcalculatestheprecision,recall,andFforeachclassseparately,andthencalculatestheaverageacrossallclasses.Microaveragingcombinesthetruepositive,falsepositive,andfalsenegativecountsforalloftheclasses,andthencalculatesprecision,recall,andFfromthecombinedcounts.Macroaveraginggivesequalweighttoallclasses,butmicroaver-aginggivesmoreweighttolargerclasses.Weusemacroaveraging(givingequalweighttoallclasses),becausewehavenoreasontobelievethattheclasssizesinthedatasetreflecttheactualdistributionoftheclassesinarealcorpus. Classificationwith30distinctclassesisahardproblem.Tomakethetaskeasier,wecancollapsethe30classesto5classes,usingthegroupingthatisgiveninTable19.Forexample,agentandbeneficiarybothcollapsetoparticipant.Onthe30classproblem,LRAwiththesinglenearestneighbouralgorithmachievesanaccuracyof39.8%(239/600)andamacroaveragedFof36.6%.Alwaysguessingthemajorityclasswouldresultinanaccuracyof8.2%(49/600).Onthe5classproblem,theaccuracyis58.0%(348/600)andthemacroaveragedFis54.6%.Alwaysguessingthemajorityclasswouldgiveanaccuracyof43.3%(260/600).Forboththe30classand5classproblems,LRA’saccuracyissignificantlyhigherthanguessingthemajorityclass,with95%confidence,accordingtotheFisherExactTest(Agresti,1990). 29 ComputationalLinguisticsVolume1,Number1 Relationcauseeffectpurposedetraction Abbr.cseffprpdetr Examplephrasefluvirus(*)examanxietyconcerthall(†)headachepill Description HmakesMoccurorexist,Hisnecessaryandsufficient MmakesHoccurorexist,MisnecessaryandsufficientHisforV-ingM,Mdoesnotnecessarilyoccurorexist HopposesM,HisnotsufficienttopreventM HoccurseverytimeMoccursHoccurswhenMoccurs HexistedwhileMexisted,Misanintervaloftime HisdirectedtowardsM,MisnotthefinalpointHisthelocationofMHislocatedatMHoriginatesatM MperformsH,MisanimateornaturalphenomenonMbenefitsfromHHusesM MisacteduponbyH frequencytimeat timethroughfreqtattthrdailyexercisemorningexercisesix-hourmeeting directionlocationlocationatlocationfromagent beneficiaryinstrumentobject objectproperty dirloclatlfragbeninstobjobj outgoingmailhometowndesertstormforeigncapitalstudentproteststudentdiscountlaserprintermetalseparator QUALITY 30 SimilarityofSemanticRelationsTurney VSM-AVVSM-WMTSLRA VSM-AVVSM-WMTSLRA 7.3LRAversusVSM Table20showstheperformanceofLRAandVSMonthe30classproblem.VSM-AVisVSMwiththeAltaVistacorpusandVSM-WMTSisVSMwiththeWMTScorpus.TheresultsforVSM-AVaretakenfromTurneyandLittman(2005).Allthreepairwisediffer-encesinthethreeFmeasuresarestatisticallysignificantatthe95%level,accordingtothePairedT-Test(FeeldersandVerkooijen,1995).TheaccuracyofLRAissignificantlyhigherthantheaccuraciesofVSM-AVandVSM-WMTS,accordingtotheFisherExactTest(Agresti,1990),butthedifferencebetweenthetwoVSMaccuraciesisnotsignifi-cant. Table21comparestheperformanceofLRAandVSMonthe5classproblem.TheaccuracyandFmeasureofLRAaresignificantlyhigherthantheaccuraciesandFmea-suresofVSM-AVandVSM-WMTS,butthedifferencesbetweenthetwoVSMaccuraciesandFmeasuresarenotsignificant.8.Discussion TheexperimentalresultsinSections6and7demonstratethatLRAperformssignifi-cantlybetterthantheVSM,butitisalsoclearthatthereisroomforimprovement.Theaccuracymightnotyetbeadequateforpracticalapplications,althoughpastworkhas shownthatitispossibletoadjustthetradeoffofprecisionversusrecall(TurneyandLittman,2005).Forsomeoftheapplications,suchasinformationextraction,LRAmightbesuitableifitisadjustedforhighprecision,attheexpenseoflowrecall. Anotherlimitationisspeed;ittookalmostninedaysforLRAtoanswer374analogyquestions.However,withprogressincomputerhardware,speedwillgraduallybecome 31 ComputationalLinguisticsVolume1,Number1 lessofaconcern.Also,thesoftwarehasnotbeenoptimizedforspeed;thereareseveralplaceswheretheefficiencycouldbeincreasedandmanyoperationsareparallelizable.ItmayalsobepossibletoprecomputemuchoftheinformationforLRA,althoughthiswouldrequiresubstantialchangestothealgorithm. ThedifferenceinperformancebetweenVSM-AVandVSM-WMTSshowsthatVSMissensitivetothesizeofthecorpus.AlthoughLRAisabletosurpassVSM-AVwhentheWMTScorpusisonlyaboutonetenththesizeoftheAVcorpus,itseemslikelythatLRAwouldperformbetterwithalargercorpus.TheWMTScorpusrequiresoneterabyteofharddiskspace,butprogressinhardwarewilllikelymaketenorevenonehundredterabytesaffordableintherelativelynearfuture. Fornoun-modifierclassification,morelabeleddatashouldyieldperformanceim-provements.With600noun-modifierpairsand30classes,theaverageclasshasonly20examples.Weexpectthattheaccuracywouldimprovesubstantiallywithfiveortentimesmoreexamples.Unfortunately,itistimeconsumingandexpensivetoacquirehand-labeleddata. Anotherissuewithnoun-modifierclassificationisthechoiceofclassificationschemeforthesemanticrelations.The30classesofNastaseandSzpakowicz(2003)mightnotbethebestscheme.Otherresearchershaveproposeddifferentschemes(Vanderwende,1994;BarkerandSzpakowicz,1998;RosarioandHearst,2001;Rosario,Hearst,andFillmore,2002).Itseemslikelythatsomeschemesareeasierformachinelearningthanothers.Forsomeapplications,30classesmaynotbenecessary;the5classschememaybesufficient. LRA,likeVSM,isacorpus-basedapproachtomeasuringrelationalsimilarity.Pastworksuggeststhatahybridapproach,combiningmultiplemodules,somecorpus-based,somelexicon-based,willsurpassanypurebredapproach(Turneyetal.,2003).Infuturework,itwouldbenaturaltocombinethecorpus-basedapproachofLRAwiththelexicon-basedapproachofVeale(2004),perhapsusingthecombinationmethodofTurneyetal.(2003). TheSingularValueDecompositionisonlyoneofmanymethodsforhandlingsparse,noisydata.WehavealsoexperimentedwithNonnegativeMatrixFactorization(NMF)(LeeandSeung,1999),ProbabilisticLatentSemanticAnalysis(PLSA)(Hofmann,1999),KernelPrincipalComponentsAnalysis(KPCA)(Scholkopf,Smola,andMuller,1997),andIterativeScaling(IS)(Ando,2000).Wehadsomeinterestingresultswithsmallma-trices(around2,000rowsby1,000columns),butnoneofthesemethodsseemedsub-stantiallybetterthanSVDandnoneofthemscaleduptothematrixsizesweareusinghere(e.g.,17,232rowsand8,000columns;seeSection6.1). Instep4ofLRA,wesimplyselectthetopnum SimilarityofSemanticRelationsTurney (1)thepatternsaregenerateddynamicallyfromthecorpus,(2)SVDisusedtosmooththedata,and(3)athesaurusisusedtoexplorevariationsofthewordpairs.WiththeWMTScorpus(about5×1010Englishwords),LRAachievesanFof56.5%,whereastheFofVSMis40.3%. Wehavepresentedseveralexamplesofthemanypotentialapplicationsformea-suresofrelationalsimilarity.Justasattributionalsimilaritymeasureshaveproventohavemanypracticaluses,weexpectthatrelationalsimilaritymeasureswillsoonbe-comewidelyused.Gentneretal.(2001)arguethatrelationalsimilarityisessentialtounderstandingnovelmetaphors(asopposedtoconventionalmetaphors).Manyre-searchershavearguedthatmetaphoristheheartofhumanthinking(LakoffandJohnson,1980;HofstadterandtheFluidAnalogiesResearchGroup,1995;Gentneretal.,2001;French,2002).Webelievethatrelationalsimilarityplaysafundamentalroleinthemindandthereforerelationalsimilaritymeasurescouldbecrucialforartificialintelligence. Infuturework,weplantoinvestigatesomepotentialapplicationsforLRA.Itispos-siblethattheerrorrateofLRAisstilltoohighforpracticalapplications,butthefactthatLRAmatchesaveragehumanperformanceonSATanalogyquestionsisencouraging. 33 ComputationalLinguisticsAcknowledgments Volume1,Number1 ThankstoMichaelLittmanforsharingthe374SATanalogyquestionsandforinspiringmetotacklethem.ThankstoViviNastaseandStanSzpakowiczforsharingtheir600classifiednoun-modifierphrases.ThankstoEgidioTerra,Char-lieClarke,andtheSchoolofComputerScienceoftheUniversityofWaterloo,forgivingusa copyoftheWaterlooMultiTextSystemandtheir[BerlandandCharniak1999]Berland,TerabyteCorpus.ThankstoDekangLinformak-MatthewandEugeneCharniak.1999. inghisDependency-BasedWordSimilaritylex-Findingpartsinverylargecorpora.In iconavailableonline.ThankstoDougRohdeProceedingsofthe37thAnnualMeetingforSVDLIBCandMichaelBerryforSVDPACK.oftheAssociationforComputationalThankstoTedPedersenformakinghisWord-Linguistics(ACL’99),pages57–64, Net::Similaritypackageavailable.ThankstoJoelNewBrunswick,NJ.Martinforcommentsonthepaper.Thanksto theanonymousreviewersofComputationalLin-[Berry1992]Berry,MichaelW.1992.Largeguisticsfortheirveryhelpfulcommentsandsug-scalesingularvaluecomputations.In-gestions.ternationalJournalofSupercomputerAp- andPeteWhitelock,editors,Proceed-ingsoftheThirty-SixthAnnualMeet-ingoftheAssociationforComputational LinguisticsandSeventeenthInternationalConferenceonComputationalLinguistics(COLING-ACL’98),pages96–102,SanFrancisco,California.MorganKauf-mannPublishers. References [Agresti1990]Agresti,Alan.1990.Categor-icalDataAnalysis.Wiley.[Ando2000]Ando,RieKubota.2000.La-tentsemanticspace:Iterativescalingimprovesprecisionofinter-documentsimilaritymeasurement.InProceed-ingsofthe23rdAnnualACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval(SIGIR-2000),pages216–223. plications,6(1):13–49. [BudanitskyandHirst2001]Budanitsky, AlexanderandGraemeHirst.2001.Semanticdistanceinwordnet:Anexperimental,application-orientedevaluationoffivemeasures.InPro-ceedingsoftheWorkshoponWordNetandOtherLexicalResources,SecondMeetingoftheNorthAmericanChapteroftheAssociationforComputationalLin-guistics(NAACL-2001),pages29–24,Pittsburgh,PA. [Baeza-YatesandRibeiro-Neto1999][Chiarelloetal.1990]Chiarello,Christine, Baeza-Yates,RicardoA.andCurtBurgess,LorieRichards,andBerthierA.Ribeiro-Neto.1999.AlmaPollock.1990.Semanticandas-ModernInformationRetrieval.ACMsociativepriminginthecerebralhemi-Press.spheres:Somewordsdo,somewords don’t...sometimes,someplaces.Brain [BanerjeeandPedersen2003]Banerjee,Sa-andLanguage,38:75–104. tanjeevandTedPedersen.2003.Ex-tendedglossoverlapsasameasureof [Claman2000]Claman,Cathy.2000.10 semanticrelatedness.InProceedingsof RealSATs.CollegeEntranceExamina-theEighteenthInternationalJointConfer-tionBoard. enceonArtificialIntelligence(IJCAI-03),pages805–810,Acapulco,Mexico. [Clarke,Cormack,andPalmer1998] Clarke,CharlesL.A.,GordonV.Cor-[BarkerandSzpakowicz1998]Barker,Ken mack,andChristopherR.Palmer.andStanSzpakowicz.1998.Semi-1998.Anoverviewofmultitext.ACMautomaticrecognitionofnounmodi-SIGIRForum,32(2):14–15.fierrelationships.InChristianBoitet 34 SimilarityofSemanticRelationsTurney [Daganzo1994]Daganzo,CarlosF.1994. Thecelltransmissionmodel:Ady-namicrepresentationofhighwaytraf-ficconsistentwiththehydrodynamictheory.TransportationResearchPartB:Methodological,28(4):269–287. [Deerwesteretal.1990]Deerwester, ScottC.,SusanT.Dumais,ThomasK.Landauer,GeorgeW.Furnas,and RichardA.Harshman.1990.Indexing[FeeldersandVerkooijen1995]Feelders,bylatentsemanticanalysis.JournalAdandWilliamVerkooijen.1995.oftheAmericanSocietyforInformationWhichmethodlearnsthemostfromScience(JASIS),41(6):391–407.data?Methodologicalissuesinthe analysisofcomparativestudies.In [Dolan1995]Dolan,WilliamB.1995. FifthInternationalWorkshoponArti-Metaphorasanemergentpropertyof ficialIntelligenceandStatistics,pages machine-readabledictionaries.InPro-219–225,Ft.Lauderdale,Florida. ceedingsoftheAAAI1995SpringSympo-siumSeries:RepresentationandAcquisi-[Fellbaum1998]Fellbaum,Christiane,edi-tionofLexicalKnowledge:Polysemy,Am-tor.1998.WordNet:AnElectronicLexi-biguityandGenerativity,pages27–32.calDatabase.MITPress.[Dumais1990]Dumais,SusanT.1990.En-[French2002]French,RobertM.2002.Thehancingperformanceinlatentseman-computationalmodelingofanalogy-ticindexing(LSI)retrieval.Techni-making.TrendsinCognitiveSciences, calReportTM-ARH-017527,Bellcore, 6(5):200–205. Morristown,NJ. [Gentner1983]Gentner,Dedre.1983. [Dumais1993]Dumais,SusanT.1993. Structure-mapping:Atheoretical Latentsemanticindexing(LSI)and frameworkforanalogy.Cognitive TREC-2.InD.K.Harman,editor,Pro-Science,7(2):155–170. ceedingsoftheSecondTextREtrieval Conference(TREC-2),pages105–115.[Gentneretal.2001]Gentner,Dedre,BrianNationalInstituteofStandardsandBowdle,PhillipWolff,andConsueloTechnology.Boronat.2001.Metaphorislike[Falkenhainer1990]Falkenhainer,Brian.1990.Analogicalinterpretationincontext.InProceedingsoftheTwelfthAnnualConferenceoftheCognitiveScienceSociety,pages69–76.LawrenceErlbaumAssociates. analogy.InDedreGentner,KeithJ.Holyoak,andBoichoN.Kokinov,ed-itors,TheAnalogicalMind:PerspectivesfromCognitiveScience,pages199–253,Cambridge,MA.MITPress. Inferringsemanticsimilarityfromdistributionalevidence:Ananalogy-basedapproachtowordsensedisambiguation.InProceedingsoftheACL/EACLWorkshoponAutomaticInformationExtractionandBuildingofLexicalSemanticResourcesforNLPApplications,pages90–97,Madrid,Spain. [GildeaandJurafsky2002]Gildea,Daniel andDanielJurafsky.2002.Automatic[Falkenhainer,Forbus,andGentner1989] labelingofsemanticroles.Computa-Falkenhainer,Brian,KennethD.For-tionalLinguistics,28(3):245–288.bus,andDedreGentner.1989.The structure-mappingengine:Algorithm andexamples.ArtificialIntelligence,[Girju,Badulescu,andMoldovan2003] Girju,Roxana,AdrianaBadulescu,41(1):1–63. andDanI.Moldovan.2003.Learning [Federici,Montemagni,andPirrelli1997]semanticconstraintsfortheautomatic Federici,Stefano,SimonettaMon-discoveryofpart-wholerelations. temagni,andVitoPirrelli.1997.InProceedingsoftheHumanLanguage 35 ComputationalLinguisticsVolume1,Number1 TechnologyConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics(HLT-NAACL2003),pages80–87. [Goldenberg2005]Goldenberg,David. 2005.Theemperor’snewclothes:Undressingthenewandunimprovedsat.GelfMag-azine,March.http://www.gelf-magazine.com/mt/archives/the new SimilarityofSemanticRelationsTurney [Lauer1995]Lauer,Mark.1995.Design-[Marxetal.2002]Marx,Zvika,IdoDagan, ingStatisticalLanguageLearners:Exper-JoachimBuhmann,andEliShamir. imentsonCompoundNouns.Ph.D.the-2002.Coupledclustering:Amethod sis,MacquarieUniversity.fordetectingstructuralcorrespon-dence.JournalofMachineLearningRe-[LeacockandChodorow1998]Leacock,search,3:747–780.ClaudiaandMartinChodorow. 1998.Combininglocalcontextand[Medin,Goldstone,andGentner1990] Medin,DouglasL.,RobertL.Gold-WordNetsimilarityforwordsense stone,andDedreGentner.1990.identification.InChristianeFellbaum, Similarityinvolvingattributesandeditor,WordNet:AnElectronicLexical relations:JudgmentsofsimilarityDatabase,pages265–283.MITPress. anddifferencearenotinverses. [LeeandSeung1999]Lee,DanielD.andPsychologicalScience,1(1):64–69. H.SebastianSeung.1999.Learn-Dan,ingthepartsofobjectsbynonnegative[Moldovanetal.2004]Moldovan, AdrianaBadulescu,MartaTatu,matrixfactorization.Nature,401:788– DanielAntohe,andRoxanaGirju.791. 2004.Modelsforthesemanticclassifi-cationofnounphrases.InProceedings[Lesk1969]Lesk,MichaelE.1969.Word-oftheComputationalLexicalSemanticswordassociationsindocumentre-WorkshopatHLT-NAACL2004,pagestrievalsystems.AmericanDocumenta-60–67,Boston,MA.tion,20(1):27–38.[Lesk1986]Lesk,MichaelE.1986.Auto-[MorrisandHirst1991]Morris,JaneandGraemeHirst.1991.Lexicalcohesionmaticsensedisambiguationusingma-computedbythesauralrelationsasanchinereadabledictionaries:Howto indicatorofthestructureoftext.Com-tellapineconefromaicecreamcone. putationalLinguistics,17(1):21–48.InProceedingsofACMSIGDOC’86, pages24–26. [NastaseandSzpakowicz2003]Nastase, ViviandStanSzpakowicz.2003.[Lewis1991]Lewis,DavidD.1991.Eval-Exploringnoun-modifierseman-uatingtextcategorization.InPro-ticrelations.InFifthInternationalceedingsoftheSpeechandNaturalLan-WorkshoponComputationalSemanticsguageWorkshop,pages312–318,Asilo-(IWCS-5),pages285–301,Tilburg,Themar,CA.MorganKaufmann. Netherlands. [Lin1998a]Lin,Dekang.1998a.Auto-[PantelandLin2002]Pantel,Patrickand maticretrievalandclusteringofsimi-DekangLin.2002.Discoveringword larwords.InProceedingsofthe36thAn-sensesfromtext.InProceedingsof nualMeetingoftheAssociationforCom-ACMSIGKDDConferenceonKnowledge putationalLinguisticsandthe17thIn-DiscoveryandDataMining,pages613– ternationalConferenceonComputational 619. Linguistics(COLING-ACL’98),pages768–774,Montreal,Canada.[Radaetal.1989]Rada,Roy,HafedhMili, EllenBicknell,andMariaBlettner. [Lin1998b]Lin,Dekang.1998b.An1989.Developmentandapplicationof information-theoreticdefinitionofametriconsemanticnets.IEEETrans-similarity.InProceedingsofthe15thactionsonSystems,Man,andCybernet-InternationalConferenceonMachineics,19(1):17–30.Learning(ICML’98),pages296–304. Bob,M.E.MorganKaufmann,SanFrancisco,[Rehderetal.1998]Rehder, Schreiner,MichaelB.W.Wolfe,DarrellCA. 37 ComputationalLinguisticsVolume1,Number1 Laham,ThomasK.Landauer,andmationbyComputer.Addison-Wesley,WalterKintsch.1998.UsinglatentReading,MA.semanticanalysistoassessknowl-edge:Sometechnicalconsiderations.[SaltonandBuckley1988]Salton,Gerard andChrisBuckley.1988.Term-DiscourseProcesses,25:337–354. weightingapproachesinautomatictextretrieval.InformationProcessing[Reitman1965]Reitman,WalterR.1965. andManagement,24(5):513–523.CognitionandThought:AnInformation ProcessingApproach.JohnWileyand [SaltonandMcGill1983]Salton,Gerard Sons,NewYork,NY. andMichaelJ.McGill.1983.Intro-ductiontoModernInformationRetrieval.[Resnik1995]Resnik,Philip.1995.Us-McGraw-Hill,NewYork,NY.inginformationcontenttoevaluatesemanticsimilarityinataxonomy.[Scholkopf,Smola,andMuller1997] InProceedingsofthe14thInternationalScholkopf,Bernhard,AlexanderJ.JointConferenceonArtificialIntelligenceSmola,andKlaus-RobertMuller.(IJCAI-95),pages448–453,SanMateo,1997.KernelprincipalcomponentCA.MorganKaufmann.analysis.InProceedingsoftheIn-ternationalConferenceonArtificial[RiloffandJones1999]Riloff,Ellenand NeuralNetworks(ICANN-1997),pages RosieJones.1999.Learningdictio-583–588,Berlin. nariesforinformationextractionby multi-levelbootstrapping.InProceed-[TerraandClarke2003]Terra,EgidioandingsoftheSixteenthNationalConferenceCharlesL.A.Clarke.2003.FrequencyonArtificialIntelligence(AAAI-99),estimatesforstatisticalwordsimilar-pages474–479.itymeasures.InProceedingsofthe HumanLanguageTechnologyandNorth [RosarioandHearst2001]Rosario,Barbara AmericanChapterofAssociationofCom-andMartiHearst.2001.Classify-putationalLinguisticsConference2003 ingthesemanticrelationsinnoun-(HLT/NAACL2003),pages244–251. compoundsviaadomain-specificlexi-calhierarchy.InProceedingsofthe2001[Turney2001]Turney,PeterD.2001.Min-ConferenceonEmpiricalMethodsinNat-ingtheWebforsynonyms:PMI-IR uralLanguageProcessing(EMNLP-01),versusLSAonTOEFL.InProceed-pages82–90.ingsoftheTwelfthEuropeanConference onMachineLearning,pages491–502, [Rosario,Hearst,andFillmore2002]Berlin.Springer. Rosario,Barbara,MartiHearst, 2002.andCharlesFillmore.2002.The[Turney2002]Turney,PeterD. Thumbsuporthumbsdown?Se-descentofhierarchy,andselectionin manticorientationappliedtounsuper-relationalsemantics.InProceedingsof visedclassificationofreviews.InPro-the40thAnnualMeetingoftheAssocia-ceedingsofthe40thAnnualMeetingoftionforComputationalLinguistics(ACL theAssociationforComputationalLin-’02),pages417–424,Philadelphia,PA. guistics(ACL’02),pages417–424. [Ruge1992]Ruge,Gerda.1992.Experi-mentsonlinguistically-basedtermas-[Turney2005]Turney,PeterD.2005.Mea-suringsemanticsimilaritybylatentsociations.InformationProcessingand relationalanalysis.InProceedingsofManagement,28(3):317–332. theNineteenthInternationalJointCon-[Salton1989]Salton,Gerard.1989.Au-ferenceonArtificialIntelligence(IJCAI-tomaticTextProcessing:TheTransfor-05),pages1136–1141,Edinburgh,Scot-mation,Analysis,andRetrievalofInfor-land. 38 SimilarityofSemanticRelationsTurney [TurneyandLittman2005]Turney,PeterD. JournalofMachineLearningResearch,andMichaelL.Littman.2005.3:1083–1106. Corpus-basedlearningofanalogiesandsemanticrelations.MachineLearn-ing,60(1–3):251–278.[Turneyetal.2003]Turney,PeterD., MichaelL.Littman,JeffreyBigham,andVictorShnayder.2003.Com-biningindependentmodulestosolvemultiple-choicesynonymandanal-ogyproblems.InProceedingsoftheInternationalConferenceonRecentAd-vancesinNaturalLanguageProcessing(RANLP-03),pages482–489,Borovets,Bulgaria.[Vanderwende1994]Vanderwende,Lucy. 1994.Algorithmforautomaticin-terpretationofnounsequences.InProceedingsoftheFifteenthInternationalConferenceonComputationalLinguistics,pages782–788,Kyoto,Japan.[Veale2003]Veale,Tony.2003.Theanalogi-calthesaurus.InProceedingsofthe15thInnovativeApplicationsofArtificialIn-telligenceConference(IAAI2003),pages137–142,Acapulco,Mexico.[Veale2004]Veale,Tony.2004.WordNet sitstheSAT:Aknowledge-basedap-proachtolexicalanalogy.InPro-ceedingsofthe16thEuropeanConferenceonArtificialIntelligence(ECAI2004),pages606–612,Valencia,Spain.[Yangarber2003]Yangarber,Roman.2003. Counter-trainingindiscoveryofse-manticpatterns.InProceedingsofthe41stAnnualMeetingoftheAssocia-tionforComputationalLinguistics(ACL-2003),pages343–350,Sapporo,Japan.[Yarowsky1993]Yarowsky,David.1993. Onesensepercollocation.InPro-ceedingsoftheARPAHumanLanguageTechnologyWorkshop,pages266–271,Princeton,NJ.[Zelenko,Aone,andRichardella2003] Zelenko,Dmitry,ChinatsuAone,andAnthonyRichardella.2003.Ker-nelmethodsforrelationextraction. 39 因篇幅问题不能全部显示,请点此查看更多更全内容