WordNet

One of the most important projects in natural language processing over the years has been construction of English WordNet orPrinceton WordNet in Princeton University under direction of George A. Miller. English words in four categories noun, verb, adjective and adverb have been grouped into sets of cognitive synonyms that are called Synset. Synsets in PWN are also interlinked together by means of conceptual semantics and lexical relations. Each English word may locate in several synsets in PWN that is realized as a sense of that word. Different researches on natural language processing and information retrieval revealed that PWN structure can be used in order to improve performance of such tasks. In recent years, some efforts have been made in order to create a WordNet for Persian language. These projects have tried to construct Persian WordNet in the manually, semi-automatic and automatic manner.FarsNet is a semi-automatic generated Persian WordNet that had made by Natural Language Processing research laboratory of Shahid Beheshti University. Since, FarsNet does not cover many Persian words; our purpose in this project is constructing large scale Persian WordNet using automatic approaches. Statistical information extracted from corpus and some heuristics have used in our experiments.

Automatic Persian WordNet Construction

Manual construction of WordNet is a time consuming task and requires linguistic knowledge. The estimation of the average time for building a lexical entry depends on the polysemy of the words in the synsets, on the available lexical resources and definitely on the WordNet building tools. Thus automated approaches for WordNet construction or enrichment have been proposed to facilitate faster, cheaper and easier development. In our experiments on Persian language, by using knowledge extracted from corpora, word sense disambiguation methods and some heuristics, links between Persian words and Princeton WordNet's synsets were introduced automatically.

Publications

  • Montazery, m., & Faili, H. (2010). Automatic Persian WordNet Construction. the 23rd International conference on computational linguistics (pp. 846-850).
  • Montazery, m., & Faili, H. (2011). Unsupervised Learning for Persian WordNet Construction. Proceedings of the International Conference RANLP-2011.

University of Tehran

NLP lab university of Tehran

Natural Language and Text Processing Laboratory at University of Tehran has been involved in the research in computational linguistics, focusing on language parsing, text mining, machine translation, word sense disambiguation, Spell and Grammar Checking, Summarization, and Statistical parsing, mainly on Persian language.

FarsNet project

FarsNet is the first published Persian WordNet which organized about 17,000 Persian words and 10,000 synsets in different hierarchical structure and contains nouns, adjectives and verbs.

Princeton WordNet

WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. WordNet's structure makes it a useful tool for computational linguistics and natural language processing.