Author, Institution: Voldemaras Žitkus, Kaunas University of Technology
Science area, field of science: Technological Sciences, Informatics Engineering, T007
Scientific Advisor: Prof. Dr. Rita Butkienė (Kaunas University of Technology, Technological Sciences, Informatics Engineering, T007)
Dissertation Defense Board of Informatics Engineering Science Field:
Prof. Dr. Tomas Skersys (Kaunas University of Technology, Technological Sciences, Informatics Engineering, T007) – chairperson
Prof. Dr. Nikolaj Goranin (Vilnius Gediminas Technical University, Technological Sciences, Informatics Engineering, T007)
Prof. Dr. Diana Kalibatienė (Vilnius Gediminas Technical University, Technological Sciences, Informatics Engineering, T007)
Prof. Dr. Tomas Krilavičius (Vytautas Magnus University, Natural Sciences, Informatics, N009)
Dr. Zbigniew Marszałek (Silesian University of Technology, Poland, Natural Sciences, Informatics, N009)
Dissertation defense meeting will be at Rectorate Hall of Kaunas University of Technology (K. Donelaičio 73–402, Kaunas)
The doctoral dissertation is available at the library of Kaunas University of Technology (Gedimino g. 50, Kaunas)
Annotation: When automatically extracting information from text, it is important to identify different mentions of the same entity or object and to aggregate the semantic information near these mentions. When different mentions of the same object in the text refer to the same object – they corefer. Process of determining these relationships (usually with the help of computer programs) is called coreference resolution. Most coreference resolution approaches were developed for English and other major languages. Small languages, such as Lithuanian, did not receive attention. This dissertation analyses the problem of coreference resolution for English, Lithuanian and languages related to Lithuanian. The resolution of coreferences for the Lithuanian language is approached in a comprehensive way in order to create not only a solution for coreference resolution, but also opportunities for creation of new solutions and their improvement. A four-level annotation scheme has been created, indicating what and how should be resolved, while allowing to save more linguistic information. The first Lithuanian coreference corpus (LCC) was created. A rule-based method for solving coreferences for the Lithuanian language was developed. The rules are tested on related languages and formalized using first-order predicate logic. The created text and resolution methods use the created annotation scheme. The proposed new evaluation methodology utilizes the advantages of the created annotation scheme. When evaluating the resolution approaches, it allows to take into account not only the resolved coreferences, but also the quality of the resolution.
June 17 d. 10:00
Rectorate Hall at Kaunas University of Technology (K. Donelaičio 73-402, Kaunas)
add to iCal