On going projects

SUMO — Automatic Text Summarization for Mobile Technologies

Description: While Mobile Technologies (PDAs, Tablet PCs, Mobile Phones) are rapidly emerging in our every day life, their usage is still unmanageable for a variety of applications, even for the most basic ones like reading contents from the web. Indeed, the electronic miniaturization tendency leads to propose more and more applications like internet access for smaller and smaller devices. However, these devices are not prepared to present WYSIWYG web pages contents. Indeed, they still can not display all the information available on the web such as it is and, as a consequence, need automatic summarization techniques for content presentation. It is clear that such Human Computer Interfaces are gap for the growth of these new technologies. Unfortunately, the development of such interfaces has emerged left aside from the research community thus only emerging few relevant papers on the topic since 2001. Our project aims at fulfilling this lack.

Duration: 2005–2007

Leader: Gaël Harry Dias

Participants: Pavel Brázdil (Univeristy of Porto), Simão Melo de Sousa (Univeristy of Beira Interior), João Paulo Cordeiro (Univeristy of Beira Interior), Райчо Мукелов (Univeristy of Beira Interior), Румен Моралийски (Univeristy of Beira Interior)

SITE O MATIC — Web Automation and Adaptive Web

Description: The Web currently poses a number of interesting research problems. From the User’s point of view, the Web is becoming too large, too dynamic and increasingly unknown. From the Edito’s point of view, the Web is a constant demand for new information and timely updates. Moreover, the Editor should not only maintain the contents of the site, but also permanently choose the site’s navigational structure that best helps achieving the aims of the site’s Owner, User, or both. From the Owner’s point of view the need for such a constant labour intensive effort implies very high financial or personal costs. In this project we aim to develop a platform and a methodology for automating, as much as possible, the management activities of a Web site, taking into account the behaviour of the Users, and the aims of the Owner. One of the effects of automation is the reduction of the Editor’s effort, and consequently of the costs for the Owner. The other effect is that the site can more timely adapt to the behaviour of the User, improving the browsing experience and helping the user in achieving his/her own goals when these are in accordance to the goals of the Owner of the site.

Duration: 2005–2007

Leader: Alípio Jorge (University of Porto)

Participants: Gaël Harry Dias, José Paulo Leal (University of Porto), Carlos Soares (University of Porto), José Luís Borges (University of Porto), Nuno Ecudeiro (Instituto Politécnico do Porto), Mário Alves (Universidade Aberta), Jorge Morais (University of Minho), Ricardo Campos (University of Beira Interior), Hugo Veiga (University of Beira Interior)

Funding agency: Fundação para a Ciência e a Tecnologia (Portugal)

Reference: POSC/EIA/58367/2004

SEASON — Semantics and Structure of Languages

Description: The current project aims at gathering know how from two different fields, computational linguistics and formal methods, in order to tackle the problem of the semantics of languages, in particular natural languages. Language modelization raises interesting mathematical and computational problems: (Ⅰ) the ever growing volume of texts (e.g. texts from the internet) and the dynamicity of encountered natural languages impose systematic and structured treatment of an ocean of (eventually unstructured) information; (Ⅱ) while morphology and syntax have shown successful research results (e.g. automatic summarization, machine translation, etc.), semantics still remains an open problem. One reason for this situation was the lack of advanced logical methodologies and tools. This situation has changed. Indeed, the research activity in Formal Methods and Computational Logic proposes a set of powerful methodologies and tools that allow the mathematical modelling and reasoning of computational systems, in particular formal languages. As a proof of its maturity, formal methods have successfully been applied to industry (e.g. reliability of critical systems, Common Criteria, security evaluation, etc.). Nevertheless, extending and applying these methodologies and tools still remains a strategic and theoretic challenge, in particular applying them to natural language processing.In summary, we aim at developing methodologies and tools in both areas of natural language processing and formal methods and study their connections.

Duration: 2003–2005 (extended until 2007)

Leader: Gaël Harry Dias

Participants: Simão Melo de Sousa (University of Beira Interior)

Funding agency: Fundação para a Ciência e a Tecnologia (Portugal)

Concluded projects

LEILA — Learning Lexical Associations

Description: Lexical Associations include a large range of linguistic phenomena, such as compound nouns (e.g. interior designer), phrasal verbs (e.g. run through), adverbial locutions (e.g. on purpose), compound determinants (e.g. an amount of), prepositional locutions (e.g. in front of) and institutionalized phrases (e.g. con carne). In fact, lexical associations are frequently used in everyday language, usually to precisely express ideas and concepts that cannot be compressed into a single word. As a consequence, their identification is a crucial issue for applications that require some degree of semantic processing (e.g. Information Retrieval, Machine Translation, and Summarization). In recent years, there has been a growing awareness in the Natural Language Processing (NLP) community of the problems that lexical associations pose and the need for their robust handling. For that purpose, syntactical, statistical and hybrid syntaxico statistical methodologies have been proposed. However, few works have attempted to tackle this problem through the machine learning paradigm. This project aims at responding to this situation by introducing machine learning techniques in order to identify and classify lexical associations from texts.

Duration: 2004–2005

Leader: Gaël Harry Dias and Sylvie Billot (University of Orléans, France)

Participants: Guillaume Cleuziou (University of Orléans, France), Christel Vrain (University of Orléans, France), Lionel Martin (University of Orléans, France), Cláudia Santos (University of Beira Interior)

Funding agency: CRUP — Conselho de Reitores das Universidades Portuguesas (Portugal)

Reference: F‒20/05

MULTILEXI — Multilingual Term Extraction for Lexicographic Purposes

Description: Multilingual terminology resources are indispensable for the development of multilingual language technologies such as machine translation, cross language information retrieval systems and other language based applications. In a time of rapid and global development of technological domains the creation of efficient and up to date terminological resources is inevitably supported by appropriate computational resources and tools. Therefore, the aim of the project is the development of robust tools for bilingual terminology extraction from domain specific parallel corpora that could be applied to lexicographic and terminological purposes on a broad scale. The basis for the proposed co operation are past interchanges between Slovenian and Portuguese experts, including joint publications and presentations at international conferences as well as research co operation within various projects.

Duration: 2004–2005

Leader: Gaël Harry Dias and Špela Vintar (University of Ljubljana, Slovenia)

Funding agency: GRICES — Gabinete de Relaçôes Internacionais da Ciência e do Ensino Superior (Portugal)