HULTIG

Human Language Technology Information Group

The Center for Human Language Technology and Bioinformatics (HULTIG) is a research group of the Department of Informatics, University of Beira Interior. Over time, we have worked on a variety of topics related to the automatic processing of human language, with particular focus on the application side of them. Among the various sub-domains, we have devoted special attention to Information Search, Mining and Text Extraction, Automatic Summarization, Automatic Plagiarism Detection, Feeling Analysis in Text, Lexical Semantics, Similarity Textual Alignment, Text Aesthetic Characterization.

Besides a majority concern with application and technology, we also consider the more theoretical and conceptual aspects of the study of human language, especially computational linguistics. In this sense, we maintain collaboration with other groups and research units, in a spirit of constant openness to multi-disciplinarity, knowing that it is at the intersection of knowledge that the most interesting solutions to the most complex problems often emerge.


Historical Context

The Center for Human Language Technology and Bioinformatics (HULTIG) was founded in 2003 by Gaël Harry dias, then Assistant Professor at the Department of Informatics of the University of Beira Interior (UBI), Covilhã, Portugal. Dr. Gaël Dias remained in charge of the center until June 2011, when he moved to the University of Caens Basse-Normandie, France, where he began his teaching and research work. He currently runs the HULTECH - GREYC group there.
Since July 2011, HULTIG has been directed by Dr. João Paulo Cordeiro, professor at the Computer Department of the UBI, who was one of the first members of HULTIG since 2004.
Over time HULTIG has promoted cutting-edge research in the fields of Natural Language Processing, in areas such as Text Search, Information Retrieval and Extraction, Automatic Summarization, Cognitive Sciences and Bioinformatics. It also promotes interdisciplinary and applied research, having collaborated with HULTIG relevant national and international research institutions, such as: Carnegie Mellon University, North Texas University, Massashusetts Institute of Technology, Microsoft, Caens University, Porto University, INESC-TEC, among others.
The wealth of any institution lies in the people who populate it. Even those who pass by leave something of themselves and always take something with them. They contribute to our scientific and cultural enrichment and develop as well. As Antoine de Saint-Exupéry said:

"(...) Those who pass by do not go alone, they do not leave us alone. They leave a little of themselves, they take a little of us (...)"

Several researchers and students have collaborated significantly with HULTIG over time. Several have concluded their Masters and Doctorates here. Others have worked or work in research lines and programs. All of them, without exception, have been very important and so here is a small mural of the people who have been HULTIG:
Professors and Researchers: Célia Nunes, Fátima Simões, Gaël Dias, João Cordeiro, Paulo Osório, Ricardo Campos, Sebastião Pais.
Doctoral Degrees: Dinko Lambov, Isabel Marcelino, João Cordeiro, José Moreno, Ricardo Campos, Raicho Mukelov, Rumen Moraliyski, Sebastião Pais.
Masters: Ângelo Santos, Bruno Fernandes, Cláudia Santos, Daniel Rodrigues, David Machado, Elsa Alves, João Gouveia, Henrique Mendes, Leopoldo Ismael, Manuel Lourenço, Nuno Guimarães, Sérgio Costa, Sebastião Pais, Ruben Costa, Sónia Bastos, Tiago Barbosa
Students: Bruno Conde, Bruno Martins, Cármen Barroso, Daniel Malaca, Fernando Cunha, Hélio Santos, Hugo Costa, José Chantre, Luís Almeida, Victor Gonçalves.


Publications



Books

  • Dias, G., Sousa, S. Crochemore, M. (2006). Special issue of the TAL journal on Scaling Natural Language Processing: Complexity, Algorithms, and Architectures. Hermes Science Publications.
  • Bento, C., Cardoso, A. Dias, G. (2005). Proceedings of the 12th Portuguese Conference on Artificial Intelligence. Progress in Artificial Intelligence Serie. SpringerLNAI 3008. ISSN: 0302?9743.
  • Bento, C., Cardoso, A. Dias, G. (2005). Proceedings of the 2005 Portuguese Conference on Artificial Intelligence. IEEE Press. ISBN: 078039365.
  • Jones, R. Dias, G. (2005). Methodologies and Evaluation of Lexical Cohesion Techniques in Real-world Applications — Beyond Bag of Words. In association with ACM editions. ISBN: 1595930345.
  • Dias, G., Lopes, J.G.L. Vintar, Š. (2004). Methodologies and Evaluation of Multiword Units in Real-world Applications. ELRA editions. ISBN: 2951740816. EAN: 0782951740815.

Chapters in Books

  • Dias, G., Nunes C., Cordeiro J.P., Moraliyski, R. Marcelino, I., Mukelov Raycho, Campos R., Santos, C., Alves, E., Conde, Bruno and Nonchev B. (2006). Language Independent Methodologies to Tackle Multilinguality. In Readings in Multilinguality. Selected Papers from Young Researchers in BIS-21++. Galia Angelova, Kiril Simov, Milena Slavcheva (Editors). Incoma Ltd. Shoumen, Bulgaria.
  • Dias, G. and Alves, E.. (2006). Multilingual Topic Segmentation. In Special Volume of the International Symposium on Social Communication edited by Eloina Miyares Bermudez and Leonel Ruiz Miyares. Cambridge Scholars Press, UK.
  • Dias, G. (2005) Extracção Automática de Unidades Polilexicais para o Português. In "A Língua Portuguesa no Computador". Tony Beber Sardinha (Editors). Mercado de Letras. Collection "As Faces da Linguística Aplicada". ISBN: 8575910442.
  • Jones, R. Dias, G. (2005). Methodologies and Evaluation of Lexical Cohesion Techniques in Real-world Applications — Beyond Bag of Words. In association with ACM editions. ISBN: 1595930345.
  • Dias, G., Madeira, S., Lopes, J.G.P. (2005). Extracting Concepts from Dynamic Legislative Text Collections. In "Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora". Geoff Barnbrook, Pernilla Danielsson Michaela Mahlberg (Editors). ISBN: 0826491812. Continuum International Publishing Group.

Master's Thesis

  • Fernandes, B. (2014). Sumarização Personalizada e Subjetiva de Texto. Faculdade de Engenharia, Universidade da Beira Interior.
  • Ismael, L. (2014). Processamento Automático de Requisitos de Software Crítico, Expressos Em Linguagem Natural. Faculdade de Engenharia, Universidade da Beira Interior.
  • Mendes, H. (2013). Similaridade Documental e Deteção de Plágio. Faculdade de Engenharia, Universidade da Beira Interior.
  • Santos, A. (2012). Sumarização Automática de Texto. Faculdade de Engenharia, Universidade da Beira Interior.
  • Santos, C. (2005). Alexia: Acquisition of Lexical Chains for Text Summarization. Universidade da Beira Interior.
  • Campos, R. (2005). Agrupamento Automático de Páginas Web Utilizando Técnicas de Web Content Mining. Universidade da Beira Interior.
  • Campos, R. (2005). Agrupamento Automático de Páginas Web Utilizando Técnicas de Web Content Mining. Universidade da Beira Interior.

Doctorate Thesis

  • Moraliyski, R. (2013). Discovery of Word Semantic Relations based on Sentential Context Analysis. Universidade da Beira Interior. Júri: Antoine Doucet (University of Caen Basse Normandie, França), José Gabriel Lopes (New University of Lisbon, Portugal), António Branco (University of Lisbon, Portugal), Pablo Gamallo (University of Santiago de Compostela, Spain).
  • Lambov, D. (2011). Cross Domain Multi-View Sentiment Classification. Universidade da Beira Interior. Júri: Mohand Boughanem (University Paul Sabatier, France), Pavel Brazdil (University of Porto, Portugal), Nuno Marques (New University of Lisbon, Portugal), João Graça (INESC-ID, Portugal), Veska Noncheva (University of Plovdiv, Bulgaria) and Abel Gomes (University of Beira Interior, Portugal)
  • Cordeiro, J. (2011). Rule Induction for Sentence Reduction. Universidade da Beira Interior. Júri: Marie-Francine Moens (Catholic University of Leuven, Belgium), Paulo Quaresma (University of Evora, Portugal), Luísa Coheur (Technical University of Lisbon, Portugal) and Abel Gomes (University of Beira Interior, Portugal).
  • Dias, G. (2002). Extraction Automatique d’Associations Lexicales à Partir de Corpora. Universidade Nova de Lisboa e Universidade de Orléans (França).

Journals

  • Brazdil, P., Trigo, L., Cordeiro, J., Sarmento, R., & Valizadeh, M. (2015). Affinity mining of documents sets via network analysis, keywords and summaries. Oslo Studies in Language, 7(1).
  • Dias, G., Moraliyski, R., Cordeiro, J.P., Doucet, A., Ahonen-Myka, H. (2010). Automatic Discovery of Word Semantic Relations using Paraphrase Alignment and Distributional Lexical Semantics Analysis. In Journal of Natural Language Engineering. Special Issue on Distributional Lexical Semantics. (Guest Eds) Roberto Basisli Marco Pennacchiotti. Volume 16, issue 04, pp. 439-467. Cambridge University Press. ISSN 1351-3249.
  • Campos, R., Dias, G., Nunes, C. & Nonchev, B. (2008). Clustering of Web Page Search Results: A Full Text Based Approach. In International Journal of Computer and Information Science (IJCIS). International Association for Computer and Information Science. Vol. 9 (4). December 2008. ISSN: 1525-9293.
  • Cordeiro, J.P., Dias, G. Cleuziou G.   Brazdil P. (2007). New Functions for Unsupervised Asymmetrical Paraphrase Detection. In Journal of Software. Volume:2, Issue:4, Page(s): 12-23. Academy Publisher. Finland. ISSN: 1796-217X. October 2007.
  • Dias, G., Nunes C., Cordeiro J.P., Moraliyski, R. Marcelino, I., Mukelov Raycho, Campos R., Santos, C., Alves, E., Conde, Bruno and Nonchev B. (2006). Language Independent Methodologies to Tackle Multilinguality. In Readings in Multilinguality. Selected Papers from Young Researchers in BIS-21++. Galia Angelova, Kiril Simov, Milena Slavcheva (Editors). Incoma Ltd. Shoumen, Bulgaria.

Conferences

2015
  • Cordeiro, J., Inácio, P., Fernandes, D. (2015). Fractal Beauty in Text. Progress in Artificial Intelligence, 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, LNAI (To appear), Springer Verlag.
  • Felipe, B., Cordeiro, J. (2015). Automatic detection of plagiarism in two acts. Computer Symposium - INForum 2015. (To appear)
  • Sebastião País, Gaël Dias, Rumen Moraliyski, João Cordeiro, Unsupervised and Language-Independent Method to Recognize Textual Entailment by Generality, CLIB - Proceedings of the First International Conference Computational Linguistics in Bulgaria, vol.1, no.1, pp. 82-90, setembro, 2014
  • Cordeiro, J.P., Dias, G., Brazdil, P., (2013). Rule Induction for Sentence Reduction. Progress in Artificial Intelligence, 16th Portuguese Conference on Artificial Intelligence, EPIA 2013, LNAI 8154, pp. 528--539, Springer Verlag 2013.
2010
  • Grigonyté, G., Cordeiro, J.P., Moraliyski, R., Dias, G., & Brazdil, P. (2010). A Paraphrase Alignment for Synonym Evidence Discovery. 23rd International Conference on Computational Linguistics (COLING 2010). Beijing, China, August 23-27.
  • Cleuziou, G., Dias, G. & Levorato, V. (2010). Pretopological Modeling for Semantic-Lexical Structuring. 17th Meeting of the Francophone Classification Society (SFC 2010). Saint-Denis de la Réunion, France, June 9-11.
  • Campos, R., Dias, G., Nunes, C. & Nonchev, B. (2008). Clustering of Web Page Search Results: A Full Text Based Approach. In International Journal of Computer and Information Science (IJCIS). International Association for Computer and Information Science. Vol. 9 (4). December 2008. ISSN: 1525-9293.
2009
  • Campos, R. (2009). The Regional Information System of Digital Cities and Regions in the infrastructure side. In Proceedings of the Portuguese Information Systems Association (CAPSI 2009) Viseu, Portugal, 28 – 30 October.
  • Dias, G., Pais, S., Cunha, F., Costa, H., Machado, D., Barbosa, T. & Martins, B. (2009). Hierarchical Soft Clustering and Automatic Text Summarization for Accessing the Web on Mobile Devices for Visually Impaired People. 22nd International FLAIRS Conference (FLAIRS 2009). Sanibel Island, USA. May 19 – 21.
  • Machado, D., Barbosa, T., Pais, S., Martins, B & Dias, G. (2009). Universal Mobile Information Retrieval. 13th International Conference on Human Computer Interaction (HCII 2009). San Diego, USA. July 19 – 24.
  • Lambov, D., Dias, G. & Noncheva, V. (2009). High Level Features for Learning Subjective Language. 3rd International AAAI Conference on Weblogs and Social Media (ICWSM 2009). San José, USA. May 17 – 20.
  • Dias, G. & Moraliyski, R. (2009). Relieving Polysemy Problem for Synonymy Detection. In Proceedings of 14th Portuguese Conference in Artificial Intelligence (EPIA 2009), Aveiro, Portugal, October 12-15.
  • Lambov, D., Dias, G. & Noncheva, V. (2009). Sentiment Classification across Domains. In Proceedings of 14th Portuguese Conference in Artificial Intelligence (EPIA 2009), Aveiro, Portugal, October 12-15.
  • Cordeiro, J.P., Dias, G. & Brazdil, P. (2009). Unsupervised Induction of Sentence Compression Rules. In Proceedings of the Workshop on Language Generation and Summarization associated to the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL/IJCNLP 2009). Singapore, Singapore, August 6.
  • Campos, R., Dias, G. & Jorge, A. (2009). Disambiguating Web Search Results by Topic and Temporal Clustering: A Proposal. In Proceedings of International Conference on Knowledge Discovery and Information Retrieval (KDIR 2009), Funchal, Portugal, October 6-8.
2008
  • Campos, R. (2008). Definition of a Framework to Evaluate the Business, Tourism and Municipal Mayoral of Digital Cities Portals. In Proceedings of the. Portuguese Information Systems Association (CAPSI 2008) Setubal, Portugal, 29 – 31 October.
  • Marques, C., Campos, R. and Carvalho, A. (2008). Search-based Learning Objects Metadata. In JAV Iturbide, FJG Peñalvo and ABG González (Eds.). X International Symposium on Educational Informatics Salamanca, Spain, 01 - 03 October. ISBN: 978-84-7800-312-9
  • Dias, G., Mukelov, R. & Cleuziou, G. (2008). Unsupervised Graph-Based Discovery of General-Specific Noun Relationships from Web Corpora Frequency Counts. 12th International Conference on Natural Language Learning (CoNLL 2008). Manchester, UK. August 16-17.
  • Dias, G., Mukelov, R., Cleuziou, G. (2008). Unsupervised Learning of General-Specific Noun Relations from the Web. 21th International FLAIRS Conference (FLAIRS 2008). AAAI Press. Coconut Grove, Florida, USA.
  • Dias, G., Mukelov, R., Cleuziou, G. & Noncheva, V. (2008). Semantic Similarities and General-Specific Noun Relations from the Web. In proceedings of the Workshop on “Sematic Similarity Measurements” of the 8èmes Journées Francophones' Extraction et Gestion des Connaissances (EGC 2008). INRIA Sophia Antipolis, France, January 29.
  • Marcelino, I., Dias, G., Casteleiro, J. & Martinez, J. (2008). Semi-Controlled Construction of the European Portuguese Unified Medical Language System. In proceedings of the Workshop on Finding the Hidden Knowledge: Text Mining for Biology and Medicine (FTHK 2008). Glasgow, Scotland, February 21-22.
2007
  • Moralyiski R., & Dias, G. (2007). One Sense per Discourse for Synonym Detection. International Conference On Recent Advances in Natural Language Processing (RANLP 2007). Borovets, Bulgaria, September 27-29. ISBN: 978-954 -91743-7-3. pp. 383-387.
  • Dias, G., Alves, E. & Pereira Lopes, G. (2007). Topic Segmentation Algorithms for Text Summarization and Passage Retrieval: An Exhaustive Evaluation. 22nd AAAI Conference on Artificial Intelligence. AAAI Press. Vancouver, British Columbia, Canada.
  • Cordeiro, J.P., Dias, G. & Cleuziou G. (2007). Biology Based Alignments of Paraphrases for Sentence Compression. In Proceedings of the Workshop on Textual Entailment and Paraphrasing (ACL-PASCAL / ACL2007). Prague, Czech Republic. ISBN: 978-1-932432-88-6.
  • Cordeiro, J.P., Dias, G. & Brazdil, P. (2007). Learning Paraphrases from WNS Corpora. 20th International FLAIRS Conference. AAAI Press. Key West, Florida, USA.

  • Cordeiro, J.P., Dias, G. & Brazdil, P. (2007). A Metric for Paraphrase Detection. 2nd International Multi-Conference on Computing in the Global Information Technology. IEEE Computer Society Press. Guadeloupe, France.
  • Moralyiski R., & Dias, G. (2007). Combination of Global and Local Attributional Similarities for Synonym Detection. In A Pliska Studia Mathematica Bulgarica Journal. N. Yanev (eds). Vol 18. ISSN: 02049805. pp. 239-254.
  • Dias, G. & Conde, B. (2007). Towards Web Browsing for Visually Impaired People. 4th International Conference on Information Technology: New Generations. IEEE Computer Society Press.
  • Campos, R. (2007). As Bibliotecas Digitais e os Motores de Busca: novos Sistemas de Informação no Contexto da Preservação Digital. In ACM-DL Proceedings of the EATIS 2007 – Euro American Conference on Telematics and Information Systems. Faro, Portugal, May 14 – 17. ACM-DL. ISBN 978-1-59593-598-4.
  • Campos, R. & Marques, C. (2007). A Evolução e o Futuro do Governo Electrónico. In ACM-DL Proceedings of the EATIS 2007 – Euro American Conference on Telematics and Information Systems. Faro, Portugal, May 14 – 17. ACM-DL. ISBN 978-1-59593-598-4.
  • Marques, C., Gestosa, V. and Campos, R. (2007). A case study with high school students about e-Gov in Portugal. In ACM-DL Proceedings of the EATIS 2007 – Euro American Conference on Telematics and Information Systems Faro, Portugal, 14 – 17 May. ACM-DL. ISBN 978-1-59593-598-4.
2006
  • Campos, R., Dias, G. and Nunes, C. (2006). WISE: Hierarchical Soft Clustering of Web Page Search Results based on Web Content Mining Techniques. 2006 IEEE/WIC/ACM International Conference on Intelligent Agent Technology (WIC 2006). Hong Kong, 18-22 December. IEEE Computer Society Press. ISBN:0-7695-2747-7. pp. 301-305.
  • Campos, R. & Marques, C. (2006). O Governo Electrónico e os Sistemas de Informação Públicos em Portugal. In Proceedings of the 1.ª Conferência Ibérica de Sistemas e Tecnologias de Informação 2006. Ofir, Portugal, 21 -23 July. pp. 421-438 (Volume I). ISBN:978-989-20-0271-2.
  • Dias, G., Nunes C., Cordeiro J.P., Moraliyski, R. Marcelino, I., Mukelov Raycho, Campos R., Santos, C., Alves, E., Conde, Bruno and Nonchev B. (2006). Language Independent Methodologies to Tackle Multilinguality. In Readings in Multilinguality. Selected Papers from Young Researchers in BIS-21++. Galia Angelova, Kiril Simov, Milena Slavcheva (Editors). Incoma Ltd. Shoumen, Bulgaria.
  • Dias, G. & Conde, B. (2006). Efficient Text Summarization for Web Browsing On Mobile Devices. In Proceedings of the Workshop on Ubiquitous User Modeling associated to the 17th European Conference on Artificial Intelligence. Riva del Guarda, Italy, August 28.
  • Dias, G. & Santos, C., & Cleuziou, G. (2006). Automatic Knowledge Representation using a Graph based Algorithm for Language Independent Lexical Chaining. In Proceedings of the Workshop on Information Extraction Beyond the Document associated to the Joint Conference of the International Committee of Computational Linguistics and the Association for Computational Linguistics (COLING•ACL 2006). Sydney, Australia, July 22nd.
2005
  • Dias, G. & Vintar, S. (2005). Unsupervised Learning of Multiword Units from Part-of-Speech Tagged Corpora: Does Quantity means Quality?. 12th Portuguese Conference on Artificial Intelligence (EPIA 2005). Covilhã, Portugal, 5-8 December. Bento, C., Cardoso, A. & Dias, G. (eds). Progress in Artificial Intelligence Serie. Springer LNAI 3008. pp. 669-680. ISSN: 03029743.
  • Dias, G. & Alves, E. (2005). Discovering Topic Boundaries for Text Summarization based on Word Co-occurrence. International Conference On Recent Advances in Natural Language Processing (RANLP 2005). Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Nicolas Nicolov, Nikolai Nikolov (eds), Borovets, Bulgaria, September 21-23. pp. 187-191. ISBN: 9549174336.
  • Campos, R. & Dias, G. (2005). Automatic Hierarchical Clustering of Web Pages. In Proceedings of the ELECTRA Workshop associated to 28th Annual International ACMSIGIR Conference, Salvador da Bahia, Brazil, August 19th. pp. 83–85. In association with ACM editions. ISBN: 1595930345.
  • Dias, G. & Alves, E. (2005). Unsupervised Topic Segmentation Based on Word Co occurrence and Multi Word Units for Text Summarization. In Proceedings of the ELECTRA Workshop associated to 28th Annual International ACMSIGIR Conference, Salvador da Bahia, Brazil, August 19th. pp. 41–48. In association with ACM editions. ISBN: 1595930345.
  • Dias, G. & Alves, E. (2005). Language-Independent Informative Topic Segmentation. 9th International Symposium on Social Communication. Santiago de Cuba, Cuba, January 24-28. pp. 588-592. ISBN: 9597174057.
  • Dias, G., Alves, E. & Nunes, C. (2005). Topic Segmentation: How Much Can We Do By Counting Words And Sequences of Words. In A Pliska Studia Mathematica Bulgarica Journal. Vol. 17. pp. 39–70.
2004
  • Dias, G. & Nunes, S. (2004). Evaluation of Different Similarity Measures for the Extraction of Multiword Units in a Reinforcement Learning Environment. In Proceedings of the 4th International Conference On Languages Resources and Evaluation, M.T. Lino, M.F. Xavier, F. Pereira, R. Costa, and R. Silva (Editors), Lisbon, Portugal, May 26–28. pp. 1717–1721. ISBN: 2951740816. EAN: 0782951740815.
  • Alexandre, L. , Pereira M., Madeira C. S., Cordeiro J.P. & Dias, G. (2004). Web Image Indexing: Combining Image Analysis with Text Processing. In proceedings of the 5th International Workshop on Image Analysis for Multimedia Interactive Services, Instituto Superior Técnico, Lisbon, Portugal. April 21-23. CD-Version. ISBN: 9729811571.
  • Cordeiro and Brazdil (2004). Learning Text Extraction Rules Without Ignoring Stop Words. 4th International Workshop on Pattern Recognition in Information Systems (PRIS 2004). Porto, Portugal.
  • Pereira, R., Crocker, P. & Dias, G. (2004). A Parallel Multikey Quicksort Algorithm for Mining Multiword Units. Workshop on Methodologies and Evaluation of Multiword Units in Real world Applications (MEMURA Workshop) associated with the 4th International Conference On Languages Resources and Evaluation. Dias, G., Lopes, J.G.L. & Vintar, Š. (Editors), Lisbon, Portugal. May 25. pp. 17–24. ISBN: 2951740816. EAN: 0782951740815.
2003
  • Dias, G. (2003). Multiword Unit Hybrid Extraction. In proceedings of the Workshop on Multiword Expressions of the 41st Annual Meeting of the Association of Computational Linguistics, Japan, July 7–12. pp. 41–49. ISBN: 1932432205. Funded by Fundação Oriente.
  • Gil, A. & Dias, G. (2003). Using Masks, Suffix Array based Data Structures and Multidimensional Arrays to Compute Positional Ngram Statistics from Corpora. In proceedings of the Workshop on Multiword Expressions of the 41st Annual Meeting of the Association of Computational Linguistics, Japan, July 7–12. pp. 25–33. ISBN: 1932432205. Funded by Fundação Oriente.
  • Dias, G. (2003). Aquisição Automática de Associações Textuais. In proceedings of the CP3A Workshop on Corpora Paralelos, Aplicações e Algoritmos Associados, J.J. Almeida (Editors), Braga, Portugal, June 3rd. pp. 59–65.
  • Gil, A. & Dias, G. (2003). Efficient Mining of Textual Associations. In Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering. China, IEEE Press, October 26–29. pp. 549–555. ISBN: 0780379020.
  • Dias, G. & Carapinha, L. & Trindade, R. & Mota, S. & Ribeiro, M. & Dias, J. (2003). Construire et Accéder à une Base de Données d’Expressions Figées à partir de Ressources de la Toile. 5èmes Rencontres Terminologie et Intelligence Artificielle (TIA), Strasbourg, France, March 32st – April 1st. pp. 92–102.
  • Dias, G. & Kaalep H. (2003). Automatic Extraction of Multiword Units for Estonian: Phrasal Verbs. In: Languages in Development, H. Metslang and M. Rannut (Editors), Linguistics Edition 41, Lincom-Europa, München, Germany. pp. 81–91. ISBN: 3895867039.
2001
  • Dias, G. & Kaalep H. & K. Muischnek (2001). Automatic Extraction of Multiword Units for Estonian: a Comparison Between Annotated and Non Annotated Corpora. International Futuristic Conference on Language Development. 12–14 March. Tallin. Estonia.
  • Dias, G. & Nunes, S. (2001). Does Natural Selection Apply to Natural Language Processing? An Experiment for Multiword Unit Extraction. In proceedings of the Natural Language Processing Knowledge Engineering Workshop of the 2001 IEEE Systems, Man, and Cybernetics Conference, Tuscson, USA, October 7–11. CD Version. ISBN: 0780370899. Funded by Fundação Calouste Gulbenkian.
  • Ribeiro, A. & Dias, G. & Lopes, J.G.P., and Mexia, J. (2001). Cognates Alignment. In proceedings of Machine Translation Summit, B. Maegaard (Editors), Santiago de Compostela, Spain. September 18–22. pp. 287–293. ISBN: 8790708083.
  • Dias, G. & Kaalep H. & K. Muischnek (2001). Automatic Extraction of Verb Phrases from Annotated Corpora: A Linguistic Evaluation for Estonian. In Proceedings of the Workshop on Collocation of the joint 39th Annual Meeting of the Association of Computational Linguistics and 10th Conference of the European Chapter of the Association of Computational Linguistics, Toulouse, France. July 6–11. ISBN: 1558607676.
  • Dias, G. & Kaalep H. & K. Muischnek (2001). Automatic Extraction of Verbal Locutions for Estonian: Validating Results with Pre existing Phrasal Lexicons. In Proceedings of the 6th Conference on Computational Lexicography and Corpus Research (COMPLEX 2001), Birmingham, England. June 28th – July 1st.
2000
  • Dias, G. & Guilloré, S. & Lopes, J.G.P. (2000). Mining Textual Associations in Text Corpora. In Proceedings of the Workshop on Text Mining associated to the 6th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2000), Boston, MA, USA. August 20th. pp. 92–95.
  • Dias, G. & Guilloré, S. & Lopes, J.G.P. (2000). Benefiting from multi domain corpora for extracting terminologically relevant multiword lexical units. In Proceedings of the 9th EURALEX International Congress, U. Heid, S. Evert, E. Lehmann, and C. Rohrer (Editors), Stuttgart, Germany, August 8–12. pp. 339–350. ISBN: 3000065741.
  • Dias, G. & Guilloré, S. & Lopes, J.G.P. (2000). Extracting Textual Associations from Part Of Speech Tagged Corpora. In Proceedings of the European Association for Machine Translation Workshop on Harvesting Existing Resources, Ljubljana, Slovenia.
  • Dias, G. & Guilloré, S. & Bassano, J.C. & Lopes, J.G.P. (2000). Combining Linguistics with Statistics for Multiword Term Extraction: A Fruitful Association? In Proceedings of 6ème Conférence sur la Recherche d’Informations Assistée par Ordinateur (RIAO 2000). Paris, France, April 12–14. pp. 1–20. ISBN: 290545007X.
  • Dias, G. & Guilloré, S. & Lopes, J.G.P. (2000). Extraction Automatique d’Associations Textuelles à Partir de Corpora Non Traités. In Proceedings of 5èmes Journées Internationales d’Analyse Statistique des Données Textuelles, M. Rajman and J C. Chappelier (Editors), Lausanne, Switzerland, March 9–11. pp. 213–221.
  • Dias, G. & Guilloré, S. & Lopes, J.G.P. (2000). Normalisation of Association Measures for Multiword Lexical Unit Extraction. In Proceedings of International Conference on Artificial and Computational Intelligence for Decision, Control and Automation in Engineering, and Industrial Applications, Tunisia. March 22–24. pp. 207–216.
  • Dias, G. & Guilloré, S. & Bassano, J.C. & Lopes, J.G.P. (2000). Extraction Automatique d’unités Lexicales Complexes: Un Enjeu Fondamental pour la Recherche Documentaire. In Traitement Automatique des Langues, Vol. 41:2, Christian Jacquemin (Editors). Paris, France. pp. 447–473. ISBN: 2746202255.
1999
  • Dias, G. & Guilloré, S. & Vintar, Š. & Lopes, J.G.P. (1999). Identifying and Integrating Terminologically Relevant Multiword Units in the IJS ELAN Slovene English Parallel Corpus. In Selected Papers of the 10th Computational Linguitistics In the Netherlands (CLIN), P. Monachesi (Editors), Utrecht, Netherlands. pp. 29–40.
  • Dias, G. & Guilloré, S. & Lopes, J.G.P. (1999). Mutual Expectation: a Measure for Multiword Lexical Unit Extraction. In Proceedings of VEXTAL Venezia per il Trattamento Automatico delle Lingue, Venezia, Italy. November 22–24. pp. 133–139. ISBN: 8880981129.
  • Silva, J. & Dias, G. & Guilloré, S. & Lopes, J.G.P. (1999). Using LocalMaxs Algorithm for the Extraction of Contiguous and Non contiguous Multiword Lexical Units. In Proceedings of 9th Portuguese Conference in Artificial Intelligence, Pedro Barahona and Júlio Alferes (Editors), Lecture Notes in Artificial Intelligence, Springer Verlag, Évora, Portugal, September 21–24. pp. 113–132.
  • Dias, G. & Guilloré, S. & Lopes, J.G.P. (1999). Multilingual Aspects of Multiword Lexical Units. In Proceedings of Workshop on Language Technologies of the 32th annual meeting fo the Societas Linguistica Europea, Vintar, Š. (Editors), Ljubljana, Slovenia, July 8–11. pp. 11–21. ISBN: 9612270031.
  • Dias, G. & Guilloré, S. & Lopes, J.G.P. (1999). Language Independent Automatic Acquisition of Rigid Multiword Units from Unrestricted Text corpora. In Proceedings of 6ème Conférence Annuelle sur le Traitement Automatique des Langues Naturelles, Cargèse, France, July 12–17. pp. 333–339.
  • Dias, G. & Guilloré, S. & Lopes, J.G.P. (1999). Multiword Lexical Units Extraction. In Proceedings of the International Conference on Machine Translation and Computer Language Information Processing, China, June 26–28. pp. 119–126.

MOVES

Ongoing (48 months) - Started: 2018 Ends: 2022

Within this project, we propose to develop a multilingual surveillance system capable of detecting emerging crowds by identifying rising events that foster high focus, high energy and high emotion on social media. Our fundamental hypothesis is that virtual crowds evidence similar characteristics to real crowds, which may allow their modelization in terms of complex computer systems by relying on advanced natural language processing and machine learning techniques. The current project lays at the intersection of important scientific research topics, namely urban informatics, natural language processing for social media, predictive analytics over big social data and image semtiment analysis.

HULTIG-C

Ongoing (24 months) - Started: 2019 Ends: 2021

The objective of this project is to build an NLP cloud platform, which enables researchers and users to use language processing components and resources, following the software-as-a-service paradigms. The focus is on multilingual text analysis based on an open-source infrastructure and compliant with relevant NLP standards.


Resources

"Nuggets" for Text Processing in Java
Version 1.1

Crawler for Scoial networks.

Text Web Crawler with Python & MySQL.

Lexicon of extreme sentiments created
based on SentiWordNet and SenticNet


People

João Paulo Cordeiro
João Paulo Cordeiro
Assistant Professor (UBI)
Sebastião Pais
Sebastião Pais
Assistant Professor (UBI)
Irfan Khan Tanoli
Irfan Khan Tanoli
Postdoc. Researcher (UBI)
ABDULLATIF A. ABOLOHOM
Abdullatif A. Abolohom
Postdoc. Researcher (UBI)
Yaseen Khan Tanoli
Yaseen Khan Tanoli
PhD Student (UBI)
Muhammad Luqman Jamil
M. Luqman Jamil
MS Student (UBI)

Contact

HULTIG

Departamento de Informática

Universidade da Beira Interior,
Rua Marquês d'Ávila e Bolama
6200-001 Covilhã
Portugal

Telefone/Fax


Telefone :+351 275 242081 (ext.: 1601)
Fax :+351 275 319 899

Email

Responsible: João Cordeiro, PhD

Email : [jpaulo] @ [di] . [ubi] . [pt]