Research team on Data & Web Mining

Latest News

  • Prof Vazirgiannis presents in the European Summer School in Information Retrieval (ESSIR) tutorial entitled "Graph-of-words: boosting text mining with graphs"

  • Παρουσίαση: Αρχειοθέτηση ιστοπεριεχομένου και διατήρηση ψηφιακής μνήμης: "η εμπειρία του ΟΠΑ"-Μ. Βαζιργιάννης, Επιστημονική Ημερίδα: Η συμβολή των οικονομικών βιβλιοθηκών στην έρευνα και στην ανάπτυξη, Τράπεζα της Ελλάδος, Παρασκευή 6 Μαρτίου 2015, Περισσότερα: εδώ.

  • Margarita Karkali defended successfully her PhD thesis “Efficient Novelty Detection in Document Streams” on 7/7/2014.


Current Projects
2011-2014 8NEWE2009: "S-Suite: A Multipart Service Oriented Architecture for The Car Rental Sector."
Funded by "Support of new enterprices for R&D activities" program in the context of NSRF 2007-2013

S-Suite is a service oriented framework that mediates among brokers and service providers in the car rental industry. It handles the full life cycle of reservations enabling automatic reservation treatment incorporating the most enhanced functional features demanded by brokers and service providers. The benefits of the system are multiple: a. eficiency and transparency, b. optimal matching among reservations demands and service offers at a local level.
The S-Suite objective is to provide to the brokers a great number of rental options for their customer, as potentially multiple partners are available for each different region. S-Suite framework includes a complete administration panel, with a large number of accounting/logistic and statistic capabilities towards monitoring and management of the service operation. Respectively, two types of partner panels are implemented to give the partners the opportunity to monitor their reservations and accounting from S-Suite. The accounting module provides full logistics management including monthly payments among the participating entities and multidimensional statistics.

2010-2014 Heracleitus II. Investing in knowledge society through the European Social Fund

Real-Time Web Personalization, Ph.D. Candidate Margarita Karkali

Machine Learning Methods for Online Advertising Campaigns, Ph.D. Candidate Matina Thomaidou

2010-2013 DIGITEO Chair Grand on Web Mining (France)

The overall objective of the proposed project is mining and learning from the large scale and dynamically evolving data and graphs generated in the Web 2.0 context. We seek to understand the structure and dynamics of a Web 2.0 information collection (using the Web graph and social networks as use cases) and to learn how to predict future properties and behavior based on its past evolution. We focus on the users and the content of the evolving Web 2.0 collections taking into account temporal evolution towards valid rankings.

Selected Concluded Projects
2006-2008 FP6/IST-2005-33331: SQO-OSS - Source Quality Observatory for Open Source Software

The project seeks to use as many sources of quality indicators as possible so as to create a set of metrics that can be applied automatically to a software project's repository in order to extract quantifiable measurements of its quality. The project leverages existing tools and will also create new ones in an effort to build an integrated quality assessment platform. It is foreseen that data mining techniques will play an important role in the project. Official Project Page Success Classifier MARIE CURIE Intra-European Fellowship - NGWeMiS - Next Generation Web Mining & Searching The project lies in the area of knowledge extraction and management from the massive and heterogeneous document collections on the World Wide Web. The main objective of the proposed project is the design guidelines and prototypes development for next generation web mining and searching techniques based on the P2P paradigm. The innovation lies in i. usage of P2P paradigm in the various levels of web content management and searching, ii. the study and development of novel similarity measures among web documents that take into account multple facets including structure and semantics iii. clustering the web data and meta data taking into account their P2P organization paradigm. Official Project Page

2005-2006 PYTHAGORAS II/GSRT: Innovative Aspects in Web Content Ranking: Time/Trends and Topic Classification

The project objective is the improvement of web ranking algorithms taking into account a) the temporal features of web pages, b) the size and dynamism of the web graph and c) the semantics of web pages. Statistical learning techniques will be used to develop a framework of algorithms which will classify the web content subject to timeliness and accuracy (in terms of the topic) specification constraints. Official Project Page

2004-2006 MARIE CURIE International Fellowship - New Techniques For Handling Quality and Uncertainty in Spatial Mining

Dr. M. Halkidi visits as a PostDoc researcher the University of California, Riverside.

2001 - 2004 - IST/h-TechSight (IST-2001-33174) - subcontractor

The overall objective of the project is to develop platform for Knowledge Management in the sector manufactures and process industries. Out mission in this context is the effective extraction of keywords and semantics from large and evolving collections of web documents followed by appropriate organization of the corpus allowing high quality search with multiple similarity criteria (among which semantic similarity). 2001-2004 - IST/NEMIS (IST-2001-37574) - subcontractor NEMIS is a network of excellence (NoE) that brings together researchers, academics and practitioners of the Text Mining domain. NEMIS aims to improve knowledge sharing, research efficiency and knowledge discovery, to promote the exchange of opinions, debate, knowledge and expertise among the domain experts, and to co-ordinate ongoing research efforts at the EU level. Our participation aimed at consulting the consortium in the area of text mining for documents putting emphasis on the web.

2001-2004 IST/FET - PANDA - Patterns for Next-Generation Database Systems - FET Working Group (IST-2001-33058)

With the advent of hardware advances, complex information resulting from data intensive applications is posing requirements for today's (and tomorrow's) Database Management Systems (DBMSs). Such information possesses a number of key features such as huge volume of data, diversity and complexity. We claim that the term pattern is a good candidate for generic representation of these novel information types. PANDA Working Group will study current state-of-the-art in pattern management and explore novel theoretical and practical aspects of a pattern-based management system (so-called, PBMS). Aided by an Industrial Advisory Board, PANDA-WG will face application of the novel PBMS technology in real needs, such as those found in financial and telecommunications fields.

2001-2004 IST/FET - DB/GLOBE (IST-2001-32645)

The DBGlobe project aims at developing novel data management techniques to deal with the challenge of global computing. On the premise, global computing is a database problem: how to design, build and analyse systems that manage large amount of data. However, the traditional database approach of storing data of interest in monolithic database management systems becomes obsolete in such environments. In current database research, data are relatively homogeneous, exhibit a small degree of distribution (just a few network sites) are passive in that they remain unchanged unless explicitly updated. All these assumptions do not hold in the global computing world. This creates the need for new theoretical foundations in all aspects of data management: modelling, storage and querying. Local Project Page

2001-2003 IST/I-KnowUMine (IST-2000-31077)

The Project aims to develop a novel and innovative generic platform for Automated Re-engineering of on-line services based on Content, Knowledge and Web Site Metrics. The I-KnowUMine will provide the basis for development of numerous on-line services ranging from Electronic Information Services and Electronic Publishing to e-commerce and on-line collaborative applications. It will evaluate, customize and integrate cutting-edge technologies and incorporate research results on Content Management, Knowledge Management and Web Usage Mining in order to semi-automatically re-construct underlying information structure. The revised structure reflects the needs of end-users or user groups for targeted information while affects and improves overall performance, usability and user retention.

1999-2001 PENED/GSRT Research & Development for Knowledge Discovery in Medical Data (with Medical School, Univ. of Athens)
1999-2001: Bilateral Cooperation between Greece and Germany

The objective of the project is research in the area of uncertainty management in the area of spatial and temporal database systems. R&D

1999 ESPRIT/VRSHOP Design and development of WWW enabled Virtual Reality interfaces for e-commerce applications

Interactive distributed design of 3D spaces with definition of a multitude of spatiotemporal parameters.