On-going project: Approaches and tools for online analytical processing of Big Data

 

The technological advances in recent years have resulted in the production of huge amounts of data, called big data. This data is generated not only within companies, but also by machines (Internet of Things) and by human beings (smartphones, Web 2.0, tablets). This large mass of data is characterised by its volume, its speed of production and the variety of its formats.

 

The inherent characteristics of Big Data pose new scientific challenges when it comes to online analysis for decision-making. Indeed, decision-making has rested for forty years on data warehouses organising structured data and allowing their analysis through OLAP cubes.

 

However, it is now established that conventional data warehouses are unable to organise and allow the analysis of Big Data, mainly because of the variety of formats, volume and velocity. As a result, companies tend to extract and load data into data lakes based on dedicated storage solutions (HDFS, NoSQL databases, etc.) and then transform them, in particular into Cube OLAP for decision-making.

 

 The purpose of this project is to develop new approaches and tools for the design and analysis of big data OLAP cubes. We are particularly interested in NoSQL databases (graph oriented, document oriented, column oriented) without neglecting other types of sources. For the design of cubes, we aim to develop the following approaches:

  • An approach based on the use of metadata for the integration of relational data, documents, and graphs in OLA cubes
  • An approach to designing and analysing OLAP cubes based entirely on graphs.

 

 For OLAP, we are motivated by the growing advances in artificial intelligence, especially in terms of machine learning, deep learning and natural language processing. We aim to develop two approaches based on artificial intelligence for

 

  • Analysis of relational OLAP cubes in natural language
  • Analysis of graph-oriented data for decision-making.