Curriculum vitae

Colazzo Dario

Professeur des universités


Mes activités de recherche et d'enseignement s'inscrivent dans le cadre du traitement efficace des big data semi-structurées, via  parallélisme et la distribution  (MapReduce, Hadoop, Spark, Flink, ...). Je m'intéresse particulièrement aux aspects formels et aux applications de techniques d'analyse statique pour le traitement de graphes massifs et de données JSON.

Je travaille actuellement sur les projets suivants :

- Inférence de schéma pour des données JSON massifs.
- Techniques et algorithmes de pré-filtrage pour les systèmes de recommandation contexutels. 
- Saturation incrémentielle et en streaming de données RDF massives.
- Systèmes de types pour les bases de données à strucutre de graphe. 

Ma liste de publications est disponible ici :

Dernières publications


Attouche L., Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2024), Validation of Modern JSON Schema: Formalization and Complexity, Proceedings of the ACM on Programming Language, vol. 8, p. 1451-1481

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2023), Negation-closure for JSON Schema, Theoretical Computer Science, vol. 955, p. 113823

Attouche L., Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2022), Witness Generation for JSON Schema, Proceedings of the VLDB Endowment, vol. 15, n°13, p. 4002-4014

Baazizi A., Colazzo D., Ghelli G., Sartiani C. (2019), Parametric schema inference for massive JSON datasets, The VLDB Journal, vol. 28, n°4, p. 497-521

Bidoit N., Colazzo D., Malla N., Sartiani C. (2018), Evaluating Queries and Updates on Big XML Documents, Information Systems Frontiers, vol. 20, n°1, p. 63-90

Colazzo D., Ghelli G., Sartiani C. (2017), Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation, ACM Transactions on Database Systems, vol. 42, n°4, p. 1-44

Camacho-Rodriguez J., Colazzo D., Manolescu I. (2015), PAXQuery: Efficient Parallel Processing of Complex XQuery, IEEE Transactions on Knowledge and Data Engineering, vol. 27, n°7, p. 1977-1991

Nguyen B., Dudouet F-X., Colazzo D., Vion A., Manolescu I., Senellart P. (2011), XML content warehousing : Improving sociological studies of mailing lists and web data, BMS : Bulletin de méthodologie sociologique, vol. 112, n°1, p. 5-31

Dudouet F-X., Nguyen B., Colazzo D., Manolescu I., Vion A. (2010), Webstand, une plateforme de gestion de données web pour applications sociologiques, TSI : Technique et Science Informatiques, vol. 29, n°8-9, p. 1055-1080

Chapitres d'ouvrage

Attouche L., Baazizi M-A., Colazzo D. (2024), Optimistic Data Generation for JSON Schema, in Abdelkader Hameurlain, A Min Tjoa, Reza Akbarinia, Angela Bonifati, Transactions on Large-Scale Data- and Knowledge-Centered Systems LVI. Special Issue on Data Management - Principles, Technologies, and Applications Springer Nature Switzerland, p. 119-152

Farvardin M., Colazzo D., Belhajjame K., Sartiani C. (2020), Scalable Saturation of Streaming RDF Triples, in Abdelkader Hameurlain, A Min Tjoa, Philippe Lamarre, Karine Zeitouni, Transactions on Large-Scale Data- and Knowledge-Centered Systems XLIV : Special Issue on Data Management – Principles, Technologies, and Applications Springer, p. 1-40

Communications avec actes

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2021), An Empirical Study on the “Usage of Not” in Real-World JSON Schema Documents, in Aditya Ghose ; Jennifer Horkoff ; Vítor E. Silva Souza ; Jeffrey Parsons ; Joerg Evermann, Berlin Heidelberg, Springer International Publishing, 102-112 p.

Baazizi M-A., Berti C., Colazzo D., Ghelli G., Sartiani C. (2020), Human-in-the-Loop Schema Inference for Massive JSON Datasets, in , 23rd International Conference on Extending Database Technology, EDBT 2020, Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 635-638 p.

Fruth M., Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2020), Challenges in Checking JSON Schema Containment over Evolving Real-World Schemas, in Georg Grossmann ; Sudha Ram, Berlin Heidelberg, Springer International Publishing, 220-230 p.

Farvardin M., Colazzo D., Belhajjame K., Sartiani C. (2019), Streaming saturation for large RDF graphs with dynamic schema information, in Alvin Cheung, Kim Nguyễn, Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages (DBPL 2019 ), New York, NY, ACM - Association for Computing Machinery, 42-52 p.

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C. (2019), A Type System for Interactive JSON Schema Inference (Extended Abstract), in Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, Stefano Leonardi, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 101:1--101:13 p.

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C. (2019), Schemas and Types for JSON Data: From Theory to Practice, in Peter Boncz, Stefan Manegold, SIGMOD '19 Proceedings of the 2019 International Conference on Management of Data, New York, NY, ACM - Association for Computing Machinery, 2060-2063 p.

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C. (2019), Schemas And Types For JSON Data, in Melanie Herschel, Helena Galhardas, Berthold Reinwald,, 22nd International Conference on Extending Database Technology (EDBT 2019), Konstanz,, 437-439 p.

Vahidi Ferdousi Z., Colazzo D., Negre E. (2018), CBPF: Leveraging Context and Content Information for Better Recommendations, in Guojun Gan, Bohan Li, Xue Li, Shuliang Wang, Advanced Data Mining and Applications 14th International Conference, ADMA 2018, Springer, 381-391 p.

Vahidi Ferdousi Z., Colazzo D., Negre E. (2018), Correlation-Based Pre-Filtering for Context-Aware Recommendation, in George Roussos, Achilles Kameas, 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), IEEE - Institute of Electrical and Electronics Engineers

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C. (2017), Counting types for massive JSON datasets, in , New York, NY, ACM - Association for Computing Machinery, 1-12 p.

Vahidi Ferdousi Z., Negre E., Colazzo D. (2017), Context factors in context-aware recommender systems, in , AISR 2017 : Atelier interdisciplinaire sur les systèmes de recommandation, Paris, Conservatoire national des arts et métiers

Camacho-Rodríguez J., Colazzo D., Herschel ., Manolescu I., Roy Chowdhury S. (2016), Reuse-based Optimization for Pig Latin, in Snehasis Mukhopadhyay, ChengXiang Zhai, Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM'16), New York, Association Française de Marketing, 2215-2220 p.

Colazzo D., Sartiani C. (2015), Typing regular path query languages for data graphs, in James Cheney, Thomas Neumann, DBPL 2015 Proceedings of the 15th Symposium on Database Programming Languages, Association Française de Marketing, 69-78 p.

Camacho-Rodríguez J., Colazzo D., Manolescu I. (2014), PAXQuery: A Massively Parallel XQuery Processor, in , DanaC'14 Proceedings of Workshop on Data analytics in the Cloud, Association Française de Marketing, 1-4 p.

Colazzo D., Sartiani C. (2014), Typing query languages for data graphs, in Elisa Bertino, Goce Trajcevski, 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), IEEE - Institute of Electrical and Electronics Engineers, 28-31 p.

Colazzo D., Roatis A., Manolescu I., Goasdoué F. (2014), RDF Analytics: Lenses over Semantic Graphs, in Suel, Torsten, WWW '14 Proceedings of the 23rd international conference on World wide web, Séoul, ACM, 467-478 p.

Bidoit N., Colazzo D., Malla N., Ulliana F., Nolè M., Sartiani C. (2013), Processing XML queries and updates on map/reduce clusters, in Giovanna Guerrini, Norman W. Paton, EDBT '13 Proceedings of the 16th International Conference on Extending Database Technology, Association Française de Marketing, 745-748 p.

Communications sans actes

Attouche L., Baazizi M-A., Colazzo D., Ding Y., Fruth M., Ghelli G., Sartiani C., Scherzinger S. (2021), A Test Suite for JSON Schema Containment, ER Posters/Demo 2021 - 40th International Conference on Conceptual Modeling, St. John's, Canada

Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2020), Not Elimination and Witness Generation for JSON Schema (short version), 36ème Conférence sur la Gestion de Données – Principes, Technologies et Applications, Paris, France

Camacho-Rodriguez J., Colazzo D., Herschel M., Manolescu I., Roy Chowdhury S. (2014), Reuse-based Optimization for Pig Latin, BDA'2014: 30e journées Bases de Données Avancées, Oct 2014, Grenoble-Autrans, France, Grenoble-Autrans, France

Camacho-Rodriguez J., Colazzo D., Manolescu I. (2014), PAXQuery: Efficient Parallel Processing of Complex XQuery, BDA'2014: 30e journées Bases de Données Avancées, Oct 2014, Grenoble-Autrans, France, Grenoble-Autrans, France

Colazzo D., Goasdoué F., Manolescu I., Roatis A. (2013), Warehousing RDF Graphs, BDA' 2013: 29e journées Bases de Données Avancées, Oct 2013, Nantes, France, Nantes, France

Vion A., Manolescu I., Nguyen B., Colazzo D., Senellart P., Dudouet F-X. (2009), The WebStand Project, WebSci'09 : Society On-Line Conference, Athènes, Grèce

Prépublications / Cahiers de recherche

Attouche L., Baazizi M-A., Colazzo D., Ghelli G., Sartiani C., Scherzinger S. (2023), Validation of Modern JSON Schema: Formalization and Complexity, Paris, Preprint Lamsade, 1-49 p.


Nguyen B., Dudouet F-X., Colazzo D., Manolescu I. (2008), A Source Centric Temporal Model, 6 p.

Retour à la liste