Belhajjame Khalid

Maître de conférences
Bureau : B214


I am a lecturer (Maitre de Conférences) at the University Paris-Dauphine, where I am a member of the LAMSADE research lab. Before moving to Paris, I have been a researcher forseveral years at the University of Manchester, and prior to that a PhD student at the University of Grenoble. My research interests lie in the areas of information and knowledge management. In particular, I have made key contributions to the areas of pay-as-you data integration, e-Science, scientific workflow management, provenance tracking and exploitation, and semantic web services. I have published over 50 papers in the aforementioned topics. Most of my research proposals were validated against real-world applications from the fields of astronomy, biodiversity and life sciences. I am a member of the editorial board of the MethodX Elsevier paper. I have participated in multiple European-, French- and UK-funded projects, and have been an active member of the W3C Provenance working group, the NSF funded DataONE working group on scientific workflows and provenance, and more recently the Research Object for Scholarly Communication Community Group.  I am also co-leading the provenance benchmarking activity ProvBench, which seeks to produce a family of benchmarks for testing provenance proposals.

Dernières publications


Schintke F., Belhajjame K., De Mecquenem N., Frantz D., Guarino V., Hilbrich M., Lehmann F., Missier P., Sattler R., Sparka J., Speckhard D., Stolte H., Vu A., Leser U. (2024), Validity constraints for data analysis workflows, Future Generation Computer Systems, vol. 157, p. 82-97

Belhajjame K., Mejri M. (2023), Online maintenance of evolving knowledge graphs with RDFS-based saturation and why-provenance support, Journal of Web Semantics, vol. 78

Djaffardjy M., Marchment G., Sebe C., Blanchet R., Belhajjame K., Gaignard A., Lemoine F., Cohen-Boulakia S. (2023), Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems, Computational and Structural Biotechnology Journal, vol. 21, p. 2075-2085

Belhajjame K. (2022), On the Anonymization of Workflow Provenance without Compromising the Transparency of Lineage, Journal of Data and Information Quality, vol. 14, n°1, p. 1–27

Belhajjame K., Faci N., Maamar Z., Burégio V., Soares E., Barhamgi M. (2020), On privacy-aware eScience workflows, Computing, vol. 102, n°5, p. 1171–1185

Gaignard A., Skaf-Molli H., Belhajjame K. (2020), Findable and reusable workflow data products: A genomic workflow case study, Semantic Web Journal, vol. 11, n°5, p. 751-763

Belhajjame K., De Castro V., Espinosa-Oviedo J., Musicante M., Souza Da Costa U., Souza Neto P. (2018), πSOD-M: Building SOC Applications in the Presence of Non-Functional Requirements, International Journal of Web and Grid Services, vol. 14, n°4, p. 400 -431

Alper P., Belhajjame K., Curcin V., Goble C. (2018), LabelFlow Framework for Annotating Workflow Provenance, Informatics, vol. 5, n°1

Alper P., Belhajjame K., Goble C. (2017), Static analysis of Taverna workflows to predict provenance patterns, Future Generation Computer Systems, vol. 75, p. 310-329

Cohen-Boulakia S., Belhajjame K., Collin O., Chopard J., Froidevaux C., Gaignard A. (2017), Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities, Future Generation Computer Systems, vol. 75, p. 284-298

Barhamgi M., Bandara A., Yu Y., Belhajjame K., Nuseibeh B. (2016), Protecting Privacy in the Cloud: Current Practices, Future Directions, Computer, vol. 49, n°2, p. 68-72

Paton N., Fernandes A., Belhajjame K., Hedeler C. (2015), Enabling community-driven information integration through clustering, Distributed and Parallel Databases, vol. 33, n°1, p. 33-67

van Schouwen R., Thompson M., Zhao J., Belhajjame K., Roos M., Goble C., Bechhofer S., ‘t Hoen P., Klyne G., Corcho O., de Roure D., Garrido J., Verdes-Montenegro L., Cruickshank D., Mina E., Soiland-Reyes S., Wolstencroft K., Dharuri H., Hettne K. (2014), Structuring research methods and data with the research object model: genomics workflows as a case study, Journal of Biomedical Semantics, vol. 5, n°1, p. 41

Garijo D., Alper P., Belhajjame K., Corcho O., Gil Y., Goble C. (2014), Common motifs in scientific workflows: An empirical analysis, Future Generation Computer Systems, vol. 36, p. 338-351

Embury S., Belhajjame K., Paton N. (2014), Verification of Semantic Web Service Annotations Using Ontology-Based Partitioning, IEEE Transactions on Services Computing, vol. 7, p. 515-528

Fernandes A., Paton N., Belhajjame K., Hedeler C., Embury S. (2013), Incrementally improving dataspaces based on user feedback, Information Systems, vol. 38, n°5, p. 656-687

Wolstencroft K., Fisher P., Soiland-Reyes S., Haines R., Williams A., Fellows D., Belhajjame K., Goble C., Sufi S., Balcazar Vargas M., Nieva de la Hidalga A., Hardisty A., Bacall F., Bhagat J., Nenadic A., Dunlop I., Owen S., Withers D., Belhajjame K. (2013), The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud, Nucleic Acids Research, vol. 41, n°W1, p. W557-W561

Ciccarese P., Belhajjame K., Clark T., Goble C., Gray A., Soiland-Reyes S. (2013), PAV ontology: provenance, authoring and versioning, Journal of Biomedical Semantics, vol. 4, n°1, p. 37

Chapitres d'ouvrage

Farvardin M., Colazzo D., Belhajjame K., Sartiani C. (2020), Scalable Saturation of Streaming RDF Triples, in Abdelkader Hameurlain, A Min Tjoa, Philippe Lamarre, Karine Zeitouni, Transactions on Large-Scale Data- and Knowledge-Centered Systems XLIV : Special Issue on Data Management – Principles, Technologies, and Applications Springer, p. 1-40

Hedeler C., Fernandes ., Belhajjame K., Mao L., Guo C., Paton N., Embury S. (2013), A Functional Model for Dataspace Management Systems, in Barbara Catania et Lakhmi C. Jain, Advanced Query Processing, Berlin: Springer, p. 305-341

Communications avec actes

Maligeay N., Bossut N., Belhajjame K. (2024), Why Do Scientific Workflows Still Break?, in Shadi Ibrahim, Suren Byna, Tristan Allard, Jay Lofstead, Amelie Chi Zhou, Tassadit Bouadi, Jalil Boukhobza, Diana Moise, Cédric Tedeschi,, SSDBM 2024: 36th International Conference on Scientific and Statistical Database Management, ACM - Association for Computing Machinery, 1-4 p.

Belhajjame K., Barhamgi M., Camacho D. (2024), Exploring Data Preparation Modules by Examples, in , Springer, 52-69 p.

Manouvrier M., Belhajjame K. (2024), PG-FD: Mapping Functional Dependencies to the Future Property Graph Schema Standard, in Joe Tekli, Johann Gamper, Richard Chbeir, Yannis Manolopoulos, Advances in Databases and Information Systems, Springer Nature Switzerland, 45-59 p.

Belhajjame K. (2023), Efficient Maintenance of Agree-Sets Against Dynamic Datasets, in , Konstanz,, 14-26 p.

Azorin R., Grigori D., Belhajjame K. (2022), A Reproducible Approach for Mining Business Activities from Emails for Process Analytics, in , Berlin Heidelberg, Springer International Publishing, 77–91 p.

Bethaz P., Belhajjame K., Vargas-Solar G., Cerquitelli T. (2021), DS4ALL: All you need for democratizing data exploration and analysis, in , elsevier, Piscataway, NJ, IEEE - Institute of Electrical and Electronics Engineers

Belhajjame K. (2020), Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows, in Bonifati, Angela; Zhou, Yongluan; Salles, Marcos Antonio Vaz, Konstanz,, 229-240 p.

Alili H., Drira R., Belhajjame K., Ben Ghezala H., Grigori D. (2019), A Model-Driven Framework for the Modeling and the Description of Data-as-a-Service to Assist Service Selection and Composition, in Sven Hartmann, Josef Küng, Sharma Chakravarthy, Gabriele Anderst-Kotsis, A Min Tjoa, Ismail Khalil, Database and Expert Systems Applications 30th International Conference, DEXA 2019, Springer, 396-406 p.

Farvardin M., Colazzo D., Belhajjame K., Sartiani C. (2019), Streaming saturation for large RDF graphs with dynamic schema information, in Alvin Cheung, Kim Nguyễn, Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages (DBPL 2019 ), New York, NY, ACM - Association for Computing Machinery, 42-52 p.

Al Jlailaty D., Grigori D., Belhajjame K. (2019), On the elicitation and annotation of business activities based on emails, in Hung, Chih-Cheng; Papadopoulos, George A., New York, NY, ACM - Association for Computing Machinery, 101–103 p.

Aloulen Z., Belhajjame K., Grigori D., Acker R. (2019), A Domain-Independent Ontology for Capturing Scientific Experiments, in imitris Kotzinos ; Dominique Laurent ; Nicolas Spyratos ; Yuzuru Tanaka ; Rin-ichiro Taniguchi, Berlin Heidelberg, Springer International Publishing, 53-68 p.

Belhajjame K. (2018), On Answering Why-Not Queries Against Scientific Workflow Provenance, in , elsevier, Konstanz,, 465-468 p.

Jlailaty D., Grigori D., Belhajjame K. (2018), Email Business Activities Extraction and Annotation, in Dimitris Kotzinos, Dominique Laurent, Nicolas Spyratos, Yuzuru Tanaka, Rin-ichiro Taniguchi, Information Search, Integration, and Personalization 12th International Workshop, ISIP 2018, Springer, 69-86 p.

Alili H., Belhajjame K., Drira R., Grigori D., Ben Ghezala H. (2018), Quality Based Data Integration for Enriching User Data Sources in Service Lakes, in , 2018 IEEE International Conference on Web Services, ICWS 2018, San Francisco, CA, USA, July 2-7, 2018, Piscataway, NJ, IEEE - Institute of Electrical and Electronics Engineers, 163-170 p.

Jlailaty D., Grigori D., Belhajjame K. (2017), Business Process Instances Discovery from Email Logs, in , 2017 IEEE International Conference on Services Computing (SCC), Hawaii, IEEE - Institute of Electrical and Electronics Engineers, 19-26 p.

Jlailaty D., Grigori D., Belhajjame K. (2017), Mining Business Process Activities from Email Logs, in , 2017 IEEE International Conference on Cognitive Computing (ICCC), Hawaii, IEEE - Institute of Electrical and Electronics Engineers, 112-119 p.

Jlailaty D., Grigori D., Belhajjame K. (2017), Multi-level clustering for extracting process-related information from email logs, in Saïd Assar, Oscal Pastor, Haralambos Mouratidis, 11th IEEE International Conference on Research Challenges in Information Science (RCIS 2017), Brighton, IEEE - Institute of Electrical and Electronics Engineers, 455-456 p.

Alili H., Belhajjame K., Grigori D., Drira R., Ben Ghezala H. (2017), On Enriching User-Centered Data Integration Schemas in Service Lakes, in Witold Abramowicz, Business Information Systems, 20th International Conference, BIS 2017, Poznan, Poland, June 28–30, 2017, Proceedings, Berlin Heidelberg, Springer, 3-15 p.

Cortés Ríos J., Paton N., Fernandes A., Belhajjame K. (2016), Efficient Feedback Collection for Pay-as-you-go Source Selection, in Peter Baumann, Ioana Manolescu-Goujot, Luca Trani [et al.], Proceedings of the 28th International Conference on Scientific and Statistical Database Management (SSDBM '16), New York, Association Française de Marketing

Belhajjame K., Bonifati A. (2016), Data Exchange with MapReduce: A First Cut, in Peter Baumann, Ioana Manolescu-Goujot, Luca Trani [et al.], Proceedings of the 28th International Conference on Scientific and Statistical Database Management (SSDBM), New York, Association Française de Marketing, 4 p.

Paton N., Belhajjame K., Embury S., Fernandes A., Maskat R. (2016), Pay-as-you-go Data Integration: Experiences and Recurring Themes, in Rūsiņš Mārtiņš Freivalds, Gregor Engels, Barbara Catania, SOFSEM 2016: Theory and Practice of Computer Science 42nd International Conference on Current Trends in Theory and Practice of Computer Science, Harrachov, Czech Republic, January 23-28, 2016, Proceedings, Berlin Heidelberg, Springer, 81-92 p.

Harmassi M., Grigori D., Belhajjame K. (2015), Mining Workflow Repositories for Improving Fragments Reuse, in Jorge Cardoso, Francesco Guerra, Geert-Jan Houben, Alexandre Miguel Pinto, Yannis Velegrakis, Semantic Keyword-based Search on Structured Data Sources. First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers, Springer, 76-87 p.

Alper P., Belhajjame K., Goble C., Karagoz P. (2015), LabelFlow: Exploiting Workflow Provenance to Surface Scientific Data Provenance, in Bertram Ludäscher, Beth Plale, Provenance and Annotation of Data and Processes. 5th International Provenance and Annotation Workshop, IPAW 2014, Cologne, Germany, June 9-13, 2014. Revised Selected Papers, Berlin Heidelberg, Springer, 84-96 p.

Belhajjame K. (2014), Annotating the Behavior of Scientific Modules Using Data Examples: A Practical Approach, in , Advances in Database Technology - EDBT 2014, 17th International Conference on Extending Database Technology, Athens, Greece, March 24-28, Proceedings, Konstanz,, 726-737 p.

Missier P., Dey S., Belhajjame K., Cuevas-Vicenttin V., Ludäscher B. (2013), D-PROV: extending the PROV provenance model with workflow structure, in , TaPP '13, 5th USENIX Workshop on the Theory and Practice of Provenance, Chicago, ACM - Association for Computing Machinery

Holl S., Garijo D., Belhajjame K., Zimmermann O., De Giovanni R., Obst M., Goble C. (2013), On specifying and sharing scientific workflow optimization results using research objects, in , WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science, Association Française de Marketing, 28-37 p.

Missier P., Belhajjame K., Cheney J. (2013), The W3C PROV family of specifications for modelling provenance metadata, in , EDBT '13 Proceedings of the 16th International Conference on Extending Database Technology, Gênes, Association Française de Marketing, 773-776 p.

Alper P., Belhajjame K., Goble C., Karagoz P. (2013), Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotations, in , 2013 IEEE International Congress on Big Data (BigData Congress), IEEE - Institute of Electrical and Electronics Engineers, 318-325 p.

Alper P., Goble C., Belhajjame K. (2013), On assisting scientific data curation in collection-based dataflows using labels, in , WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science, Association Française de Marketing, 7-16 p.

Belhajjame K., Zhao J., Garijo D., Garrido A., Soiland-Reyes S., Alper P., Corcho O. (2013), A workflow PROV-corpus based on taverna and wings, in , EDBT '13 Proceedings of the Joint EDBT/ICDT 2013 Workshops, Association Française de Marketing, 331-332 p.

Alper P., Belhajjame K., Goble C., Karagoz P. (2013), Enhancing and abstracting scientific workflow provenance for data publishing, in , EDBT '13 Proceedings of the Joint EDBT/ICDT 2013 Workshops, Association Française de Marketing, 313-318 p.

Communications sans actes

Gaignard A., Skaf-Molli H., Belhajjame K. (2021), Découvrabilité et réutilisation de données produites par des workflows : un cas d’usage en génomique, Journées Francophones d'Ingénierie des Connaissances (IC) Plate-Forme Intelligence Artificielle (PFIA'21), Bordeaux, France

Dey S., Belhajjame K., Koop D., Song T., Missier P., Ludäscher B. (2014), UP & DOWN: Improving Provenance Precision by Combining Workflow- and Trace-Level Information, 6th USENIX Workshop on the Theory and Practice of Provenance TaPP 2014, Cologne, Allemagne

Actes d'une conférence

Belhajjame K., Gehani A., Alper P. (2018), Provenance and Annotation of Data and Processes, in , Berlin Heidelberg, Springer International Publishing, 272 p

Prépublications / Cahiers de recherche

Vargas-Solar G., Negrete-Yankelevich S., Espinosa-Oviedo J., Belhajjame K., Zechinelli-Martini J-L. (2023), MATILDA: Inclusive Data Science Pipelines Design through Computational Creativity, Paris, Preprint Lamsade

Jlailaty D., Grigori D., Belhajjame K. (2016), A framework for mining process models from emails logs, Preprint Lamsade, 18 p.


Belhajjame K., Brambilla M., Grigori D., Mauri A. (2014), Community Profiling for Crowdsourcing Queries, COST European cooperation in science and technology

