internships

Artificial Cohort Generation for Clinical Research at CHUV using Machine Learning

The Lausanne University Hospital (CHUV) is a key player in medical care and biomedical research both at national and international levels. The mission of the Data Science team within CHUV’s Department of Information Systems, is to foster the adoption and use of data science and innovative tools in medical informatics inside the hospital to significantly improve biomedical research and hospital key processes.

To achieve this ambitious goal, one of the core tasks of the group is to develop the necessary IT infrastructure for providing physicians and scientists with the means to smoothly access and use, for their clinical research projects, the massive volume of clinical data that is generated every day within the hospital. Our current and future challenges lie at the intersection of big data, medical informatics, data protection and artificial intelligence. Collaboration and data sharing have become core elements of biomedical research. Protective measures like anonymization and synthetic dataset generation are paramount for gaining patients’ trust and bringing together data from healthcare and research to conduct advanced studies.

To leverage the ability of the Data Science group to provide researchers with high quality data while complying with current privacy guidelines on data protection, you will implement a tool able to generate artificial data points that are statistically identical to the CHUV datasets. This tool will be constructed using state of the art Machine Learning methods. In order to achieve this goal, you will:

1) explore existing methods and tools to generate synthetic clinical data
2) test the selected approaches on simple datasets
3) explore the extension capabilities of the tool to more complex datasets and data types
4) assess the compliance of the generated data with data privacy guidelines such as HIPAA
5) implement a robust version of the final method and deploy it for potential CHUV end-usersMore info

Type: Internship, master project
    Period: February – September (2020)
      Required skills:
      • Knowledge of state-of-the-art Machine Learning methods and statistics
      • Strong Linux background
      • Good proficiency in Python and ML related modules (jupyter, scikit-learn, pandas, numpy, etc) any other lower level programming language is a plus
      • Knowledge of free machine learning software libraries such as Tensorflow, Keras, or pyTorch
      • Interest in all cycles of software development
      • Good master of English is required, knowledge of French is a plus
      • Team spirit and initiative, ability work autonomously
      • Knowledge of adversarial generative networks, Docker, HPC, or clinical data processing is a plus
      Contact

      Jérémie Despraz
      Jean Louis Raisaro

      Development of de-identification service for structured and unstructured clinical data

      Collaboration and data sharing have become core elements of biomedical research. Protective measures like anonymization are paramount for gaining patients’ trust and bringing together data from healthcare and research in order to use this data for further research.

      In recent years, several easy-to-use tools have been developed that make methods of data anonymization available to a broad range of users. However, robust implementations that can be applied in production environment within a hospital are lacking. To address this problem, at the Direction of Information Systems of CHUV, we have started to develop a service for the de-identification of structured clinical data that is based on the ARX anonymization toolbox [1]. ARX is the state-of-the-art most wide-spread open-source tool for structured biomedical data anonymization that uses formal statistical methods and privacy models to perform data de-identification and quantify the remaining risk of re-identification. Although significant progress has already been made in the development of this de-identification service, substantial work remains to be done to enable users to take advantage of it. Moreover, the service must be extended to also enable researchers to de-identify unstructured medical data which is mostly contained in free-text reports.

      Therefore, in this project the selected intern will:

      1) familiarize with the current CHUV de-identification service and the ARX anonymization toolbox [1] on which the service is based
      2) define the specification of a robust and user-friendly Web front-end to enable data engineers and researchers to use the service in their daily job
      3) implement such Web front end, validate it with end users and generate a detailed user guide
      4) design and implement new functionalities to enable knowledge-driven de-identification of clinical narrative

      Type: Internship, master project
        Period: February – September (2020)
          Required skills:
          • Excellent programming experience with JAVA, SQL and Python
          • Excellent programming experience with a front-end Web development framework (e.g., React, Angular, VueJS)
          • Good programming experience with SQL and experience with modern database engines
          • Interest in all cycles of software development
          • Good master of English is required, knowledge of French is a plus
          • Team spirit and initiative, ability work autonomously
          • Knowledge of Docker, HPC, or clinical data processing is a plus
          • Knowledge of the main privacy frameworks (k-anonymity, differential privacy) is a plus
          Contact

          Jean Louis Raisaro