Operational tools for data management in biosciences
Etablissement : ECOLE DU NUMERIQUE
Langue : Anglais
Formation(s) dans laquelle/lesquelles le cours apparait :
- Master Data Management in Biosciences [ECTS : 4,00]
Période : S3
- Python programming knowledge (intermediate level).
- Understanding of databases: Familiarity with SQL and basic database management would be helpful.
- Basic biology: Given the bioscience focus, students should have foundational knowledge in life sciences to understand biological datasets and their context.
- Familiarity with command line interfaces: Necessary for bash scripting and using tools like git.
By the end of the course, students should be able to:
- Understand the principles of data management, including planning, acquisition, processing, and sharing of scientific data.
- “Implement basic data pipelines in accordance with ETL principles.
- Work with large-scale biological datasets and employ modern software tools to analyze, organize, and document research.
- Collaborate effectively in research using ELN tools and version control systems to ensure reproducibility and transparency.
- Deploy APIs and containerized applications for sharing and managing data in an open-access framework.
Chapter 1. Introduction to data management
a. What is data management?
b. The importance of data management
c. Data management in science
1. Practical work: Research data management at the ETH
d. The data lifecycle
1. Managing the data
2. Example of a data lifecycle in the biomedical field
e. The main characteristics of well-managed data
1. The FAIR data principles
Chapter 2. Planning for data management
a. Creating a data management plan
b. Data policies (Quick overview, details in course B0908 European environment and policies in life sciences and public health)
c. Case studies
Chapter 3. Data acquisition and pre-processing
a. Raw vs analysed data
b. Gold, Silver, and Bronze Levels of Data
c. Data integration and data aggregation
d. The Extract-Transform-Load and Extract-Load-Transform processes
1. Preparing data for analysis
2. Examples of tools used for ETL
e. Practical work: ETL with Python on Cedrus data
1. The JSON data format
2. Manipulation a JSON file using bash and Python
3. Using SQLAlchemy to manage SQL database from Python
Chapter 4. Data analysis
a. Analyses pipelines
b. Managing code
c. Workflow systems
d. Practical work with Nextflow/Circos
Chapter 5. Organization
a. File organization
b. Naming convention
c. Databases
d. Storage and backups
e. Version control systems (File versioning)
f. Practical work: collaborative python coding using git
Chapter 6. Managing sensitive data
a. Types of sensitive data
b. Keeping data secure (Quick overview, details in course B0906 Mechanisms of data protection)
c. Anonymizing data (Quick overview, details in course B0906 Mechanisms of data protection)
1. Practical work: data a