The Center for Biomedical Informatics & Biostatistics (CB2) seeks a Data Scientist III to support the research enterprise and mission with informatics, tools, clinical data sources, and shared human and technical resources. This position will collaborate with researchers for gathering, transforming and analyzing data from various databases and sources, perform programming, evaluate new platforms, techniques and scientific programs, develop reports and visual representations and develop and maintain databases for clinical research. The incumbent will need to understand clinical and biomedical research domains to support and service research studies/projects users across campus as well as university-wide initiatives.
Duties & Responsibilities
- Work with researchers to determine the best data and informatics solutions and designs for clinical research, and implement the resulting data and informatics solutions.
- Build, deploy and manage research data pipeline, including Extract, Transform and Load (ETL) pipelines, to migrate data from source systems to the data warehouse/repository for biomedical research datasets and to support researchers with limited/identified datasets from both internal and external sources.
- Provide Honest Broker services.
- Designs, modifies, and/or adapts unique one-of-a-kind data systems in support of biomedical research or instruction.
- Creates technical and use documentation for all systems.
- Creates data visualizations and dashboards using PowerBI, Python, and R.
- Provides custom API support to projects.
- Assist with server management duties to ensure back-ups and continuity of services.
- Operates and conducts performance checks on highly complex research projects.
- Develop/integrate instruments/forms within data collection platforms e.g., REDCap project(s) to collect, manage, and distribute data for various research studies.
Knowledge, Skills, and Abilities:
- Proficiency in Python, R, PHP, and/or Go. Python is required.
- Proficient in SQL and database administration for at least one major RDBMS platform (MySQL, Postgres, Oracle, MS SQL), MySQL/MariaDB preferred.
- Proficient in creating and accessing APIs using either Python/FastAPI or R/Plumber.
- Proficient in the use of version control systems such as Git.
- Strong computer skills including experience with database management systems (MySQL, PostgreSQL, Oracle, MS SQL, etc).
- Ability to gather, listen and respond to feedback to ensure the data system meets stakeholder needs.
- Strong organizational skills, ability to self-direct and meet project deadlines.
- Ability to learn new skills quickly with little to no guidance.
- Strong verbal and written communication skills.
Minimum Qualifications
- Master’s degree in data science, computer science, informatics, or closely related field is required.
- Minimum of 5 years of relevant work experience, or equivalent combination of education and work experience.
Preferred Qualifications
- Experience with biomedical data and ontologies.
- Experience with HIPAA and clinical research data privacy and security.
- Experience with Microsoft Server and Linux environments.
- Experience with data visualization using PowerBI, Python, or R.