Clinical Data Scientist / Clinical Informaticist
Clinical Data Scientist / Clinical Informaticist
New York or Remote / Full-Time / Early to Mid-Career / Early to Mid-2023 Start
Resumes, questions, and requests for assistance or an accommodation due to a disability may be directed to Dandelion’s Director of Clinical Informatics, firstname.lastname@example.org.
References are required.
Salary Range: $150K - $175K + equity
Dandelion Health was founded in 2020 by experts in health tech, hospital systems, academia, and clinical AI. We are building the world’s largest AI training and validation platform. Today, we pride ourselves on our ability to make data access as easy as possible for AI developers, while raising the bar for patient safety and data quality. Tomorrow, we will be the place where any AI developer can go to build a responsible clinical AI product. Our culture is all about learning from data and improving, so we can help our clients improve health through AI. Meet the rest of our team here.
Our Data and Cloud Stack
We partner with health systems to safely and ethically make their de-identified patient data available to AI developers. Currently the data is acquired from Sharp HealthCare and Sanford Health in the United States – with three additional U.S. health systems joining soon.
We have clinical data dating back to July 1, 2016. This data represents over 10 million patients and includes but is not limited to:
- Structured data (e.g., 100% of the EMR, including some claims)
- Unstructured text (e.g., clinical notes, radiology reports)
- Images, video (e.g., PACS, pathology)
- Waveforms, streaming inpatient data (e.g., ECGs, ICU and bed monitors)
We are an AWS and Python shop. Our clients, partners, and other internal teams may work with other languages and platforms from time to time (e.g., R, MATLAB, GCP, Azure).
You are a residency-trained physician who also knows your way around electronic health record data and can code. Your primary responsibility is to create AI-ready datasets for our clients in our AWS environment. This will involve working across the organization with both engineers and clinical scientists to build our own clean, reproducible datasets and code base. Your team’s ultimate goal is to deliver the highest-quality data products possible to our clients, who are AI developers building products that improve patient health.
You bring your clinical and data skill sets together in the following ways:
- Interacting with clients—ranging from AI startups to established life sciences companies—to understand their clinical data needs, and translate semantic concepts (e.g., “fatty liver disease”) into queries that can be executed in our data (e.g., ICD code ranges, regular expression matching for radiology reports)
- Querying complex source systems in a range of health data sources (e.g., EMRs, ECG data, DICOM data) to create high-quality datasets that can be used to train and validate AI algorithms. You have extremely high levels of attention to detail, but also know how to keep the big picture in mind.
- Working with non-medically-trained data scientists as they help create and validate these datasets.
You are not afraid to dig into massive, confusing, disorganized new datasets and get them under control. You are excited to learn new environments, languages, and skills. This is a small, early stage company with enormous ambitions and everyone pitches in to do everything.
Your specific responsibility is to supervise the process from customer request to dataset delivery. This will include the following:
- Perform complex data extraction, manipulation, and summarization of large amounts of data to create analytical datasets and provide a range of solutions to support customers’ AI activities
- Support the design, testing, validation, analysis, and merging of multimodal data structures from a variety of source systems
- Support all phases of SQL/analytical programming, data management, quality control, and reporting
- Collaborate with customers to define and deliver on data requests with accountability for timely and high-quality delivery
- Develop code to deliver high-quality data products on time to customers
- Create summarized findings and recommendations that are clearly presented and adapted for audiences that have a varying range of technical and clinical experience
- Ensure accuracy, data integrity, and validity of data and analysis in all work
- Summarize the complexity of these data structures and operations into clear explanations and documentation for internal and external audiences
- Present to senior leadership as well external audiences
- MD/DO and completed residency
- Clinical Informatics board-certification/board-eligibility preferred
- 1+ years experience querying EHR data (fluency in SQL required)
- Experience with extracting, curating, and analyzing data created within the HIT and healthcare delivery ecosystem (e.g., EMR, claims, registry); this may include knowledge of the roles of data exchange and content standards (e.g., FHIR, CDA, CQL) and clinical terminology standards (e.g., ICD, CPT, LOINC, SNOMED-CT, NDC, RxNorm)
- Awareness and understanding of data privacy, anonymization, data protection, security, data ethics and how to address these in data analyses
- Hands-on technical and project leadership experience
- Strategic and outcomes-oriented mindset that balances depth and breadth to demonstrate value
- Strong technical writing, editing, and communication skills along with a collaborative mindset
- Excellent organizational skills with an ability to embrace change and effectively manage multiple projects and consistently plan work to meet deadlines
- Experience working in or with startups is a plus
Technology Experiences and Skills
We don’t expect anyone to have all of the following skills or experiences, but we do seek candidates who are interested in growing their skill sets and working with health care data and all its glorious complexity. The Data Team works closely with our Engineering Team to put our work into production and meet client needs.
- Python and/or R
- Git and version control
- Familiarity with writing regular expressions
- Prior experience querying EDWs or databases and creating reports or analytics for healthcare data
- Familiarity with the data aspects of electronic medical records, especially Epic, Cerner, Allscripts, etc.
- Prior experience working with insurance claims data
- Familiarity with medical terminologies or controlled vocabularies such as ICD-10, SNOMED-CT, LOINC, CPT/HCPCS, NDC, and RxNorm
- Any medical ontology experience
- Any NLP experience
- Any experience working with DICOM or other imaging modalities
- Experience with OMOP common data model
- Data viz in Python or R
- Automated reporting (e.g., Quarto)
- Familiarity with Machine Learning concepts
- Experience with AWS
Note that familiarity with machine learning model development and deployment is not required for this position, but familiarity with high level ML concepts is a plus.
Potential Work Examples
- Learning about EHR data models and querying data based on client needs
- Researching client requests and investigating how the needed data is collected, stored, and used by the health system
- Working with analytic and clinical staff to understand how data is stored
- Cleaning and wrangling data
- Normalizing data from across healthcare systems
- Using and deploying our own ML tools
- Automating tasks that can be automated
- Documenting your work for future reuse and sharing knowledge across the company
- Refining and iterating work based on internal and client feedback
Nature of our work
Our work is fast paced and iterative. We are growing, and we want to support our team members to grow in their skills as well. We are building a team that approaches problems with a diversity of perspectives, values experimentation, and refining our approach based on that experimentation. We work with the full spectrum of healthcare data from tabular data, videos, images, waveforms, etc. If a health system collects it, we might work with it!
If this looks like a partial fit, please reach out, we would love to share more about the work we do for you to understand if it would be a good fit for you.
There is occasional travel for in-person company working days on roughly a quarterly basis.
- Professional development days to build your skills
- Collegial work environment
- Academic bent towards inquiry and problem solving but start-up speed and flexibility
- Remote work and flexible hours, need to be available for meetings which we try to keep to a healthy minimum
- Great balance of focus time to work on projects but easy to access team members to discuss issues and work collaboratively
- Dandelion is a mission driven company to improve patient care!