Healthcare
Healthcare¶
Medical Knowledge¶
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation
Developments: The authors devlope a powerful graph-rag based system called MediGraphRag for medical discustions reaching SOTA results. They do so, they most notably introduce 'Triple Graph Construction'.
- Dynamically selected chunk partitioning.
- Entity extraction using structured output of name, type, and a context in how it mattered.
- Perform Triple Link construction
Graph Construction: 'Triple Link' They create a Triple link Repository Graph (RepoGraph). They create three graphs:
Involving the creation of several graph layers.
- Medical papers/books
- Medical Dictionaries
- UMLS made of well-defined medical vocabularies and their relationships.
They link the entities that are extracted from medical books/papers as E2, based on their relevance, then they use to compute cosine similarity. They then create a relation as a consise phrase based on on the entity, and associated references.
Next, they tag the graphs strting with some pre-defined tags, and iteratively generate more tag summaries, for closely-related graphs. These tags have multiple categories. Language models then tag the system using a prompts akin to the following
Generate a structured summary from the provided medical content,
strictly adhering to the following categories... {Tag
Name: Description of the tag}... .
They then iterate using heirarchichal clustering based on tag similarity to group the graphs and generate synthesized tag summaries, with each graph having it's own groups. They try to merge the top 20% of most similar pairs, and merge them if the pairwise similarities cross a specified threshold.
Retrieval To do retrieval, they generate tags based on the query Q, and then, generally, find similar tabs and reconstruct a prompt that includes the concatenated entity names and relationships:
Given QUESTION:
{Q}. GRAPH: {ei
[na]+Re
j
ei
+ej [na], ...}. Answer
the user question: QUESTION using the graph:
GRAPH...
Patient Care¶
Doctor GPT implements advanced LLM prompting for organizing, indexing and discussing PDFs, and does so without using any type of opinionated prompt processing frameworks
Disease prediction¶
Genome-wide prediction of disease variant effects with a deep protein language model 'A Model that predects bad genetic variants'
Here we implemented a workflow generalizing ESM1b to protein sequences of any length and used it to predict all ~450 million possible missense variant effects across all 42,336 protein isoforms in the human genome.
The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics A quality set of JAX-enabled transformer models for use in downstream uses.
They use 6mer tokenization and embeddings. Non-commercial license. Github
Drug synthesis¶
Generative AI for designing and validating easily synthesizable and structurally novel antibiotics
The authors demonstrate a powerful AI that helps to discover antibiotic chemicals with some demonstrating antibacterial activity.
"""Many generative AI models for drug design are only tested in silico because the molecules they design are synthetically intractable. Without synthesis and wet lab validation, it’s hard to know whether AI-generated molecules actually work as expected.
SyntheMol was built exclusively design easy-to-synthesize molecules to enable wet lab validation. SyntheMol creates molecules using molecular building blocks and chemical reactions from the @EnamineLtd REAL Space of 30 billion molecules, which ensures easy synthesis.
SyntheMol uses a Monte Carlo Tree Search (MCTS) to explore the vast space of easily synthesizable compounds for promising drug candidates. The MCTS is guided by a trained molecular property prediction model such as a graph neural network (GNN).
We applied SyntheMol to design #antibiotics for A. baumannii, a bacterium with few treatments. We screened ~13,500 compounds against A. baumannii, which we used to train our property prediction models. We then ran SyntheMol to design tens of thousands of antibiotic candidates.
We filtered our #AI-generated #molecules for novelty, predicted efficacy, and diversity, and we worked with @EnamineLtd to #Synthesize 58 #compounds. We found that 6 of those 58 compounds (10% hit rate) were highly potent against A. baumannii and a range of other bacterial species.
SyntheMol can generate molecules that are easy to synthesize and are effective in the wet lab, bridging the gap between AI-based drug design and #experimentall validation.
SyntheMol’s code is fully open-source at https://github.com/swansonk14/SyntheMol. Data, models, and generated molecules for our antibiotics application are at https://zenodo.org/records/10257839.