Blog

  • Source

    Headlines like this have become all but common within the healthcare space for the last 1-2 years.

    The fear is real. I understand it. So let’s delve into what’s happening.


    First, what are LLMs?

    Large Language Models (LLMs) are Artificial Intelligence systems that have learned to systematize and write human language by ingesting huge amounts of text—like books, articles, and websites. Think of them like very smart “text predictors” that use statistics to guess the next word in a sentence based on what came before. By practicing this guessing game many billions of times, they get really good at creating sentences that make sense.

    These models use special section of Artificial Intelligence called transformers that helps them understand context – how words connect to each other in long sentences or paragraphs. This means they can answer questions, summarize information, or even write new text that sounds natural.

    In healthcare and computational biology, LLMs can be helpful to scientists and doctors by quickly reading and summarizing tons of medical papers or patient records, assisting in writing reports and other tedious tasks…

    They do not “think” like people, they are essentially pattern seeking machines that use patterns they learned during training to generate useful language. Sometimes their answers can surprise people by being very creative or very wrong, so humans need to ALWAYS check their work.


    But can they be useful? We’re gonna explore some usages of LLMs in medicine and some benefits and drawbacks from that use.

    Usages of LLMs in Healthcare

    • Clinical Decision Support: LLMs like GPT-3.5 and GPT-4 analyze electronic health records and medical imaging results to assist clinicians with diagnosis and treatment planning. For example, LLMs have been used as medical chatbots across many specialties, helping interpret patient data and medical literature. (Source)
    • Medical Education and Research: LLMs facilitate rapid literature review, summarize complex biomedical information, and aid in clinical trial design and patient recruitment, supporting research workflows in computational biology and medicine. (Source)
    • Patient Communication: LLMs generate personalized, understandable explanations for patients, convert complex medical jargon into plain language, and provide tailored health guidance to empower patients and caregivers. (Source)
    • Health Promotion and Disease Prevention: By integrating patient data, LLMs identify high-risk groups and recommend preventive strategies, fostering proactive medical care. (Source)
    • Clinical Workflow Automation: Tasks such as summarizing discharge instructions and generating clinical documentation have been streamlined with LLM support. (Source)
    The current distribution of projects applying LLMs in a healthcare setting (Source)

    Let’s talk first about the benefits, I understand the fears, but to properly address the usage, we also need to understand the benefits:

    • Efficiency and Scalability: LLMs can process massive datasets quickly, enabling faster clinical decision-making and literature synthesis.
    • Personalization: Models will be able in the future to provide individualized patient advice based on medical history and current guidelines, enhancing treatment adherence.
    • Broadened Access: LLM-driven chatbots can become educational tools to increase healthcare accessibility, especially in remote or underserved areas.
    • Advanced Research Tools: They are a massive support in Literature and Database tasks such as drug discovery and gene function prediction.

    Finally, let’s discuss the dangers:

    • Diagnostic Limitations: LLMs can have severe limitations and are prone to hallucinations- Despite successes on exams like the USMLE, LLMs can lack clinical context and reasoning, leading to errors in real-world settings.
    • Lack of Transparency: Proprietary models often do not disclose training data, impeding trust and validation by clinicians. This is why public research is paramount and so important to counteract proprietary models;
    • Non-reproducibility: Model output may vary across runs due to inherent stochasticity, which presents clinical risk, especially for minority groups which are underepressented in the data;
    • Bias and Safety Concerns: Training data bias can lead to inaccurate or harmful recommendations; hallucinations (fabricated content) remain a frequent issue. Minorities within the data tend to see the biases they face reflected on the LLM.
    • Technical and Ethical Issues: Data privacy, the need for human oversight, and evolving regulatory frameworks pose challenges for widespread adoption.

    LLMs represent a transformative, powerful tool that can augment healthcare professionals and researchers, but their effective and safe use depends on awareness of their strengths and limitations. Human expertise remains essential to interpret, verify, and ethically integrate these AI systems at the frontlines of medicine.

    Leave a comment: Large Language Models in Healthcare: The front line perspective
  • Boring you think?

    Until you face the daunting task of dealing with a project without documentation and properly commented code and then its full blow panic 😨…

    So hear me out, I’ve done my fair share of computational projects in biology and medicine to give you some basics of good project management. Let’s go!

    • Properly Structure the Project – A good structure makes all the difference in the future and if your find a consistent one accross your projects it will be so easy to know where each file is and in which folder. Here’s my Typical one:
      • /Data – in which i often create two subfolders: /Data/rawdata and /Data/prepareddata . In this folder goes all the data of the project. If the project demands data preparation it’s ok to create the aforementioned folders, allowing you to keep a copy of the raw data and fine tune the data preparation pipeline, over and over again.
      • /Scripts or /Functions all functions or reusable scripts should go in here.
      • /Results or /Outputs to save the results of the code;
      • /Docs to save documentation on procedures and the project
      • /logs when running tools such as in multiple sequence alignment, logs are important to debug errors. Save them in their folder.
      • Outside of all the folders should be your main or master script that aggregates all the pipeline and your Readme file with all the basic instructions to understand and run the project.
    “Example folder structure for organizing a bioinformatics project with data, scripts, results, and documentation.”
    How your Folder tree should look in a project
    • Use Version Control Tools – Keeping track of your code and analysis steps is crucial in any bioinformatics/computational biology project, where pipelines evolve and mistakes happen. Version control systems like Git help you record changes, backtrack when something breaks, and collaborate with others without overwriting each other’s work. Start by initializing a Git repository in your project folder, commit changes regularly with clear messages, and consider hosting your repository on platforms like GitHub, GitLab, or Bitbucket for easy sharing and backup. Even for solo projects, version control is an invaluable safety net. And it helps you build a profile of projects and pipelines you’ve built along the way!
    • Keep a Clear README File:
      A well-written README is the roadmap to your project — it tells others (and your future self) exactly what your project is about and how to navigate it. At a minimum, your README should explain the purpose of the project, describe the folder structure, list all software and dependencies (including version numbers), and provide simple instructions on how to run your analysis step-by-step. This single file can save countless hours of confusion, especially when you revisit the project months later or share it with collaborators. Treat it as your project’s user manual.
    • Make a Concise Header for each file – A concise and simple header for each file is crucial to keep it easy to understand. Below is a simple example you can use.
    #***********************************************
    # Project : <Project Name>
    #
    # Script name : <name of the file> 
    #
    # Author: <name of the author>
    #
    # Date created: YYYYMMDD
    #
    # Summary: <short summary of what's on the file>
    #
    #
    # Revision History: <add updates here>
    #
    # Date        Author      Num    Summary
    # YYYYMMDD    <author>      1      Created
    # 
    #**********************************************

    5. Use Clear and Consistent Naming Conventions: Good file names make it easy to understand your data at a glance and prevent mix-ups as your project grows. Use descriptive but concise names — for example, sample1_trimmed.fastq is far clearer than file1.fastq. Including version numbers or dates can help track different stages of your analysis. Avoid spaces in file names; instead, use underscores or hyphens to keep everything script-friendly and easy to read. A consistent naming system saves time and confusion for both you and anyone else working with your data.

    6. Archive and Share Data Responsibly: Large bioinformatics datasets can quickly eat up storage and become difficult to manage. Compress intermediate or final results using tools like tar or gzip to save space and make transfers more efficient. When sharing data with collaborators or moving files between servers, create checksum files (such as with md5sum) to ensure that files remain intact and uncorrupted during transfer. For long-term storage and to make your work accessible to the scientific community, consider depositing data in trusted repositories like NCBI SRA, ENA, or Zenodo. Proper archiving and sharing help maintain data integrity and support open, reproducible science.

    1. Document Each Step of Your Workflow: Don’t rely on memory — write things down as you go! For every step in your pipeline, keep track of what you did, why you did it, and which tool and parameters you used. You can document this in a simple Markdown file, a lab notebook, or directly as comments in your scripts. It’s also a good habit to save the terminal output or log files for each run; this helps with debugging when something goes wrong and gives you a clear audit trail for your results.
    2. Write a Short Final Report : Once your project is finished (or ready to share), take the time to write a short summary. It doesn’t need to be a full manuscript — just a clear, concise overview of what you did, what you found, and where to find all the relevant scripts, data, and results. This little report can be a lifesaver when you revisit the project months later, or when you want to turn your pipeline into a figure or table for publication.

    ✅ Wrapping Up


    I know it might feel tedious at first, but good project organization and clear documentation are truly your best friends in bioinformatics and computational biology. They save you from countless headaches, prevent lost data and forgotten parameters, and make it so much easier to share your work with collaborators or your future self.

    So next time you start a new project, invest a bit of time up front to structure it well, use version control, write helpful READMEs, keep your code tidy, and back up your data properly. Your future self will thank you — trust me!

    💡 What about you?


    I’d love to hear your favorite tricks or horror stories about messy projects. Drop a comment below and let’s help each other stay organized!

    Leave a comment: 🧬 “Reproducible Bioinformatics/Computational Biology: How to Structure, Document, and Share Your Work”
  • Computational Biology is a fascinating area, which has been steadily developing since the 1970s with the development of the Needleman-Wunsch Algorithm .

    Visual Representation of the Needleman-Wunsch Algorithm

    However it has always been hard to find a source that helps compile knowledge in this area. Thus the birth of this site and its companion personal blog https://insilicobiology.blog/.

    On this website you will find curated articles on this area I love and cherish as well as a build-up of resources (curated or created by me) so you can learn more about it. You’ll find them distributed by Tech Resources, Computational Biology Resources, and Precision Medicine Resources.

    On https://insilicobiology.blog/ you will find more free discourse about the interconnections between computational biology and medical research and more free prose on how it is to be a computational biologist in the 21st century.

    Hope you these bytes of the natural world!

    Leave a comment: What is this Website About?

Join 900+ subscribers

Stay in the loop with everything you need to know.