Computational biology is hard.
Computational tools evolve, data becomes more abundant, more people get into the field, the competition becomes more fierce every day. However, these are not the major challenge of computational biology.
It’s the time.
You can only know so much, but in reality the information out there is seemingly infinite. The simple tasks are the actual bottlenecks in publishing good papers. Ask any PhD student:
What did you spend more time on?
A: Actual experiment and discovery process
B: Editorial process and creating nice figures
You will be truly be surprised!
—
But, that’s how it actually works. As a PhD student, you are in training to become a scientist. In science, there are certain ways to do things. You can’t blurb sentences in a piece of paper and expect it to nail it down. It has to be precise. It needs to be checked by others, validated and made sure that what you are presenting to scientific community (and eventually to everyone) is actually true.
This process involves you to be trained in a certain way. YOU need to understand the necessities and YOU are responsible for making sure that everything is set up correctly. That brings me back to my title…
My job as computational biologist is analyzing big complex datasets, inferring valuable information and presenting my findings. The problem is: I can’t do this alone, neither you (the reader) can’t.
This requires collaboration with other experts, in my case with doctors and hematologists; as I am analyzing datasets on hematological malignancies.
Okay cool bro, but so what?
So… the topic most PhD candidates cover are very niche topics with only handful of experts all around the world. And, one can become the expert only going through this process and this makes it impossible for us to be an expert at everything.
You know what is coming in the next sentence… yes AI.
Recently, I wrote a pipeline where a language model does annotation of cell types in single cell data. It is fairly simple. No deep learning, no hardcore machine learning, no fancy methods. Pure, hard coded automation and it kinda works.
It is not perfect, but it CAN be perfect.
—
“My job as computational biologist is analyzing big complex datasets, inferring valuable information and presenting my findings.“
Hmm. (intrusive thoughts emerge here)
Can I automate ME?
Am I really the bottleneck? F*ck.
Okay, let’s just imagine a real life scenario.
Let’s say we have a new single-cell dataset produced by a wet-lab group, and they need an expert to analyze this dataset. Here is a realistic breakdown of these types of analysis:
- You start with the quality control, and possible integration.
- Clustering, differential and enrichment analysis across clusters.
- Additional analysis? Show genes in plots.
- You need experts to see the results, probably then cell type annotation.
- Experts don’t like your annotations 😦
- Go back to (3) till experts are satisfied.
- Move on with analysis across conditions.
- Make pretty figures ❤
- Confirm and discuss findings with experts.
- Experts don’t like your figures 😦
- Go back to (8) till experts are satisfied.
- Wrap up your findings, and write everything down.
- Send it to experts.
- Experts don’t like your writing 😦
- Question your life choices.
- Go back to (12) till experts are satisfied.
- Send it to other experts.
- Other experts are not happy with your annotations, figures, and writing.
- AaAaAAAa!!!111!
- Go back to (3) till all experts you know are satisfied.
- Send it to other experts that you don’t know so you can go back to (1).
Phew. Yes. This is more or less how this single cell data would be properly analyzed.
As I said earlier, this takes time.
Depending on how busy these experts are: couple of months to couple of years.
But, on more realistic terms, there are literally NO steps here that one can’t automate with LLM agents.

Last Remarks
Am I really the bottleneck? No, not yet.
But, it is definitely not crazy to expect a future where the scientific endeavors are accelerated rapidly with the coming age of ease in analysis. And, personally I would like to be builder of this future.

Leave a comment