By Kenichi Hartman, Ph.D.

In my previous article, I had alluded to the fact that that Immunai has two basic components to the technology they are developing: (1) single-cell profiling of immune cells using high throughput next generation sequencing (NGS) and (2) application of machine learning (ML; also referred to as “AI”) to the terabytes of data generated by the profiling. As I noted in the previous article, they have (to the best of my knowledge) in-licensed rights to IP relating to the single-cell profiling. How about IP for the other half of their tech efforts, for applying machine learning to the database of single-cell profiles? Can inventions based on application of ML to biological data even be patented? The answer is YES.

There are different flavors of ML patents relating to bioinformatics. Broadly, then can be grouped into two basic categories: (1) methods and computational tools for analyzing bioinformatics data and (2) insights gained as a result of data analysis.

I have not found ML patents that are owned by, or have likely been in-licensed by, Immuai thus far (though it’s something I intend to keep an eye on!). However, To flesh out what bioinformatics ML patents can look like, I’d like to share an illustrative example of a patent from Illumina, an industry-leading provider of NGS platforms and reagents. They are the proverbial 800-pound gorilla in this space, with a market cap of about $45 billion. In addition to their platform sequencing tech, they also provide software tools such as BaseSpace and DRAGEN for managing and analysing the massive amounts of sequencing data produced by their sequencers. Therefore, it should come as no surprise that they are actively involved in developing ML-based tools to analyze large volumes of sequencing data to produce health-related insights.

Illumina has been conducting collaborative research projects with academic research institutions such as Stanford, Harvard, and others, in various aspects of sequence analysis. Two years ago, one of these collaborative projects resulted in a Nature Genetics paper entitled “Predicting the clinical impact of human mutation with deep neural networks” (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6237276/). The paper describes a convolutional neural network (CNN; a type of deep learning network) that was trained to determine pathogenicity of missense mutations in rare disease patients with 88% accuracy. The CNN was trained using hundreds of thousands of common missense variants, including benign variants derived from population sequencing of six non-human primate species. This study also resulted in a Github module for an open-source version of the trained CNN called “PrimateAI” (https://github.com/Illumina/PrimateAI) and an announcement from Illumina that PrimateAI will be integrated into their BaseSpace software platform.

Not only that, Illumina filed a related group of patent applications (a “patent family”) based on three simultaneously filed International PCT applications that are all directed to the technology behind PrimateAI. The PCT applications were published as WO2019079180A1, WO2019079182A1, and WO2019079182A1, directed to embodiments of a deep convolutional neural network-based variant pathogenicity classifier and methods for training such a classifier.

From this large patent family, Illumina was recently awarded US patent 10,558,915, entitled “Deep learning-based techniques for training deep convolutional neural networks” in February of this year, which had the following main claim:

1. A method of constructing a variant pathogenicity classifier, the method including:

training a convolutional neural network-based variant pathogenicity classifier, which runs on numerous processors coupled to memory, using as input benign training example pairs and pathogenic training example pairs of reference protein sequences and alternative protein sequences, wherein the alternative protein sequences are generated from benign variants and pathogenic variants; and

wherein the benign variants include common human missense variants and non-human primate missense variants occurring on alternative non-human primate codon sequences that share matching reference codon sequences with humans.

What’s particularly notable about this patent is the brevity of the main claim, from which I’d like to tease out some general take-aways about ML patents:

  1. It is possible to obtain a claim about a neural network that is limited primarily by the nature of the training data. The only limitation in the claim regarding the architecture of the classifier being trained is that it’s a CNN. While the application itself (which is well over 100 pages) goes into great detail about the specifics of the logical structure of the CNN, these details were not relevant to the invention’s patentability (this point is made especially apparent in the prosecution history of the parent application, now US patent 10,423,861). More importantly, claim 1 is not limited by such constraints.
  2. There is no need to develop a fundamentally new type of neural network architecture in order to obtain a patent about neural networks. A new, clever application of an existing class of networks can be enough. Many aspects of the CNN described in the application are “off-the-shelf”, and the basic idea of applying neural networks, even CNNs in particular, to analyze sequence data is not in itself new. No problem. The patentable innovation in this case was in choosing, and effectively providing, a new type of training data to train a functioning CNN-based pathogenicity classifier.
  3. An abundance of detail in the application text, counter-intuitively, can support broader claims. Details that appear in the application to be vital for enabling the claimed invention do not necessarily have to be recited as limitations in the main claim. For example, the application text (which is over 100 pages long) goes into great detail on how the training of the PrimateAI CNN was achieved. The training of the PrimateAI CNN was a complicated process that included semi-automatically labeling sequences to generate usable training data from a very large sequencing data set and making use of two sub-networks that were trained to respectively determine secondary structure and solvent accessibility from the amino acid sequence. Nevertheless, the Examiner did not require these aspects to be added into claim 1. This matches my own experience with ML and big-data patents, in which I have found that “over-explaining” the technology in the application and providing one or more very detailed and very particular examples can be a key factor towards achieving allowance for claims that appear counter-intuitively broad (provided that the broader embodiments are also explicitly noted in the application text).

Zooming out now, I’d like to discuss what the information above means for the bioinformatics AI startups that are popping up like mushrooms these days. Given the ever-maturing state of the art, the number of startups already in play, and the substantial investments into AI/ML that incumbents like Illumina are committing to, a generic call to apply AI/ML to bioinformatics is not a sufficient value proposition on its own to justify creating and funding a new startup. The good news is that relatively broad patents directed to clever applications of ML to sequencing and other large-scale biological data, and are not overly limited to particular implementations, can be obtained (provided the idea is novel and non-obvious, of course!). The flip side is that startups should start thinking from very early on what products they eventually hope to market and what moats (be they patents, proprietary datasets, trade secrets and knowhow) they can build to either prevent an incumbent from replicating the product in their existing platforms (say, Illumina’s BaseSpace) or induce the incumbent to pay a good price for an acquisition.

*This article is NOT meant to be legal advice! Whether or not a particular invention can be patented, and what degree of breadth can be achieved, is dependent on many factors that cannot be fully described in any one article, or even a series of articles. Anyone interested in pursuing a patent should consult with a qualified patent agent or attorney.

Link to the original article