We proposed an Open Knowledge Network for NASA Space Biology that uses RAG, embeddings, knowledge graphs, and human validation to answer scientific questions with evidence instead of hallucinated confidence.
Dreamers designed an NSF Proto-OKN concept in collaboration with NASA Space Biology to connect research publications, datasets, ontologies, and biological knowledge into a system researchers could query in natural language. The goal was not just better search. It was a reliable scientific question-answering layer that could retrieve relevant evidence, map concepts across domains, and make AI systems less prone to hallucination in fields where bad answers can cause real harm.
The technical plan started with a broad graph foundation: 22 priority datasets plus 7 existing knowledge graphs, tools, and networks. The target integrations included GeneLab / OSDR, NCBI, PubMed, NIH RePORTER, OrthoDB, NSF public-access repositories, SPOKE, OBO Foundry, DBpedia, NIEM, Bio2RDF, SemMedDB, Hetionet, Monarch Initiative, Wikidata, and KnowWhereGraph. The data model extended the ISA Investigation-Study-Assay framework with RDF triples, OWL relationship constraints, SHACL validation, JSON-LD / TTL compatibility, and source-linked identifiers so data could be traced, reviewed, and removed by source when needed.
The OKN was designed around NASA Space Biology use cases, including GeneLab, NIH data, NIEM, ISA-style study models, biomedical ontologies, and cross-species biological relationships. Instead of forcing researchers to know every schema or ontology in advance, the system would use AI to extract terms, match synonyms, normalize concepts, and surface connected data through a transparent graph.
Scientific knowledge is fragmented across papers, datasets, ontologies, and domain-specific repositories. A researcher asking a critical question may need to connect radiation exposure, gene expression, organism models, assay metadata, biomedical terms, and prior studies across multiple incompatible systems. Standard search is too shallow for that. LLMs are too unreliable by themselves. In critical fields like biology, medicine, aerospace research, and public health, a fluent answer is not enough. The system needs source grounding, traceability, confidence signals, and expert review.
Dreamers proposed a RAG-centered OKN that combines semantic search, embeddings, generative AI, and graph data modeling. Publications and datasets would be parsed into a flexible knowledge graph, with AI extracting named concepts, mapping synonyms, identifying relationships, and translating natural-language questions into graph-aware queries. Example workflows included finding studies above a radiation threshold across organisms, connecting upregulated genes in radiation-exposed mice to human disease datasets, and using orthology mapping to bridge model-organism evidence back to human biology.
The system would use retrieval first, then generation. Answers would be grounded in relevant datasets, research papers, ontology mappings, graph relationships, and confidence-scored connections before any natural-language response was produced. Human experts would remain in the loop for validation, correction, and manual precedence where AI-generated relationships conflict with reviewed knowledge.
The architecture combined practical product work with hard scientific infrastructure: Neo4j or RDF-compatible graph storage, edge attributes for confidence scores, Neolace-style manual graph review, Faiss-scale embedding search, GPT-assisted concept extraction, AWS Lambda, EC2 or Kubernetes autoscaling, CDN-backed APIs, OAuth, SSL/TLS, WAF protection, and rate-limited public access. The proposal also included an auto-update mechanism for integrated sources and a plain-language interface for both data importing and graph-aware querying.
The broader ambition was a scientific source-of-truth layer: an open, extensible network that AI models, researchers, educators, and the public could query to reduce misinformation, improve research access, and keep answers tied to the best available human knowledge.
"Very promising." — NASA Ames Research Center planning discussions