Pancreatic Cancer Biomedical Knowledge Graph
Description
This dataset comprises approximately 1 million high-confidence biomedical triples focused on pancreatic cancer, constructed from a curated set of 23 relevant biomedical entities (KRAS, TP53, gemcitabine) and 11 common relation types ( mutated_in, treats, interacts_with). Each triple is embedded in a synthetic, natural language sentence mimicking scientific phrasing and is paired with a simulated attention score ranging from 0.75 to 1.00, reflecting transformer-based model confidence. Heuristic boosting was applied to biologically plausible combinations, resulting in an average attention score near 0.88. This structured resource is ideal for training, validating, or benchmarking biomedical NLP models and knowledge extraction systems within the context of pancreatic cancer.