by sib-swiss
🦜✨ Chat system, MCP server, and reusable components to improve LLMs capabilities when generating SPARQL queries
# Add to your Claude Code skills
git clone https://github.com/sib-swiss/sparql-llmThis project provides tools to enhance the capabilities of Large Language Models (LLMs) in generating SPARQL queries for specific endpoints:
sparql-llm pip packageThe system integrates Retrieval-Augmented Generation (RAG) and SPARQL query validation through endpoint schemas, to ensure more accurate and relevant query generation on large scale knowledge graphs.
The components are designed to work either independently or as part of a full chat-based system that can be deployed for a set of SPARQL endpoints. It requires endpoints to include metadata such as SPARQL query examples and endpoint descriptions using the Vocabulary of Interlinked Datasets (VoID), which can be automatically generated using the .
No comments yet. Be the first to share your thoughts!
Top skills in this category by stars
[!TIP]
You can quickly check if an endpoint contains the expected metadata at sib-swiss.github.io/sparql-editor/check
The server exposes a Model Context Protocol (MCP) endpoint to access biodata resources at the SIB, through their SPARQL endpoints, such as UniProt, Bgee, OMA, SwissLipids, Cellosaurus at chat.expasy.org/mcp
question (string): the user's questionpotential_classes (list[string]): high level concepts and potential classes that could be found in the SPARQL endpointssteps (list[string]): split the question in standalone smaller parts if relevantclasses (list[string]): high level concepts and potential classes that could be found in the SPARQL endpointsquery (string): a valid SPARQL query stringendpoint (string): the SPARQL endpoint URL to execute the query againstFollow the instructions of your client, and use the URL of the public server: https://chat.expasy.org/mcp
For example, for GitHub Copilot in VSCode, to add a new MCP server through the VSCode UI:
ctrl+shift+i or cmd+shift+i), and make sure the mode is set to Agent in the bottom rightctrl+shift+p or cmd+shift+p), and search for MCP: Open User Configuration, this will open a mcp.json fileConnect to a running streamable HTTP MCP server, such as the publicly available chat.expasy.org/mcp.
In your VSCode mcp.json you should have the following:
{
"servers": {
"expasy-mcp-http": {
"url": "https://chat.expasy.org/mcp/",
"type": "http"
}
}
}
uvx sparql-llm
Optionally you can provide the path to a custom settings JSON file to configure the server (e.g. the list of endpoints that will be indexed and available through the server), see the Settings class for detailed available settings.
Example settings file for your MCP server deployment:
{
"app_org": "Your organization",
"app_topics": "genes, proteins, lipids, chemical reactions, and metabolomics data",
"endpoints" : [
{
"label": "UniProt",
"endpoint_url": "https://sparql.uniprot.org/sparql/",
"description": "UniProt is a comprehensive resource for protein sequence and annotation data."
},
{
"label": "Bgee",
"description": "Bgee is a database for retrieval and comparison of gene expression patterns across multiple animal species.",
"endpoint_url": "https://www.bgee.org/sparql/",
"homepage_url": "https://www.bgee.org/"
}
]
}
Example mcp.json file to add and configure the MCP server in a client (e.g. VSCode):
{
"servers": {
"expasy-mcp": {
"type": "stdio",
"command": "uvx",
"env": {
"SETTINGS_FILEPATH": "/Users/you/sparql-mcp.json"
},
"args": [
"sparql-llm"
]
}
}
}
[!IMPORTANT]
Click on
Startjust on top of"openroute-mcp"to start the connection to the MCP server.You can click the wrench and screwdriver button 🛠️ (
Configure Tools...) to enable/disable specific tools
[!NOTE]
More details available in the VSCode MCP official docs.
Requires Python >=3.10
pip install sparql-llm
Or with uv:
uv add sparql-llm
Load SPARQL query examples defined using the SHACL ontology from a SPARQL endpoint. See github.com/sib-swiss/sparql-examples for more details on how to define the examples.
from sparql_llm import SparqlExamplesLoader
loader = SparqlExamplesLoader("https://sparql.uniprot.org/sparql/")
docs = loader.load()
print(len(docs))
print(docs[0].metadata)
You can provide the examples as a file if it is not integrated in the endpoint, e.g.:
loader = SparqlExamplesLoader("https://sparql.uniprot.org/sparql/", examples_file="uniprot_examples.ttl")
Refer to the LangChain documentation to figure out how to best integrate documents loaders to your system.
[!NOTE]
You can check the completeness of your examples against the endpoint schema using this notebook.
Generate a human-readable schema using the ShEx format to describe all classes of a SPARQL endpoint based on the VoID description present in the endpoint. Ideally the endpoint should also contain the ontology describing the classes, so the rdfs:label and rdfs:comment of the classes can be used to generate embeddings and improve semantic matching.
[!TIP]
Checkout the void-generator project to automatically generate VoID description for your endpoint.
from sparql_llm import SparqlVoidShapesLoader
loader = SparqlVoidShapesLoader("https://sparql.uniprot.org/sparql/")
docs = loader.load()
print(len(docs))
print(docs[0].metadata)
You can provide the VoID description as a file if it is not integrated in the endpoint, e.g.:
loader = SparqlVoidShapesLoader("https://sparql.uniprot.org/sparql/", void_file="uniprot_void.ttl")
The generated shapes are well-suited for use with a LLM or a human, as they provide clear information about which predicates are available for a class, and the corresponding classes or datatypes those predicates point to. Each object property references a list of classes rather than another shape, making each shape self-contained and interpretable on its own, e.g. for a Disease Annotation in UniProt:
up:Disease_Annotation { a [ up:Disease_Annotation ] ; up:sequence [ up:Chain_Annotation up:Modified_Sequence ] ; rdfs:comment xsd:string ; up:disease IRI }
You can also generate the complete ShEx shapes for a SPARQL endpoint with:
from sparql_llm import get_shex_from_void
shex_str = get_shex_from_void("https://sparql.uniprot.org/sparql/")
print(shex_str)
This takes a SPARQL query and validates the predicates/types used are compliant with the VoID descriptio