Open Science Data & API Hub

Direct access to raw datasets, API endpoints, and semantic mapping for research and academic integration.

Data Infrastructure

Standardized machine-readable datasets for researchers, automated data processing, and quantitative modeling.

Relational & Structured Data

Standardized Frictionless Data packages for integration with institutional databases and research management systems.

FRICTIONLESS DATA PACKAGE

Datapackage.json

Contains dataset schema, type definitions, and metadata for automated table generation.

https://yourselftoscience.org/datapackage.json

Machine-Readable Context (AI & LLM Modeling)

Raw text and metadata formatted specifically for automated processing and large-scale language modeling. Supports structured search and information retrieval.

LLM NATIVE CONTEXT

llms.txt

Raw markdown optimized for feeding directly into system prompts and vector databases.

https://yourselftoscience.org/llms.txt

OPENAPI SPECIFICATION

openapi.json

Allows Custom GPTs and AI Agents to natively call and query the dataset via static endpoints.

https://yourselftoscience.org/openapi.json

Tabular Datasets & JSON Endpoints

Static data endpoints and bulk export files for Python/Pandas workflows, statistical modeling, and local ingestion.

STATIC JSON ENDPOINT

Standard JSON array of all active profiles.

https://yourselftoscience.org/resources.json

RAW CSV EXPORT

Static CSV file for offline spreadsheet workflows.

https://yourselftoscience.org/resources.csv

Linked Data & Semantic Mapping

RDF Turtle graphs and VoID metadata aligned with Wikidata identifiers to ensure semantic interoperability with global research repositories.

RDF TURTLE GRAPH

.ttl

Full ontology and instance data in semantic format.

https://yourselftoscience.org/resources.ttl

VoID DESCRIPTOR

Linked Data

Vocabulary of Interlinked Datasets (VoID) metadata mapping.

https://yourselftoscience.org/void.ttl

Data Licensing (CC0 1.0)

Our dataset is dedicated to the public domain under the Creative Commons CC0 1.0 Universal Public Domain Dedication. You can ingest, modify, and distribute the data for any purpose without limitation.

Wikidata Integration

Each resource is mapped to stable Wikidata QIDs. This alignment is maintained to ensure our dataset remains interoperable with global knowledge graphs and research databases.

Currently serving as the verification source for 32 upstream Wikidata entities.

Dataset Schema

The dataset contains the following fields for each resource. For detailed definitions of each data type, visit our Full Data Dictionary.

  • id: A persistent, unique identifier (UUID) for the resource.
  • permalink: The permanent URI linking directly to the resource's dataset page.
  • slug: A user-friendly identifier used in the URL.
  • title: The name of the resource or study.
  • organizations: An array of organizations conducting the research, each with a name and optional Wikidata ID.
  • link: A URL to the resource's website.
  • dataTypes: An array of strings describing the types of data collected (e.g., "Genome", "Health data").
  • compensationType: The type of compensation offered ("donation", "payment", or "mixed").
  • origin: The country where the organization is based (Headquarters).
  • countries: An array of countries where the resource is available.
  • description: A brief description of the resource.
  • citations: An array of academic citations related to the resource.
  • compatibleSources: Known accepted dataset sources (e.g., "WGS", "23andMe").
  • resourceWikidataId: The main Wikidata QID aligned with the project.
  • entityCategory: The general type of the organization (e.g., "Non-Profit", "Government").
  • entitySubType: A more specific classification of the organization (e.g., "Research Foundation", "Regulatory Agency").
  • isCitedOnWikidata: Boolean flag indicating if the resource currently uses the catalogue as a verifiable reference URL (P854).
  • wikidataReferenceUrl: The specific Wikidata URL connecting the resource to the catalogue citation (if applicable).
  • rorId: The Research Organization Registry (ROR) identifier for the primary organization, when available.
  • rorTypes: Organization types from ROR (e.g., "education", "government", "healthcare", "company", "nonprofit").