IEEE.org     |     IEEE Xplore Digital Library     |     IEEE Standards     |     IEEE Spectrum     |     More Sites

Unverified Commit 2756f3f8 authored by Joshua Gay's avatar Joshua Gay 🏃🏼

Creates initial release of BioCompute Object Schema in prep for ballot

Signed-off-by: Joshua Gay's avatarJoshua Gay <j.gay@ieee.org>
parents
.DS_Store
# This is the list of BioCompute Object Schema authors for copyright purposes.
#
# This does not necessarily list everyone who has contributed code,
# since in some cases, their employer may be the copyright holder.
# To see the full list of contributors, see the file CONTRIBUTORS.
The Translational Genomics Research Institute
Gil Alterovitz
Michael Crusoe
Jeremy Goecks
John Quackenbush
Marco Schito
Hiroki Morizono
Paul Walsh
Hadley King
Dennis Dean II
Stian Soiland-Reyes
Raja Mazumder
Jonal Almeida
Carole Goble
Joseph Sayed Nooraga
Janisha Patel
Robel Kahsay
# This is the list of BioCompute Object Schema contributors
#
# This does not necessarily list the copyright holders, since in some
# cases, an employer may be the copyright holder. To see the full
# list of copyright holders, see the file AUTHORS
Jason Travis
Gil Alterovitz
Michael Crusoe
Jeremy Goecks
John Quackenbush
Marco Schito
Hiroki Morizono
Paul Walsh
Hadley King
Dennis Dean II
Stian Soiland-Reyes
Raja Mazumder
Jonal Almeida
Carole Goble
Joseph Sayed Nooraga
Janisha Patel
Robel Kahsay
Copyright 2019 The BioCompute Schema Authors
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SPDX-License-Identifier: BSD-3-Clause
\ No newline at end of file
# BioCompute Object Schema
BioCompute Object Schema is a project of the IEEE P2791 BioCompute
Working Group intended for use as part of IEEE P2791 (Standard for
Bioinformatics Computations and Analyses Generated by High-Throughput
Sequencing (HTS) to Facilitate Communication).
## License
All source files (.json files) in this repository are subject to the
following copyright and licensing terms.
Copyright 2019 The BioCompute Object Schema Authors.
See the LICENSE file distributed with this work for copyright and
licensing information, the AUTHORS file for a list of copyright
holders, and the CONTRIBUTORS file for the list of contributors.
## Disclaimer
This open source repository contains material that may be included-in
or referenced by an unapproved draft of a proposed IEEE Standard. All
material in this repository is subject to change. The material in this
repository is presented "as is" and with all faults. Use of the
material is at the sole risk of the user. IEEE specifically disclaims
all warranties and representations with respect to all material
contained in this repository and shall not be liable, under any
theory, for any use of the material. Unapproved drafts of proposed
IEEE standards must not be utilized for any conformance/compliance
purposes.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://biocomputeobject.org/schemas/biocomputeobject.json",
"type": "object",
"title": "Base type for all BioCompute Objects",
"description": "All BioCompute object types must adhear to this type in order to be compliant with BioCompute framework",
"required": [
"bco_id",
"bco_spec_version",
"checksum",
"provenance_domain",
"usability_domain",
"description_domain",
"execution_domain",
"io_domain",
"error_domain"
],
"definitions": {
"bco_id": {
"type": "string",
"description": "A unique identifier that should be applied to each BCO instance, generated and assigned by a BCO database engine. IDs should never be reused",
"examples": [
"https://w3id.org/biocompute/examples/HCV1a.json"
]
},
"uri": {
"type": "object",
"description": "A Uniform Resource Identifer",
"additionalProperties": false,
"required": [
"uri"
],
"properties": {
"filename": {
"type": "string"
},
"uri": {
"type": "string",
"format": "uri"
},
"access_time": {
"type": "string",
"format": "date-time"
},
"sha1_chksum": {
"type": "string",
"description": "hash function that produces a message digest",
"pattern": "[A-Za-z0-9]+"
}
}
},
"contributor": {
"type": "object",
"description": "Contributor identifier and type of contribution (determined according to PAV ontology) is required",
"required": [
"contribution",
"name"
],
"additionalProperties": false,
"properties": {
"name": {
"type": "string",
"description": "Name of contributor",
"examples": [
"Charles Hadley King"
]
},
"affiliation": {
"type": "string",
"description": "Organization the particular contributor is affiliated with",
"examples": [
"George Washington University"
]
},
"email": {
"type": "string",
"description": "electronic means for identification and communication purposes",
"examples": [
"hadley_king@gwu.edu"
],
"format": "email"
},
"contribution": {
"type": "array",
"description": "type of contribution determined according to PAV ontology",
"reference": "https://doi.org/10.1186/2041-1480-4-37",
"items": {
"type": "string",
"enum": [
"authoredBy",
"contributedBy",
"createdAt",
"createdBy",
"createdWith",
"curatedBy",
"derivedFrom",
"importedBy",
"importedFrom",
"providedBy",
"retrievedBy",
"retrievedFrom",
"sourceAccessedBy"
]
}
},
"orcid": {
"type": "string",
"description": "Field to record author information. ORCID identifiers allow for the author to curate their information after submission. ORCID identifiers must be valid and must have the prefix ‘https://orcid.org/’",
"examples": [
"https://orcid.org/0000-0003-1409-4549"
],
"format": "uri"
}
}
}
},
"additionalProperties": false,
"properties": {
"bco_id": {
"$ref": "#/definitions/bco_id",
"readOnly": true
},
"bco_spec_version": {
"type": "string",
"description": "Version of the BCO specification used to define this document",
"examples": [
"https://w3id.org/biocompute/spec/v1.2"
],
"readOnly": true,
"format": "uri"
},
"checksum": {
"type": "string",
"description": "A string-type, read-only value, protecting the object from internal or external alterations without proper validation generated with a SHA-256 hash function.",
"examples": [
"5986B05969341343E77A95B4023600FC8FEF48B7E79F355E58B0B404A4F50995"
],
"readOnly": true,
"pattern": "^([A-Za-z0-9]+)$"
},
"provenance_domain": {
"$ref": "provenance_domain.json"
},
"usability_domain": {
"$ref": "usability_domain.json"
},
"extension_domain": {
"properties": {
"fhir_extension": {
"type": "array",
"items": {
"$ref": "extension_domain/fhir_extension.json"
}
},
"scm_extension": {
"$ref": "extension_domain/scm_extension.json"
}
}
},
"description_domain": {
"$ref": "description_domain.json"
},
"execution_domain": {
"$ref": "execution_domain.json"
},
"parametric_domain": {
"$ref": "parametric_domain.json"
},
"io_domain": {
"$ref": "io_domain.json"
},
"error_domain": {
"$ref": "error_domain.json"
}
}
}
\ No newline at end of file
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://biocomputeobject.org/schemas/description_domain.json",
"type": "object",
"title": "Description Domain",
"description": "Structured field for description of external references, the pipeline steps, and the relationship of I/O objects.",
"required": [
"keywords",
"pipeline_steps"
],
"properties": {
"keywords": {
"type": "array",
"description": "Keywords to aid in search-ability and description of the object.",
"items": {
"type": "string",
"description": "This field should take free text value using common biological research terminology.",
"examples": [
"HCV1a",
"Ledipasvir",
"antiviral resistance",
"SNP",
"amino acid substitutions"
]
}
},
"xref": {
"type": "array",
"description": "List of the databases or ontology IDs that are cross-referenced in the BCO.",
"items": {
"type": "object",
"description": "External references are stored in the form of prefixed identifiers (CURIEs). These CURIEs map directly to the URIs maintained by Identifiers.org.",
"reference": "https://identifiers.org/",
"required": [
"namespace",
"name",
"ids",
"access_time"
],
"properties": {
"namespace": {
"type": "string",
"description": "External resource vendor prefix",
"examples": [
"pubchem.compound"
]
},
"name": {
"type": "string",
"description": "Name of external reference",
"examples": [
"PubChem-compound"
]
},
"ids": {
"type": "array",
"description": "List of reference identifiers",
"items": {
"type": "string",
"description": "Reference identifier",
"examples": [
"67505836"
]
}
},
"access_time": {
"type": "string",
"description": "Date and time the external reference was accessed",
"format": "date-time"
}
}
}
},
"platform": {
"type": "array",
"description": "reference to a particular deployment of an existing platform where this BCO can be reproduced.",
"items": {
"type": "string",
"examples": [
"hive"
]
}
},
"pipeline_steps": {
"type": "array",
"description": "Each individual tool (or a well defined and reusable script) is represented as a step. Parallel processes are given the same step number.",
"items": {
"additionalProperties": false,
"type": "object",
"required": [
"step_number",
"name",
"description",
"input_list",
"output_list"
],
"properties": {
"step_number": {
"type": "integer",
"description": "Non-negative integer value representing the position of the tool in a one-dimensional representation of the pipeline."
},
"name": {
"type": "string",
"description": "This is a recognized name of the software tool",
"examples": [
"HIVE-hexagon"
]
},
"description": {
"type": "string",
"description": "Specific purpose of the tool.",
"examples": [
"Alignment of reads to a set of references"
]
},
"version": {
"type": "string",
"description": "Version assigned to the instance of the tool used corresponding to the upstream release.",
"examples": [
"1.3"
]
},
"prerequisite": {
"type": "array",
"description": "Reference or required prereqs",
"items": {
"type": "object",
"description": "Text value to indicate a package or prerequisite for running the tool used.",
"required": [
"name",
"uri"
],
"properties": {
"name": {
"type": "string",
"description": "Public searchable name for reference or prereq.",
"examples": [
"Hepatitis C virus genotype 1"
]
},
"uri": {
"$ref": "biocomputeobject.json#/definitions/uri"
}
}
}
},
"input_list": {
"type": "array",
"description": "URIs (expressed as a URN or URL) of the input files for each tool.",
"items": {
"$ref": "biocomputeobject.json#/definitions/uri"
}
},
"output_list": {
"type": "array",
"description": "URIs (expressed as a URN or URL) of the output files for each tool.",
"items": {
"$ref": "biocomputeobject.json#/definitions/uri"
}
}
}
}
}
}
}
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://biocomputeobject.org/schemas/error_domain.json",
"type": "object",
"title": "Error Domain",
"description": "",
"required": [
"empirical_error",
"algorithmic_error"
],
"properties": {
"empirical_error": {
"type": "object",
"title": "Empirical Error",
"description": "empirically determined values such as limits of detectability, false positives, false negatives, statistical confidence of outcomes, etc. This can be measured by running the algorithm on multiple data samples of the usability domain or through the use of carefully designed in-silico data."
},
"algorithmic_error": {
"type": "object",
"title": "Algorithmic Error",
"description": "descriptive of errors that originate by fuzziness of the algorithms, driven by stochastic processes, in dynamically parallelized multi-threaded executions, or in machine learning methodologies where the state of the machine can affect the outcome."
}
}
}
\ No newline at end of file
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://biocomputeobject.org/schemas/execution_domain.json",
"type": "object",
"title": "Execution Domain",
"description": "The fields required for execution of the BCO are herein encapsulated together in order to clearly separate information needed for deployment, software configuration, and running applications in a dependent environment",
"required": [
"script",
"script_driver",
"software_prerequisites",
"external_data_endpoints",
"environment_variables"
],
"additionalProperties": false,
"properties": {
"script": {
"type": "array",
"description": "points to internal or external references to a script object that was used to perform computations for this BCO instance.",
"items": {
"additionalProperties": false,
"properties": {
"uri": {
"$ref": "biocomputeobject.json#/definitions/uri"
}
}
}
},
"script_driver": {
"type": "string",
"description": "Specification of the kind of executable that can be launched in order to perform a sequence of commands described in the script in order to run the pipelin",
"examples": [
"hive",
"cwl-runner",
"shell"
]
},
"software_prerequisites": {
"type": "array",
"description": "Minimal necessary prerequisites, library, tool versions needed to successfully run the script to produce BCO.",
"items": {
"type": "object",
"description": "A necessary prerequisite, library, or tool version.",
"required": [
"name",
"version",
"uri"
],
"additionalProperties": false,
"properties": {
"name": {
"type": "string",
"description": "Names of software prerequisites",
"examples": [
"HIVE-hexagon"
]
},
"version": {
"type": "string",
"description": "Versions of the software prerequisites",
"examples": [
"babajanian.1"
]
},
"uri": {
"$ref": "biocomputeobject.json#/definitions/uri"
}
}
}
},
"external_data_endpoints": {
"type": "array",
"description": "Minimal necessary domain-specific external data source access in order to successfully run the script to produce BCO.",
"items": {
"type": "object",
"description": "Requirement for network protocol endpoints used by a pipeline’s scripts, or other software.",
"required": [
"name",
"url"
],
"additionalProperties": false,
"properties": {
"name": {
"type": "string",
"description": "Description of the service that is accessed",
"examples": [
"HIVE",
"access to e-utils"
]
},
"url": {
"type": "string",
"description": "The endpoint to be accessed.",
"examples": [
"https://hive.biochemistry.gwu.edu/dna.cgi?cmd=login"
]
}
}
}
},
"environment_variables": {
"type": "object",
"description": "Environmental parameters that are useful to configure the execution environment on the target platform.",
"additionalProperties": false,
"patternProperties": {
"^[a-zA-Z_]+[a-zA-Z0-9_]*$": {
"type": "string"
}
}
}
}
}
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://biocomputeobject.org/schemas/extension_domain/fhir_extension.json",
"type": "object",
"required": [
"fhir_endpoint",
"fhir_version",
"fhir_resources"
],
"properties": {
"fhir_endpoint": {
"type": "string",
"description": "Base URI of FHIR server where the resources are stored",
"examples": [
"http://fhirtest.uhn.ca/baseDstu3"
],
"format": "uri"
},
"fhir_version": {
"type": "string",
"description": "FHIR version of the server endpoint"
},
"fhir_resources": {
"type": "array",
"items": {
"type": "object",
"required": [
"fhir_resource",
"fhir_id"
],
"properties": {
"fhir_resource": {
"type": "string",
"description": "Type of FHIR resource used"
},
"fhir_id": {
"type": "string",
"description": "Server-specific identifier string"
}
}
}
}
}
}
\ No newline at end of file
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://biocomputeobject.org/schemas/extension_domain/scm_extension.json",