Schema Version Identifier

From WikiSTEP

Jump to: navigation, search

This is a discussion page for the STEP modules team about identification of EXPRESS schemas and STEP files (ISO 10303-21). In ISO 10303-1:1994 "Overview and fundamental principles" clause "4.3 Information object registration" the "Abstract Syntax Notation One (ASN.1)" (ISO/IEC 8824-1) is declared to be used for the "unambiguous identification of (Express) schemas and other information objects". There have been some problems over the years with doing so and this pages tries to clarify the actual state and to discuss the possible changes.

Contents

Use Cases

The following use cases are included:

  • UC1: Verifying that a local copy of an EXPRESS file is the same version that is referenced in a standard
  • UC2: Verifying that a local copy of an EXPRESS file is the same version that is stored in a remote repository
  • UC3: Verifying the EXPRESS schema and version that an implementation is compliant with

The locations of of schema name assignments and schema version assignments for Modular APs (current process)

  • The short form 400 level schema (EXPRESS ed2) input data:
AP210_ELECTRONIC_ASSEMBLY_INTERCONNECT_AND_PACKAGING_DESIGN_MIM
  • The longform generation result (EXPRESS ed1):
AP210_ELECTRONIC_ASSEMBLY_INTERCONNECT_AND_PACKAGING_DESIGN_MIM_LF
  • Annex C.1 in AP document (2xx parts) specifies the FILE_SCHEMA value in the part 21 file:
"The FILE_SCHEMA element of the header shall specify the name of the EXPRESS schema used and include its object information identifier (see Annex E)."
  • EXAMPLE The instance below identifies the Ap210_electronic_assembly_interconnect_and_packaging_design schema:
FILE_SCHEMA (('AP210_ELECTRONIC_ASSEMBLY_INTERCONNECT_AND_PACKAGING_DESIGN_MIM_LF {1 0 10303 410 2 1 4}'))
     
  • Annex B.2 in Top level implementation module (4xx part) assigns object identifiers:
{ iso standard 10303 part(410) version(2) schema(1) ap210-electronic-assembly-interconnect-and-packaging-design-mim-lf(4) }
  • AP development guidelines [need a doc reference here] require that ISO 10303-410 Longform schema has the following rule which requires data population in the part 21 data section:
RULE application_protocol_definition_required FOR (application_context);
WHERE
	WR1 : SIZEOF( QUERY( ac <* application_context |
              (SIZEOF (QUERY (apd <* USEDIN(ac,'AP210_ELECTRONIC_ASSEMBLY_INTERCONNECT_AND_PACKAGING_DESIGN_MIM_LF.APPLICATION_PROTOCOL_DEFINITION.APPLICATION') |
                (apd.application_interpreted_model_schema_name = 'ap210_electronic_assembly_interconnect_and_packaging_design')
                )) > 0)
              )) > 0;
 END_RULE;
  • The current process will result in a part 21 file with the following when the implementors follow Annex C.1 (and also abide by the implementor agreement to populate the publication year value):
FILE_SCHEMA(('AP210_ELECTRONIC_ASSEMBLY_INTERCONNECT_AND_PACKAGING_DESIGN_MIM_LF {1 0 10303 410 2 1 4}'));
#887=APPLICATION_PROTOCOL_DEFINITION(,'ap210_electronic_assembly_interconnect_and_packaging_design',2009,#888);
#888=APPLICATION_CONTEXT('EM pilot');
  • The application_context string value is an example; actual values depend on the context of the implementation and may be subject to further agreements among exchange parties.

Identification Methods

The purpose of identification is to allow a user to map a local copy of a file or document to a reference. SC4 standards using EXPRESS edition one provide a robust mechanism for uniquely identifying schema names since the name of the schema is included both in the standard and in the EXPRESS file and in the implementation forms.

A schema version included as part of a published standard can theoretically be identified by any of the following depending on the reference location:

  • Standard designator + schema type
  • ISO URN
  • CVS Id (for schemas stored in stepmod)
  • Subversion Id (for published modules)
  • ASN.1 Identifier (for non-modular APs) The ASN.1 identifier is only in the edition one long form schemas of APs.
  • WG N-number (for modules and AICs only. This is not UNIQUE for IRs.)

ASN.1 Object Identifier format

See http://www.obj-sys.com/asn1tutorial/node10.html for an explanation of the format

ISO 10303 Description and implementation methods

A schema version identifier is supported by all ISO 10303 description and implementation methods, at least for the most recent editions

ISO 10303-11 EXPRESS

The first edition of EXPRESS (1994) had no support of any schema version identification, but with the second edition (2004) an optional "schema_version_id" was introduced (see clause 9.3 Schema):

296 schema_decl = SCHEMA schema_id [ schema_version_id ] ’;’ schema_body END_SCHEMA ’;’ .
298 schema_version_id = string_literal .

Examples:

SCHEMA geometry_schema ’version_2’;
END_SCHEMA;
SCHEMA support_resource_schema ’{ISO standard 10303 part(41) object(1) version(8)}’;
END_SCHEMA;

EXPRESS does neither define the format of the schema version identifier nor does it define any logic how these identifiers may interfere in the case that several versions are available.

Only for the purpose of creating the long form schema in Annex G "Generating a single schema from multiple schemas" it is said that the schema version identifier shall be converted into an equivalent embedded remark if available:

(* Original 2003 schemas: \n
   schema = <schema_id> [schema_version_id = ’<version>’] ; \n
   ...
 *)

So far the EXPRESS schemas defined within ISO 10303 do not take advantage of this capability; the use of this capability is not yet enforced in any way.

ISO 10303-21 STEP File

Starting already with the first edition of part 21 in 1994 a schema object identifier is supported: "The attribute schema_identifiers shall consist of a list of strings, each of which shall contain the name of the schema optionally followed by the object identifier assigned to that schema."

In addition it is required ("shall") that the object identifier is given in the ASN.1 format. So this is more strict than the equivalent wording in EXPRESS. But on the other side there is no formal mapping between what EXPRESS calls an "schema version identifier" and what part 21 calls a "object identifier". In addition the normative text recommends the use of the object identifier when given which is the case for all schemas defined within ISO 10303.

A note explains that the object identifier shall be enclosed within braces ("{", "}"). Example:

FILE_SCHEMA (('AUTOMOTIVE DESIGN { 1 0 10303 214 1 1 1 }'));

It is important to note that Part 21 does not require use of object identifiers. Here is a quote from Part 21:2001, clause 8.2.3:

"If an object identifier is provided, it shall have the form specified in ISO/IEC 8824-1. The use of object identifiers within this International Standard is described in clause 3 of ISO 10303-1. When available, the use of the object identifier is recommended as it provides unambiguous identification of the schema. NOTE The general form of an object identifier is a sequence of space-delimited integers. The sequence is enclosed within braces ("{", "}")."

Here is an application of version identification for the pdm schema which does not use object identifiers: "Schema version identification: version identification for the PDM Schema shall be encoded in the header section of the STEP Part 21 exchange file to identify the version of the schema to which the file conforms. This is done with the header entity file_schema, which identifies the EXPRESS schemas that specify the entity instances in the data section. The attribute schema_identifiers contains a list of strings that name the schema, optionally followed by the object identifier assigned to that schema. In place of the object identifier, the PDM Schema version identification number shall be enclosed within curly braces. Only capital letters shall be used in schema name strings.

EXAMPLE: To indicate PDM Schema version 1.2, the following instance of the header entity file_schema should be used.

ISO-10303-21;
HEADER;
...
FILE_SCHEMA(('PDM_SCHEMA {1.2}'));"

(The syntax for the part 21 file for the pdm schema is wrong. The syntax should be:

FILE_SCHEMA(('PDM_SCHEMA 1.2'));

since the use of braces indicates object identifiers are used.

ISO 10393-22 SDAI

The Standard Data Access Interface (SDAI) supported a schema identification from the very beginning. For the SDAI dictionary schema there is

 ENTITY schema_definition;
   name : express_id;
   identification : OPTIONAL info_object_id;
   ...
 END_ENTITY;

This capability is e.g. supported by the open source JSDAI implementation.

ISO 10303-28 STEP-XML

In p28 "XML representation of EXPRESS schemas and data" an "schema_version" is supported in the XML element "express":

<xs:element name="express" nillable="true">
 <xs:complexType>
  <xs:simpleContent>
   <xs:extension base="xs:string">
    <xs:attribute name="schemaLocation" type="ex:Seq-anyURI"" use="optional"/>
    <xs:attribute name="id" type="xs:ID" use="required"/>
    <xs:attribute name="schema_identifier" type="xs:normalizedString" use="optional"/>
    <xs:attribute name="schema_name" type="xs:normalizedString" use="optional"/>
    <xs:attribute name="schema_version" type="xs:normalizedString" use="optional"/>
   </xs:extension>
  </xs:simpleContent>
 </xs:complexType>
</xs:element>

Like p21 also p28 requires that the optional schema_version is given in the form of an ASN.1 identifier for the case that the schema is part of ISO 10303.

Verification Methods

"A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that an accidental or intentional change to the data will change the hash value. The data to be encoded is often called the "message", and the hash value is sometimes called the message digest or simply digest."

Wikipedia cryptographic hash function

Comparing the hash value of a local copy of a schema against a published hash value will enable a user to determine that they have the correct version of the schema.

Even a small change in the source will drastically change the output of the hash function, so it is important to preprocess the EXPRESS before feeding it into the hash function:

  • Remove all comments
  • Translate keywords to upper case
  • Remove unnecessary white space
  • Address the last line/first line concatenation issue for concatenated EXPRESS files. (probably easiest to just have one blank line at end of schema)--Tom Thurman 01:07, 7 September 2009 (UTC)

Issues

SC4 does not currently enforce the inclusion of a "schema_version_id" in the EXPRESS file for all the EXPRESS schemas. This should be enforced.

As stated above the schema_version_id is available on for EXPRESS edition 2, but not for files according to Edition 1. So far the "long form" Express schemas have to be in accordance with edition 1.

P21 misses a formal statement that says that the object identifier is the same as the schema_version_id in EXPRESS.

IRs do not have UNIQUE WG N number for each schema. (The root cause is that IRs don't have a file per schema architecture like APs and modules.)

But they have a unique ASN.1. This should be sufficient. Lothar Klein 15:07, 5 September 2009 (UTC)

Agreed, as long as each schema is represented by an individual EXPRESS file and as long as the ASN.1 is included in the EXPRESS file.--Tom Thurman 00:24, 6 September 2009 (UTC)

The ASN.1 notation does not support minor changes to schemas.

With the clarification given here [1] each published (end) version has a unique id, including editions and technical corrigenda and amendments. But what is not covered are ballot (CD, DIS) and other internal versions. Only for the very first DIS ballot part 1 allow the use of version 0. But this is the consequently not available for DIS ballots in later editions. And we also have no way to specify CD or CD-TS and FDIS ballots.

If we allow the ASN.1 to include both unpublished and published versions (e.g., unpublished versions {0,2,4,6,7,8,9,10,11,13,14,15,16...} and published versions {1, 3, 5, 12}) then the identification problem for minor changes is at least addressed since the unpublished versions are some form of intermediate, presumably minor change. Since the published SC4 standard explicitly specifies the version number (rather than an incremental reference to a previous version number) this should be possible to enable immediately with an SC4 process change.--Tom Thurman 00:43, 6 September 2009 (UTC)

If we want to publish minor changes, that obviously invalidates the above proposal.--Tom Thurman 00:43, 6 September 2009 (UTC)

The longform generation process needs to be reviewed. One idea is to add a constant to the top level schema that will be converted by the longform generation process into a schema version identifier.

Not all EXPRESS tools support the schema_version_id construct.

Clause C.1 in a modular AP is redundant to clause B.x in a 400 level module.

Blanks sometimes appear before and after the ASN.1 sequence in ISO 10303 documents. Need to query the CAX-IF for opinion on ignoring leading and trailing blanks.

Possible Future Changes

Personal tools