RCSB PDB Protein Data Bank A Member of the wwPDB
An Information Portal to Biological Macromolecular Structures
PDB Home | Contact Us

PROPOSED DATA ITEMS DESCRIBING PROTEIN PRODUCTION and CRYSTALLIZATION

(Revised: 8-July-2003 J. Westbrook)

 

These items are supposed to allow reproduction of the protein production and

are aimed at the same level of detail as a J. Mol. Biol. Publication.

 

Please email jon@strubi.ox.ac.uk with any comments/questions on these items.

Further details are available from http://www.oppf.ox.ac.uk/ (mmCIF2.htm).

 

 

01-Jul-02: Starting dictionary obtained from http://pdb.rutgers.edu/mmcif.

               Original dictionary developed with significant input from

               Cathy Lawson (Rutgers), Rosalind Kim (Berkeley),

               Kim Henrick (EBI/MSD), and John Westbrook (Rutgers).

 

Changes

 

12-Jul-02:  The *method items are likely to be enumerated lists rather than free text

            Development stage dictionary item name has changed to _entity_src_gen.gene_src_dev_stage –

            it was not a .pdbx_ item

            _entity_src_gen_clone.gene_insert_method has been added and mention of

            _entity_src_gen.pdbx_gene_insert_method removed

 

15-Jul-02:  Several item name changes

            ENTITY_SRC_GEN_CLONE_LIGATION_FREE has changed name to ENTITY_SRC_GEN_CLONE_RECOMBINATION.

            _entity_src_gen_express.protein_location removed – covered by

            _entity_src_gen.pdbx_host_org_cellular_location.

            Added _entity_src_gen_lysis.time

            Added _entity_src_gen_fract.protein_volume

            Removed _entity_src_gen_chrom.fraction_volume

            Added _entity_src_gen_refold.time

            Protein characterisation split out of ENTITY_SRC_GEN_PURE into ENTITY_SRC_GEN_PURE_CHARACTER

 

20-Aug-02: Corrections as pointed out by Ulrich Harttig, PSF (Berlin)

            Removed _entity_src_gen.pdbx_host_org_gene_fusion_details

            Added _entity_src_gen_express.inducer

Added _entity_src_gen_express.inducer_concentration

            Removed _entity_src_gen_express.expression_level – value unknowable at this time

            Added _entity_src_gen_fract.protein_yield_method

            Renamed _entity_src_gen_chrom.protein_concentration to _entity_src_gen_chrom.sample_concentration

            Added _entity_src_gen_chrom.elution_buffer_id

            Added _entity_src_gen_chrom.sample_conc_method

            Added _entity_src_gen_chrom.yield_method

            Renamed _entity_src_gen_remove_tag.tag_removal_details to _entity_src_gen_remove_tag.details

            Removed _entity_src_gen_remove_tag.cleavage_site – derivable

            Removed _entity_src_gen_pure_character.pure_id

            Added _entity_src_gen_pure_character.prod_step_id

 

08-Oct-02: Several alterations arising from discussions with John Ionides, EBI (Cambridge)

            Entry has been typed – this is to facilitate data exchange using mmCIF

            Entity references a globally unique target – source of this is a point for discussion

            The sequence-storage category is a new category, PDBX_CONSTRUCT

Sequence annotation is in PDBX_CONSTRUCT_FEATURE

            The expression category has been extended to contain expression strain information – seems sensible

            The OD timepoints have been split off into a sub category – previous attempt was not valid mmCIF

            ENTITY_SRC_GEN_REMOVE_TAG has been renamed ENTITY_SRC_GEN_PROTEOLYSIS – improve generality

            ENTITY_SRC_GEN_CHARACTER has been reworded to be applicable at any stage

            A generic process step category has been added – ENTITY_SRC_GEN_PROD_OTHER +

            ENTITY_SRC_GEN_PROD_OTHER_PARAMETER

            The workflow-tracing item prod_step_id has been renamed next_step_id – seems a more useful item

 

7-July-2003  Added crystallization data items.

 

Known Omissions and Issues

 

Cell-free expression is missing

 

Transformation method for expression is not explicitly mentioned – it was hoped that this could be covered by the transformation items in the cloning category but it may be easier to simply add the requisite data items to the expression category also.

 

Data Items

 

The *details items generally allow free text entry. Other items generally allow numeric data or selection from a predefined enumerated list. Current thinking is more clearly expressed in the web pages available from http://www.oppf.ox.ac.uk/ (mmCIF2.htm).

 

Further information about the dictionaries and mmCIF can be found at http://pdb.rutgers.edu/mmcif/ .


ENTRY

NB: Items are only detailed here where they modify or extend the pre-existing ENTRY category

 

Item name

Description

_entry.id

The value of _entry.id identifies the data block. Note that this item need not be a number; it can be any unique identifier. In the context of a structural genomics project this identifier, when prefixed by the value of _entry.type, should be globally unique For protein production it is envisaged that the value of _entry.id should uniquely define a product from a protein production run. This will normally be a protein sample but may also be an expression vector.

_entry.type

For exchange within a structural genomics project, the value of _entry.id idebtifies the type of data given in the data block. A null value for this item indicates that this is not a structual genomics exchnage mmCIF but rather a complete entry that conforms to the _pdbx_style definitions. It is envisaged that both 'P' and 'E' are possible products of a protein production facility and would be identical in all aspects except that P describes a sample that is (predominatly) protein and E describes a sample that is a nucleic acid.

ENTITY

NB: Items are only detailed here where they modify or extend the pre-existing ENTRY category

 

Item name

Description

_entity.id

The value of _entity.id must uniquely identify a record in the ENTITY list. Note that this item need not be a number; it can be any unique identifier.

_entity.target_id

The value of _entity.target_id points to a target idenitifier from which this entity was generated.

 


ENTITY_SRC_GEN

NB: Items are only detailed here where they modify or extend the pre-existing ENTRY category

 

Item name

Description

_entity_src_gen.entity_id

This data item is a pointer to _entity.id in the ENTITY category.

_entity_src_gen.host_org_common_name

The common name of the organism that served as host for the production of the entity. Where full details of the protein production are available it would be expected that this item be derived from _entity_src_gen_express.host_org_common_name or via _entity_src_gen_express.host_org_tax_id

_entity_src_gen.host_org_details

A description of special aspects of the organism that served as host for the production of the entity. Where full details of the protein production are available it would be expected that this item would derived from _entity_src_gen_express.host_org_details

_entity_src_gen.host_org_strain

The strain of the organism in which the entity was expressed. Where full details of the protein production are available it would be expected that this item be derived from _entity_src_gen_express.host_org_strain or via _entity_src_gen_express.host_org_tax_id

_entity_src_gen.plasmid_details

A description of special aspects of the plasmid that produced the entity in the host organism. Where full details of the protein production are available it would be expected that this item would be derived from _pdbx_construct.details of the construct pointed to from _entity_src_gen_express.plasmid_id.

_entity_src_gen.plasmid_name

The name of the plasmid that produced the entity in the host organism. Where full details of the protein production are available it would be expected that this item would be derived from _pdbx_construct.name of the construct pointed to from _entity_src_gen_express.plasmid_id.

_entity_src_gen.pdbx_host_org_variant

Variant of the organism used as the expression system. Where full details of the protein production are available it would be expected that this item be derived from entity_src_gen_express.host_org_variant or via _entity_src_gen_express.host_org_tax_id

_entity_src_gen.pdbx_host_org_cell_line

A specific line of cells used as the expression system. Where full details of the protein production are available it would be expected that this item would be derived from entity_src_gen_express.host_org_cell_line

_entity_src_gen.pdbx_host_org_atcc

Americal Tissue Culture Collection of the expression system. Where full details of the protein production are available it would be expected that this item would be derived from _entity_src_gen_express.host_org_culture_collection

_entity_src_gen.pdbx_host_org_culture_collection

Culture collection of the expression system. Where full details of the protein production are available it would be expected that this item would be derived somehwere, but exactly where is not clear.

_entity_src_gen.pdbx_host_org_cell

Cell type from which the gene is derived. Where entity.target_id is provided this should be derived from details of the target.

_entity_src_gen.pdbx_host_org_scientific_name

The scientific name of the organism that served as host for the production of the entity. Where full details of the protein production are available it would be expected that this item would be derived from _entity_src_gen_express.host_org_scientific_name or via _entity_src_gen_express.host_org_tax_id

_entity_src_gen.pdbx_host_org_tissue

The specific tissue which expressed the molecule. Where full details of the protein production are available it would be expected that this item would be derived from _entity_src_gen_express.host_org_tissue

_entity_src_gen.pdbx_host_org_vector

Identifies the vector used. Where full details of the protein production are available it would be expected that this item would be derived from _entity_src_gen_clone.vector_name.

_entity_src_gen.pdbx_host_org_vector_type

Identifies the type of vector used (plasmid, virus, or cosmid). Where full details of the protein production are available it would be expected that this item would be derived from _entity_src_gen_express.vector_type.

_entity_src_gen.expression_system_id

A unique identifier for the expression system. This should be extracted from a local list of expression systems.

_entity_src_gen.gene_src_dev_stage

A string to indicate the life-cycle or cell development cycle in which the gene is expressed and the mature protein is active.

_entity_src_gen.start_construct_id

A pointer to _pdbx_construct.id in the PDBX_CONSTRUCT category. The indentified sequence is the initial construct.

ENTITY_SRC_GEN_PROD_DIGEST

This category contains details for the DIGEST steps used in the overall protein production process. The digestion is assumed to be applied to the result of the previous production step, or the gene source if this is the first production step.

Item name

Description

_entity_src_gen_prod_digest.entry_id

The value of _entity_src_gen_prod_digest.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_prod_digest.entity_id

The value of _entity_src_gen_prod_digest.entity_id uniquely identifies each protein contained in the project target protein complex whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_prod_digest.step_id

This item is the unique identifier for this digestion step.

_entity_src_gen_prod_digest.next_step_id

This item unique identifier for the next production step. This allows a workflow to have multiple entry points leading to a single product.

_entity_src_gen_prod_digest.end_construct_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced nucleic acid sequence is that of the digest product

_entity_src_gen_prod_digest.robot_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category.

_entity_src_gen_prod_digest.date

The date of this production step.

_entity_src_gen_prod_digest.restriction_enzyme_1

The first enzyme used in the restriction digestion. The sites at which this cuts can be derived from the sequence.

_entity_src_gen_prod_digest.restriction_enzyme_2

The second enzyme used in the restriction digestion. The sites at which this cuts can be derived from the sequence.

_entity_src_gen_prod_digest.purification_details

String value containing details of any purification of the product of the digestion.


ENTITY_SRC_GEN_PROD_PCR

This category contains details for the PCR steps used in the overall protein production process. The PCR is assumed to be applied to the result of the previous production step, or the gene source if this is the first production step.

Item name

Description

_entity_src_gen_prod_pcr.entry_id

The value of _entity_src_gen_prod_pcr.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_prod_pcr.entity_id

The value of _entity_src_gen_prod_pcr.entity_id uniquely identifies each protein contained in the project target protein complex whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_prod_pcr.step_id

This item is the unique identifier for this PCR step.

_entity_src_gen_prod_pcr.next_step_id

This item unique identifier for the next production step. This allows a workflow to have multiple entry points leading to a single product.

_entity_src_gen_prod_pcr.end_construct_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced nucleic acid sequence is that of the PCR product.

_entity_src_gen_prod_pcr.robot_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category. The referenced robot is the robot responsible for the PCR reaction (normally the heat cycler).

_entity_src_gen_prod_pcr.date

The date of this production step.

_entity_src_gen_prod_pcr.forward_primer_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced nucleic acid sequence is that of the forward primer.

_entity_src_gen_prod_pcr.reverse_primer_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced nucleic acid sequence is that of the reverse primer.

_entity_src_gen_prod_pcr.reaction_details

String value containing details of the PCR reaction.

_entity_src_gen_prod_pcr.purification_details

String value containing details of any purification of the product of the PCR reaction.

ENTITY_SRC_GEN_CLONE

This category contains details for the cloning steps used in the overall protein production process. Each row in ENTITY_SRC_GEN_CLONE should have an equivalent row in either ENTITY_SRC_GEN_CLONE_LIGATION or ENTITY_SRC_GEN_CLONE_RECOMBINATION.

Item name

Description

_entity_src_gen_clone.entry_id

The value of _entity_src_gen_clone.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_clone.entity_id

The value of _entity_src_gen_clone.entity_id uniquely identifies each protein contained in the project target protein complex whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_clone.step_id

This item is the unique identifier for this cloning step.

_entity_src_gen_clone.next_step_id

This item unique identifier for the next production step. This allows a workflow to have multiple entry points leading to a single product.

_entity_src_gen_clone.end_construct_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced nucleic acid sequence is that of the cloned product.

_entity_src_gen_clone.robot_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category.

_entity_src_gen_clone.date

The date of this production step.

_entity_src_gen_clone.gene_insert_method

The method used to insert the gene into the vector. For 'Ligation', an ENTITY_SRC_GEN_CLONE_LIGATION entry with matching .step_id is expected. For 'Recombination', an ENTITY_SRC_GEN_CLONE_RECOMBINATION entry with matching .step_id is expected.

_entity_src_gen_clone.vector_name

The name of the vector used in this cloning step.

_entity_src_gen_clone.vector_details

Details of any modifications made to the named vector.

_entity_src_gen_clone.transformation_method

The method used to transform the expression cell line with the vector

_entity_src_gen_clone.marker

The type of marker included to allow selection of transformed cells

_entity_src_gen_clone.verification_method

The method used to verify that the incorporated gene is correct

_entity_src_gen_clone.purification_details

Details of any purification of the product.

ENTITY_SRC_GEN_CLONE_LIGATION

This category contains details for the ligation-based cloning steps used in the overall protein production process. _entity_src_gen_clone_ligation.clone_step_id in this category must point at a defined _entity_src_gen_clone.step_id. The details in ENTITY_SRC_GEN_CLONE_LIGATION extend the details in ENTITY_SRC_GEN_CLONE to cover ligation dependent cloning steps.

Item name

Description

_entity_src_gen_clone_ligation.entry_id

This item is a pointer to _entity_src_gen_clone.entry_id in the ENTITY_SRC_GEN_CLONE category.

_entity_src_gen_clone_ligation.entity_id

This item is a pointer to _entity_src_gen_clone.entity_id in the ENTITY_SRC_GEN_CLONE category.

_entity_src_gen_clone_ligation.step_id

This item is a pointer to _entity_src_gen_clone.step_id in the ENTITY_SRC_GEN_CLONE category.

_entity_src_gen_clone_ligation.cleavage_enzymes

The names of the enzymes used to cleave the vector. In addition an enzyme used to blunt the cut ends, etc., should be named here.

_entity_src_gen_clone_ligation.ligation_enzymes

The names of the enzymes used to ligate the gene into the cleaved vector.

_entity_src_gen_clone_ligation.temperature

The temperature at which the ligation experiment was performed, in degrees celcius.

_entity_src_gen_clone_ligation.time

The duration of the ligation reaction in minutes.

_entity_src_gen_clone_ligation.details

Any details to be associated with this ligation step, e.g. the protocol.

ENTITY_SRC_GEN_CLONE_RECOMBINATION

This category contains details for the recombination-based cloning steps used in the overall protein production process. It is assumed that these reactions will use commercially available kits. _entity_src_gen_clone_recombination.clone_step_id in this category must point at a defined _entity_src_gen_clone.step_id. The details in ENTITY_SRC_GEN_CLONE_RECOMBINATION extend the details in ENTITY_SRC_GEN_CLONE to cover recombination dependent cloning steps.

Item name

Description

_entity_src_gen_clone_recombination.entry_id

This item is a pointer to _entity_src_gen_clone.entry_id in the ENTITY_SRC_GEN_CLONE category.

_entity_src_gen_clone_recombination.entity_id

This item is a pointer to _entity_src_gen_clone.entity_id in the ENTITY_SRC_GEN_CLONE category.

_entity_src_gen_clone_recombination.step_id

This item is a pointer to _entity_src_gen_clone.step_id in the ENTITY_SRC_GEN_CLONE category.

_entity_src_gen_clone_recombination.system

The name of the recombination system.

_entity_src_gen_clone_recombination.recombination_enzymes

The names of the enzymes used for this recombination step.

_entity_src_gen_clone_recombination.details

Any details to be associated with this recombination step, e.g. the protocol or differences from the manufacturer's specified protocol.

ENTITY_SRC_GEN_EXPRESS

This category contains details for the EXPRESSION steps used in the overall protein production process. It is hoped that this category will cover all forms of cell-based expression by reading induction as induction/transformation/transfection.

Item name

Description

_entity_src_gen_express.entry_id

The value of _entity_src_gen_express.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_express.entity_id

The value of _entity_src_gen_express.entity_id uniquely identifies each protein contained in the project target complex proteins whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_express.step_id

This item is the unique identifier for this expression step.

_entity_src_gen_express.next_step_id

This item unique identifier for the next production step. This allows a workflow to have multiple entry points leading to a single product.

_entity_src_gen_express.end_construct_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced sequence is expected to be the amino acid sequence of the expressed product.

_entity_src_gen_express.robot_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category.

_entity_src_gen_express.date

The date of production step.

_entity_src_gen_express.promoter_type

The nature of the promoter controlling expression of the gene.

_entity_src_gen_express.plasmid_id

This item is a pointer to _pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced entry will contain the nucleotide sequence that is to be expressed, including tags.

_entity_src_gen_express.vector_type

Identifies the type of vector used (plasmid, virus, or cosmid) in the expression system.

_entity_src_gen_express.N_terminal_seq_tag

Any N-terminal sequence tag as a string of one letter amino acid codes.

_entity_src_gen_express.C_terminal_seq_tag

Any C-terminal sequence tag as a string of one letter amino acid codes

_entity_src_gen_express.host_org_scientific_name

The scientific name of the organism that served as host for the expression system. It is expected that either this item or _entity_src_gen_express.host_org_tax_id should be populated.

_entity_src_gen_express.host_org_common_name

The common name of the organism that served as host for the expression system. Where _entity_src_gen_express.host_org_tax_id is populated it is expected that this item may be derived by look up against the taxonomy database.

_entity_src_gen_express.host_org_variant

The vairant of the organism that served as host for the expression system. Where _entity_src_gen_express.host_org_tax_id is populated it is expected that this item may be derived by a look up against the taxonomy database.

_entity_src_gen_express.host_org_strain

The strain of the organism that served as host for the expression system. Where _entity_src_gen_express.host_org_tax_id is populated it is expected that this item may be derived by a look up against the taxonomy database.

_entity_src_gen_express.host_org_tissue

The specific tissue which expressed the molecule.

_entity_src_gen_express.host_org_culture_collection

Culture collection of the expression system

_entity_src_gen_express.host_org_cell_line

A specific line of cells used as the expression system

_entity_src_gen_express.host_org_tax_id

The id for the NCBI taxonomy node corresponding to the organism that served as host for the expression system.

_entity_src_gen_express.host_org_details

A description of special aspects of the organism that served as host for the expression system.

_entity_src_gen_express.culture_base_media

The name of the base media in which the expression host was grown.

_entity_src_gen_express.culture_additives

Any additives to the base media in which the expression host was grown.

_entity_src_gen_express.culture_volume

The volume of media in millilitres in which the expression host was grown.

_entity_src_gen_express.culture_time

The time in hours for which the expression host was allowed to grow prior to induction/transformation/transfection.

_entity_src_gen_express.culture_temperature

The temperature in degrees celcius at which the expression host was allowed to grow prior to induction/transformation/transfection.

_entity_src_gen_express.inducer

The chemical name of the inducing agent.

_entity_src_gen_express.inducer_concentration

Concentration of the inducing agent.

_entity_src_gen_express.induction_details

Details of induction/transformation/transfection.

_entity_src_gen_express.multiplicity_of_infection

The multiplicity of infection for genes introduced by transfection, eg. for baculovirus-based expression.

_entity_src_gen_express.induction_timepoint

The time in hours after induction/transformation/transfection at which the optical density of the culture was measured.

_entity_src_gen_express.induction_temperature

The temperature in celcius at which the induced/transformed/transfected cells were grown.

_entity_src_gen_express.harvesting_details

Details of the harvesting protocol.

_entity_src_gen_express.storage_details

Details of how the harvested culture was stored.

ENTITY_SRC_GEN_EXPRESS_TIMEPOINT

This category contains details for OD time series used to monitor a given EXPRESSION step used in the overall protein production process.

Item name

Description

_entity_src_gen_express_timepoint.entry_id

The value of _entity_src_gen_express_timepoint.entry_id is a pointer to _entity_src_gen_express.entry_id

_entity_src_gen_express_timepoint.entity_id

The value of _entity_src_gen_express_timepoint.entity_id is a pointer to _entity_src_gen_express.entity_id

_entity_src_gen_express_timepoint.step_id

This item is a pointer to _entity_src_gen_express.step_id

_entity_src_gen_express_timepoint.serial

This items uniquely defines a timepoint within a series.

_entity_src_gen_express_timepoint.OD

The optical density of the expression culture in arbitrary units at the timepoint specified.

_entity_src_gen_express_timepoint.time

The time in hours after induction/transformation/transfection at which the optical density of the culture was measured.

ENTITY_SRC_GEN_LYSIS

This category contains details for the cell lysis steps used in the overall protein production process.

Item name

Description

_entity_src_gen_lysis.entry_id

The value of _entity_src_gen_lysis.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_lysis.entity_id

The value of _entity_src_gen_lysis.entity_id uniquely identifies each protein contained in the project target protein complex whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_lysis.step_id

This item is the unique identifier for this lysis step.

_entity_src_gen_lysis.next_step_id

This item unique identifier for the next production step. This allows a workflow to have multiple entry points leading to a single product.

_entity_src_gen_lysis.end_construct_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced sequence is expected to be the amino acid sequence of the expressed product after lysis.

_entity_src_gen_lysis.robot_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category.

_entity_src_gen_lysis.date

The date of this production step.

_entity_src_gen_lysis.method

The lysis method.

_entity_src_gen_lysis.buffer_id

This item is a pointer to pdbx_buffer.id in the PDBX_BUFFER category. The referenced buffer is that in which the lysis was performed.

_entity_src_gen_lysis.buffer_volume

The volume in millilitres of buffer in which the lysis was performed.

_entity_src_gen_lysis.temperature

The temperature in degrees celcius at which the lysis was performed.

_entity_src_gen_lysis.time

The time in seconds of the lysis experiment.

_entity_src_gen_lysis.details

String value containing details of the lysis protocol.

ENTITY_SRC_GEN_REFOLD

This category contains details for the refolding steps used in the overall protein production process.

Item name

Description

_entity_src_gen_refold.entry_id

The value of _entity_src_gen_refold.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_refold.entity_id

The value of _entity_src_gen_refold.entity_id uniquely identifies each protein contained in the project target protein complex whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_refold.step_id

This item is the unique identifier for this refolding step.

_entity_src_gen_refold.next_step_id

This item unique identifier for the next production step. This allows a workflow to have multiple entry points leading to a single product.

_entity_src_gen_refold.end_construct_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced sequence is expected to be the amino acid sequence of the expressed product after the refolding step.

_entity_src_gen_refold.robot_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category.

_entity_src_gen_refold.date

The date of this production step.

_entity_src_gen_refold.denature_buffer_id

This item is a pointer to pdbx_buffer.id in the PDBX_BUFFER category. The referenced buffer is that in which the protein was denatured.

_entity_src_gen_refold.refold_buffer_id

This item is a pointer to pdbx_buffer.id in the PDBX_BUFFER category. The referenced buffer is that in which the protein was refolded.

_entity_src_gen_refold.temperature

The temperature in degrees celcius at which the protein was refolded.

_entity_src_gen_refold.time

The time in hours over which the protein was refolded.

_entity_src_gen_refold.storage_buffer_id

This item is a pointer to pdbx_buffer.id in the PDBX_BUFFER category. The referenced buffer is that in which the refolded protein was stored.

_entity_src_gen_refold.details

String value containing details of the refolding.

ENTITY_SRC_GEN_PROTEOLYSIS

This category contains details for the protein purification tag removal steps used in the overall protein production process

Item name

Description

_entity_src_gen_proteolysis.entry_id

The value of _entity_src_gen_proteolysis.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_proteolysis.entity_id

The value of _entity_src_gen_proteolysis.entity_id uniquely identifies each protein contained in the project target complex proteins whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_proteolysis.step_id

This item is the unique identifier for this tag removal step.

_entity_src_gen_proteolysis.next_step_id

This item unique identifier for the next production step. This allows a workflow to have multiple entry points leading to a single product.

_entity_src_gen_proteolysis.end_construct_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced sequence is expected to be the amino acid sequence of the expressed product after the proteolysis step.

_entity_src_gen_proteolysis.robot_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category.

_entity_src_gen_proteolysis.date

The date of production step.

_entity_src_gen_proteolysis.details

Details of this tag removal step.

_entity_src_gen_proteolysis.protease

The name of the protease used for cleavage.

_entity_src_gen_proteolysis.protein_protease_ratio

The ratio of protein to protease used for the cleavage. = mol protein / mol protease

_entity_src_gen_proteolysis.cleavage_buffer_id

This item is a pointer to pdbx_buffer.id in the PDBX_BUFFER category. The referenced buffer is that in which the cleavage was performed.

_entity_src_gen_proteolysis.cleavage_temperature

The temperature in degrees celcius at which the cleavage was performed.

_entity_src_gen_proteolysis.cleavage_time

The time in minutes for the cleavage reaction

ENTITY_SRC_GEN_FRACT

This category contains details for the fraction steps used in the overall protein production process. Examples of fractionation steps are centrifugation and magnetic bead pull-down purification.

Item name

Description

_entity_src_gen_fract.entry_id

The value of _entity_src_gen_fract.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_fract.entity_id

The value of _entity_src_gen_fract.entity_id uniquely identifies each protein contained in the project target protein complex whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_fract.step_id

This item is the unique identifier for this fractionation step.

_entity_src_gen_fract.next_step_id

This item unique identifier for the next production step. This allows a workflow to have multiple entry points leading to a single product.

_entity_src_gen_fract.end_construct_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced sequence is expected to be the amino acid sequence of the expressed product after the fractionation step.

_entity_src_gen_fract.robot_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category.

_entity_src_gen_fract.date

The date of this production step.

_entity_src_gen_fract.method

This item describes the method of fractionation.

_entity_src_gen_fract.temperature

The temperature in degrees celcius at which the fractionation was performed.

_entity_src_gen_fract.details

String value containing details of the fractionation.

_entity_src_gen_fract.protein_location

The fraction containing the protein of interest.

_entity_src_gen_fract.protein_volume

The volume of the fraction containing the protein.

_entity_src_gen_fract.protein_yield

The yield in milligrammes of protein from the fractionation.

_entity_src_gen_fract.protein_yield_method

The method used to determine the yield

ENTITY_SRC_GEN_CHROM

This category contains details for the chromatographic steps used in the purification of the protein.

Item name

Description

_entity_src_gen_chrom.entry_id

The value of _entity_src_gen_chrom.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_chrom.entity_id

The value of _entity_src_gen_chrom.entity_id uniquely identifies each protein contained in the project target complex proteins whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_chrom.step_id

This item is the unique identifier for this chromatography step.

_entity_src_gen_chrom.next_step_id

This item unique identifier for the next production step. This allows a workflow to have multiple entry points leading to a single product.

_entity_src_gen_chrom.end_construct_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced sequence is expected to be the amino acid sequence of the expressed product after the chromatography step.

_entity_src_gen_chrom.robot_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category.

_entity_src_gen_chrom.date

The date of production step.

_entity_src_gen_chrom.column_type

The type of column used in this step.

_entity_src_gen_chrom.column_volume

The volume of the column used in this step.

_entity_src_gen_chrom.column_temperature

The temperature in degrees celcius at which this column was run.

_entity_src_gen_chrom.equilibration_buffer_id

This item is a pointer to pdbx_buffer.id in the PDBX_BUFFER category. The referenced buffer is that in which the column was equilibrated.

_entity_src_gen_chrom.flow_rate

The rate at which the equilibration buffer flowed through the column.

_entity_src_gen_chrom.elution_buffer_id

This item is a pointer to pdbx_buffer.id in the PDBX_BUFFER category. The referenced buffer is that with which the protein was eluted.

_entity_src_gen_chrom.elution_protocol

Details of the elution protocol.

_entity_src_gen_chrom.sample_prep_details

Details of the sample preparation prior to running the column.

_entity_src_gen_chrom.sample_volume

The volume of protein solution run on the column.

_entity_src_gen_chrom.sample_concentration

The concentration of the protein solution put onto the column.

_entity_src_gen_chrom.sample_conc_method

The method used to determine the concentration of the protein solution put onto the column.

_entity_src_gen_chrom.volume_pooled_fractions

The total volume of all the fractions pooled to give the purified protein solution.

_entity_src_gen_chrom.yield_pooled_fractions

The yield in milligrammes of protein recovered in the pooled fractions.

_entity_src_gen_chrom.yield_method

The method used to determine the yield

_entity_src_gen_chrom.post_treatment

Details of any post-chromatographic treatment of the protein sample.

ENTITY_SRC_GEN_PURE

This category contains details for the final purified protein product. Note that this category does not contain the amino acid sequence of the protein. The sequence will be found in the ENTITY_POLY_SEQ entry with matching entity_id. Only one ENTITY_SRC_GEN_PURE category is allowed per entity, hence there is no step_id for this category.

Item name

Description

_entity_src_gen_pure.entry_id

The value of _entity_src_gen_pure.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_pure.entity_id

The value of _entity_src_gen_pure.entity_id uniquely identifies each protein contained in the project target complex proteins whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_pure.step_id

This item unique identifier the production step.

_entity_src_gen_pure.product_id

When present, this item should be a globally unique identifier that identifies the final product. It is envisaged that this should be the same as and product code associated with the sample and would provide the key by which information about the production process may be extracted from the protein production facility. For files describing the protein production process (i.e. where _entity.type is 'P' or 'E') this should have the same value as _entry.id

_entity_src_gen_pure.date

The date of production step.

_entity_src_gen_pure.conc_device_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category.

_entity_src_gen_pure.conc_details

Details of the protein concentration procedure

_entity_src_gen_pure.conc_assay_method

The method used to measure the protein concentration

_entity_src_gen_pure.protein_concentration

The final concentration of the protein.

_entity_src_gen_pure.protein_yield

The yield of protein in milligrammes.

_entity_src_gen_pure.protein_purity

The purity of the protein.

_entity_src_gen_pure.protein_oligomeric_state

The oligomeric state of the protein. Monomeric is 1, dimeric 2, etc.

_entity_src_gen_pure.storage_buffer_id

This item is a pointer to pdbx_buffer.id in the PDBX_BUFFER category. The referenced buffer is that in which the protein was stored.

_entity_src_gen_pure.storage_temperature

The temperature in degrees celcius at which the protein was stored.

ENTITY_SRC_GEN_CHARACTER

This category contains details of protein characterisation. It refers to the characteristion of the product of a specific step.

Item name

Description

_entity_src_gen_character.entry_id

The value of _entity_src_gen_character.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_character.entity_id

The value of _entity_src_gen_character.entity_id uniquely identifies each protein contained in the project target complex proteins whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_character.step_id

This item is the unique identifier for the step whose product has been characterised.

_entity_src_gen_character.robot_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category.

_entity_src_gen_character.date

The date of characterisation step.

_entity_src_gen_character.method

The method used for protein characterisation.

_entity_src_gen_character.result

The result from this method of protein characterisation.

_entity_src_gen_character.details

Any details associated with this method of protein characterisation.

ENTITY_SRC_GEN_PROD_OTHER

This category contains details for process steps that are not explicitly catered for elsewhere. It provides some basic details as well as placeholders for a list of parameters and values (the category ENTITY_SRC_GEN_PROD_OTHER_PARAMETER). Note that processes that have been modelled explicitly should not be represented using this category.

Item name

Description

_entity_src_gen_prod_other.entry_id

The value of _entity_src_gen_prod_other.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_entity_src_gen_prod_other.entity_id

The value of _entity_src_gen_prod_other.entity_id uniquely identifies each protein contained in the project target protein complex whose structure is to be determined. This data item is a pointer to _entity.id in the ENTITY category. This item may be a site dependent bar code.

_entity_src_gen_prod_other.step_id

This item is the unique identifier for this process step.

_entity_src_gen_prod_other.next_step_id

This item unique identifier for the next production step. This allows a workflow to have multiple entry points leading to a single product.

_entity_src_gen_prod_other.end_construct_id

This item is a pointer to pdbx_construct.id in the PDBX_CONSTRUCT category. The referenced nucleic acid sequence is that of the product of the process step.

_entity_src_gen_prod_other.robot_id

This data item is a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category. The referenced robot is the robot responsible for the process step

_entity_src_gen_prod_other.date

The date of this process step.

_entity_src_gen_prod_other.process_name

Name of this process step.

_entity_src_gen_prod_other.details

Additional details of this process step.

ENTITY_SRC_GEN_PROD_OTHER_PARAMETER

This category contains parameters and values required to capture information about a particular process step

Item name

Description

_entity_src_gen_prod_other_parameter.entry_id

The value of _entity_src_gen_prod_other_parameter.entry_id is a pointer to _entity_src_gen_prod_other.entry.id

_entity_src_gen_prod_other_parameter.entity_id

The value of _entity_src_gen_prod_other_parameter.entity_id is a pointer to _entity_src_gen_prod_other.entity_id

_entity_src_gen_prod_other_parameter.step_id

This item is a pointer to _entity_src_gen_prod_other.step_id

_entity_src_gen_prod_other_parameter.parameter

The name of the parameter associated with the process step

_entity_src_gen_prod_other_parameter.value

The value of the parameter

_entity_src_gen_prod_other_parameter.details

Additional details about the parameter

PDBX_BUFFER

Data items in the PDBX_BUFFER category record details of the sample buffer.

Item name

Description

_pdbx_buffer.id

The value of _pdbx_buffer.id must uniquely identify the sample buffer.

_pdbx_buffer.name

The name of each buffer.

_pdbx_buffer.details

Any additional details to do with buffer.

PDBX_BUFFER_COMPONENTS

Constituents of buffer in sample

Item name

Description

_pdbx_buffer_components.id

The value of _pdbx_buffer_components.id must uniquely identify a component of the buffer.

_pdbx_buffer_components.buffer_id

This data item is a pointer to _pdbx_buffer.id in the BUFFER category.

_pdbx_buffer_components.name

The name of each buffer component.

_pdbx_buffer_components.volume

The volume of buffer component.

_pdbx_buffer_components.conc

The millimolar concentration of buffer component.

_pdbx_buffer_components.details

Any additional details to do with buffer composition.

_pdbx_buffer_components.conc_units

The concentration units of the component.

_pdbx_buffer_components.isotopic_labeling

The isotopic composition of each component, including the % labeling level, if known. For example: 1. Uniform (random) labeling with 15N: U-15N 2. Uniform (random) labeling with 13C, 15N at known labeling levels: U-95% 13C;U-98% 15N 3. Residue selective labeling: U-95% 15N-Thymine 4. Site specific labeling: 95% 13C-Ala18, 5. Natural abundance labeling in an otherwise uniformly labled biomolecule is designated by NA: U-13C; NA-K,H

PDBX_CONSTRUCT

Data items in the PDBX_CONSTRUCT category specify a sequence of nucleic acids or amino acids. It is a catch-all that may be used to provide details of sequences known to be relevant to the project as well as primers, plasmids, proteins and such like that are either used or produced during the protein production process. Molecules described here are not necessarily complete, so for instance it would be possible to include either a complete plasmid or just its insert. This category may be considered as an abbreviated form of _entity where the molecules described are not required to appear in the final co-ordinates. Note that the details provided here all pertain to a single entry as defined at deposition. It is anticipated that _pdbx_construct.id would also be composed of a sequence that is unique within a given site prefixed by a code that identifies that site and would, therefore, be GLOBALLY unique. Thus this category could also be used locally to store details about the different constructs used during protein production without reference to the entry_id (which only becomes a meaningful concept during deposition).

Item name

Description

_pdbx_construct.entry_id

The value of _pdbx_construct.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_pdbx_construct.id

The value of _pdbx_construct.id must uniquely identify a record in the PDBX_CONSTRUCT list and should be arranged so that it is composed of a site-speicific prefix combined with a value that is unique within a given site.Note that this item need not be a number; it can be any unique identifier.

_pdbx_construct.name

_pdbx_construct.name provides a placeholder for the local name of the construct, for example the plasmid name if this category is used to list plasmids.

_pdbx_construct.organisation

_pdbx_construct.organisation describes the organisation in which the _pdbx_construct.id is unique. This will normally be the lab in which the constrcut originated. It is envisaged that this item will permit a globally unique identifier to be constructed in cases where this is not possible from the _pdbx_construct.id alone.

_pdbx_construct.entity_id

In cases where the construct IS found in the co-ordinates then this item provides a pointer to _entity.id in the ENTITY category for the corresponding molecule.

_pdbx_construct.robot_id

In cases where the sequence has been determined by a robot this data item provides a pointer to pdbx_robot_system.id in the PDBX_ROBOT_SYSTEM category for the robot responsible

_pdbx_construct.date

The date that the sequence was determined.

_pdbx_construct.details

Additional details about the construct that cannot be represented in the category _pdbx_construct_feature.

_pdbx_construct.class

The primary function of the construct. This should be considered as a guideline only.

_pdbx_construct.type

The type of nucleic acid sequence in the construct. Note that to find all the DNA molecules it is necessary to search for DNA + cDNA and for RNA, RNA + mRNA + tRNA.

_pdbx_construct.seq

sequence expressed as string of one-letter base codes or one letter amino acid codes. Unusual residues may be represented either using the appropriate one letter code wild cards or by the three letter code in parentheses.

PDBX_CONSTRUCT_FEATURE

Data items in the PDBX_CONSTRUCT_FEATURE category may be used to specify various properties of a nucleic acid sequence used during protein production.

Item name

Description

_pdbx_construct_feature.id

The value of _pdbx_construct_feature.id must uniquely identify a record in the PDBX_CONSTRUCT_FEATURE list. Note that this item need not be a number; it can be any unique identifier.

_pdbx_construct_feature.construct_id

The value of _pdbx_construct_feature.construct_id uniquely identifies the construct with which the feature is associated. This is a pointer to _pdbx_construct.id This item may be a site dependent bar code.

_pdbx_construct_feature.entry_id

The value of _pdbx_construct_feature.entry_id uniquely identifies a sample consisting of one or more proteins whose structure is to be determined. This is a pointer to _entry.id. This item may be a site dependent bar code.

_pdbx_construct_feature.start_seq

The sequence position at which the feature begins

_pdbx_construct_feature.end_seq

The sequence position at which the feature ends

_pdbx_construct_feature.type

The type of the feature

_pdbx_construct_feature.details

Details that describe the feature

PDBX_ROBOT_SYSTEM

The details about each robotic system used to collect data for this project.

Item name

Description

_pdbx_robot_system.id

Assign a numerical ID to each instrument.

_pdbx_robot_system.model

The model of the robotic system.

_pdbx_robot_system.type

The type of robotic system used for in the production pathway.

_pdbx_robot_system.manufacturer

The name of the manufacturer of the robotic system.

 

 

EXPTL_CRYSTAL, EXPTL_CRYSTAL_GROW, EXPTL_GROW_COMP,

CELL and SYMMETRY

 

The details about crystallization and the characterization of any produced crystals.

 

       Dictionary Item Name

          Description

_exptl_crystal_grow.method

Crystallization method

_exptl_crystal_grow.apparatus

Apparatus

_exptl_crystal_grow.temp

_exptl_crystal_grow.temp_details

Temperature

_exptl_crystal_grow.pH

_exptl_crystal_grow.pdbx_pH_range

pH

Tabulated in mmCIF category

exptl_crystal_grow_comp

Crystallization solutions compositions

 

 

_exptl_crystal.preparation

Additional treatments (e.g. soaking, time in drop, annealing, cryoprotectant, etc)

_exptl_crystal.pdbx_crystal_image_url

_exptl_crystal.pdbx_crystal_image_format

 

Image of crystal

_exptl_crystal.size_*

Crystal size

_cell.length_a

_cell.length_b

_cell.length_c

_cell.length_alpha

_cell.length_beta

_cell.length_gamma

 

Cell constants

_cell.length_a_esd

_cell.length_b_esd

_cell.length_c_esd

_cell.length_alpha_esd

_cell.length_beta_esd

_cell.length_gamma_esd

 

ESD of Cell constants

_symmetry.space_group_name_H-M

Space Group

 

 

 

© RCSB PDB