PDB Parser Output
TEST CASES FOR THE PDBPARSER2.0.PL.
SCREEN CAPTURES SHOWING PARTIAL OUTPUT OF EACH TEST CASE:
("..." represent missing parts of output which is insignificant for the test)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% TEST for handling DUPLICATE ELEMENT NAMES %%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
<< program should print the duplicate element names and the line
numbers where they were found in the DTD file >>
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
C:\users\cat\bioProject\bioworkarea\code>perl pdbparser2.0.pl
****************************************************************************************
*** PDBParser ***
*** Version: 2.0-1 ***
*** ***
*** PDBParser will convert a Bioinformatics's PDB file format to xml format, based ***
*** on a supplied DTD. ***
*** ***
*** Project: URI(tm) Universal Research Interchange Format ***
*** ***
*** Legal: Copyright (C) 2004, URI, Bioinformatics, CSC592 ***
*** ***
****************************************************************************************
The name of a Bioinformatics's PDB input file will be needed.
Here is a list of input files in the current directory.
Bioinformatics PDB (ent): pdb12e8.ent pdb1mcp.ent
Please specify [pdb12e8.ent|pdb1mcp.ent](pdb12e8.ent):
The name of a Bioinformatics's DTD input file will be needed.
Here is a list of input files in the current directory.
Bioinformatics DTD (dtd): DTD_URI.dtd DTD_URI_2_duplicate.dtd
Bioinformatics DTD (dtd): URI_DTD-04-15-04.dtd URI_DTD_err1_fixed.dtd
Bioinformatics DTD (dtd): URI_DTD_org.dtd
Please specify [](DTD_URI.dtd):DTD_URI_2_duplicate.dtd
******************************************
**** General DTD Information ****
******************************************
************************************
**** DTD File Declared Definitions:
************************************
Element Count: 261
Attributes Count: 128
Entity Count: 0
******************************************************************
**** Error: Errors were detected while reading the DTD file ****
******************************************************************
Duplicate Element (48): concaten (#PCDATA)
Duplicate Element (108): authors (#PCDATA)
Duplicate Element (109): title (#PCDATA)
Duplicate Element (135): authors (#PCDATA)
Duplicate Element (136): title (#PCDATA)
Duplicate Element (137): editors (#PCDATA)
Duplicate Element (138): to_be_pulished (#PCDATA)
Duplicate Element (139): journal_abbrev (#PCDATA)
Duplicate Element (140): journal_vol (#PCDATA)
Duplicate Element (141): first_page (#PCDATA)
Duplicate Element (142): year (#PCDATA)
Duplicate Element (143): publishers (#PCDATA)
Duplicate Element (144): journal_id_ASTM (#PCDATA)
Duplicate Element (145): country (#PCDATA)
...
...
...
******************************************
**** End of General DTD Information ****
******************************************
PDB Parser has exited due to DTD errors.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% TEST for handling ORPHANE ELEMENTS %%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
<< program should print the orphan element names and the line
numbers where they were found in the DTD file >>
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
...
...
...
Please specify [pdb12e8.ent|pdb1mcp.ent](pdb12e8.ent):
The name of a Bioinformatics's DTD input file will be needed.
Here is a list of input files in the current directory.
Bioinformatics DTD (dtd): DTD_URI.dtd DTD_URI_2_duplicate.dtd
Bioinformatics DTD (dtd): URI_DTD-04-15-04.dtd URI_DTD_err1_fixed.dtd
Bioinformatics DTD (dtd): URI_DTD_org.dtd
Please specify [](DTD_URI.dtd):URI_DTD_org.dtd
...
...
...
*********************
Element: hetnams
Attributes: non-polyer_seqs_type CDATA #FIXED "HETNAM"
Category: LIST
Child Elements: hetname*
*********************
******************************************
**** General DTD Information ****
******************************************
************************************
**** DTD File Declared Definitions:
************************************
Element Count: 483
Attributes Count: 274
Entity Count: 0
***************************
**** DTD Tree Definitions:
***************************
Root Element: URI_protein
Proclaimed Elements: 251
Associated Elements: 249
Orphaned Elements: 234
Associated Attributes: 222
Orphaned Attributes: 52
********************************************************************
**** Error: Errors were detected while building the DTD tree. ****
********************************************************************
Proclaimed Child (705): Parent Element "hetnams" proclaimed a child Element "hetname"
but none exist in the Element Declaration list.
******************************************
**** End of General DTD Information ****
******************************************
PDB Parser has exited due to DTD errors.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%% TEST for handling ELEMENT ORPHANS, OR %%%%%%%%%%
%%%%%%% ELEMENTS DECLARED OVER MULTIPLE LF-CR LINES %%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
<< program should print the orphan element names and the line
numbers where they were found in the DTD file >>
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
...
...
...
Please specify [pdb12e8.ent|pdb1mcp.ent](pdb12e8.ent):
The name of a Bioinformatics's DTD input file will be needed.
Here is a list of input files in the current directory.
Bioinformatics DTD (dtd): DTD_URI.dtd DTD_URI_2_duplicate.dtd
Bioinformatics DTD (dtd): URI_DTD-04-15-04.dtd URI_DTD_err1_fixed.dtd
Bioinformatics DTD (dtd): URI_DTD_org.dtd
Please specify [](DTD_URI.dtd):URI_DTD_err1_fixed.dtd
...
...
...
*********************
Element: salt_atom_serial_num
Attributes: None
Category: #PCDATA
*********************
******************************************
**** General DTD Information ****
******************************************
************************************
**** DTD File Declared Definitions:
************************************
Element Count: 483
Attributes Count: 274
Entity Count: 0
***************************
**** DTD Tree Definitions:
***************************
Root Element: URI_protein
Proclaimed Elements: 462
Associated Elements: 461
Orphaned Elements: 22
Associated Attributes: 271
Orphaned Attributes: 3
****************************************************************************
**** Error: The DTD tree was built without all declared definitions. ****
**** ****
**** Element orphans exist when it or one of its ancestors are ****
**** not proclaimed by a parent Element. Note: The number of ****
**** Proclaimed and Associated Elements to determine the number ****
**** of disassociated trees branches possibly causing multiple ****
**** orphans. Attribute orphans exist when its Element is not ****
**** declared or a parent Element does not proclaim the ****
**** Attribute's Element as its child Element. ****
****************************************************************************
****************************
**** Unassociated Elements:
****************************
Orphan Element (136): "journal_id_ASTM (#PCDATA)"
Orphan Element (137): "country (#PCDATA)"
Orphan Element (138): "journal_id_ISSN (#PCDATA)"
Orphan Element (139): "journal_id_ISBN (#PCDATA)"
Orphan Element (140): "ccdc_pdb_code (#PCDATA)"
Orphan Element (603): "db_struct_ref_db_accession (#PCDATA)"
Orphan Element (604): "db_struct_ref_db_code (#PCDATA)"
Orphan Element (606): "db_struct_ref_auth_align_begin (#PCDATA)"
Orphan Element (607): "db_struct_ref_auth_insertion_begin (#PCDATA)"
Orphan Element (609): "db_struct_ref_auth_align_end (#PCDATA)"
Orphan Element (610): "db_struct_ref_auth_insertion_end (#PCDATA)"
Orphan Element (613): "struct_refs_seq_difs (struct_ref_seq_dif*)"
Orphan Element (616): "struct_ref_seq_dif (struct_ref_seq_pdb_id*, struct_ref_seq_aminoacid_id*, str
uct_ref_seq_pdb_strand_id*, struct_ref_seq_db_name*, struct_ref_seq_db_accession*, struct_ref_seq_de
tails*)"
Orphan Element (620): "struct_ref_seq_pdb_id (#PCDATA)"
Orphan Element (621): "struct_ref_seq_aminoacid_id (#PCDATA)"
Orphan Element (622): "struct_ref_seq_pdb_strand_id (#PCDATA)"
Orphan Element (623): "struct_ref_seq_db_name (#PCDATA)"
Orphan Element (624): "struct_ref_seq_db_accession (#PCDATA)"
Orphan Element (625): "struct_ref_seq_details (#PCDATA)"
Orphan Element (1008): "atom_auth_comp_id (#PCDATA)"
Orphan Element (1009): "atom_auth_asym_id (#PCDATA)"
Orphan Element (1010): "atom_auth_atom_id (#PCDATA)"
*****************************************************
**** Attribute's Element or ancestor not proclaimed:
*****************************************************
Orphan Attribute (614): (struct_refs_seq_difs) "pdb_id CDATA #IMPLIED"
Orphan Attribute (617): (struct_ref_seq_dif) "struct_ref_seq_dif_id CDATA #REQUIRED"
Orphan Attribute (618): (struct_ref_seq_dif) "seq_num CDATA #IMPLIED"
******************************************
**** End of General DTD Information ****
******************************************
PDB Parser has exited due to DTD errors.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%% TEST for successfully handling valid DTDs %%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
****************************************************************************************
*** PDBParser ***
*** Version: 2.0-1 ***
*** ***
*** PDBParser will convert a Bioinformatics's PDB file format to xml format, based ***
*** on a supplied DTD. ***
*** ***
*** Project: URI(tm) Universal Research Interchange Format ***
*** ***
*** Legal: Copyright (C) 2004, URI, Bioinformatics, CSC592 ***
*** ***
****************************************************************************************
The name of a Bioinformatics's PDB input file will be needed.
Here is a list of input files in the current directory.
Bioinformatics PDB (ent): pdb1mcp.ent
Please specify [pdb1mcp.ent](pdb1mcp.ent):
The name of a Bioinformatics's DTD input file will be needed.
Here is a list of input files in the current directory.
Bioinformatics DTD (dtd): URI_DTD-04-03-04.dtd URI_DTD-04-15-04.dtd
Bioinformatics DTD (dtd): URI_DTD_partially _fixed.dtd
Please specify [](URI_DTD-04-03-04.dtd):URI_DTD-04-15-04.dtd
*********************
Element: URI_protein
Attributes: pdb_id CDATA #REQUIRED
Category: LIST
Child Elements: attributes? annotation? seq_data? sites? crystal_cell?
Child Elements: orig_matrices? model_atoms? connections? bookkeep_informat
*********************
*********************
Element: attributes
Attributes: pdb_id CDATA #IMPLIED
Category: LIST
Child Elements: header* deletion* titles* error_warn* compounds* sources*
Child Elements: keywords*
*********************
*********************
Element: header
Attributes: classification CDATA #IMPLIED
Attributes: deposition_date CDATA #IMPLIED
Attributes: pdb_id CDATA #IMPLIED
Category: EMPTY
*********************
...
...
...
*********************
Element: num_connections
Attributes: None
Category: #PCDATA
*********************
******************************************
**** General DTD Information ****
******************************************
************************************
**** DTD File Declared Definitions:
************************************
Element Count: 491
Attributes Count: 275
Entity Count: 0
***************************
**** DTD Tree Definitions:
***************************
Root Element: URI_protein
Proclaimed Elements: 491
Associated Elements: 491
Orphaned Elements: 0
Associated Attributes: 275
Orphaned Attributes: 0
******************************************
**** End of General DTD Information ****
******************************************
New converted L Sequence:
DIVMTQSQKFMSTSVGDRVSITCKASQNVGTAVAWYQQKPGQSPKLMIYSASNRYTGVPDRFTGSGSGTDFTLTISNMQSEDLADYFCQQYSSYPLTFGA
GTKLELKRADAAPTVSIFPPSSEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSATDQDSKDSTYSMSSTLTLTKDEYERHNSYTCEATHKT
STSPIVKSFNRNEC
New converted H Sequence:
EVQLQQSGAEVVRSGASVKLSCTASGFNIKDYYIHWVKQRPEKGLEWIGWIDPEIGDTEYVPKFQGKATMTADTSSNTAYLQLSSLTSEDTAVYYCNAGH
DYDRGRFPYWGQGTLVTVSAAKTTPPSVYPLAPGSAAQTNSMVTLGCLVKGYFPEPVTVTWNSGSLSSGVHTFPAVLQSDLYTLSSSVTVPSSTWPSETV
TCNVAHPASSTKVDKKIVPRD
New converted M Sequence:
DIVMTQSQKFMSTSVGDRVSITCKASQNVGTAVAWYQQKPGQSPKLMIYSASNRYTGVPDRFTGSGSGTDFTLTISNMQSEDLADYFCQQYSSYPLTFGA
GTKLELKRADAAPTVSIFPPSSEQLTSGGASVVCFLNNFYPKDINVKWKIDGSERQNGVLNSATDQDSKDSTYSMSSTLTLTKDEYERHNSYTCEATHKT
STSPIVKSFNRNEC
New converted P Sequence:
EVQLQQSGAEVVRSGASVKLSCTASGFNIKDYYIHWVKQRPEKGLEWIGWIDPEIGDTEYVPKFQGKATMTADTSSNTAYLQLSSLTSEDTAVYYCNAGH
DYDRGRFPYWGQGTLVTVSAAKTTPPSVYPLAPGSAAQTNSMVTLGCLVKGYFPEPVTVTWNSGSLSSGVHTFPAVLQSDLYTLSSSVTVPSSTWPSETV
TCNVAHPASSTKVDKKIVPRD
PDB Parser has completed with Success.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%