First copy the Access database to your hard disk, and open it from there. If you open it from the CD, it will be read-only, and some operations and queries will not work.
To download the database you are interested in, click title below for:
You will notice several
tables and forms, which make up the fingerprinting database. Click first
on “tables” under the object list to see a list of tables
in this database. The following
tables are primary data tables, and I include an explanation of each (all
other tables are generated from the primary data, and not important here): Genotypes: Include all the information on each genotype, individual,
or population in the study.This
includes species, name, number, alternate name, and pedigree. I have not been putting pedigree, because I hope to be able to
link one day this database to ICIS/IWIS and have that part done automatically.
There is also a column for Genotype ID number.
I try to use the ICIS/IWIS number when possible, if not, I assign
it a random number that must be unique.
This is the number all the other tables use to refer to this
germplasm. You can use the ID
numbers of your germplasm if it exists, but if not, assign it one using
some consistent schema such as, country abbreviation and number combination
or “AMB” and a number.You
can put numbers and letters in this field. (actually, in all the fields
of this table, you can put numbers and letters). The column individual number is only when we
genotyped many individuals from the same population; if it is an inbred
line, leave it blank. Entering
data into this table may be done manually, or may be imported from excel. Please see the table, RajabGenosIN, as an example
of a table that was imported from excel. If the genotype is already in the database,
it does not have to be imported again, as long as you use the exact
same name. Marker Data, Inds: This is the actual data for the individuals or the populations
genotyped one individual at a time.
Data entered here include the data ID, which is a number that
Access assigns to the data as it is entered (so don’t include
this when importing data); the Genotype ID number, from the Genotypes
table; the Study ID number from the Studies table; the Marker Name from
the Markers table, and Allele 1 and Allele 2. Allele 1 and Allele 2 are the alleles that the
genotype had for that marker; if there is only one, put the value in
the column for Allele 1 and put the marker name (with no allele specification)
in the column for Allele 2. If
there is missing data for that marker, put the marker name in both Allele
1 and Allele 2. The alleles are named: marker name, followed
by a dash, followed by the size in base pairs of the allele. No spaces. Data
is entered into this table by importing from Excel; an example table
of data imported from excel is found in the table MarkerDataIN Marker Data, Pops: This table is the same as Marker Data, Inds, except that
instead of only having two allele columns, you may put as many as you
have alleles for in the population, and following each Allele column
is a Frequency column, where you enter the frequency of each allele
in that population. You must
enter a 0 for every allele not present in a population, and if new data
is to be entered where that allele was never even scored, you must enter
the zeros manually before importing the data. Missing data is a period in the case of this
table ONLY; do not use periods in the Marker Data, Inds table. Data is imported in the same way as with the
Marker Data, Inds table, except that you must have the same number of
columns as you have alleles. Markers: This table includes the marker name (must be a unique name
for each marker, and please make sure your markers all have the exact
name as the markers already in the table, unless they are new markers
not already in the database); marker type (maize or wheat), marker source,
and location, meaning genetic location. I haven’t entered that, either, because
I hope to link this table to Graingenes and MaizeGDB, where the data
is already stored, plus much more. Data
is generally entered into this table manually, or may be imported from
excel. If the marker is already in the database, it
does not have to be imported again, as long as you use the exact same
name. Studies: The Studies table is a brief description of each study,
which will allow people to know what data may be combined with what
other data. This table includes Study ID, which has to
be a number (just use the next number in line); study description (a
short text identifying the study); investigator (name of the person
who did most of the work, or the Service Lab, etc); date (could be date
the analysis completed, or date the data was input into the database);
If the data is available for public release (yes or no); conditions,
which is a memo field describing the lab conditions, and can be longer
than the other columns; error rate within the study, if calculated (for
example, by repeating the same genotype or genotypes more than once
for every marker); and control genotypes, if used. Data is generally entered into this table manually;
each study must be unique. Menu system The menu system will allow you to access different functionalities
of the database. You open it by clicking on “Forms” and
then double click on f_main. The following screen appears:
From here you can do the following:
Import Error File Here is screenshot of the Import
Error file functionality
This is the general format of the above example of Error
File:
To import the error file, you have to select the whole area from Excel that you want to import (it has to have the above format, e.g. “Allele” in the first cell) and then click CTRL+C or right click and then from the menu click “copy”. Generate Error Report for selected study Now you can go to the main menu of the Fingerprint database, select the corresponding study, and click “Import error file for selected study”. When this import is complete, you can use the next button “Generate Error Report for selected study” and you will get a report similar to this:
The above is a screenshot of the Error Report. Note that
you can generate combined error report for several studies if you type
in the other study numbers (apart from the selected study) in this message
box (here we have selected studies 2, 6 and 15):
If you don’t want to combine error reports for several
studies, just click OK in the above message box without filling in the
input line. Query data From the main menu you can click “Query data”
which will show the following selection screen for querying the error
data. There are lists of alleles, genotypes, genotype names, markers
and studies. Each list is present so you can select the specific values
you want. You can single click on each element you want selected for
the criteria. Note that if you haven’t selected any items in a
list, all items will be considered in the query.
From the Query menu you can query the error tables that have been imported. You can build the criteria from the menu by selecting specific alleles, genotypes, genotype names, markers and/or studies that you are interested in. Example would be to select from above the allele “dp023-228” and “dp036-163” from the Alleles column, and markers “dp023” and dp036”, and then study 1, 5 and 16. You can click the “Criteria summary” button and it will give you a summary of the selection criteria like this: Alleles
must be one of the following: dp023-228 dp036-163 Markers
must be one of the following: dp023 dp036 Studies
must be one of the following: (1) Omani
Wheat Landraces (5) M. Reynolds Wheat lines (16) MegaEnvironment
Wheats Note that the meaning of the above criteria is “Alleles
must be <dp023-228 or dp036-163> AND Markers must be <dp023
or dp036> AND Studies must be <1 or 5 or 16>. Here is the corresponding output after clicking the “Generate
Report” button:
Above each list in the query screen is a checkbox that has
to be selected if you want that entity in the output. As an example,
if we had deselected the “Alleles” checkbox in the Query
screen, then in the above output the columns “Allele1” and
“Allele2” would not be present. Note that it is possible to perform criteria for columns
that you don’t include in the output, for instance if you want
data only from the study 16, you might want to deselect columns Study
ID and Study Name, to exclude these two from the output, otherwise each
record would have Study ID 16 and Study Name “MegaEnvironment
Wheats”. Below each list is a “Clean” button which can
be used to deselect all the selected values in that particular list,
also the ones that are not visible, since the lists might have more
values than can be seen on the screen (scroll up and down with the scroll
bar in that list). And note that to view which values have been selected
in all of the list boxes, it is convenient to use the “Criteria
Summary” button. Export Template When generating an Excel file with “Export Template”
from the Main window, you have to select a study first from the study
select box, and then click “Export Template”. This will
generate an Excel file which will look like this:
Note that there are data in each of the sheets below “experiment”, “conditions”, “data_list”, “marker” and “accessions”. Common Alleles within studies If you click the “Common alleles within studies”
from the query screen, you will get the alleles that are present in
*all* of the studies that you have selected. For instance, if you have
studies 2, 5, and 11 selected, and allele dp023-228 was present in study
2 and 5, but not 11, then the allele dp023-228 would *not* appear in
the output. If you want the alleles that are present in *at least one*
of the studies you have selected, you could get that by selecting the
studies you want, select the alleles checkbox only, and then deselect
the checkboxes for genotypes, genotype names, markers, study ID and
study Name. In the above example, allele dp023-228 *would* appear in
the output. Example: Querying Markers that are present in at least one study If you want a list of all of the markers that are present in at least one study, deselect all other checkboxes than Markers, make sure all selections are clean (e.g. with the respective clean buttons, or by checking the criteria summary) and then click “Generate Report”. Importing and exporting data: To import data into Access from excel, there are two possibilities. You may either import an external file that
is already formatted exactly the same as the table in the FP database
you are importing into, or you may use the “Import Clipboard”
function from the main menu, which will import data in 0/1 format. To import an external file, close all tables, and use File,
Get External Data, Import. A
window will open where you will browse for your excel file.
Change the “Files of type” list to “Microsoft
Excel”. Double click the
file you wish to import, and an import wizard will open; Click the box
“first row contains column headings” (note, your first row
must have the EXACT same column headings names as the table in access
you are importing into). Click
Next. Check the box, “In an existing table”, and use the
pull down menu to choose the table you are adding data to.
Click Next. Click finish; if no error messages are displayed,
you did the import correctly! Open
the table in Access to find your data added; if correct, you may save
Access, and if not, exit without saving, or delete the rows you added
to the table and try again. To import data into the Marker Data, Ind. Table that is
in a 0/1 format using f_main: You must have an Excel file with “Allele” in
cell A1, and then the Genotypes in each row, and the Markers in each
column, like this:
A “0” in the grid means that this marker was
not present for this genotype, and a “1” means that the
marker was present for this genotype. We can see above that marker AMB-01
and AMB-02 were not present for genotype nc130-139, and that marker
AMB-03 was present for genotype nc130-139. A dot “.” or a “9” means missing
data, it wasn’t measured. You can see that markers AMB-02 and
AMB-03 were not measured for genotype nc130-148. Note that importing will insert the new genotypes and markers
if they didn’t exist already, but without the additional data
like Species etc. That means that it is better to define manually the
genotypes and markers with its respective data (species etc.) in tables
Genotypes and Markers before importing the 0/1 data. You must use the f_main form for importing from the clipboard. Click “forms” under the Objects
menu, and then double click f_main.
Before proceeding any further, open Excel, and select the data
you wish to import. Using the example table, dataIN, select all
data, or only a subset, if you wish.
From excel, click “copy” in order to put the selection
into the clipboard. Then, return
to Access and click the box, “import from clipboard”.
This may take a few minutes.
When finished, go to the Marker Data, Inds table to confirm that
your data was entered correctly. If you are not very familiar with how to select an area
in Excel, you could follow these steps. Suppose we want to select all
the data in the above Excel file (cells A1 to D5). Click on the A1 cell
with “Allele”. Now hold down the shift button and simultaneously
click first with the right arrow key (which will select the line with
“Allele” to “AMB-03” values) and then the down
key (selecting the whole area from “Allele” to “9”).
Now click CTRL+C. Note that this method will work even if you a much
bigger area in your Excel file, that can only be seen by scrolling.
When the selected area has passed to the Clipboard, the area in Excel
will be blinking. Whenever the selected area in Excel is *not* blinking
anymore, then it is no longer present in the clipboard. This would happen
if you copied something else to the clipboard for instance from another
application like Notepad. To output your data into a 0/1 table: You must use the f_main form for this.
Click “forms” under the Objects menu, and then double
click f_main. Select the study you wish to view as a 0/1 table,
then click “Create matrix of study”.
When this is finished (it may take quite awhile, in the larger
studies), click “show table matrix”.
Another matrix will open, which you can then save as a table
in Access, or Save As Excel file to another file. Clean up datasets before importing! If
you have multiple bands at an allele, you can remove some of them if
you believe they are a stutters, shadow bands, plus-A bands, and other
PCR and electrophoresis artifacts. The
software for scoring most SSR data from an automatic DNA sequencer often
have functions that allow you to do this automatically.
If not, you can remove them manually if the extra bands are of
lower intensity, characteristic of stutters for example (that is, one
strong band, followed by one or more less intense bands, each one is
one repeat unit smaller than the one before it. All stutter bands should be removed). Even if they do not behave like a stutter, if
only one or two bands are intense, and all other bands much less intense,
the other bands can be removed, if you know you are working with diploid
individuals. You
may also choose to remove bands that do not follow the repeat type,
within a certain error, which is usually: for
2 base repeats bands are every two base pairs, plus or minus 0.75 –
0.95 bases. for
3 base repeats, bands are every three base pairs, plus or minus 1.25
– 1.45 bases. for
4 base repeats, bands are every four base pairs, plus or minus 1.5 bases
- 1.75 bases). for
5 base repeats, bands are every five base pairs, plus or minus 2 bases. for
6 base repeats, bands are every six base pairs, plus or minus 2.5 bases. Note
the error, in relation to the separation between two alleles, get smaller
as the repeat unit gets larger, thus increasing your confidence (and
one main reason we do not recommend using 2 base repeats at all). Caution is warranted in assuming all polymorphisms
follow the repeat type, however, as some polymorphisms in SSRs have
been shown to fall outside the repeat unit, and therefore not follow
any pattern at all. Delete Study This functionality you can use by selecting a study from
the study listbox and then clicking on the wastebasket icon on the right
of the study listbox:
Since this may delete a lot of data, you have to confirm
a warning message that will also tell you how many records in the tables
Marker_Data_Inds, Marker_Data_Pops and ErrorCheck will be deleted, to
be able to make an informed decision. If you click “Yes”
the study will be deleted, if you click “No” nothing will
happen and the delete request is cancelled. Useful queriesSome of the queries can be used to check if your data is
correct. These queries are documented below. The rest of the queries
you should not try to run yourself, they were created for the import
functions only. To run a query, open the “Queries” tab and
then double click on the query name for the query you want. qry_check_missing_genotypes This query will give you a list of all the genotypes that
have been used in table Marker Data, Inds but has not been defined in
table Genotypes. There must be matching records from field Genotype
ID Number in table Genotypes to field Genotype ID Number in table Marker
Data, Inds. qry_check_missing_markers This query will similarly give you a list of all the Markers
that have been used in table Marker Data, Inds but have not been defined
in table Markers. The matching fields in the two tables have the name
Marker Name. If you don’t have any missing markers, the output
from the query will look like this:
If you had missing markers, they would be present in the
records in the output screen above. Your job would then be to add these
to the Markers table, so that data consistency is restored. Remember
to fill out all fields in the Markers table, for instance the Forward
primer, the Reverse primer, the Marker location etc.
|