Names: Section: To answer these questions you need to search online genome databases:

Names: __________________________________ Section: ___________ Bio400 BIOINFORMATICS WORKSHEET Each pair of two students will turn in ONE Bioinform...
29 downloads 0 Views 121KB Size
Names: __________________________________ Section: ___________

Bio400

BIOINFORMATICS WORKSHEET

Each pair of two students will turn in ONE Bioinformatics worksheet. Please write both of your names on the top of this sheet. BLASTing (BLAST = Basic Local Alignment Search Tool) Imagine that you have cloned and sequenced a portion of an Arabidopsis gene. gtgaacccgt caacccttga acctcggctg gcaagtctaa tcaaaggcag gcagttaaat The questions you ask are: -Has anyone else ever studied this gene? - Is this gene unique to Arabidopsis or are there homologs in other species? To answer these questions you need to search online genome databases: 1. Select "NCBI Blast" from the Bio 400 home page. 2. Select “Nucleotide BLAST”. 3. Paste your sequence into the “Enter Query Sequence” Box. (The sequence can be copied and pasted from the word document “BioinformaticsSequence” in the Assignments folder on the Bio 400 web page) 4. Select a database from the dropdown menu to search. For the broadest search, use the nucleotide collection (nr/nt). 5. Click the ? button next to “choose a BLAST algorithm” and read the information. To answer your initial questions (at top of page) you will need to choose the least stringent search, "blastn". 6. Choose your algorithm and then click the blue "BLAST" button. 7. Your request is now being processed. The sequence you entered is being compared to all sequences in the database you selected, so it may take a few minutes. 8. Your results will be presented in graphic format. Scroll down to see the pair-wise alignments below the graph (or click on a bar inside the graph and you will be taken to that sequence alignment). a. What does the length of each line indicate? ______________________________ b. What does the color of each line indicate?________________________________ c. Can you find any homologs of this sequence in organisms other than Arabidopsis?_____________________________ (Note: if you didn’t identify any non-Arabiodpsis sequences you can go back to the BLAST search page and broaden your search.)

1

Names: __________________________________ Section: ___________

Bio400

9. From the NCBI website: "E Value (Expect Value) describes the likelihood that a sequence with a similar score will occur in the database by chance. The smaller the E Value, the more significant the alignment. For example, [an] alignment [with] a very low E value of e-117 [means] that a sequence with a similar score is very unlikely to occur simply by chance." Do alignments of your sequence with those in other species have higher or lower E-values than alignments within Arabidopsis species? _____________________ 10. Click on the blue accession number for the entry that represents the Arabidopsis gene with an exact match over these 60 base pairs. This will take you to the GenBank Record for this gene. 11. Arabidopsis has 5 chromosomes. Each gene in the Arabidopsis genome has a unique identifier (locus tag) in the format AtNgNNNNN. The At refers to “Arabidopsis thaliana”, the "N" in Ng refers to the chromosome number, and the final 5 digits refer to the location on the chromosome. The genes are numbered sequentially along the chromosome. Look through the GenBank record to identify the Arabidopsis locus tag for your sequence and write it below. Arabidposis locus tag: ________________________ 12. What is the gene’s three letter acronym (usually followed by a number if the gene belongs to a family of genes):____________________ This is the Arabidopsis gene that you will be studying ALL QUARTER in this class. 13. Looking at the GenBank record, answer these questions: a. What chromosome is this gene located on? ___________ b. What is the function of this gene product? _______________________________ c. Does this gene have introns and exons? _______________________ (Hint: Does mRNA contain introns? Where did you find the information about genomic DNA in the GenBank Record you looked at for the last worksheet?)

2

Names: __________________________________ Section: ___________

Bio400

Protein information databases - Plant Chromatin DataBase or ChromDB: Great for showing gene expression! 1. Select the "ChromDB" link from the Bio 400 webpage. 2. In the Search Box at the top of the page, select Locus and enter the Arabidopsis locus tag. - What protein group does your gene product belong to? _____________________ 3. Select “expression” from the menu at the top of the page - In what plant tissues is your gene expressed as determined by Northern analysis? ______________________________ Note: Slq = silique or seed pod; Sdlg = 2 week old seedling (entire plant) - TAIR database: Great for getting a summary of all information known about your Arabidopsis gene! 4. Go back to the Summary page for your gene. 5. Click on the "Locus Link": it will take you to the TAIR information page for your gene. 6. Scroll down to the bottom of this page and notice there are several published papers that reference your gene. You will be reading one of these (Plant orthologs of p300/CBP: conservation of a core domain in metazoan p300/CBP acetyltransferaserelated proteins) for class. There is a lot more information on this page about your gene! ***You should come back to this page throughout the quarter to learn more about your gene as you develop your research projects.***

3

Names: __________________________________ Section: ___________

Bio400

SALK T-DNA database: One way to study a protein's function is to look at the “knock-out” phenotype. Several years ago, an incredible resource of Arabidopsis “knock-out” mutants was compiled at the Salk Institute in San Diego. Random insertion of the Agrobacterium transfer DNA (TDNA) into the Arabidopsis genome created a library of seeds each of which contains one or two insertions in its genome. The Salk institute has set up a searchable database to allow you to find a mutant in your gene of interest. 1. Select the "Salk T-DNA" webpage from the Bio 400 homepage. 2. To make this page easier to read, use the arrows in the upper right corner of the screen to zoom in to 10 bases/pixel. The dark blue bar running horizontally represents the chromosome, with the “top” of the chromosome to the left. The first gene you see at the top of Chromosome 1 is called At1g01010, and the next one is named Atg1g01020 etc. The positions of the genes are shown as broken green arrows above the chromosome (the introns are shown as gaps in the arrow). You can click on the arrows on either end of the dark blue line to move along to the next region of the chromosome. - Name a gene that does not have any introns: ______________________ 3. Underneath the diagrammed chromosome, there are indications of various insertions. For example, insert mutants generated at the Salk Institute lab are indicated in green or pink, and are called “SALK_NNNNNN”. The numbering of these mutant plants is unrelated to the gene in which they are inserted. 4. To search the database for a specific gene, scroll down to SEARCH and type the Arabidopsis locus name into “Query”. Click on “Search”. The display will now be centered on the gene you searched for. You should see SALK insert lines in your gene. 5. Write down the Salk numbers of two T-DNA insertion mutants that you believe will not express your gene. For each one, describe where in the gene the insertion lies (e.g. exon 3) and what effect you predict it will have on gene expression and/or function. The position of the insert determines in part how it affects gene function. 1)

2)

***Check with a TA/instructor to find out which Salk T-DNA line your group will be using. Then proceed to “identifying PCR primers” on the next page.***

4

Names: __________________________________ Section: ___________

Bio400

Identifying PCR Primers

We will be using PCR to confirm that the plants we obtained from the ABRC (Arabidopsis Biological Resource Center) do indeed have a T-DNA insertion. We want you to see how you can design primers necessary for the PCR reaction. 1. Return to the Salk T-DNA website. 2. On the right side of the page, under Tools, select "- iSct primers". (You may have to scroll down.) 3. Scroll down to “2. Salk-TDNA verification Primer design” 4. In the white box, type in the name of your SALK line (using the indicated format) and click on “submit”. In the space below, write the sequences of your LP and RP primers (include 5' and 3' labels): By convention, DNA sequences are written in the 5’ to 3’ orientation. LP: ______________________________________________________ RP: ______________________________________________________ 6. Copy the information you obtain into a Word document and e-mail it to yourself in order to print it and SAVE it. You will need this information to design your PCR protocol in next week’s laboratory. (Note: LP= Left Primer; RP = right primer, TM = melting temperature, GC = % GC content. It also gives you the predicted PCR product size if an insert is present (BP+RP product size.) Note: Since you are joining an on-going research project, students from previous quarters have already ordered the LP and RP primers for the PCR reaction.

5

Names: __________________________________ Section: ___________

Bio400

Double checking primers

Before you go to the expense of ordering primers, carrying out PCR reactions and analyzing results, it’s always a good a idea to double check the primer sequences to see that they are specific for the sequence you’re interested in, and that they will amplify the correct sequence. 1. Go to the TAIR homepage from the Bio 400 homepage. 2. Under TOOLS select SeqViewer. 3. In the text box, paste (or type in) the LP primer sequence, select search by sequence and hit Submit. 4. Look carefully at the screen. The green lines represent the 5 Arabidopsis chromosomes. The position that matches your primer sequence will be indicated by a red #1. 2. Click on “select” (the button in the middle of the top of the page) then scroll down to “1”. 3. A new window will open with a view of the chromosome sequence indicating what region your primer is complementary to. 4. Look along the right side of the screen to see if the gene name written vertically matches your gene of interest. 5. Look at the sequence, and record the sequence position of the primer. Is the sequence italicized or not? (If the sequence is italicized it means that it is identical to the bottom strand of the DNA, if it is not, it means that it is identical to the top strand of the DNA). 6. Now repeat this process for the RP primer and answer the following questions. a. Which strand will the LP primer anneal to? _________________________ b. Which strand will the RP primer anneal to? _________________________ c. Do both primers fall within the correct gene? _________________________ d. What would be the size of a PCR product created by amplifying Arabidopsis genomic DNA using these 2 primer? _________________________ Make sure that the answers make sense with respect to PCR. We will be going over PCR in lecture and lab next week. YOU WILL NEED TO UNDERSTAND HOW TO DESIGN PRIMERS FOR THE LAB PRACTICAL Turn in your worksheet to a TA.

6

Suggest Documents