Unix: Beyond the Basics

Unix: Beyond the Basics 1 Login • Requesting a tak account http://iona.wi.mit.edu/bio/software/unix/bioinfoaccount.php • Windows  PuTTY  Need to...
37 downloads 0 Views 566KB Size
Unix: Beyond the Basics

1

Login • Requesting a tak account http://iona.wi.mit.edu/bio/software/unix/bioinfoaccount.php

• Windows  PuTTY  Need to setup X-windows for graphical display

• Macs Access the Terminal: Go  Utilities  Terminal

2

Login using Secure Shell ssh –Y user@tak PuTTY on Windows

Terminal on Macs

Command Prompt user@tak ~$

3

Hot Topics website: http://jura.wi.mit.edu/bio/education/hot_topics/ • Please login to our tak server, and create a directory for the exercises in your lab share, and use it as your working directory $ mkdir unix2 $ cd unix2

• Download all files into your working directory • You should have the files below in your working directory: – foo.txt, sample1.txt, exercise.txt , join2filesByFirstColumn.pl , datasets folder – you can check with ls command

4

Unix Commands Review (Unix Essentials) $ sort –k2,3 foo.tab start

end

-n or -g: sorting numbers -n is recommended than –g, except for scientific notation or a leading '+' -r: reverse order

$ cut –f1,5 foo.tab $ cut –f1-5 foo.tab -f: -f1,5: -f1-5:

select only these fields select 1st and 5th fields select 1st, 2nd, 3rd, 4th, and 5th fields

$ wc –l foo.txt How many lines are in this file? 5

What you will learn… • • • • • •

sed awk groupBy (bedtools) loops join files together scripting

6

sed: stream editor for filtering and transforming text • Print lines 10 - 15: $ sed -n '10,15p' bigFile > selectedLines

• Delete 5 header lines at the beginning of a file: $ sed '1,5d' file > fileNoHeader

• Remove all version numbers (ex: '.1') from the end of a list of sequence accessions: $ sed 's/\.[0-9]\+//g' accsWithVersion > accsOnly s: substitute g: global modifier d: delete line p: print the current pattern space -n: only print those lines matching our pattern 7

awk • Name comes from the original authors: Alfred V. Aho, Peter J. Weinberger, Brian W. Kernighan

• A simple programing language • Good for short programs, including parsing files

8

awk • Print the 2nd and 1st fields of the file: $ awk ' { print $2"\t"$1 } ' foo.tab • Convert sequences from tab delimited format to fasta format: $ awk ' { print ">"$1"\n"$2 } ' foo.tab > foo.fa $ head -1 foo.tab Seq1 ACTGCATCAC $ head -2 foo.fa >Seq1 ACGCATCAC

#: comment, ignored by awk By default, awk splits each line by spaces

Character

Description

\n

newline

\r

carriage return

\t

horizontal tab

9

awk: field separator • Issues with default separator: white space – one field is gene description with multiple words – consecutive empty cells

• To use tab as the separator: $ awk –F "\t" '{ print NF }' foo.txt or $ awk 'BEGIN {FS="\t"} { print NF }' foo.txt BEGIN: action before read input NF: number of fields in the current record FS: input field separator OFS: output field separator END: action after read input 10

awk: arithmetic operations Add average values of 4th and 5th fields to the file: $ awk '{ print $0"\t"($4+$5)/2 }' foo.tab $0: all fields Operator

Description

+

Addition

-

Subtraction

*

Multiplication

/

Division

%

Modulo

^

Exponentiation

**

Exponentiation

11

awk: making comparisons Print out records if values in 4th or 5th field are above 4: $ awk '{ if( $4>4 || $5>4 ) print $0 } ' foo.tab Sequence

Description

>

Greater than