The Sockets API TELE3118 lecture notes  by Tim Moors, April 15, 2014

Copyright © 15/04/14, Tim Moors

2

Resources 2 books: • Short & sweet (147 pages) o o

o

M. Donahoo and K. Calvert: TCP/IP Sockets in C: Practical Guide for Programmers, Elsevier, 2001 Click on title for free online access from within UNSW Code etc: http://cs.baylor.edu/~donahoo/#series

• Definitive & exhaustive (991 pages) o W. Stevens et al: Unix Network Programming, Vol. 1: The Sockets Networking API, 3rd ed., Addison-Wesley 2004

Copyright © 15/04/14, Tim Moors

3

Transport layer (in 1 crowded slide) Many apps need more than network layer packet delivery service. Transport protocols fix the mismatch, e.g. port #s to distinguish multiple processes • •

Services need ‘well known’ port #s (e.g. http=80, dns=53, ntp=123) Clients can let operating system choose arbitrary “ephemeral” port # = return address, e.g. which web browser window

web app

clock

http dns

ntp port udp

tcp ip

protocol

Reliability • check integrity of received info •

(What if router mangles payload held in its network layer buffer?) o One transport protocol, UDP, only does this.

recover from loss o o

Another transport protocol, TCP, also does this. TCP’s mechanism involves 2 ends agreeing on transfer state

o

Flow control avoids loss due to excess speed, but may involve sending smaller data units => TCP may alter message boundaries

• TCP is “connection-oriented” (UDP “connectionless”) • TCP can only unicast (UDP can multicast/broadcast)

(e.g. app writes “TELE3118”, TCP might send “TELE” “31” “18”)

=> TCP STREAM service vs UDP DGRAM service Copyright © 15/04/14, Tim Moors

4

Sockets history Operating System (OS) should abstract away details of hardware devices

user space

e.g. Unix provides consistent file interface for disk & keyboard & screen I/O

1983: 4.2BSD Unix extended I/O to support network access via “sockets”

1983: Before IP won over OSI, SNA etc => “multiprotocol” (but small addresses) 1983+Unix: C interface, e.g. pass pointers & typecast rather than overload functions. (e.g. (sockaddr*)sock_in)

OS

web app

clock

http dns ntp sockets tcp udp ip

hardware

1992: Microsoft Windows Sockets (“Winsock”) v1.0 simplifies porting network apps to Windows, with slight differences since Windows ≠ Unix. 1999: IPv6 requires sockets extensions [RFC 3493] Bottom line: We want to write portable modern apps, not to read all apps using varied functions from past 25 years. For simplicity, we’ll omit error checking!

The sockets API faces challenges from modern systems, as described in G. Neville-Neil: “Whither Sockets?”,Copyright ACM Queue Magazine, May 2009 © 15/04/14, Tim 7(4):34-5, Moors

5

Web client

// Web client. Copyright (C) 2009, Tim Moors. // Compile with tim_sockets.cpp & link in Ws2_32.lib (Windows), libnsl (Unix) #include "tim_sockets.h" int main(int argc, char **argv) { #include #ifdef _WIN32 SOCKET s = socket(PF_INET, SO # define snprintf _snprintf struct sockaddr_in server; #endif int main(int argc, char **argv) { if (argc != 3) { fprintf(stderr, "Usage: %s server-addr filename", argv[0]); exit(EXIT_FAILURE); }

server.sin_addr.s_addr = inet_ad server.sin_port = htons(80); if(connect(s, (struct sockaddr *) send(s, request, strlen(request while (1) { int bytes = recv(s, buf, BUF_S fwrite(buf, sizeof(char), bytes } }

SOCKET s = socket(PF_INET, SOCK_STREAM, 0); struct sockaddr_in server; server.sin_family = AF_INET; server.sin_addr.s_addr = inet_addr(argv[1]); server.sin_port = htons(80); if(connect(s, (struct sockaddr *)&server, sizeof(server))==0) { #define MAX_REQ_LEN 1000 } char request[MAX_REQ_LEN]; snprintf(request, MAX_REQ_LEN, "GET %s HTTP/1.1\nHost: %s\n\n", argv[2], argv[1]); send(s, request, strlen(request), 0);

// receive file and write to stdout while (1) { #define BUF_SIZE 1234 char buf[BUF_SIZE]; int bytes = recv(s, buf, BUF_SIZE, 0); if (bytes 1st member IDs family & rest IDs endpoint. Functions defined in generic terms => typecast: (sockaddr*)&sin struct sockaddr { struct sockaddr_in{ unsigned short sa_family; short sin_family; char sa_data[14]; unsigned short sin_port; }; struct in_addr sin_addr; in_addr gives access to IPv4 char sin_zero[8]; address, most usefully as a }; 32b word .sin_addr.s_addr 14B data is too small for IPv6=>generalise to sockaddr_storage large enough to hold any address that the host supports. struct sockaddr_storage { struct sockaddr_in6 { short ss_family; short sin6_family; char __ss_pad1[_SS_PAD1SIZE]; u_short sin6_port; __int64 __ss_align; u_long sin6_flowinfo; char __ss_pad2[_SS_PAD2SIZE]; struct in6_addr sin6_addr; }; u_long sin6_scope_id; }; Copyright © 15/04/14, Tim Moors

13

bind() int bind(SOCKET s, const struct sockaddr *my_addr, socklen_t addrlen); • Specifies the local sockaddr o

o

Port: When app (e.g. client) doesn’t care about value, 0 asks OS to choose an ephemeral port IP address: App that doesn’t know local address or doesn’t care which interface can specify INADDR_ANY (IN6ADDR_ANY_INIT for IPv6) • Interface may matter for multihomed machines (e.g. WiFi+Eth; firewall)

• bind() fails if sockaddr is in use. o e.g. bind() may fail when restarting program just after it closed since TCP conn. is in TIME_WAIT state. o Solve with SO_REUSEADDR socket option.

Copyright © 15/04/14, Tim Moors

14

connect() Specifies remote end. Informs OS of destination address for sending, and how to filter/route incoming packets. int connect(SOCKET s, const struct sockaddr *serv_addr, socklen_t addrlen); Typically used with SOCK_STREAM • Starts TCP handshake with remote end (in addition to informing local OS). Returns when handshake is finished (e.g. may be ECONNREFUSED or ENETUNREACH) • If client didn’t set local endpoint (e.g. with bind()) then OS will set it like INADDR_ANY & use ephemeral port Sometimes used with SOCK_DGRAM: • Specifies default destination for send() & • Restricts recv() to remote. • Can change remote end by calling connect() again. • Yes, connectionless UDP can connect()!: UDP is connectionless at the transport layer, but connect() can be used to set state info for the sockets API Copyright © 15/04/14, Tim Moors

15

listen()ing for connections By default, sockets support clients that initiate, rather than respond to, connection requests TCP: If receive SYN (= start handshake request) then send RST

listen() tells OS that app is a server & specifies backlog TCP: If receive SYN then send SYN+ACK int listen(SOCKET s, int backlog);

Servers usually call bind() before listen() to specify well known port. • OS queues clients when multiple try to connect => “backlog” = max # of ”pending connections” = handshake in progress or complete. •

Backlog numerology: Often 5 (max for 4.2BSD). OS may add a fudge factor (Linux: +3) to include handshakes in progress when app treats backlog as handshakes complete. Hard to set properly [Stevens UNP pp. 106-7]

• listen() returns promptly, when OS is ready;

accept() indicates when clients are ready

Copyright © 15/04/14, Tim Moors

16

accept()ing connections accept(): Respond to handshake request from client. To concurrently serve multiple clients, must distinguish communication with each => create a new socket for each. SOCKET accept(SOCKET s, struct sockaddr *addr, socklen_t *addrlen); o o

addr identifies client Return value = new socket

Copyright © 15/04/14, Tim Moors

17

Number conversion •





In Jonathan Swift’s “Gulliver’s Travels”, Lilliputians and Blefuscudians fight a war over the (trivial) issue of which end to break their eggs at (big or small) Computing: When storing integers, which byte to store in lower address? A1: Intel et al: Little Endian (least significant) A2: IBM et al: Big Endian Networking: Send big or little end of integer first? A: Must standardise: Big Endian.

Image from www.laputan.org/ images/pictures/little-endian.jpg

Sockets uses Big Endian network format, which may differ from host format. => functions to convert hton or ntoh for short (port) or long (32b IPv4 address) integers htonl(), ntohl(), htons(), ntohs() • Be careful: Compiler doesn’t usually type check, e.g. sa.sin_port=0xCAFE = 0xCAFE or 0xFECA depending on machine. Use sa.sin_port=htons(0xCAFE) •

Similarly, be careful with structs, since different hosts/compilers may align members differently. Write functions to send/receive all member values, rather than directly transferring memory occupied by struct. Copyright © 15/04/14, Tim Moors

18

Presenting addr’s to humans • sockaddr stores addresses in numeric (binary) form.

Want to present sin*_addr members in humanreadable form, e.g. “dotted decimal” for IPv4. (sin_port #s are readily presented as integers.)

inet_[np]to[pn](): (n = numeric, p = presentable/printable) Protocol-independent & available under Linux and Windows Vista const char *inet_ntop(int af, const void *src, char *dst, socklen_t cnt); o o o

src points to in_addr or in6_addr Return value (points to dst if successful) useful when including as an argument, e.g. in printf Capitalised as “InetNtop()” under Windows Vista

int inet_pton(int af, const char *src, void *dst);

Older similar functions: inet_[na]to[an]() and inet_addr() Windows also has its own functions, e.g. WSAStringToAddress() is analogous to inet_pton() Copyright © 15/04/14, Tim Moors

19

Using names Textual names are even more “presentable” than addresses (e.g. example.com vs 208.77.188.166) • •

See lecture on Domain Name System (DNS) Sockets names services (e.g. “http”=80) as well as devices; ignore service name by passing NULL for serv(ice)

int getaddrinfo(const char *node, const char *service, const struct addrinfo *hints, struct addrinfo **res); struct addrinfo { • • •

Maps name to address int ai_flags; e.g. AI_NUMERICHOS //\ hints: e.g. to request IPv4 or IPv6 int ai_family; Release res with freeaddrinfo() int ai_socktype; //|-ala socket()

int getnameinfo( const struct sockaddr *sa, socklen_t salen, }; char *host, size_t hostlen,

int ai_protocol; /// size_t ai_addrlen; struct sockaddr *ai_addr; char *ai_canonname; Canonical struct addrinfo *ai_next; // Allo

char *serv, size_t servlen, int flags); • Maps address to “name” • flags: e.g. NI_NUMERICHOST for printable addr not DNS name Before IPv6 and Windows XP, apps used gethostbyname() & gethostbyaddr() Copyright © 15/04/14, Tim Moors

20

Broadcast and multicast Use SOCK_DGRAM to broadcast or multicast, since SOCK_STREAM only unicasts Broadcast needs deliberate setup (to protect against mistake in address spamming network), through “setsockopt(s, SOL_SOCKET, SO_BROADCAST” Multicast doesn’t spread everywhere. • Receivers must tell routers to add/drop membership: App tells OS which uses IGMP† to tell routers:

ip_mreq† struct identifies multicast address & interface setsockopt(s, IPPROTO_IP, IP_ADD_MEMBERSHIP, (char*)&mreq, sizeof(mreq)); Then IP_DROP_MEMBERSHIP when done



Sender may want to further limit propagation (TTL)



Control whether you receive what you send to a multicast group that you’re a member of with “setsockopt(…IP_MULTICAST_LOOP”

“setsockopt(… IP_MULTICAST_TTL”

† IPv6 uses ICMPv6, rather than IGMP, and ipv6_mreq Copyright © 15/04/14, Tim Moors

21

Cleaning up close() tells the OS that the app is finished with a socket. • For TCP sockets, OS usually† closes connection by sending FIN. • Important for servers that create new socket when accept()ing each new client. Without closing, socket resources would get used up. • close() stops both send & recv to/from socket o

Info in OS queue waiting to be sent will still be sent

SO_LINGER option => Whether TCP lingers after close() to ensure reliable transfer (e.g. wait for acks & retransmit) shutdown(SOCKET s, int how) Like close, but: • allows specific types of access to be shut down how: SHUT_RD, SHUT_WR, SHUT_RDWR e.g. client stops writing to indicate end of request, but waits to read reply. • Forces closure o

close() maintains socket if still used by another process, server may fork() after accept() & one process close()s but other hasn’t finished

Copyright © 15/04/14, Tim Moors

22

Summary (for reference) SOCKET socket(int socket_family, int socket_type, int protocol); int setsockopt(SOCKET s,int level,int optname,const void *optval,socklen_t optlen); ssize_t send(SOCKET s,const void *buf,size_t len,int flags); ssize_t recv(SOCKET s, void *buf, size_t len, int flags); ssize_t sendto(SOCKET s,const void *buf,size_t len,int flags,const struct sockaddr *to,socklen_t tolen); ssize_t recvfrom(SOCKET s,void *buf,size_t len,int flags,struct sockaddr *from,socklen_t *fromlen);

int bind(SOCKET s, const struct sockaddr *my_addr, socklen_t addrlen); int connect(SOCKET s, const struct sockaddr *serv_addr, socklen_t addrlen); int listen(SOCKET s, int backlog); SOCKET accept(SOCKET s, struct sockaddr *addr, socklen_t *addrlen); htonl(), ntohl(), htons(), ntohs() const char *inet_ntop(int af, const void *src, char *dst, socklen_t cnt); int inet_pton(int af, const char *src, void *dst); int getaddrinfo(const char *node, const char *service, const struct addrinfo *hints, struct addrinfo **res); int getnameinfo(const struct sockaddr *sa, socklen_t salen, char *host, size_t hostlen, char *serv, size_t servlen, int flags); close() shutdown(SOCKET s, int how)

struct sockaddr { unsigned short sa_family; char sa_data[14]; };

struct sockaddr_in{ short sin_family; unsigned short sin_port; struct in_addr sin_addr; char sin_zero[8]; };

Copyright © 15/04/14, Tim Moors

23

Links • Socket options give control of TCP/IP,

e.g. Nagle algorithm, IP TTL, IGMP etc • Framing: Apps that use SOCK_STREAM must often frame their data; e.g. using length fields

Copyright © 15/04/14, Tim Moors