Blame - third_party/lksctp-tools/doc/ols.tex - RealtimeRoboticsGroup/test

blob: 1e476064d9520e3e07701d7241108440ad6c3f06 [file] [log] [blame]

Austin Schuh	8d0a285	2019-12-28 22:54:28 -0800	[diff] [blame]	1	% TEMPLATE for Usenix papers, specifically to meet requirements of
				2	% TCL97 committee.
				3	% originally a template for producing IEEE-format articles using LaTeX.
				4	% written by Matthew Ward, CS Department, Worcester Polytechnic Institute.
				5	% adapted by David Beazley for his excellent SWIG paper in Proceedings,
				6	% Tcl 96
				7	% turned into a smartass generic template by De Clarke, with thanks to
				8	% both the above pioneers
				9	% use at your own risk. Complaints to /dev/null.
				10	% make it two column with no page numbering, default is 10 point
				11
				12	% Munged by Fred Douglis <douglis@research.att.com> 10/97 to separate
				13	% the .sty file from the LaTeX source template, so that people can
				14	% more easily include the .sty file into an existing document. Also
				15	% changed to more closely follow the style guidelines as represented
				16	% by the Word sample file.
				17	% This version uses the latex2e styles, not the very ancient 2.09 stuff.
				18
				19	% adapted for Ottawa Linux Symposium
				20
				21	\documentclass[twocolumn]{article}
				22	\usepackage{ols,epsfig}
				23	\begin{document}
				24
				25	%Remove this next line if your system defaults correctly.
				26	\special{papersize=8.5in,11in}
				27
				28	%don't want date printed
				29	\date{}
				30
				31	%make title bold and 14 pt font (Latex default is non-bold, 16 pt)
				32	\title{\Large \bf Linux Kernel SCTP : The Third Transport}
				33
				34	%for single author (just remove % characters)
				35	\author{
				36	La~Monte H.P.\ Yarroll \\
				37	%{\em Your Department} \\
				38	{\em Motorola GTSS}\\
				39	%{\em Your City, State, ZIP}\\
				40	% is there a standard format for email/URLs??
				41	% remember that ~ doesn't do what you expect, use \~{}.
				42	{\normalsize piggy@acm.org} \\
				43	%
				44	% copy the following lines to add more authors
				45	\and
				46	Karl Knutson \\
				47	{\em Motorola GTSS}\\
				48	%% is there a standard format for email/URLs??
				49	{\normalsize karl@athena.chicago.il.us}
				50	%
				51	} % end author
				52
				53	\maketitle
				54
				55	% You have to do this to suppress page numbers. Don't ask.
				56	\thispagestyle{empty}
				57	\renewcommand{\thefootnote}{\fnsymbol{footnote}}
				58
				59	\subsection*{Abstract}
				60	% nuked italics -- FD
				61
				62	The Stream Control Transmission Protocol (SCTP) is a reliable
				63	message-oriented protocol with transparent support for multihoming.
				64	It allows multiple independent complex exchanges which all share a
				65	single connection and congestion context.
				66
				67	We provide an overview of the protocol, the UDP-style API and the
				68	details of the Linux kernel reference implementation. The brief API
				69	discussion is intended for developers wishing to use SCTP. The
				70	detailed implementation discussion is for developers interested in
				71	contributing to the kernel development effort.
				72
				73	\section{Introduction}
				74
				75	The developers at the Linux 2.5 Kernel Summit in San Jose achieved a
				76	rough consensus that 2.5 should probably support SCTP, a new transport
				77	protocol from the IETF. This paper introduces the ongoing work on
				78	such an implementation, providing some details for both the
				79	application developer and the kernel developer.
				80
				81	The Stream Control Transmission Protocol (SCTP) is a reliable
				82	message-oriented protocol with transparent support for multihoming.
				83	It allows multiple independent complex exchanges which all share a
				84	single connection and congestion context.
				85
				86	\subsection{History of SCTP}
				87	The SIGTRAN (Signalling Transport) Working Group of the IETF is
				88	concerned with the transport of telephony signalling data over IP.
				89	Upon reviewing the available standard transport protocols, they
				90	concluded that none of them met the transport requirements of
				91	signalling data.
				92
				93	SIGTRAN concluded that they needed a new transport protocol which
				94	could provide reliable message delivery, tolerate network failures,
				95	and avoid the head-of-line-blocking problem. We will discuss this
				96	problem later.
				97
				98	The WG selected a proposal from Randall Stewart and Qiaobing Xie of
				99	Motorola as a starting point. Stewart and Xie had developed a
				100	Distributed Processing Environment, Quantix, aimed at telephony
				101	applications. This DPE had been successfully demonstrated at Geneva
				102	Telecom in 1999.
				103
				104	The Working Group took great care in constructing the new protocol,
				105	SCTP, incorporating many lessons learned from TCP, such as congestion
				106	control, selective ACK, message fragmentation and bundling.
				107
				108	The core transport protocol from Quantix brought support for
				109	multihoming, message framing, and streams. We discuss all of these
				110	features at length later.
				111
				112	The IESG decided that the resulting protocol was robust enough to be
				113	elevated from a specialised transport for telephony signalling to a new
				114	general purpose transport to stand beside UDP and TCP. To this end,
				115	they moved the work from SIGTRAN to TSVWG, the general transport
				116	group.
				117
				118	As of this writing, the core specification, \cite{rfc2960}, is at Proposed
				119	Standard. There have been three successful bakeoffs covering over 25
				120	separate implementations. Lessons learned from the most recent
				121	bakeoff are being written up in an ``Implementor's Guide'', \cite{impl}.
				122
				123	\subsection{SCTP in the Linux kernel}
				124
				125	Shortly before the first bakeoff, the IESG asked SIGTRAN to move SCTP
				126	from riding on UDP to riding directly on top of IP. The long term
				127	goal was clearly was to move SCTP from user space into the kernel.
				128
				129	Aside from the obvious performance gains, this has the effect of
				130	reducing the number of implementations to roughly one per operating
				131	system. This makes it easier to verify the stability of most of the
				132	implementations which appear on the Internet.
				133
				134	Randall Stewart saw the importance of this and started one of the
				135	authors of this paper working on a port of the user space
				136	implementation to the Linux kernel. This port was intended as a
				137	reference for developers of implementations for other kernels to
				138	examine. The Linux kernel implementation has since diverged
				139	significantly from the user space reference, but maintains the
				140	standards of a reference implementation (see Coding Standards, below).
				141
				142	\subsection{SCTP examples}
				143
				144	SCTP is a reliable message-oriented protocol with transparent support
				145	for multihoming. It allows multiple independent complex exchanges
				146	which all share a single connection and congestion context.
				147
				148	Many network applications operate by exchanging simultaneously, short,
				149	similar sequences of data continuously. The traffic produced by these
				150	operations can be characterised as MICE (Multiple Independent Complex
				151	Exchanges). It is also true that many applications which use MICE
				152	also have high network reliability requirements.
				153
				154	\subsubsection{A database app}
				155	One example is a client/server database application. Each request and
				156	each response is a message. Each transaction is a sequence of
				157	dependent request/response pairs.
				158
				159	Implemented over TCP, this application would have to provide its own
				160	message boundaries, since TCP sends bytes, not messages. How do we
				161	implement MICE with TCP? We have two ways of doing this: multiple
				162	connections, or a single multiplexed and reused connection.
				163
				164	With each transaction over a separate TCP connection, we gain the
				165	independence of transactions, but at a cost in performance. Since TCP
				166	(as a general purpose transport protocol) uses congestion control,
				167	each of the connections would have to go through slow-start and if
				168	most transactions were short, they would never get out of slow-start.
				169
				170	With all transactions over a single TCP connection, we make efficient
				171	use of the network bandwidth, but open ourselves up to the
				172	head-of-line blocking problem. This means that if one segment in one
				173	transaction is lost, this blocks all transactions, not just the one
				174	with the lost segment.
				175
				176	If we use SCTP for the same application we gain the benefits of using
				177	TCP, as well as advantages peculiar to SCTP. SCTP directly supports
				178	messages and guarantees TCP-like levels of bandwidth efficiency via
				179	bundling and fragmentation. Each database transaction can be
				180	represented as an ordered stream of messages, which are independent in
				181	SCTP for retransmission purposes. This means that while SCTP has the
				182	same congestion control mechanisms as TCP, it does not have to resort
				183	to multiple connections nor is it vulnerable to the head-of-line
				184	blocking problem.
				185
				186	\subsubsection{A free clinic}
				187
				188	Another example of SCTP use is for a free\footnote{Free as in ``free
				189	beer''.} clinic which needs a reliable way to use its IP-networked
				190	patient monitoring software.
				191
				192	This has many similarities to the example above in that different
				193	monitoring devices would need to send simultaneous
				194	information---multiple independent complex exchanges. The main
				195	difference is in the higher network reliability requirements.
				196
				197	A reasonable way to improve the network reliability is to set up a
				198	parallel network and use multihoming for the client and server
				199	applications. However, if the application is TCP-based, the
				200	multihoming needs to be added to the application. With SCTP, the
				201	multihoming ability is built into the protocol. All that is necessary
				202	is to make the appropriate socket calls and SCTP will take advantage
				203	of the addresses available in the existing network. This also applies
				204	if one side of the connection has more addresses than the other.
				205
				206	\section{The UDP-style API}
				207
				208	Any new protocol needs an API. In particular for an Internet
				209	protocol, it's important to have the API match the API normally used
				210	for IP networks. This is the Berkeley sockets model---the SCTP
				211	version is defined in the Internet Draft ``Sockets API Extensions for
				212	SCTP''\cite{api}. The API draft defines two complementary interfaces
				213	to SCTP--one for compatibility with older TCP-based applications, and
				214	another for new applications designed expressly to use SCTP. The
				215	Linux Kernel SCTP stack does not yet implement the former, so we
				216	discuss only the UDP-style interface.
				217
				218	The conceptual model of the UDP-style API is (naturally) that of plain
				219	UDP. To send a message in UDP, you create a socket, bind an address
				220	to it and send your message using \texttt{sendmsg()}. To receive a
				221	message in UDP, you create a socket, bind an address to it and use
				222	\texttt{recvmsg()}. It's much the same with the UDP-style API for
				223	SCTP. To send a message, you create a socket, bind \textit{addresses}
				224	to it and use \texttt{sendmsg()}. The SCTP stack underlying the API
				225	handles association startup and shutdown automatically. The same goes
				226	for message reception. To receive a message in UDP-style, you create
				227	a socket, bind \textit{addresses} to it and use \texttt{recvmsg()}.
				228
				229	The important API differences between UDP and UDP-style SCTP are:
				230	multihoming; ancillary data; and the option of notifications from the
				231	SCTP stack.
				232
				233	\subsection{Multihoming and \texttt{bindx()}}
				234
				235	There are three ways to work with multihoming with SCTP. One is to
				236	ignore multihoming and use one address. Another way is to bind all
				237	your addresses through the use of \texttt{INADDR\_ANY} or
				238	\texttt{IN6ADDR\_ANY}. This will ``associate the endpoint with the
				239	optimal subset of available local interfaces.''(Section 3.1.2,
				240	\cite{api}) The most flexible way is through the use of
				241	\texttt{sctp\_bindx()}, which allows additional addresses to be
				242	added to a socket after the first one is bound with \texttt{bind()},
				243	but before the socket is used to transfer or receive data. The
				244	function \texttt{sctp\_bindx()} is further described in section 8.1 of
				245	\cite{api}.
				246
				247	\subsection{Ancillary data}
				248
				249	To use streams with the UDP-style API, you use ancillary data in the
				250	\texttt{struct~cmsghdr} part of the \texttt{struct~msghdr} argument to
				251	both \texttt{sendmsg()} and \texttt{recvmsg()}. Ancillary data is
				252	used for initialisation data (\texttt{struct~sctp\_initmsg} and for
				253	header data (\texttt{struct~sctp\_sndrcvinfo}).
				254
				255	Ancillary data are manipulated with the macros \texttt{CMSG\_FIRSTHDR,
				256	CMSG\_NEXTHDR, CMSG\_DATA, CMSG\_SPACE, \textnormal{and} CMSG\_LEN}.
				257	These are all defined in \cite{rfc2292}. \cite{api} provides a nice
				258	example in section 5.4.2.
				259
				260	{\tt \small
				261	\begin{verbatim}
				262	struct sctp_initmsg {
				263	uint16_t sinit_num_ostreams;
				264	uint16_t sinit_max_instreams;
				265	uint16_t sinit_max_attempts;
				266	uint16_t sinit_max_init_timeo;
				267	};
				268	\end{verbatim}
				269	}
				270
				271	The initialisation ancillary data sets information for starting
				272	new associations.
				273
				274	{\tt \small
				275	\begin{verbatim}
				276	struct sctp_sndrcvinfo {
				277	uint16_t sinfo_stream;
				278	uint16_t sinfo_ssn;
				279	uint16_t sinfo_flags;
				280	uint32_t sinfo_ppid;
				281	uint32_t sinfo_context;
				282	uint8_t sinfo_dscp;
				283	sctp_assoc_t sinfo_assoc_id;
				284	};
				285	\end{verbatim}
				286	}
				287
				288	The header ancillary data reports information gleaned from the SCTP
				289	headers. If requested with the \texttt{SCTP\_RECVDATAIOEVNT} socket
				290	option, this ancillary data is provided with every inbound data
				291	message. There is a handy key (\texttt{sinfo\_assoc\_id}) which
				292	identifies the association for this particular message. It also
				293	provides the flags needed to implement partial delivery of very large
				294	messages.
				295
				296	Outbound messages should include an \texttt{sctp\_sndrcvinfo} ancillary
				297	data structure to tell SCTP which SCTP stream to put this datagram
				298	into. It is also possible to set a default stream so that this
				299	ancillary data may be omitted.
				300
				301	\subsection{Notifications}
				302
				303	SCTP provides for the concept of optional notifications. These are
				304	messages delivered in-band about events inside the SCTP stack, such as
				305	a destination transport address failure or a new association coming
				306	up. The notifications are marked with the \texttt{MSG\_NOTIFICATION}
				307	flag in the \texttt{msg\_flags} field of the \texttt{sctp\_sendrcvinfo}
				308	ancillary data. The notification is delivered as the body of the
				309	message returned by \texttt{recvmsg()}.
				310
				311	In \ref{notifications} we find a table of notifications. Each
				312	notification delivers its own data structure which shares the same
				313	name (lower case, naturally) as the notification type itself. The first
				314	field of every notification is a \texttt{uint16\_t} which caries the
				315	notification type.
				316
				317	\begin{figure*}[t]
				318	\begin{center}
				319	{\tt
				320	\begin{tabular}{ l l l }
				321	\hline
				322	\textnormal{Type} & \textnormal{Socket Option} & \textnormal{Description} \\
				323	\hline
				324	SCTP\_ASSOC\_CHANGE & SCTP\_RECVASSOCEVNT & \textnormal{Change of association} \\
				325	SCTP\_PEER\_ADDR\_CHANGE & SCTP\_RECVADDREVNT & \textnormal{Change in status of a given address} \\
				326	SCTP\_REMOTE\_ERROR & SCTP\_RECVPEERERR & \textnormal{An error received from a peer} \\
				327	SCTP\_SEND\_FAILED & SCTP\_RECVSENDFAILEVNT & \textnormal{A failure to send} \\
				328	SCTP\_SHUTDOWN\_EVENT & SCTP\_RECVDOWNEVNT & \textnormal{The reception of a \texttt{SHUTDOWN} chunk} \\
				329	\end{tabular}
				330	}
				331	\end{center}
				332	\caption{\label{notification}Useful notifications for an SCTP socket}
				333	\end{figure*}
				334
				335	\section{The lksctp Project}
				336
				337	A critical factor in the success of any new IETF protocol is of course
				338	a Linux implementation. Fortunately, key personnel at Motorola
				339	recognised this and encouraged us to tackle such a project. Months
				340	later, we have a core implementation with an ever-expanding feature
				341	set. We now have significant participation from developers at IBM and
				342	Intel and the pace is picking up.
				343
				344	\subsection{Coding standards}
				345
				346	In addition to the usual requirements of kernel code, our code seeks
				347	to be a useful reference for people making their own kernel
				348	implementations of SCTP. If a reader has some question about how to
				349	implement a particular section of the RFC, they need only grep for the
				350	relevant text in our code and they can find an example. As much as
				351	practical, we draw names directly from the RFC. We made the state
				352	machine into an explicit table (see \ref{states} for an
				353	excerpt) with names that refer directly back to the relevant section
				354	numbers. Clarity is a compelling requirement for our code.
				355
				356	\subsection{Extreme Programming}
				357
				358	As the project grew and we added developers, we clearly needed some
				359	way of coordinating our work. We decided to experiment with Extreme
				360	Programming, \cite{xp}.
				361
				362	XP is a collection of practices aimed at controlling risk in a small
				363	to medium-sized software development project. One important principle
				364	is that you should do the simplest thing that could possibly work. A
				365	second important principle is to take advantage of the fact that
				366	programmers like to code.
				367
				368	We use a range of XP practices, but the practices which are most
				369	visible to anybody who reads or works on lksctp are the tests and the
				370	metaphors.
				371
				372	\section{The Tests}
				373
				374	One of the XP practices we use is code-to-the-test. XP asks, ``If
				375	testing is good, why don't we do it all the time?'' Instead of writing
				376	tests for working code, write tests first, and then write code to pass
				377	the tests. This practice leads to a large automated test suite which
				378	runs several times per day.
				379
				380	We use three kinds of test, unit tests, test frame functional tests,
				381	and live kernel functional tests.
				382
				383	The most basic form of test is the unit test. Unit tests exercise all
				384	the interfaces of a particular object and confirm that it behaves
				385	correctly. They also encode regression checks for fixed bugs. These
				386	tests all have names beginning with \texttt{test\_}.
				387
				388	The second form of test is the test frame functional tests. These are
				389	the tests with names beginning with \texttt{ft\_frame\_}. These tests
				390	check for external behaviours of the system, but with a simulated
				391	kernel. The simulated kernel is very light weight and gives us very
				392	fine control over things like timing and network properties.
				393
				394	Ideally, functional tests should be written by the customer for a
				395	system---they encode the behaviours that the customer expects. In our
				396	case, we play the role of customer on behalf of the RFC. We also use
				397	test frame functional tests to define work items for off-site
				398	development groups. The off-site group writes tests which describe the
				399	feature they intend to implement and submits those tests as a
				400	proposal. This has proven an excellent medium for describing work.
				401
				402	The final form of test we use is the live kernel functional test. We
				403	have many fewer of these than we would like---they are difficult to run
				404	since we must install and boot a kernel to test. This is much more
				405	work than simply running \texttt{make unit\_test}. We are exploring UML as
				406	a possible way to automate our kernel functional tests. These tests
				407	have names beginning with \texttt{ft\_kern\_}.
				408
				409	Code-to-the-test is a practice which you can introduce at any point in
				410	a project. When you first start, it seems that you are spending more
				411	time writing tests than writing code, but once you begin to have a
				412	critical mass of interacting tests you begin to see significant
				413	payoffs in both code quality and development velocity.
				414
				415	We have had several incidents where interactions between unit tests
				416	and functional tests have uncovered complimentary masking bugs.
				417
				418	Tests are not a substitute for understanding code---they are a
				419	mechanism for encoding that understanding to share with other
				420	developers, including future versions of yourself. You can learn
				421	nearly as much about our code by reading our tests as by reading the
				422	code itself.
				423
				424	Lately, we have begun using functional tests to encode major bugs.
				425	These are among the best of all possible bug reports---they describe
				426	the failure precisely and tell exactly when the problem is gone.
				427	After the bugs are fixed the tests serve as part of the regression
				428	suite.
				429
				430	\section{The Metaphors}
				431
				432	XP projects are built around a unifying metaphor rather than an
				433	elaborate architecture. In our case, we chose two metaphors which
				434	could serve quite well for nearly any protocol development project.
				435
				436	Our metaphors are the state machine and the smart pipe. Most readers
				437	are probably familiar with the state machine, but the smart pipe is a
				438	twist on a familiar concept. The idea behind a smart pipe\footnote{An
				439	alternate term may be ``oven''.} is that raw stuff goes in one end and
				440	cooked stuff comes out the other end.
				441
				442	\subsection{The State Machine}
				443
				444	The state machine in our implementation is quite literal. We have an
				445	explicit state table which keys to specific state functions which are
				446	tied directly back to parts of the RFC. The core of the state machine
				447	(found in \texttt{sctp\_do\_sm()}) is almost purely functional---only header
				448	conversions are permitted. Each state function produces a description
				449	of the side effects (in the form of a \texttt{struct~sctp\_sm\_retval})
				450	needed to handle the particular event. A separate side effect
				451	processor, \texttt{sctp\_side\_effects()}, converts this structure into
				452	actions.
				453
				454	Events fall into four categories. The RFC is very explicit about
				455	state transitions associated with arriving chunks. The RFC discusses
				456	transitions due to primitive requests from upper layers, but many of
				457	these are implementation dependent. The third category of events is
				458	timeouts. The final category is a catch-all for odd events like
				459	queues emptying.
				460
				461	\begin{figure*}[t]
				462	\begin{center}
				463	{\tt
				464	\begin{tabular}{ l l l l l }
				465	\hline
				466	\textnormal{State:} & CLOSED & COOKIE-WAIT & COOKIE-ECHOED & ESTABLISHED \\
				467	\hline
				468	\hline
				469	\textnormal{Chunks} & & & & \\
				470	\hline
				471
				472	INIT & do\_5\_1B\_init & do\_5\_2\_1\_siminit & do\_5\_2\_1\_siminit & do\_5\_2\_2\_dupinit \\
				473	INIT ACK & discard(5.2.3) & do\_5\_1C\_ack & discard(5.2.3) & discard(5.2.3) \\
				474	COOKIE ECHO & do\_5\_1D\_ce & do\_5\_2\_4\_dupcook& do\_5\_2\_4\_dupcook& do\_5\_2\_4\_dupcook \\
				475	COOKIE ACK & discard & discard(5.2.5) & do\_5\_1E\_ca & discard(5.2.5) \\
				476	DATA & tabort\_8\_4\_8 & discard(6.0) & discard(6.0) & eat\_data\_6\_2 \\
				477	SACK & tabort\_8\_4\_8 & discard(6.0) & eat\_sack\_6\_2\_1 & eat\_sack\_6\_2\_1 \\
				478
				479	\hline
				480	\textnormal{Timeouts} & & & & \\
				481	\hline
				482
				483	T1-INIT TO & bug & do\_4\_2\_reinit & bug & bug \\
				484	T3-RTX TO & bug & bug & do\_6\_3\_3\_retx & do\_6\_3\_3\_retx \\
				485
				486	\hline
				487	\textnormal{Primitives} & & & & \\
				488	\hline
				489
				490	PRM\_ASSOCIATE & do\_PRM\_ASOC & error & error & error \\
				491	PRM\_SEND & error & do\_PRM\_SENDQ6.0 & do\_PRM\_SENDQ6.0 & do\_PRM\_SEND \\
				492
				493	\end{tabular}
				494	}
				495	\end{center}
				496	\caption{\label{states}Portion of SCTP state table showing association initialisation}
				497	\end{figure*}
				498
				499	In order to create an explicit state machine, it was necessary to
				500	first create an explicit state table. The process of creating this
				501	table uncovered a few minor contradictions in one of the drafts of the
				502	RFC. These mostly involved conflicting catch-all cases. In Figure 1
				503	we have an excerpt which shows the state functions involved in
				504	initialising a new association.
				505
				506	\subsection{The Smart Pipes}
				507
				508	Each smart pipe has one or more structures which define its internal
				509	data, and a set of functions which define its external interactions.
				510	In this respect these smart pipes can be considered a type of object,
				511	in the OO sense. All of these definitions can be found in the include
				512	file \texttt{<net/sctp/sctpStructs.h>}.
				513
				514	Most of our smart pipes have push inputs---external objects explictly
				515	put things in by calling methods directly. A pull input is
				516	possible---the smart pipe would need to have a way to register a
				517	callback function which can fetch more input in response to some other
				518	stimulus.
				519
				520	Some of our pipes use pull outputs. E.g. \texttt{SCTP\_ULPqueue} passes
				521	data and notifications up the protocol stack through explicit calls to
				522	the socket functions, usually \texttt{readmsg(2)}. Some of our smart
				523	pipes use push outputs. E.g. \texttt{SCTP\_outqueue} has a set of
				524	callback functions which it invokes when it needs to send chunks out
				525	toward the wire.
				526
				527	There are four smart pipes in lksctp. They are
				528	\texttt{SCTP\_inqueue}, \texttt{SCTP\_ULPqueue},
				529	\texttt{SCTP\_outqueue}, and \texttt{SCTP\_packet}. The first two
				530	carry information up the stack from the wire to the user; the second
				531	two carry information back down the stack.
				532
				533	\subsubsection{\texttt{SCTP\_inqueue}}
				534
				535	\texttt{SCTP\_inqueue} accepts packets and provides chunks. It is
				536	responsible for reassembling fragments, unbundling, tracking received
				537	TSN's for acknowledgement, and managing rwnd for congestion control.
				538	There is an \texttt{SCTP\_inqueue} for each endpoint (to handle chunks
				539	not related to a specific association) and one for each association.
				540
				541	The function \texttt{sctp\_v4\_rcv()} (which is the receiving function
				542	for SCTP registered with IPv4) calls \texttt{sctp\_push\_inqueue()} to
				543	push packets into the input queue for the appropriate association or
				544	endpoint. The function \texttt{sctp\_push\_inqueue()} schedules
				545	either \texttt{sctp\_bh\_rcv\_asoc()} or \texttt{sctp\_bh\_rcv\_ep()}
				546	on the immediate queue to complete delivery. These functions call
				547	\texttt{sctp\_pop\_inqueue()} to pull data out of the
				548	\texttt{SCTP\_inqueue}. This function does most of the work for this
				549	smart pipe.
				550
				551	The functions \texttt{sctp\_bh\_rcv\_ep()} and
				552	\texttt{sctp\_bh\_rcv\_asoc()} run the state machine on incoming
				553	chunks. Among many other side effects, the state machine can generate
				554	events for an upper-layer-protocol (ULP), and/or chunks to go back out
				555	on the wire.
				556
				557	\subsubsection{\texttt{SCTP\_ULPqueue}}
				558
				559	\texttt{SCTP\_ULPqueue} is the smart pipe which accepts events (either
				560	user data messages or notifications) from the state machine and
				561	delivers them to the ULP through the sockets layer. It is responsible
				562	for delivering streams of messages in order. There is one
				563	\texttt{SCTP\_ULPqueue} for every endpoint, but this is likely to
				564	change at some point to one \texttt{SCTP\_ULPqueue} for each socket.
				565	This smart pipe uses a data structure distributed between the
				566	\texttt{struct~SCTP\_endpoint} and the
				567	\texttt{struct~SCTP\_association}.
				568
				569	The state machine, \texttt{sctp\_do\_sm()}, pushes data into an
				570	\texttt{SCTP\_ULPqueue} by calling
				571	\texttt{sctp\_push\_chunk\_ULPqueue()}. It pushes notifications with
				572	\texttt{sctp\_push\_event\_ULPqueue()}. The sockets layer extracts
				573	events from an \texttt{SCTP\_ULPqueue} with
				574	\texttt{sctp\_pop\_ULPqueue()}.
				575
				576	\subsubsection{\texttt{SCTP\_outqueue}}
				577
				578	\texttt{SCTP\_outqueue} is responsible for bundling logic, transport
				579	selection, outbound congestion control, fragmentation, and any
				580	necessary data queueing. It knows whether or not data can go out onto
				581	the wire yet. With one exception noted below, every outbound chunk
				582	goes through an \texttt{SCTP\_outqueue} attached to an association.
				583	The state machine injects chunks into an \texttt{SCTP\_outqueue} with
				584	\texttt{sctp\_push\_outqueue()}. They automatically push out the other
				585	end through a small set of callbacks which are normally attached to an
				586	\texttt{SCTP\_packet}.
				587
				588	The state machine is capable of putting a fully-formed packet directly
				589	on the wire. At this point only \texttt{ABORT} uses this feature. It is
				590	likely that we will refactor \texttt{INIT ACK} generation again to use
				591	this feature.
				592
				593	\subsubsection{\texttt{SCTP\_packet}}
				594
				595	An \texttt{SCTP\_packet} is a lazy packet transmitter associated with a
				596	specific transport. The upper layer pushes data into the packet,
				597	usually with \texttt{sctp\_transmit\_chunk()}. The packet blindly
				598	bundles the chunks. If the it fills (hits the PMTU for its transport),
				599	it transmits the packet to make room for the new chunk.
				600	\texttt{SCTP\_packet} rejects packets which need fragmenting. It is
				601	possible to force a packet to transmit immediately with
				602	\texttt{sctp\_transmit\_packet()}. \texttt{SCTP\_packet} tracks the
				603	congestion counters, but handles none of the congestion logic.
				604
				605	\section{More Data Structures}
				606
				607	Not everything is a state table or a smart pipe---after all, this is
				608	the kernel and we ARE programming in C. Here again, we have followed
				609	the RFC very closely. Most of the key concepts in the RFC manifest
				610	themselves as explicit data structures. For convenience, we refer to
				611	these data structures as ``nouns''.
				612
				613	Nearly all of the ``noun'' structures are designed for use with the
				614	\texttt{sk\_buff} macros for list manipulation. These macros provide a
				615	doubly-linked list with locking.
				616
				617	\subsection{\texttt{struct~SCTP\_proto}}
				618
				619	The entire lksctp universe is grounded in an instance of \texttt{
				620	struct~SCTP\_proto} accessible through \texttt{sctp\_get\_protocol()}.
				621	This structure holds system-wide defaults for things like the maximum
				622	number of permitted retransmissions. It contains a list of all
				623	endpoints on the system.
				624
				625	\subsection{\texttt{struct~SCTP\_endpoint}}
				626
				627	Each UDP-style SCTP socket has an endpoint, represented as a
				628	\texttt{struct~SCTP\_endpoint}. Once we implement high-bandwidth sockets and
				629	TCP-style sockets, it will be possible for multiple sockets to share a
				630	single endpoint structure. The endpoint structure contains a local
				631	SCTP socket number and a list of local IP addresses. These two items
				632	define the endpoint uniquely. In addition to endpoint-wide default
				633	values and statistics, the endpoint maintains a list of associations.
				634
				635	\subsection{\texttt{struct~SCTP\_association}}
				636
				637	Each association structure, \texttt{struct~SCTP\_association}) is defined
				638	by a local endpoint (a pointer to a \texttt{struct~SCTP\_endpoint}), and
				639	a remote endpoint (an SCTP port number and a list of transport
				640	addresses). This is one of the most complicated structures in the
				641	implementation as it includes a great deal of information mandated by
				642	the RFC. Among many other things, this structure holds the state of
				643	the state machine. The list of transport addresses for the remote
				644	endpoint is more elaborate than the simple list of IP addresses in the
				645	local endpoint data structure since SCTP needs to maintain congestion
				646	information about each of the remote transport addresses.
				647
				648	\subsection{\texttt{struct~SCTP\_transport}}
				649
				650	A \texttt{struct~SCTP\_transport} is defined by a remote SCTP port number
				651	and an IP address. The structure holds congestion and reachability
				652	information for the given address. This is also where we get the list
				653	of functions to call to manipulate the specific address family. For
				654	TCP you would find this information way up in the socket, but this is
				655	not possible for SCTP.
				656
				657	\subsection{\texttt{struct~SCTP\_chunk}}
				658
				659	Possibly the most fundamental data structure in lksctp is
				660	\texttt{struct~SCTP\_chunk}. This holds SCTP chunks both inbound and
				661	outbound. It is essentially an extension to \texttt{struct~sk\_buff}.
				662	It adds pointers to the various possible SCTP subheaders and a few
				663	flags needed specifically for SCTP. One strict convention is that
				664	\texttt{chunk->skb->data} is the demarcation line between headers in
				665	network byte order and headers in host byte order. All outbound
				666	chunks are ALWAYS in network byte order. The first function which
				667	needs a field from an inbound chunk converts that full header to host
				668	byte order {\it in situ}.
				669
				670	\section{Acknowledgements}
				671
				672	The authors are members of a team at Motorola dedicated to producing
				673	open source implementations in support of IETF standardisation. We
				674	would like to thank the people who make these efforts possible,
				675	specifically Maureen~Govern, Stephen~Spear, Qiaobing~Xie, and
				676	Irfan~Ali. We are of course deeply indebted to Randall Stewart and
				677	Qiaobing Xie for having created SCTP and for starting the Linux Kernel
				678	SCTP Implementation Project. We wish to recognizee the ongoing and
				679	significant contributions from developers outside Motorola, especially
				680	Jon Grimm and Daisy Chang of IBM, and Xingang Guo of Intel.
				681
				682	\section{Availability}
				683
				684	All the code discussed in this paper is available from the lksctp
				685	project on Source Forge:
				686
				687	\begin{center}
				688	\texttt{http://sourceforge.net/projects/lksctp/}
				689	\end{center}
				690
				691	\begin{thebibliography}{2001}
				692
				693	\bibitem[RFC2960]{rfc2960} R.~Stewart, Q.~Xie, K.~Morneault, C.~Sharp,
				694	H.~J.~Schwarzbauer, T.~Taylor, I.~Rytina, M.~Kalla, L.~Zhang, and,
				695	V.~Paxson, {\em Stream Control Transmission Protocol}, RFC~2960 (Oct~2000).
				696
				697	\bibitem[SCTPAPI]{api} R.~Stewart, Q.~Xie, L.~H.~P.~Yarroll, J.~Wood,
				698	K.~Poon, K.~Fujita., {\em Sockets API Extensions for SCTP}, Work In
				699	Progress, \texttt{draft-ietf-tsvwg-sctpsocket-00.txt} (Jun~2001).
				700
				701	\bibitem[SCTPIMPL]{impl} R.~Stewart. {\it et al},
				702	{\em SCTP Implementor's Guide}, Work In Progress,
				703	\texttt{draft-ietf-tsvwg-sctpimpguide-00.txt} (Jun~2001).
				704
				705	\bibitem[SCTPMIB]{mib} J.~Pastor, M.~Belinchon. {\em Stream Control
				706	Transmission Protocol Management Information Base using SMIv2}, Work
				707	In Progress, \texttt{draft-ietf-sigtran-sctp-mib-03.txt} (Feb~2001).
				708
				709	\bibitem[XP]{xp} K.~Beck. {\em Extreme Programming Explained: Embrace
				710	Change}, Addison-Wesley Publishers (2000).
				711
				712	\bibitem[SCTPORG]{sctporg}{\em Randall Stewart's SCTP site},\\
				713	\texttt{http://www.sctp.org}, (2001).
				714
				715	\bibitem[SCTPDE]{sctpde}{\em T\"uxen/Jungmeier SCTP site},\\
				716	\texttt{http://www.sctp.de}, (2001).
				717
				718	\end{thebibliography}
				719
				720	\end{document}