third_party/lksctp-tools/doc/ols.tex - RealtimeRoboticsGroup/test - Gitiles

 % TEMPLATE for Usenix papers, specifically to meet requirements of
 %  TCL97 committee.
 % originally a template for producing IEEE-format articles using LaTeX.
 %   written by Matthew Ward, CS Department, Worcester Polytechnic Institute.
 % adapted by David Beazley for his excellent SWIG paper in Proceedings,
 %   Tcl 96
 % turned into a smartass generic template by De Clarke, with thanks to
 %   both the above pioneers
 % use at your own risk.  Complaints to /dev/null.
 % make it two column with no page numbering, default is 10 point

 % Munged by Fred Douglis <douglis@research.att.com> 10/97 to separate
 % the .sty file from the LaTeX source template, so that people can
 % more easily include the .sty file into an existing document.  Also
 % changed to more closely follow the style guidelines as represented
 % by the Word sample file.
 % This version uses the latex2e styles, not the very ancient 2.09 stuff.

 % adapted for Ottawa Linux Symposium

 \documentclass[twocolumn]{article}
 \usepackage{ols,epsfig}
 \begin{document}

 %Remove this next line if your system defaults correctly.
 \special{papersize=8.5in,11in}

 %don't want date printed
 \date{}

 %make title bold and 14 pt font (Latex default is non-bold, 16 pt)
 \title{\Large \bf Linux Kernel SCTP : The Third Transport}

 %for single author (just remove % characters)
 \author{
 La~Monte H.P.\ Yarroll \\
 %{\em Your Department} \\
 {\em Motorola GTSS}\\
 %{\em Your City, State, ZIP}\\
 % is there a standard format for email/URLs??
 % remember that ~ doesn't do what you expect, use \~{}.
 {\normalsize piggy@acm.org} \\
 %
 % copy the following lines to add more authors
 \and
 Karl Knutson \\
 {\em Motorola GTSS}\\
 %% is there a standard format for email/URLs??
 {\normalsize karl@athena.chicago.il.us}
 %
 } % end author

 \maketitle

 % You have to do this to suppress page numbers.  Don't ask.
 \thispagestyle{empty}
 \renewcommand{\thefootnote}{\fnsymbol{footnote}}

 \subsection*{Abstract}
 % nuked italics -- FD

 The Stream Control Transmission Protocol (SCTP) is a reliable
 message-oriented protocol with transparent support for multihoming.
 It allows multiple independent complex exchanges which all share a
 single connection and congestion context.

 We provide an overview of the protocol, the UDP-style API and the
 details of the Linux kernel reference implementation.  The brief API
 discussion is intended for developers wishing to use SCTP.  The
 detailed implementation discussion is for developers interested in
 contributing to the kernel development effort.

 \section{Introduction}

 The developers at the Linux 2.5 Kernel Summit in San Jose achieved a
 rough consensus that 2.5 should probably support SCTP, a new transport
 protocol from the IETF.  This paper introduces the ongoing work on
 such an implementation, providing some details for both the
 application developer and the kernel developer.

 The Stream Control Transmission Protocol (SCTP) is a reliable
 message-oriented protocol with transparent support for multihoming.
 It allows multiple independent complex exchanges which all share a
 single connection and congestion context.

 \subsection{History of SCTP}
 The SIGTRAN (Signalling Transport) Working Group of the IETF is
 concerned with the transport of telephony signalling data over IP.
 Upon reviewing the available standard transport protocols, they
 concluded that none of them met the transport requirements of
 signalling data.

 SIGTRAN concluded that they needed a new transport protocol which
 could provide reliable message delivery, tolerate network failures,
 and avoid the head-of-line-blocking problem.  We will discuss this
 problem later.

 The WG selected a proposal from Randall Stewart and Qiaobing Xie of
 Motorola as a starting point.  Stewart and Xie had developed a
 Distributed Processing Environment, Quantix, aimed at telephony
 applications.  This DPE had been successfully demonstrated at Geneva
 Telecom in 1999.

 The Working Group took great care in constructing the new protocol,
 SCTP, incorporating many lessons learned from TCP, such as congestion
 control, selective ACK, message fragmentation and bundling.

 The core transport protocol from Quantix brought support for
 multihoming, message framing, and streams.  We discuss all of these
 features at length later.

 The IESG decided that the resulting protocol was robust enough to be
 elevated from a specialised transport for telephony signalling to a new
 general purpose transport to stand beside UDP and TCP.  To this end,
 they moved the work from SIGTRAN to TSVWG, the general transport
 group.

 As of this writing, the core specification, \cite{rfc2960}, is at Proposed
 Standard.  There have been three successful bakeoffs covering over 25
 separate implementations.  Lessons learned from the most recent
 bakeoff are being written up in an ``Implementor's Guide'', \cite{impl}.

 \subsection{SCTP in the Linux kernel}

 Shortly before the first bakeoff, the IESG asked SIGTRAN to move SCTP
 from riding on UDP to riding directly on top of IP.  The long term
 goal was clearly was to move SCTP from user space into the kernel.

 Aside from the obvious performance gains, this has the effect of
 reducing the number of implementations to roughly one per operating
 system.  This makes it easier to verify the stability of most of the
 implementations which appear on the Internet.

 Randall Stewart saw the importance of this and started one of the
 authors of this paper working on a port of the user space
 implementation to the Linux kernel.  This port was intended as a
 reference for developers of implementations for other kernels to
 examine.  The Linux kernel implementation has since diverged
 significantly from the user space reference, but maintains the
 standards of a reference implementation (see Coding Standards, below).

 \subsection{SCTP examples}

 SCTP is a reliable message-oriented protocol with transparent support
 for multihoming.  It allows multiple independent complex exchanges
 which all share a single connection and congestion context.

 Many network applications operate by exchanging simultaneously, short,
 similar sequences of data continuously.  The traffic produced by these
 operations can be characterised as MICE (Multiple Independent Complex
 Exchanges).  It is also true that many applications which use MICE
 also have high network reliability requirements.

 \subsubsection{A database app}
 One example is a client/server database application.  Each request and
 each response is a message.  Each transaction is a sequence of
 dependent request/response pairs.

 Implemented over TCP, this application would have to provide its own
 message boundaries, since TCP sends bytes, not messages.  How do we
 implement MICE with TCP?  We have two ways of doing this:  multiple
 connections, or a single multiplexed and reused connection.

 With each transaction over a separate TCP connection, we gain the
 independence of transactions, but at a cost in performance. Since TCP
 (as a general purpose transport protocol) uses congestion control,
 each of the connections would have to go through slow-start and if
 most transactions were short, they would never get out of slow-start.

 With all transactions over a single TCP connection, we make efficient
 use of the network bandwidth, but open ourselves up to the
 head-of-line blocking problem.  This means that if one segment in one
 transaction is lost, this blocks all transactions, not just the one
 with the lost segment.

 If we use SCTP for the same application we gain the benefits of using
 TCP, as well as advantages peculiar to SCTP.  SCTP directly supports
 messages and guarantees TCP-like levels of bandwidth efficiency via
 bundling and fragmentation.  Each database transaction can be
 represented as an ordered stream of messages, which are independent in
 SCTP for retransmission purposes.  This means that while SCTP has the
 same congestion control mechanisms as TCP, it does not have to resort
 to multiple connections nor is it vulnerable to the head-of-line
 blocking problem.

 \subsubsection{A free clinic}

 Another example of SCTP use is for a free\footnote{Free as in ``free
 beer''.} clinic which needs a reliable way to use its IP-networked
 patient monitoring software.

 This has many similarities to the example above in that different
 monitoring devices would need to send simultaneous
 information---multiple independent complex exchanges.  The main
 difference is in the higher network reliability requirements.

 A reasonable way to improve the network reliability is to set up a
 parallel network and use multihoming for the client and server
 applications. However, if the application is TCP-based, the
 multihoming needs to be added to the application.  With SCTP, the
 multihoming ability is built into the protocol.  All that is necessary
 is to make the appropriate socket calls and SCTP will take advantage
 of the addresses available in the existing network.  This also applies
 if one side of the connection has more addresses than the other.

 \section{The UDP-style API}

 Any new protocol needs an API.  In particular for an Internet
 protocol, it's important to have the API match the API normally used
 for IP networks.  This is the Berkeley sockets model---the SCTP
 version is defined in the Internet Draft ``Sockets API Extensions for
 SCTP''\cite{api}.  The API draft defines two complementary interfaces
 to SCTP--one for compatibility with older TCP-based applications, and
 another for new applications designed expressly to use SCTP.  The
 Linux Kernel SCTP stack does not yet implement the former, so we
 discuss only the UDP-style interface.

 The conceptual model of the UDP-style API is (naturally) that of plain
 UDP.  To send a message in UDP, you create a socket, bind an address
 to it and send your message using \texttt{sendmsg()}. To receive a
 message in UDP, you create a socket, bind an address to it and use
 \texttt{recvmsg()}.  It's much the same with the UDP-style API for
 SCTP.  To send a message, you create a socket, bind \textit{addresses}
 to it and use \texttt{sendmsg()}.  The SCTP stack underlying the API
 handles association startup and shutdown automatically.  The same goes
 for message reception.  To receive a message in UDP-style, you create
 a socket, bind \textit{addresses} to it and use \texttt{recvmsg()}.

 The important API differences between UDP and UDP-style SCTP are:
 multihoming; ancillary data; and the option of notifications from the
 SCTP stack.

 \subsection{Multihoming and \texttt{bindx()}}

 There are three ways to work with multihoming with SCTP.  One is to
 ignore multihoming and use one address.  Another way is to bind all
 your addresses through the use of \texttt{INADDR\_ANY} or
 \texttt{IN6ADDR\_ANY}.  This will ``associate the endpoint with the
 optimal subset of available local interfaces.''(Section 3.1.2,
 \cite{api})  The most flexible way is through the use of
 \texttt{sctp\_bindx()}, which allows additional addresses to be
 added to a socket after the first one is bound with \texttt{bind()},
 but before the socket is used to transfer or receive data.  The
 function \texttt{sctp\_bindx()} is further described in section 8.1 of
 \cite{api}.

 \subsection{Ancillary data}

 To use streams with the UDP-style API, you use ancillary data in the
 \texttt{struct~cmsghdr} part of the \texttt{struct~msghdr} argument to
 both \texttt{sendmsg()} and \texttt{recvmsg()}.  Ancillary data is
 used for initialisation data (\texttt{struct~sctp\_initmsg} and for
 header data (\texttt{struct~sctp\_sndrcvinfo}).

 Ancillary data are manipulated with the macros \texttt{CMSG\_FIRSTHDR,
 CMSG\_NEXTHDR, CMSG\_DATA, CMSG\_SPACE, \textnormal{and} CMSG\_LEN}.
 These are all defined in \cite{rfc2292}.  \cite{api} provides a nice
 example in section 5.4.2.

 {\tt \small
 \begin{verbatim}
     struct sctp_initmsg {
        uint16_t sinit_num_ostreams;
        uint16_t sinit_max_instreams;
        uint16_t sinit_max_attempts;
        uint16_t sinit_max_init_timeo;
     };
 \end{verbatim}
 }

 The initialisation ancillary data sets information for starting
 new associations.

 {\tt \small
 \begin{verbatim}
     struct sctp_sndrcvinfo {
        uint16_t sinfo_stream;
        uint16_t sinfo_ssn;
        uint16_t sinfo_flags;
        uint32_t sinfo_ppid;
        uint32_t sinfo_context;
        uint8_t sinfo_dscp;
        sctp_assoc_t sinfo_assoc_id;
     };
 \end{verbatim}
 }

 The header ancillary data reports information gleaned from the SCTP
 headers.  If requested with the \texttt{SCTP\_RECVDATAIOEVNT} socket
 option, this ancillary data is provided with every inbound data
 message.  There is a handy key (\texttt{sinfo\_assoc\_id}) which
 identifies the association for this particular message.  It also
 provides the flags needed to implement partial delivery of very large
 messages.

 Outbound messages should include an \texttt{sctp\_sndrcvinfo} ancillary
 data structure to tell SCTP which SCTP stream to put this datagram
 into.  It is also possible to set a default stream so that this
 ancillary data may be omitted.

 \subsection{Notifications}

 SCTP provides for the concept of optional notifications.  These are
 messages delivered in-band about events inside the SCTP stack, such as
 a destination transport address failure or a new association coming
 up.  The notifications are marked with the \texttt{MSG\_NOTIFICATION}
 flag in the \texttt{msg\_flags} field of the \texttt{sctp\_sendrcvinfo}
 ancillary data.  The notification is delivered as the body of the
 message returned by \texttt{recvmsg()}.

 In \ref{notifications} we find a table of notifications.  Each
 notification delivers its own data structure which shares the same
 name (lower case, naturally) as the notification type itself.  The first
 field of every notification is a \texttt{uint16\_t} which caries the
 notification type.

 \begin{figure*}[t]
 \begin{center}
 {\tt
 \begin{tabular}{ l  l  l }
    \hline
    \textnormal{Type} & \textnormal{Socket Option} & \textnormal{Description} \\
    \hline
    SCTP\_ASSOC\_CHANGE 		& SCTP\_RECVASSOCEVNT	&  \textnormal{Change of association} \\
    SCTP\_PEER\_ADDR\_CHANGE	& SCTP\_RECVADDREVNT	& \textnormal{Change in status of a given address} \\
    SCTP\_REMOTE\_ERROR		& SCTP\_RECVPEERERR	& \textnormal{An error received from a peer} \\
    SCTP\_SEND\_FAILED		& SCTP\_RECVSENDFAILEVNT & \textnormal{A failure to send} \\
    SCTP\_SHUTDOWN\_EVENT	& SCTP\_RECVDOWNEVNT	& \textnormal{The reception of a \texttt{SHUTDOWN} chunk} \\
 \end{tabular}
 }
 \end{center}
 \caption{\label{notification}Useful notifications for an SCTP socket}
 \end{figure*}

 \section{The lksctp Project}

 A critical factor in the success of any new IETF protocol is of course
 a Linux implementation.  Fortunately, key personnel at Motorola
 recognised this and encouraged us to tackle such a project.  Months
 later, we have a core implementation with an ever-expanding feature
 set.  We now have significant participation from developers at IBM and
 Intel and the pace is picking up.

 \subsection{Coding standards}

 In addition to the usual requirements of kernel code, our code seeks
 to be a useful reference for people making their own kernel
 implementations of SCTP.  If a reader has some question about how to
 implement a particular section of the RFC, they need only grep for the
 relevant text in our code and they can find an example.  As much as
 practical, we draw names directly from the RFC.  We made the state
 machine into an explicit table (see \ref{states} for an
 excerpt) with names that refer directly back to the relevant section
 numbers.  Clarity is a compelling requirement for our code.

 \subsection{Extreme Programming}

 As the project grew and we added developers, we clearly needed some
 way of coordinating our work.  We decided to experiment with Extreme
 Programming, \cite{xp}.

 XP is a collection of practices aimed at controlling risk in a small
 to medium-sized software development project.  One important principle
 is that you should do the simplest thing that could possibly work.  A
 second important principle is to take advantage of the fact that
 programmers like to code.

 We use a range of XP practices, but the practices which are most
 visible to anybody who reads or works on lksctp are the tests and the
 metaphors.

 \section{The Tests}

 One of the XP practices we use is code-to-the-test.  XP asks, ``If
 testing is good, why don't we do it all the time?'' Instead of writing
 tests for working code, write tests first, and then write code to pass
 the tests.  This practice leads to a large automated test suite which
 runs several times per day.

 We use three kinds of test, unit tests, test frame functional tests,
 and live kernel functional tests.

 The most basic form of test is the unit test.  Unit tests exercise all
 the interfaces of a particular object and confirm that it behaves
 correctly.  They also encode regression checks for fixed bugs.  These
 tests all have names beginning with \texttt{test\_}.

 The second form of test is the test frame functional tests.  These are
 the tests with names beginning with \texttt{ft\_frame\_}.  These tests
 check for external behaviours of the system, but with a simulated
 kernel.  The simulated kernel is very light weight and gives us very
 fine control over things like timing and network properties.

 Ideally, functional tests should be written by the customer for a
 system---they encode the behaviours that the customer expects.  In our
 case, we play the role of customer on behalf of the RFC.  We also use
 test frame functional tests to define work items for off-site
 development groups.  The off-site group writes tests which describe the
 feature they intend to implement and submits those tests as a
 proposal.  This has proven an excellent medium for describing work.

 The final form of test we use is the live kernel functional test.  We
 have many fewer of these than we would like---they are difficult to run
 since we must install and boot a kernel to test.  This is much more
 work than simply running \texttt{make unit\_test}. We are exploring UML as
 a possible way to automate our kernel functional tests.  These tests
 have names beginning with \texttt{ft\_kern\_}.

 Code-to-the-test is a practice which you can introduce at any point in
 a project.  When you first start, it seems that you are spending more
 time writing tests than writing code, but once you begin to have a
 critical mass of interacting tests you begin to see significant
 payoffs in both code quality and development velocity.

 We have had several incidents where interactions between unit tests
 and functional tests have uncovered complimentary masking bugs.

 Tests are not a substitute for understanding code---they are a
 mechanism for encoding that understanding to share with other
 developers, including future versions of yourself.  You can learn
 nearly as much about our code by reading our tests as by reading the
 code itself.

 Lately, we have begun using functional tests to encode major bugs.
 These are among the best of all possible bug reports---they describe
 the failure precisely and tell exactly when the problem is gone.
 After the bugs are fixed the tests serve as part of the regression
 suite.

 \section{The Metaphors}

 XP projects are built around a unifying metaphor rather than an
 elaborate architecture.  In our case, we chose two metaphors which
 could serve quite well for nearly any protocol development project.

 Our metaphors are the state machine and the smart pipe.  Most readers
 are probably familiar with the state machine, but the smart pipe is a
 twist on a familiar concept.  The idea behind a smart pipe\footnote{An
 alternate term may be ``oven''.} is that raw stuff goes in one end and
 cooked stuff comes out the other end.

 \subsection{The State Machine}

 The state machine in our implementation is quite literal.  We have an
 explicit state table which keys to specific state functions which are
 tied directly back to parts of the RFC.  The core of the state machine
 (found in \texttt{sctp\_do\_sm()}) is almost purely functional---only header
 conversions are permitted.  Each state function produces a description
 of the side effects (in the form of a \texttt{struct~sctp\_sm\_retval})
 needed to handle the particular event.  A separate side effect
 processor, \texttt{sctp\_side\_effects()}, converts this structure into
 actions.

 Events fall into four categories.  The RFC is very explicit about
 state transitions associated with arriving chunks.  The RFC discusses
 transitions due to primitive requests from upper layers, but many of
 these are implementation dependent.  The third category of events is
 timeouts.  The final category is a catch-all for odd events like
 queues emptying.

 \begin{figure*}[t]
 \begin{center}
 {\tt
 \begin{tabular}{ l  l  l  l  l }
    \hline
    \textnormal{State:}	& CLOSED	& COOKIE-WAIT	& COOKIE-ECHOED	& ESTABLISHED \\
    \hline
    \hline
    \textnormal{Chunks}	&		&		&		& \\
    \hline

    INIT         & do\_5\_1B\_init & do\_5\_2\_1\_siminit & do\_5\_2\_1\_siminit & do\_5\_2\_2\_dupinit \\
    INIT ACK     & discard(5.2.3) & do\_5\_1C\_ack	& discard(5.2.3) & discard(5.2.3)	\\
    COOKIE ECHO	& do\_5\_1D\_ce	& do\_5\_2\_4\_dupcook& do\_5\_2\_4\_dupcook& do\_5\_2\_4\_dupcook \\
    COOKIE ACK	& discard	& discard(5.2.5) & do\_5\_1E\_ca & discard(5.2.5)    		\\
    DATA		& tabort\_8\_4\_8 & discard(6.0) & discard(6.0) & eat\_data\_6\_2		\\
    SACK		& tabort\_8\_4\_8 & discard(6.0) & eat\_sack\_6\_2\_1 & eat\_sack\_6\_2\_1      \\

    \hline
    \textnormal{Timeouts} &	&		&		& \\
    \hline

    T1-INIT TO	& bug           & do\_4\_2\_reinit & bug	& bug \\
    T3-RTX TO	& bug           & bug		& do\_6\_3\_3\_retx & do\_6\_3\_3\_retx \\

    \hline
    \textnormal{Primitives} &	&		&		& \\
    \hline

    PRM\_ASSOCIATE & do\_PRM\_ASOC & error	& error		& error \\
    PRM\_SEND	& error		& do\_PRM\_SENDQ6.0 & do\_PRM\_SENDQ6.0 & do\_PRM\_SEND \\

 \end{tabular}
 }
 \end{center}
 \caption{\label{states}Portion of SCTP state table showing association initialisation}
 \end{figure*}

 In order to create an explicit state machine, it was necessary to
 first create an explicit state table.  The process of creating this
 table uncovered a few minor contradictions in one of the drafts of the
 RFC.  These mostly involved conflicting catch-all cases.  In Figure 1
 we have an excerpt which shows the state functions involved in
 initialising a new association.

 \subsection{The Smart Pipes}

 Each smart pipe has one or more structures which define its internal
 data, and a set of functions which define its external interactions.
 In this respect these smart pipes can be considered a type of object,
 in the OO sense.  All of these definitions can be found in the include
 file \texttt{<net/sctp/sctpStructs.h>}.

 Most of our smart pipes have push inputs---external objects explictly
 put things in by calling methods directly.  A pull input is
 possible---the smart pipe would need to have a way to register a
 callback function which can fetch more input in response to some other
 stimulus.

 Some of our pipes use pull outputs.  E.g. \texttt{SCTP\_ULPqueue} passes
 data and notifications up the protocol stack through explicit calls to
 the socket functions, usually \texttt{readmsg(2)}.  Some of our smart
 pipes use push outputs.  E.g. \texttt{SCTP\_outqueue} has a set of
 callback functions which it invokes when it needs to send chunks out
 toward the wire.

 There are four smart pipes in lksctp.  They are
 \texttt{SCTP\_inqueue}, \texttt{SCTP\_ULPqueue},
 \texttt{SCTP\_outqueue}, and \texttt{SCTP\_packet}.  The first two
 carry information up the stack from the wire to the user; the second
 two carry information back down the stack.

 \subsubsection{\texttt{SCTP\_inqueue}}

 \texttt{SCTP\_inqueue} accepts packets and provides chunks.  It is
 responsible for reassembling fragments, unbundling, tracking received
 TSN's for acknowledgement, and managing rwnd for congestion control.
 There is an \texttt{SCTP\_inqueue} for each endpoint (to handle chunks
 not related to a specific association) and one for each association.

 The function \texttt{sctp\_v4\_rcv()} (which is the receiving function
 for SCTP registered with IPv4) calls \texttt{sctp\_push\_inqueue()} to
 push packets into the input queue for the appropriate association or
 endpoint.  The function \texttt{sctp\_push\_inqueue()} schedules
 either \texttt{sctp\_bh\_rcv\_asoc()} or \texttt{sctp\_bh\_rcv\_ep()}
 on the immediate queue to complete delivery.  These functions call
 \texttt{sctp\_pop\_inqueue()} to pull data out of the
 \texttt{SCTP\_inqueue}.  This function does most of the work for this
 smart pipe.

 The functions \texttt{sctp\_bh\_rcv\_ep()} and
 \texttt{sctp\_bh\_rcv\_asoc()} run the state machine on incoming
 chunks.  Among many other side effects, the state machine can generate
 events for an upper-layer-protocol (ULP), and/or chunks to go back out
 on the wire.

 \subsubsection{\texttt{SCTP\_ULPqueue}}

 \texttt{SCTP\_ULPqueue} is the smart pipe which accepts events (either
 user data messages or notifications) from the state machine and
 delivers them to the ULP through the sockets layer.  It is responsible
 for delivering streams of messages in order.  There is one
 \texttt{SCTP\_ULPqueue} for every endpoint, but this is likely to
 change at some point to one \texttt{SCTP\_ULPqueue} for each socket.
 This smart pipe uses a data structure distributed between the
 \texttt{struct~SCTP\_endpoint} and the
 \texttt{struct~SCTP\_association}.

 The state machine, \texttt{sctp\_do\_sm()}, pushes data into an
 \texttt{SCTP\_ULPqueue} by calling
 \texttt{sctp\_push\_chunk\_ULPqueue()}.  It pushes notifications with
 \texttt{sctp\_push\_event\_ULPqueue()}.  The sockets layer extracts
 events from an \texttt{SCTP\_ULPqueue} with
 \texttt{sctp\_pop\_ULPqueue()}.

 \subsubsection{\texttt{SCTP\_outqueue}}

 \texttt{SCTP\_outqueue} is responsible for bundling logic, transport
 selection, outbound congestion control, fragmentation, and any
 necessary data queueing.  It knows whether or not data can go out onto
 the wire yet.  With one exception noted below, every outbound chunk
 goes through an \texttt{SCTP\_outqueue} attached to an association.
 The state machine injects chunks into an \texttt{SCTP\_outqueue} with
 \texttt{sctp\_push\_outqueue()}.  They automatically push out the other
 end through a small set of callbacks which are normally attached to an
 \texttt{SCTP\_packet}.

 The state machine is capable of putting a fully-formed packet directly
 on the wire.  At this point only \texttt{ABORT} uses this feature.  It is
 likely that we will refactor \texttt{INIT ACK} generation again to use
 this feature.

 \subsubsection{\texttt{SCTP\_packet}}

 An \texttt{SCTP\_packet} is a lazy packet transmitter associated with a
 specific transport.  The upper layer pushes data into the packet,
 usually with \texttt{sctp\_transmit\_chunk()}.  The packet blindly
 bundles the chunks.  If the it fills (hits the PMTU for its transport),
 it transmits the packet to make room for the new chunk.
 \texttt{SCTP\_packet} rejects packets which need fragmenting.  It is
 possible to force a packet to transmit immediately with
 \texttt{sctp\_transmit\_packet()}.  \texttt{SCTP\_packet} tracks the
 congestion counters, but handles none of the congestion logic.

 \section{More Data Structures}

 Not everything is a state table or a smart pipe---after all, this is
 the kernel and we ARE programming in C.  Here again, we have followed
 the RFC very closely.  Most of the key concepts in the RFC manifest
 themselves as explicit data structures.  For convenience, we refer to
 these data structures as ``nouns''.

 Nearly all of the ``noun'' structures are designed for use with the
 \texttt{sk\_buff} macros for list manipulation.  These macros provide a
 doubly-linked list with locking.

 \subsection{\texttt{struct~SCTP\_proto}}

 The entire lksctp universe is grounded in an instance of \texttt{
 struct~SCTP\_proto} accessible through \texttt{sctp\_get\_protocol()}.
 This structure holds system-wide defaults for things like the maximum
 number of permitted retransmissions.  It contains a list of all
 endpoints on the system.

 \subsection{\texttt{struct~SCTP\_endpoint}}

 Each UDP-style SCTP socket has an endpoint, represented as a
 \texttt{struct~SCTP\_endpoint}.  Once we implement high-bandwidth sockets and
 TCP-style sockets, it will be possible for multiple sockets to share a
 single endpoint structure.  The endpoint structure contains a local
 SCTP socket number and a list of local IP addresses.  These two items
 define the endpoint uniquely.  In addition to endpoint-wide default
 values and statistics, the endpoint maintains a list of associations.

 \subsection{\texttt{struct~SCTP\_association}}

 Each association structure, \texttt{struct~SCTP\_association}) is defined
 by a local endpoint (a pointer to a \texttt{struct~SCTP\_endpoint}), and
 a remote endpoint (an SCTP port number and a list of transport
 addresses).  This is one of the most complicated structures in the
 implementation as it includes a great deal of information mandated by
 the RFC.  Among many other things, this structure holds the state of
 the state machine.  The list of transport addresses for the remote
 endpoint is more elaborate than the simple list of IP addresses in the
 local endpoint data structure since SCTP needs to maintain congestion
 information about each of the remote transport addresses.

 \subsection{\texttt{struct~SCTP\_transport}}

 A \texttt{struct~SCTP\_transport} is defined by a remote SCTP port number
 and an IP address.  The structure holds congestion and reachability
 information for the given address.  This is also where we get the list
 of functions to call to manipulate the specific address family.  For
 TCP you would find this information way up in the socket, but this is
 not possible for SCTP.

 \subsection{\texttt{struct~SCTP\_chunk}}

 Possibly the most fundamental data structure in lksctp is
 \texttt{struct~SCTP\_chunk}.  This holds SCTP chunks both inbound and
 outbound.  It is essentially an extension to \texttt{struct~sk\_buff}.
 It adds pointers to the various possible SCTP subheaders and a few
 flags needed specifically for SCTP.  One strict convention is that
 \texttt{chunk->skb->data} is the demarcation line between headers in
 network byte order and headers in host byte order.  All outbound
 chunks are ALWAYS in network byte order.  The first function which
 needs a field from an inbound chunk converts that full header to host
 byte order {\it in situ}.

 \section{Acknowledgements}

 The authors are members of a team at Motorola dedicated to producing
 open source implementations in support of IETF standardisation.  We
 would like to thank the people who make these efforts possible,
 specifically Maureen~Govern, Stephen~Spear, Qiaobing~Xie, and
 Irfan~Ali.  We are of course deeply indebted to Randall Stewart and
 Qiaobing Xie for having created SCTP and for starting the Linux Kernel
 SCTP Implementation Project.  We wish to recognizee the ongoing and
 significant contributions from developers outside Motorola, especially
 Jon Grimm and Daisy Chang of IBM, and Xingang Guo of Intel.

 \section{Availability}

 All the code discussed in this paper is available from the lksctp
 project on Source Forge:

 \begin{center}
 \texttt{http://sourceforge.net/projects/lksctp/}
 \end{center}

 \begin{thebibliography}{2001}

 \bibitem[RFC2960]{rfc2960} R.~Stewart, Q.~Xie, K.~Morneault, C.~Sharp,
 H.~J.~Schwarzbauer, T.~Taylor, I.~Rytina, M.~Kalla, L.~Zhang, and,
 V.~Paxson, {\em Stream Control Transmission Protocol}, RFC~2960 (Oct~2000).

 \bibitem[SCTPAPI]{api} R.~Stewart, Q.~Xie, L.~H.~P.~Yarroll, J.~Wood,
 K.~Poon, K.~Fujita., {\em Sockets API Extensions for SCTP}, Work In
 Progress, \texttt{draft-ietf-tsvwg-sctpsocket-00.txt} (Jun~2001).

 \bibitem[SCTPIMPL]{impl} R.~Stewart. {\it et al},
 {\em SCTP Implementor's Guide}, Work In Progress,
 \texttt{draft-ietf-tsvwg-sctpimpguide-00.txt} (Jun~2001).

 \bibitem[SCTPMIB]{mib} J.~Pastor, M.~Belinchon. {\em Stream Control
 Transmission Protocol Management Information Base using SMIv2}, Work
 In Progress, \texttt{draft-ietf-sigtran-sctp-mib-03.txt} (Feb~2001).

 \bibitem[XP]{xp} K.~Beck. {\em Extreme Programming Explained: Embrace
 Change}, Addison-Wesley Publishers (2000).

 \bibitem[SCTPORG]{sctporg}{\em Randall Stewart's SCTP site},\\
 \texttt{http://www.sctp.org}, (2001).

 \bibitem[SCTPDE]{sctpde}{\em T\"uxen/Jungmeier SCTP site},\\
 \texttt{http://www.sctp.de}, (2001).

 \end{thebibliography}

 \end{document}