traffic on the test network, though selective collection may be used if a test involves subjecting the network to a heavy load. To monitor live networks, we collect all the traffic from a small subnetwork. Collecting all the data from a network provider's live network would require monitoring several thousand communication links, and is beyond the capacity of our monitoring equipment.
In North America, a protocol called Signaling System Number 7 (SS7, ANSI (1987)) governs the format of data carried by the CCS Network. 1 The major components of the CCS network are telephone switches (SSP's), database servers (SCP's), and SS7 packet switches (STP's). STP's are responsible for routing traffic, while SSP's and SCP's can only send and receive. STP's are deployed in pairs for redundancy. Each SSP and SCP is connected to at least one pair of STP's. The (digital) communications links between nodes run at a maximum speed of 56000 bits per second. When extra communications capacity is required, parallel links are used. 56000 bits per second is relatively slow, but there are many links in a CCS network, and many seconds in a day. Anyone trying to collect and analyze SS7 data from a CCS network soon must deal with large datasets.
To date, our main data-collection tool for both live and test networks has been a Network Services Test System (NSTS). The NSTS is about the size of a household washing machine. Each NSTS can monitor 16 bi-directional communication links, and store a maximum of 128 megabytes of data. This represents two to four million SS7 messages, which can be anywhere from one hour of SS7 traffic to and from a large access tandem to approximately four days of traffic to and from a small end office. We usually collect data by placing one NSTS at each STP of a pair. Thus, our datasets from Live networks are usually 256 megabytes in size. Datasets from test networks tend to be smaller, depending on the length of the test. Along with every message that it saves, the NSTS saves a header containing a millisecond timestamp, the number of the link that carried the message, and some other status information. The timestamps are synchronized with a central time source, and so are comparable between monitoring sites.
Our SS7 datasets have many of the "standard" features of the large datasets discussed in this volume. Inhomogeneity and non-stationarity in time are the rule. For example, Figure 1 shows the call arrival process on a communication link to a small switch from 22:45 Wednesday to 14:15 Sunday. There are 31,500 points in this plot, joined by line segments. Each point represents the number of calls received during a ten second interval, expressed in units of calls per second. Evidently there is a time-of-day effect, and the effect is different on the weekend. This dataset is