The memory limit is the most commonly exceeded limit on the CAS. CAS capacity and performance heavily depend on third-party applications such as SQL Server and Java Virtual Machine.

The CAS uses memory caches to:

  • Enable fast report serving

  • Hold aggregated data for the current day and for the last reporting interval

  • Hold the most important dimensions (including URLs, queries, applications, transactions, tiers, servers, and sites)

  • Hold combinations of dimensions (for example, the number of users for every URL or query)

CAS memory limits prevent holding in memory more than a set amount of server URL pairs, sites, and users, and this prevents it from reaching the Java memory limit.

Because the CAS splits resources between reporting and data processing, these two tasks influence each other, which affects the overall system performance.

CAS Database Capacity

A large or complex CAS database will result in slower report execution, but the individual system capacity figures depend on the type of monitored traffic. CAS database complexity is approximately related to the number of monitored sessions, as explained below. Three factors are involved in analyzing the database capacity:

Session (in the context of a report server)

(Not to be confused with a TCP session.) A unique combination of server, URL, client, location, interface (AMD), application, server port, protocol browser, and operating system. Depending on server configuration, the client may be represented by a client IP address or user name, or by an aggregate that groups a number of client IP addresses.

The number of monitored users is usually the most important factor for the overall number of sessions. If the CAS is deployed in a large environment (such as more than 50,000 users), do not store single-user information in the CAS database. Keep the number of unique users that the CAS is tracking in its database below 50,000. This number includes aggregated and individual users.

Database size

Database size refers to the size of the database in GB. The size of the CAS database depends mostly on the number of days the data is stored (default:10 days), but there is no precise formula. The largest known CAS databases are 400-500 GB. Databases of this size can cause performance problems. A good practice is to keep the database size below 200-300 GB.

It is difficult to estimate the size of the CAS database because it does not depend on the amount of traffic monitored by the AMDs that feed the CAS, or on the number of sessions in the CAS database. Databases with a relatively low number of sessions may be large if all sessions are active all the time. Databases with a large number of sessions may not be very large if each session is active for only short periods.

Note:

If the size of a CAS database exceeds the predefined threshold, which is 80% of the available space, the database is automatically purged of old data and an alert is generated.

Database complexity

Database complexity refers to the number of structures stored in the database, such as sessions, servers, clients, locations, and operation names (such as URLs, queries). A good guide to database complexity is to check the number of sessions; the number of the other dimensions is usually in some way related to the number of sessions. To find out the current number of sessions in the database, use the Number of Sessions report accessible through Tools ► Diagnostics ► Capacity Status on the CAS.

Monitor the number of sessions closely, because a very large or complex CAS database will result in slower report execution. For current capacity estimates, see Estimated Report Server Capacity. However, remember that the maximum number of sessions that can be managed in the database depends strongly on the type of the traffic being monitored. So, before you accept these guidelines, perform your own observations based on your particular case and monitor the values closely, especially if you expand the system by adding AMDs or by monitoring additional traffic.

User activity details and server statistics on demand

You can configure the CAS to bypass the database and store user activity details separately on disk. This option is useful when user details are not needed on a regular basis but need to be available for occasional inspection. With this option configured, aggregated values are still shown on CAS reports, but user-level details can be accessed through drill-downs.

If this option is used in the ISP Extended mode, all user information is saved. If this option is used in the PVU mode, user information is saved for all distinct user identifiers and optionally also for selected IP address ranges.

User activity details and server statistics on demand mode imposes a limitation in filtering. When you drill down from the All Users report to the List of users (on demand) report, you lose the contextual application and tier filters.

You may consider using this configuration option to reduce database size and memory usage, assuming that user details are available through drill-downs only, and aggregates being present on higher-level reports. Remember that, subject to traffic characteristics, large amounts of disk storage may be required for user data. For more information, see Enabling User Activity Details and Server Statistics on Demand and List of Users On Demand Report.

The concept of a monitored session is important for understanding the CAS capacity limits, because it reflects the three-dimensional nature of CAS scalability space, with the retention time of raw measurement data as the third dimension:


Factors determining CAS capacity

The volume of the cube represents CAS capacity limit. Thanks to three-dimensional scalability, CAS is able to monitor virtually any environment. This is because CAS capacity does not depend on the number of loaded pages, but on the granularity of measurements, which is defined by the number of monitored operations (for example, URLs or queries), the number of monitored clients, and data retention time.

  • Data retention time can vary between 1 and 90 days, with a default of 10 days. This parameter has the lowest influence on the CAS capacity limit, and it makes a difference only when there is no more space for rationalization of the monitored operations and clients. However, the disk space that is occupied by the CAS database depends linearly on the data retention time.

  • The number of monitored operations depends on the structure and needs of the particular website. The AMD can monitor all operations served by the site, including those dynamically created and the virtual URLs reflecting HTTP POST/GET parameters that are sent by clients filling in Web forms. Theoretically, the number of unique URLs can reach millions, because each dynamically served Web page can have a unique URL. However, it usually does not exceed several thousand for the most URL monitoring purposes. The AMD can also monitor only selected HTTP GET/POST parameters that reflect business purpose of dynamic pages. Additionally, the top N limit for the auto-learned URLs can be established. Note that regardless of the number of monitored URLs, an AMD always analyzes and reports on all the pages loaded from a Web server by all the website users.

  • The number of unique clients is determined by the client aggregation method used by the CAS. It is essential to arrange for user aggregation when monitoring large environments. However, note that the User activity details and server statistics on demand configuration option allows you to view user details, while still performing user aggregation in the CAS database. For more information, see Configuring CAS Scalability Options.

Monitoring Interval in CAS

The length of the monitoring interval is particularly important for the CAS. Shorter monitoring intervals mean that the CAS has to process more records, because for each client-server session there are more measurement points in a given period such as a day. However, a longer monitoring interval means a longer time lag between actual page load occurrence and its reflection in CAS reports. In addition, longer monitoring intervals mean that the AMD may need more RAM for the client-server session data storage before data is summarized and recorded on disk at the end of each monitoring interval.

The default five-minute monitoring interval should ensure sufficient real-time data availability (shorter interval) and a manageable level of CAS database records, but you can change the interval as needed.

Monitoring Interval in ADS

For the ADS, the AMD prepares one record per HTTP hit and HTTP page loaded by each user. The monitoring interval therefore does not change the amount of data that the ADS processes; the ADS always provides an individual measurement record for each hit and page. So it is possible to increase the monitoring interval when the CAS approaches its capacity limit while still keeping the ADS data granularity on the highest possible level.

Smart Data Aggregation Based on Monitoring Settings

In practice, even a set of several AMDs analyzing gigabytes of traffic to a Virtual IP (VIP) address, such as the front door of a load balancer, can together produce a relatively small amount of data for the CAS to store and process. Monitoring settings defined on the CAS (which operations should be monitored and for what parameters) greatly influence the amount of data produced by the AMD for further analysis by the CAS.

For example, if a large website's VIP is serving several tens of millions of pages daily from tens of thousands of URLs, it does not necessarily mean that each individual page URL needs to be monitored. AMDs can be configured to monitor only a few important URLs and report other page loads only on the server/site level.

In this way, CAS capacity measured in hits or pages loaded per day may reach hundreds of millions. The actual capacity of DC RUM depends on the level of granularity expected from the measurement data (URLs, how many URLs, users, user aggregation).

Bandwidth Requirements

Bandwidth requirements for the CAS and ADS can vary, depending on what software services are monitored by the AMD. Every monitoring interval (five minutes by default), the CAS downloads a CSV (comma-separated value) file of about 3 MB (typically) to 30 MB using ZIP compression. This means that all available bandwidth is occupied for about a second, and then there is no network traffic from the report server for another five minutes. On average, this means approximately 10 to 100 kbps of network utilization. ADS files are about ten times larger than those downloaded by the CAS, so the ADS may require ten times more bandwidth.

 

  • No labels