Securing an ASP.Net application

30. January 2003 15:24 by Chris in dev  //  Tags:   //   Comments (0)

Note that this article was first published on 02/01/2003. The original article is available on DotNetJohn, where the code is also available for download. 

Introduction

This article considers and develops a reasonably secure login facility for use within an Internet application utilizing the inbuilt features of ASP.Net. This login facility is intended to protect an administrative section of an Internet site where there are only a limited number of users who will have access to that section of the site. The rest of the site will be accessible to unauthorized users. This problem specification will guide our decision-making.

Also presented are suggestions as to how this security could be improved if you cross the boundary of ASP.Net functionality into supporting technologies. Firstly, however I'll provide an overview of web application security and the features available in ASP.Net, focusing particularly on forms based authentication, as this is the approach we shall eventually use as the basis for our login facility.

Pre-requisites for this article include some prior knowledge of ASP.Net (web.config, security, etc.) and related technologies (e.g. IIS) as well as a basic understanding of general web and security related concepts, e.g. HTTP, cookies.

Web application security: authentication and authorization

Different web sites require different levels of security. Some portions of a web site commonly require password-protected areas and there are many ways to implement such security, the choice largely dependent on the problem domain and the specific application requirements.

Security for web applications is comprised of two processes: authentication and authorization. The process of identifying your user and authenticating that they are who they claim they are is authentication. Authorization is the process of determining whether the authenticated user has access to the resource they are attempting to access.

The authentication process requires validation against an appropriate data store, commonly called an authority, for example an instance of Active Directory.

ASP.Net provides authorization services using both the URL and the file of the requested resource. Both checks must be successful for the user to be allowed to proceed to access said resource.

Authentication via ASP.Net

ASP.Net arrives complete with the following authentication providers that provide interfaces to other levels of security existing within and/ or external to the web server computer system:

  • integrated windows authentication using NTLM or Kerberos.
  • forms based authentication
  • passport authentication

As with other configuration requirements web.config is utilized to define security settings such as:

  • the authentication method to use
  • the users who are permitted to use the application
  • how sensitive data should be encrypted

Looking at each authentication method in turn with a view to their use in our login facility:

Integrated Windows

This is a secure method but it is only supported by Internet Explorer and therefore most suited to intranet situations where browser type can be controlled. In fact it is the method of choice for Intranet applications. Typically it involves authentication against a Windows domain authority such as Active Directory or the Security Accounts Manager (SAM) using Windows NT Challenge/ Response (NLTM).

Integrated Windows authentication uses the domain, username and computer name of the client user to generate a ‘challenge’. The client must enter the correct password which will causes the correct response to be generated and returned to the server.

In order for integrated Windows authentication to be used successfully in ASP.Net the application needs to be properly configured to do so via IIS – you will commonly want to remove anonymous access so users are not automatically authenticated via the machines IUSR account. You should also configure the directory where the protected resource is located as an application, though this may already be the case if this is the root directory of your web application.

Consideration of suitability

As integrated Windows authentication is specific to Internet Explorer it is not a suitable authentication method for use with our login facility that we have specified we wish to use for Internet applications. In such a scenario a variety of browser types and versions may provide the client for our application and we would not wish to exclude a significant percentage of our possible user population from visiting our site.

Forms based authentication

This is cookie-based authentication by another name and with a nice wrapper of functionality around it. Such authentication is commonly deemed sufficient for large, public Internet sites. Forms authentication works by redirecting unauthenticated requests to a login page (typically username and a password are collected) via which the credentials of the user are collected and validated. If validated a cookie is issued which contains information subsequently used by ASP.Net to identify the user. The longevity of the cookie may be controlled: for example you may specify that the cookie is valid only for the duration of the current user session.

Forms authentication is flexible in the authorities against which it can validate. . For example, it can validate credentials against a Windows based authority, as per integrated Windows, or other data sources such as a database or a simple text file. A further advantage over integrated Windows is that you have control over the login screen used to authenticate users.

Forms authentication is enabled in the applications web.config file, for example:

 <configuration>
   <system.web>
     <authentication mode="Forms">
       <forms name=".AUTHCOOKIE" loginURL="login.aspx" protection="All" />
     </authentication>
     <machineKey validationKey="Autogenerate" decryption key="Autogenerate" validation"SHA1" />
     <authorization>
       <deny users="?" />
   <authorization>
   </system.web>
 </configuration> 

This is mostly self-explanatory. The name element refers to the name of the cookie. The machineKey section controls the decryption that is used. In a web farm scenario with multiple web servers the key would be hard-coded to enable authentication to work. Otherwise different machines would be using different validation keys! The ‘?’ in the authorization section above by the way represents the anonymous user. An ‘*’ would indicate all users.

Within the login page you could validate against a variety of data sources. This might be an XML file of users and passwords. This is an insecure solution however so should not be used for sensitive data though you could increase security by encrypting the passwords.

Alternatively you can use the credentials element of the web.config file, which is a sub-element of the <forms> element, as follows:

 <credentials passwordFormat=”Clear>
   <user name=”Chrispassword=”Moniker/>
   <user name=”Mariapassword=”Petersburg/>
 </credentials> 

Using this method means there is very little coding for the developer to undertake due to the support provided by the .NET Framework, as we shall see a little later when we revisit this method.

Note also the passwordFormat attribute is required, and can be one of the following values:

Clear
Passwords are stored in clear text. The user password is compared directly to this value without further transformation.

MD5
Passwords are stored using a Message Digest 5 (MD5) hash digest. When credentials are validated, the user password is hashed using the MD5 algorithm and compared for equality with this value. The clear-text password is never stored or compared when using this value. This algorithm produces better performance than SHA1.

SHA1
Passwords are stored using the SHA1 hash digest. When credentials are validated, the user password is hashed using the SHA1 algorithm and compared for equality with this value. The clear-text password is never stored or compared when using this value. Use this algorithm for best security.

What is hashing? Hash algorithms map binary values of an arbitrary length to small binary values of a fixed length, known as hash values. A hash value is a unique and extremely compact numerical representation of a piece of data. The hash size for the SHA1 algorithm is 160 bits. SHA1 is more secure than the alternate MD5 algorithm, at the expense of performance.

At this time there is no ASP.Net tool for creating hashed passwords for insertion into configuration files. However, there are classes and methods that make it easy for you to create them programmatically, in particular the FormsAuthentication class. It’s HashPasswordForStoringInConfigFile method can do the hashing. At a lower level, you can use the System.Security.Cryptography classes, as well. We'll be looking at the former method later in this article.

The flexibility of the authentication provider for Forms Authentication continues as we can select SQLServer as our data source though the developer needs then to write bespoke code for validating user credentials against the database. Typically you will then have a registration page to allow users to register their login details which will then be stored in SQLServer for use when the user then returns to a protected resource and is redirected to the login page by the forms authentication, assuming the corresponding cookie is not still in existence.

This raises a further feature - we would want to give all users access to the registration page so that they may register but other resources should be protected. Additionally, there may be a third level of security, for example an admin page to list all users registered with the system. In such a situation we can have multiple system.web sections in our web.config file to support the different levels of authorization, as follows:

 <configuration>
   <system.web>
     <authentication mode="Forms">
       <forms name=".AUTHCOOKIE" loginURL="login.aspx" protection="All" />
     </authentication>
     <machineKey validationKey="Autogenerate" decryption key="Autogenerate" validation"SHA1" />
     <authorization>
       <deny users="?" />
     <authorization>
   </system.web>
 
   <location path="register.aspx">
     <system.web>
       <authorization>
         <allow users="*,?" />
       </authorization>
     </system.web>
   </location>
 
   <location path="admin.aspx">
     <system.web>
       <authorization>
         <allow users="admin " />
         <deny users="*" />
       </authorization>
     </system.web>
   </location>
 </configuration> 

Thus only the admin user can access admin.aspx, whilst all users can access register.aspx so if they don't have an account already they can register for one. Any other resource request will cause redirection to login.aspx, if a valid authentication cookie by the name of .AUTHCOOKIE isn't detected within the request. On the login page you would provide a link to register.aspx for users who require the facility.

Alternatively you can have multiple web.config files, with that for a sub-directory overriding that for the application a whole, an approach that we shall implement later for completeness.

Finally, you may also perform forms authentication in ASP.Net against a Web Service, which we won’t consider any further as this could form an article in itself, and against Microsoft Passport. Passport uses standard web technologies such as SSL, cookies and Javascript and uses strong symmetric key encryption using Triple DES (3DES) to deliver a single sign in service where a user can register once and then has access to any passport enabled site.

Consideration of suitability

Forms based authentication is a flexible mechanism supporting a variety of techniques of various levels of security. Some of the available techniques may be secure enough for implementation if extended appropriately. Some of the techniques are more suited to our problem domain than others, as we’ll discuss shortly.

In terms of specific authorities:

Passport is most appropriately utilized where your site will be used in conjunction with other Passport enabled sites and where you do not wish to maintain your own user credentials data source. This is not the case in our chosen problem domain where Passport would both be overkill and inappropriate.

SQLServer would be the correct solution for the most common web site scenario where you have many users visiting a site where the majority of content is protected. Then an automated registration facility is the obvious solution with a configuration as per the web.config file just introduced. In our chosen problem domain we have stated that we potentially have only a handful of users accounts accessing a small portion of the application functionality and hence SQLServer is not necessarily the best solution, though is perfectly viable.

Use of the credentials section of the forms element of web.config or a simple text/ XML file would seem most suitable for this problem domain. The extra security and simplicity of implementation offered by the former makes this the method of choice.

Authorization via ASP.Net

As discussed earlier this is the second stage of gaining access to a site: determining whether an authenticated user should be permitted access to a requested resource.

File authorization utilizes windows security services access control lists (ACLs) – using the authorized identity to do so. Further, ASP.Net allows further refinement based on the URL requested, as you may have recognized in the examples already introduced, as well as the HTTP request method attempted via the verb attribute, valid values of which are: GET, POST, HEAD or DEBUG. I can't think of many occasions in which you'd want to use this feature but you may have other ideas! You may also refer to windows roles as well as named users.

A few examples to clarify:

 <authorization>
   <allow users=”Chris/>
   <deny users=”Chris/>
   <deny users=”*” />
 </authorization> 

You might logically think this would deny all users access. In fact Chris still has access, as when ASP.Net finds a conflict such as this it will use the earlier declaration.

 <authorization>
   <allow roles=”Administrators/>
   <deny users=”*” />
 </authorization>
 
 <authorization>
   <allow verbs=”GET, POST/>
 </authorization> 

Impersonation

Impersonation is the concept whereby an application executes under the context of the identity of the client that is accessing the application. This is achieved by using the access token provided by IIS. You may well know that by default the ASPNET account is used to access ASP.Net resources via the Aspnet_wp.exe process. This, by necessity, has a little more power than the standard guest account for Internet access, IUSR, but not much more. Sometimes you may wish to use a more powerful account to access system resources that your application needs. This may be achieved via impersonation as follows:

 <system.web>
   <identity impersonate=”true/>
 </system.web> 

or you may specify a particular account:

 <system.web>
   <identity impersonate=”falseuserName=”domain\sullycpassword=”password/>
 </system.web> 

Of course you will need to provide the involved accounts with the necessary access rights to achieve the goals of the application. Note also that if you don’t remove IUSR from the ACLs then this is the account that will be used – this is unlikely to meet your needs as this is a less powerful account than ASPNET.

ASP.Net will only impersonate during the request handler - tasks such as executing the compiler and reading configuration data occur as the default process account. This is configurable via the <processModel> section of your system configuration file (machine.config). Care should be taken however not to use an inappropriate (too powerful) account which exposes your system to the threat of attacks.

The situation is further complicated by extra features available in IIS6 … but we’ll leave those for another article perhaps as the situation is complex enough!

Let’s move onto developing a login solution for our chosen problem domain.

Our Chosen Authentication Method – how secure is it?

We've chosen forms based authentication utilizing the web.config file as our authority. How secure is the mechanism involved? Let's consider this by examining the process in a little more detail. As a reminder, our application scenario is one of a web site where we've put content which we want to enable restricted access to in a sub-directory named secure. We have configured our web.config files to restrict access to the secure sub-directory, as described above. We deny access to the anonymous users (i.e. unauthenticated users) to the secure sub-directory:

 <authorization>
   <deny users="?" />
 </authorization> 

If someone requests a file in the secure sub-directory then ASP.Net URL authentication kicks in - ASP.Net checks to see if a valid authentication cookie is attached to the request. If the cookie exists, ASP.Net decrypts it, validates it to ensure it hasn't been tampered with, and extracts identity information that it assigns to the current request. Encryption and validation can be turned off but are enabled by default. If the cookie doesn't exist, ASP.Net redirects the request to the login page. If the login is successful, the authentication cookie is created and passed to the user’s browser. This can be configured to be a permanent cookie or a session-based cookie. Possibly slightly more secure is a session-based cookie where the cookie is destroyed when the user leaves the application or the session times out. This prevents someone else accessing the application from the user’s client machine without having to login.

Given the above scenario we have two security issues for further consideration:

    1. How secure is the cookie based access? Note above that encryption and validation are used by default. How secure are these in reality?

      Validation works exactly the same for authentication cookies as it does for view state: the <machineKey> element's validationKey is appended to the cookie, the resulting value is hashed, and the hash is appended to the cookie. When the cookie is returned in a request, ASP.Net verifies that it wasn't tampered with by rehashing the cookie and comparing the new hash to the one accompanying the cookie. Encryption works by encrypting the cookie, hash value and all with <machineKey>'s decryptionKey attribute. Validation consumes less CPU time than encryption and prevents tampering. It does not, however, prevent someone from intercepting an authentication cookie and reading its contents.

      Encrypted cookies can't be read or altered, but they can be stolen and used illicitly. Time-outs are the only protection a cookie offers against replay attacks, and they apply to session cookies only. The most reliable way to prevent someone from spoofing your site with a stolen authentication cookie is to use an encrypted communications link (HTTPS). Talking of which, this is one situation when you might want to turn off both encryption and validation. There is little point encrypting the communication again if you are already using HTTPS.

      Whilst on the subject of cookies, remember also that cookie support can be turned off via the client browser. This should also be borne in mind when designing your application.

  • How secure is the logging on procedure to a web form? Does it use clear text username and password transmission that could be susceptible to observation, capture and subsequent misuse?

Yes is the answer. Thus if you want a secure solution but don't want the overhead of encrypting communications to all parts of your site, consider at least submitting user names and passwords over HTTPS, this assuming your web hosting service provides this.

To reiterate, the forms security model allows us to configure keys to use for encryption and decryption of forms authentication cookie data. Here we have a problem - this only encrypts the cookie data - the initial login screen data, i.e. email / password is not encrypted. We are using standard HTTP transmitting data in clear text which is susceptible to interception. The only way around this is to go to HTTPS and a secure communication channel.

Which perhaps begs the question – what is the point of encrypting the cookie data if our access is susceptible anyway if we are using an unsecured communication channel? Well, if we enable cookie authentication when we first login then subsequent interaction with the server will be more secure. After that initial login a malicious attacker could not easily gain our login details and gain access to the site simply by examining the contents of the packets of information passed to and from the web server. However, note the earlier comments on cookie theft. It is important to understand these concepts and the impact our decisions have on the overall security of our application data.

It is perhaps unsurprising given the above that for the most secure applications:

  1. A secure HTTPS channel is used whenever dealing with username/ password/ related data.
  2. Cookies are not exclusively relied upon: often though recall of certain information is cookie based important transactions still require authorization via an encrypted password or number.

It is up to the application architect/ programmer to decide whether this level of security is appropriate to their system.

Finally, before we actually come up with some code remember that forms based security secures only ASP.Net resources. It doesn’t protect HTML files, for example. Just because you have secured a directory using web.config / ASP.Net doesn’t mean you have secured all files in that directory. To do this you could look at features available via IIS.

The 'Application'

Finally to the code and making our ASP.Net application as secure as possible using the facilities ASP.Net provides. Taking the above described scenario where we have a secure sub-directory the files within which we wish to protect. However, we anticipate there will only be a handful of users who will need access to the directory and hence this is a suitable problem domain to be addressed with a web.config based authority solution as earlier decided.

Starting with our web.config file. We can secure the sub-directory either via the location element, as described above, but just to demonstrate the alternative double web.config based approach, here is the web.config at the root level:

 <configuration>
   <system.web>
     <authentication mode="Forms">
       <forms name=".AUTHCOOKIE" loginUrl="login_credentials.aspx" protection="All">
         <credentials passwordFormat="Clear">
           <user name="chris" password="password" />
         </credentials>
       </forms>
     </authentication>
     <machineKey validationKey="AutoGenerate" decryptionKey="AutoGenerate" validation="SHA1" />
     <authorization>
       <allow users="*" />
     </authorization>
   </system.web>
 </configuration> 

You can see that this sets up forms based security enabling validation and encryption and specifies a credentials list of one user, currently in Cleartext format but shortly we'll see how to encrypt the password via SHA1. You'll also see that this file doesn’t actually restrict user access at all so URL based authentication will not be used at the root level of our application. However, if we extend the configuration for the secure sub-directory via an additional web.config file:

 <configuration>
   <system.web>
     <authorization>
       <deny users="?" />
     </authorization>
   </system.web>
 </configuration> 

Then if a user attempts to access an ASP.Net resource in secure they will be dealt with according to the combination of directives in the web.config file and inherited from the parent web.config file, and machine.config file for that matter.

Onto the login file: you will need form fields to allow entry of username and password data. Note that security will be further improved by enforcing minimum standards on passwords (e.g. length), which can be achieved by validation controls. There is only minimal validation in the example. Note that there is no facility to request a ‘persistent cookie’ as this provides a minor security risk. It is up to you to decide whether a permanent cookie is acceptable in your application domain.

Then in the login file, login_credentials.aspx, after allowing the user to enter username and password data, in the sub executed on the server when the submit form button is clicked we validate the entered data against the web.config credentials data, achieved simply as follows:

 If FormsAuthentication.Authenticate(Username.Value, UserPass.Value) Then 
   FormsAuthentication.RedirectFromLoginPage (UserName.Value, false) 
 Else 
   Msg.text="credentials not valid" 
 End If 

Could it be any simpler? The FormsAuthentication object knows what authority it needs to validate against as this has been specified in the web.config file. If the user details match, the code proceeds to redirect back to the secured resource and also sets the cookie for the user session based on the user name entered. The parameter 'false' indicates that the cookie should not be permanently stored on the client machine. Its lifetime will be the duration of the user session by default. This can be altered if so desired.

Back to web.config to improve the security. The details are being stored unencrypted – we can encrypt them with the aforementioned HashPasswordForStoringInConfigFile of the FormsAuthentication class, achieved simply as follows:

 Private Function encode(ByVal cleartext As String) As String
   encode = FormsAuthentication.HashPasswordForStoringInConfigFile(cleartext, "SHA1")
   Return encode
 End Function 

This is the key function of the encode.aspx file provided with the code download, which accepts a text string (the original password – ‘password’ in this case) and outputs a SHA1 encoded version care of the above function.

Thus, our new improved configuration section of our root web.config file becomes:

 <credentials passwordFormat="SHA1">
   <user name="chris" password="5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8" />
 </credentials> 

To summarize the involved files:

Root/web.config root web.config file
Root/webform1.aspx test page
Root/login_credentials.aspx login page
 
Root/encode.aspx form to SHA1 encode a password for <credentials>
Root/secure/web.config directives to override security for this sub-directory to deny anonymous access
Root/secure/webform1.aspx test page

Conclusions

We’ve looked at the new security features of ASP.Net focusing particularly on an application scenario where forms based authentication uses the credentials section of web.config, but presenting this in the context of wider security issues.

In summary you should consider forms based authentication when:

  • User names and passwords are stored somewhere other than Windows Accounts (it is possible to use forms authentication with Windows Accounts but in this case Integrated Windows authentication may well be the best choice).
  • You are deploying your application over the Internet and hence you need to support all browsers and client operating systems.
  • You want to provide your own user interface form as a logon page.

You should not consider forms based authentication when:

  • You are deploying an application on a corporate intranet and can take advantage of the more secure Integrated Windows authentication.
  • You are unable to perform programmatic access to verify the user name and password.

Further security considerations for forms based authentication:

  • If users are submitting passwords via the logon page, you can (should?) secure the channel using SSL to prevent passwords from being easily obtained by hackers.
  • If you are using cookies to maintain the identity of the user between requests, you should be aware of the potential security risk of a hacker "stealing" the user's cookie using a network-monitoring program. To ensure the site is completely secure when using cookies you must use SSL for all communications with the site. This will be an impractical restriction for most sites due to the significant performance overhead. A compromise available within ASP.Net is to have the server regenerate cookies at timed intervals. This policy of cookie expiration is designed to prevent another user from accessing the site with a stolen cookie.

Finally, different authorities are appropriate for form-based authentication for different problem domains. For our considered scenario where the number of users was limited as we were only protecting a specific administrative resource credentials / XML file based authorities are adequate. For a scenario where all site information is ‘protected’ a database authority is most likely to be the optimal solution.

References

ASP.Net: Tips, Tutorial and Code
Scott Mitchell et al.
Sams

.Net SDK documentation

Various online articles, in particular:

ASP.Net Security: An Introductory Guide to Building and Deploying More Secure Sites with ASP.Net and IIS -- MSDN Magazine, April 2002
http://msdn.microsoft.com/msdnmag/issues/02/04/ASPSec/default.aspx
An excellent and detailed introduction to IIS and ASP.Net security issues.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/authaspdotnet.asp
Authentication in ASP.Net: .Net Security Guidance

You may download the code here.

Page and Data Caching in .Net

15. January 2003 15:17 by Chris in dev  //  Tags:   //   Comments (0)

Note that this article was first published on 02/01/2003. The original article is available on DotNetJohn, where the code is also available for download and execution.

 

Introduction

In this article we’re going to take a look at the features available to the ASP.NET programmer that enable performance improvement via the use of caching. Caching is the keeping of frequently used data in memory for ready access by your ASP.NET application. As such caching is a resource trade-off between those needed to obtain the data and those needed to store the data. You should be aware that there is this trade-off - there is little point caching data that is going to be requested infrequently as this is simply wasteful of memory and may have a negative impact on your system performance. However, on the other hand, if there is data that is required every time a user visits the home page of your application and this data is only going to change once a day, then there are big resource savings to be made by storing this data in memory rather than retrieving this data every time a user hits that homepage. This even considering that it is likely that the DBMS will also be doing it’s own caching. Typically you will want to try and minimise requests to your data store as, again typically, these will be the most resource hungry operations associated with your application.

In ASP.NET there are two areas where caching techniques arise:

  • Caching of rendered pages, page fragments or WebService output: termed ‘output caching’. Output caching can be implemented either declaratively or programmatically.
  • Caching of data / data objects programmatically via the cache class.

We'll return to the Cache class later in the article, but let’s focus on Output Caching to start with and Page Output Caching in particular.

You can either declaratively use the Output Caching support available to web forms/pages, page fragments and WebServices as part of their implementation or you can cache programmatically using the HttpCachePolicy class exposed through the HttpResponse.Cache property available within the .NET Framework. I'll not look at WebServices options in any detail here, only mentioning that the WebMethod attribute that is assigned to methods to enable them as Webservices has a CacheDuration attribute which the programmer may specify.

Page Output Caching

Let’s consider a base example and then examine in a little detail the additional parameters available to us programmers. To minimally enable caching for a web forms page (or user controls) you can use either of the following:

1. Declarative specification via the @OutputCache directive e.g.:

 <%@ OutputCache Duration="120" VaryByParam="none" %>  
 

 

2. Programmatic specification via the Cache property of the HttpResponse class, e.g.:

 Response.Cache.SetExpires(datetime,now,addminutes(2))
 Response.Cache.SetCacheability(HttpCacheability.Public) 

 

These are equivalent and will cache the page for 2 minutes. What does this mean exactly? When the document is initially requested the page is cached. Until the specified expiration all page requests for that page will be served from the cache. On cache expiration the page is removed from the cache. On the next request the page will be recompiled, and again cached.

In fact @OutputCache is a higher-level wrapper around the HttpCachePolicy class exposed via the HttpResponse class so rather than just being equivalent they ultimately resolve to exactly the same IL code.

Looking at the declarative example and explaining the VaryByParam="none". HTTP supports two methods of maintaining state between pages: POST and GET. Get requests are characterised by the use of the query string to pass parameters, e.g. default.aspx?id=1&name=chris, whereas post indicates that the parameters were passed in the body of the HTTP request. In the example above caching for such examples based on parameters is disabled. To enable, you would set VaryByParam to be ‘name’, for example – or any parameters on which basis you wish to cache. This would cause the creation of different cache entries for different parameter values. For example, the output of default.aspx?id=2&name=maria would also be cached. Note that the VaryByParam attribute is mandatory.

Returning to the programmatic example and considering when you would choose this second method over the first. Firstly, as it’s programmatic, you would use this option when the cache settings needed to be set dynamically. Secondly, you have more flexibility in option setting with HttpCachePolicy as exposed by the HttpResponse.cache property.

You may be wondering exactly what

 Response.Cache.SetCacheability(HttpCacheability.Public) 

achieves. This sets the cache control HTTP header - here to public - to specify that the response is cacheable by clients and shared (proxy) caches - basically everybody can cache it. The other options are nocache, private and server.

We’ll return to Response.Cache after looking at the directive option in more detail.

The @OutputCache Directive

First an example based on what we've seen thus far: output caching based on querystring parameters:

Note this example requires connectivity to a standard SQLServer installation, in particular the Northwind sample database. You maye need to change the string constant strConn to an appropriate connection string for your system for the sample code presented in this article to work. If you have no easy access to SQLServer, you could load some data in from an XML file or simply pre-populate a datalist (for example) and bind the datagrid to this datastructure.

output_caching_directive_example.aspx

 <%@ OutputCache Duration="30" VaryByParam="number" %>
 <%@ Import Namespace="System.Data" %>
 <%@ Import Namespace="System.Data.SqlClient" %>
 
 <html>
 <head></head>
 <body>
 
 <a href="output_caching_directive_example.aspx?number=1">1</a>-
 <a href="output_caching_directive_example.aspx?number=2">2</a>-
 <a href="output_caching_directive_example.aspx?number=3">3</a>-
 <a href="output_caching_directive_example.aspx?number=4">4</a>-
 <a href="output_caching_directive_example.aspx?number=5">5</a>-
 <a href="output_caching_directive_example.aspx?number=6">6</a>-
 <a href="output_caching_directive_example.aspx?number=7">7</a>-
 <a href="output_caching_directive_example.aspx?number=8">8</a>-
 <a href="output_caching_directive_example.aspx?number=9">9</a>
 
 <p>
 <asp:Label id="lblTimestamp" runat="server" maintainstate="false" />
 <p>
 <asp:DataGrid id="dgProducts" runat="server" maintainstate="false" />
 
 </body>
 </html>
 

 <script language="vb" runat="server">
 
 const strConn = "server=localhost;uid=sa;pwd=;database=Northwind"
 
 Sub Page_Load(sender as Object, e As EventArgs)
 
   If Not Request.QueryString("number") = Nothing Then
     lblTimestamp.Text = DateTime.Now.TimeOfDay.ToString()
 
     dim SqlConn as new SqlConnection(strConn)
     dim SqlCmd as new SqlCommand("SELECT TOP " _
       & Request.QueryString("number") & _
       " * FROM Products", SqlConn)
     SqlConn.Open()
 
     dgProducts.DataSource = SqlCmd.ExecuteReader(CommandBehavior.CloseConnection)
     Page.DataBind()
 
   End If
 End Sub
 
 </script> 

Thus, if you click through some of the links to the parameterised pages and then return to them you will see the timestamp remains the same for each parameter setting until the 30 seconds has elapsed when the data is loaded again. Further caching is performed per parameter file, as indicated by the different timestamps.

The full specification of the OutputCache directive is:

 <%@ OutputCache Duration="#ofseconds" 
                   Location="Any | Client | Downstream | Server | None" 
                   VaryByControl="controlname" 
                   VaryByCustom="browser | customstring" 
                   VaryByHeader="headers" 
                   VaryByParam="parametername" %> 

Examining these attributes in turn:

Duration
This is the time, in seconds, that the page or user control is cached. Setting this attribute on a page or user control establishes an expiration policy for HTTP responses from the object and will automatically cache the page or user control output. Note that this attribute is required. If you do not include it, a parser error occurs.

Location
This allows control of from where the client receives the cached document and should be one of the OutputCacheLocation enumeration values. The default is Any. This attribute is not supported for @OutputCache directives included in user controls. The enumeration values are:
Any: the output cache can be located on the browser client (where the request originated), on a proxy server (or any other server) participating in the request, or on the server where the request was processed.
Client: the output cache is located on the browser client where the request originated.
Downstream: the output cache can be stored in any HTTP 1.1 cache-capable devices other than the origin server. This includes proxy servers and the client that made the request.
None: the output cache is disabled for the requested page.
Server: the output cache is located on the Web server where the request was processed.

VaryByControl
A semicolon-separated list of strings used to vary the output cache. These strings represent fully qualified names of properties on a user control. When this attribute is used for a user control, the user control output is varied to the cache for each specified user control property. Note that this attribute is required in a user control @OutputCache directive unless you have included a VaryByParam attribute. This attribute is not supported for @OutputCache directives in ASP.NET pages.

VaryByCustom
Any text that represents custom output caching requirements. If this attribute is given a value of browser, the cache is varied by browser name and major version information. If a custom string is entered, you must override the HttpApplication.GetVaryByCustomString method in your application's Global.asax file. For example, if you wanted to vary caching by platform you would set the custom string to be ‘Platform’ and override GetVaryByCustomString to return the platform used by the requester via HttpContext.request.Browser.Platform.

VaryByHeader
A semicolon-separated list of HTTP headers used to vary the output cache. When this attribute is set to multiple headers, the output cache contains a different version of the requested document for each specified header. Example headers you might use are: Accept-Charset, Accept-Language and User-Agent but I suggest you consider the full list of header options and consider which might be suitable options for your particular application. Note that setting the VaryByHeader attribute enables caching items in all HTTP/1.1 caches, not just the ASP.NET cache. This attribute is not supported for @OutputCache directives in user controls.

VaryByParam
As already introduced this is a semicolon-separated list of strings used to vary the output cache. By default, these strings correspond to a query string value sent with GET method attributes, or a parameter sent using the POST method. When this attribute is set to multiple parameters, the output cache contains a different version of the requested document for each specified parameter. Possible values include none, *, and any valid query string or POST parameter name. Note that this attribute is required when you output cache ASP.NET pages. It is required for user controls as well unless you have included a VaryByControl attribute in the control's @OutputCache directive. A parser error occurs if you fail to include it. If you do not want to specify a parameter to vary cached content, set the value to none. If you want to vary the output cache by all parameter values, set the attribute to *.

Returning now to the programmatic alternative for Page Output Caching:

Response.Cache

As stated earlier @OutputCache is a higher-level wrapper around the HttpCachePolicy class exposed via the HttpResponse class. Thus all the functionality of the last section is also available via HttpResponse.Cache. For example, our previous code example can be translated as follows to deliver the same functionality:

 

output_caching_programmatic_example.aspx

 <%@ Import Namespace="System.Data" %>
 <%@ Import Namespace="System.Data.SqlClient" %>
 
 <html>
 <head></head>
 <body>
 
 <a href="output_caching_programmatic_example.aspx?number=1">1</a>-
 <a href="output_caching_programmatic_example.aspx?number=2">2</a>-
 <a href="output_caching_programmatic_example.aspx?number=3">3</a>-
 <a href="output_caching_programmatic_example.aspx?number=4">4</a>-
 <a href="output_caching_programmatic_example.aspx?number=5">5</a>-
 <a href="output_caching_programmatic_example.aspx?number=6">6</a>-
 <a href="output_caching_programmatic_example.aspx?number=7">7</a>-
 <a href="output_caching_programmatic_example.aspx?number=8">8</a>-
 <a href="output_caching_programmatic_example.aspx?number=9">9</a>
 
 <p>
 <asp:Label id="lblTimestamp" runat="server" maintainstate="false" />
 
 <p>
 
 <asp:DataGrid id="dgProducts" runat="server" maintainstate="true" />
 
 </body>
 </html>
 

 <script language="vb" runat="server">
 
 const strConn = "server=localhost;uid=sa;pwd=;database=Northwind"
 
 Sub Page_Load(sender as Object, e As EventArgs)
 
   Response.Cache.SetExpires(dateTime.Now.AddSeconds(30))
   Response.Cache.SetCacheability(HttpCacheability.Public)
   Response.Cache.VaryByParams("number")=true
 
   If Not Request.QueryString("number") = Nothing Then
 
     lblTimestamp.Text = DateTime.Now.TimeOfDay.ToString()
 
     dim SqlConn as new SqlConnection(strConn)
     dim SqlCmd as new SqlCommand("SELECT TOP " _
       & Request.QueryString("number") & " * FROM Products", SqlConn)
     SqlConn.Open()
 
     dgProducts.DataSource = SqlCmd.ExecuteReader(CommandBehavior.CloseConnection)
     Page.DataBind()
 
   End If
 End Sub
 
 </script> 

The three lines of importance are:

Response.Cache.SetExpires(dateTime.Now.AddSeconds(30))
Response.Cache.SetCacheability(HttpCacheability.Public)
Response.Cache.VaryByParams("number")=true

It is only the third line you’ve not seen before. This is equivalent to VaryByParam="number" in the directive example. Thus you can see that the various options of the OutputCache directive are equivalent to different classes exposed by Response.Cache. Apart from the method of access the pertinent information is, unsurprisingly, very similar to that presented above for the directive version.

Thus, in addition to VaryByParams there is a VaryByHeaders class as well as a SetVaryByCustom method. If you are interested in the extra functionality exposed via these and associated classes I would suggest you peruse the relevant sections of the .NET SDK documentation.

Fragment Caching

Fragment caching is really a minor variation of page caching and almost all of what we’ve described already is relevant. The ‘fragment’ referred to is actually one or more user controls included on a parent web form. Each user control can have different cache durations. You simply specify the @OutputCache for the user controls and they will be cached as per those specifications. Note that any caching in the parent web form overrides any specified in the included user controls. So, for example, if the page is set to 30 secs and the user control to 10 the user control cache will not be refreshed for 30 secs.

It should be noted that of the standard options only the VaryByParam attribute is valid for controlling caching of controls. An additional attribute is available within user controls: VaryByControl, as introduced above, allowing multiple representations of a user control dependent on one or more of its exposed properties. So, extending our example above, if we implemented a control that exposed the SQL query used to generate the datareader which is bound to the datagrid we could cache on the basis of the property which is the SQL string. Thus we can create powerful controls with effective caching of the data presented.

Programmatic Caching: using the Cache Class to Cache Data

ASP.NET output caching is a great way to increase performance in your web applications. However, it does not give you control over caching data or objects that can be shared, e.g. sharing a dataset from page to page. The cache class, part of the system.web.caching namespace, enables you to implement application-wide caching of objects rather than page wide as with the HttpCachePolicy class. Note that the lifetime of the cache is equivalent to the lifetime of the application. If the IIS web application is restarted current cache settings will be lost.

The public properties and methods of the cache class are:

Public Properties

Count: gets the number of items stored in the cache.

Item: gets or sets the cache item at the specified key.

Public Methods

Add: adds the specified item to the Cache object with dependencies, expiration and priority policies, and a delegate you can use to notify your application when the inserted item is removed from the Cache.

Equals: determines whether two object instances are equal.

Get: retrieves the specified item from the Cache object.

GetEnumerator: retrieves a dictionary enumerator used to iterate through the key settings and their values contained in the cache.

GetHashCode: serves as a hash function for a particular type, suitable for use in hashing algorithms and data structures like a hash table.

GetType: gets the type of the current instance.

Insert: inserts an item into the Cache object. Use one of the versions of this method to overwrite an existing Cache item with the same key parameter.

Remove: removes the specified item from the application's Cache object.

ToString: returns a String that represents the current Object.

We'll now examine some of the above to varying levels of detail, starting with the most complex, the insert method:

Insert

Data is inserted into the cache with the Insert method of the cache object. Cache.Insert has 4 overloaded methods with the following signatures:

Overloads Public Sub Insert(String, Object)

Inserts an item into the Cache object with a cache key to reference its location and using default values provided by the CacheItemPriority enumeration.

Overloads Public Sub Insert(String, Object, CacheDependency)

Inserts an object into the Cache that has file or key dependencies.

Overloads Public Sub Insert(String, Object, CacheDependency, DateTime, TimeSpan)

Inserts an object into the Cache with dependencies and expiration policies.

Overloads Public Sub Insert(String, Object, CacheDependency, DateTime, TimeSpan, CacheItemPriority, CacheItemRemovedCallback)

Inserts an object into the Cache object with dependencies, expiration and priority policies, and a delegate you can use to notify your application when the inserted item is removed from the Cache.

Summary of parameters:

String the name reference to the object to be cached
Object the object to be cached
CacheDependency file or cache key dependencies for the new item
Datetime indicates absolute expiration
Timespan sliding expiration – object removed if greater than timespan after last access
CacheItemPriorities an enumeration that will decide order of item removal under heavy load
CacheItemPriorityDecay an enumeration; items with a fast decay value are removed if not used frequently
CacheItemRemovedCallback a delegate that is called when an item is removed from the cache

Picking out one of these options for further mention: CacheDependency. This attribute allows the validity of the cache to be dependent on a file or another cache item. If the target of such a dependency changes, this can be detected. Consider the following scenario: an application reads data from an XML file that is periodically updated. The application processes the data in the file and represents this via an aspx page. Further, the application caches that data and inserts a dependency on the file from which the data was read. The key aspect is that when the file is updated .NET recognizes the fact as it is monitoring this file. The programmer can interrogate the CacheDependency object to check for any updates and handle the situation accordingly in code.

Remove

Other methods of the cache class expose a few less parameters than Insert. Cache.Remove expects a single parameter – the string reference value to the Cache object you want to remove.

 Cache.Remove(“MyCacheItem”) 

Get

You can either use the get method to obtain an item from the cache or use the item property. Further, as the item property is the default property, you do not have to explicitly request it. Thus the latter three lines below are equivalent:

 Cache.Insert(“MyCacheItem”, Object)
 Dim obj as object
 obj = Cache.get(“MyCacheItem”)
 obj = Cache.Item("MyCacheItem")
 obj = Cache(“MyCacheItem”) 

GetEnumerator

Returns a dictionary (key/ value pairs) enumerator enabling you enumerate through the collection, adding and removing items as you do so if so inclined. You would use as follows:

 dim myEnumerator as IDictionaryEnumerator
 myEnumerator=Cache.GetEnumerator()
 
 While (myEnumerator.MoveNext)
   Response.Write(myEnumerator.Key.ToString() & “<br>”)
   'do other manipulation here if so desired
 End While 

An Example

To finish off with an example, we’ll cache a subset of the data from our earlier examples using a cache object.

cache_class_example.aspx

 <%@ Import Namespace="System.Data" %>
 <%@ Import Namespace="System.Data.SqlClient" %>
 
 <html>
 <head></head>
 <body>
 <asp:datagrid id="dgProducts" runat="server" maintainstate="false" />
 </body>
 </html>
 

 <script language="vb" runat="server">
 
 public sub Page_Load(sender as Object, e as EventArgs)
 
   const strConn = "server=localhost;uid=sa;pwd=;database=Northwind"
 
   dim dsProductsCached as object = Cache.Get("dsProductsCached")
 
   if dsProductsCached is nothing then
 
     Response.Write("Retrieved from database:")
     dim dsProducts as new DataSet()
     dim SqlConn as new SqlConnection(strConn)
     dim sdaProducts as new SqlDataAdapter("select Top 10 * from products", SqlConn)
     sdaProducts.Fill(dsProducts, "Products")
     dgProducts.DataSource = dsProducts.Tables("Products").DefaultView
 
     Cache.Insert("dsProductsCached", dsProducts, nothing, _
       DateTime.Now.AddMinutes(1), TimeSpan.Zero)
 
   else
 
     Response.Write("Cached:")
 
     dgProducts.DataSource = dsProductsCached
 
   end if
 
   DataBind()
 
 end sub </script> 

The important concept here is that if you view the above page, then within 1 minute save and view the same page after renaming it, you will receive the cached version of the data. Thus the cache data is shared between pages/ visitors to your web site.

Wrapping matters up

A final few pointers for using caching, largely reinforcing concepts introduced earlier, with the latter two applying to the use of the cache class:

  • Don't cache everything: caching uses memory resources - could these be better utilized elsewhere? You need to trade-off whether to regenerate items, or store them in memory.
  • Prioritise items in the cache: if memory is becoming a limited system resource .NET may need to release items from the cache to free up memory. Each time you insert something into the cache, you can use the overloaded version of Insert that allows you to indicate how important it is that the item is cached to your application. This is achieved using one of the CacheItemPriority enumeration values.
  • Configure centrally. To maximize code clarity and ease of maintenance store your cache settings, and possibly also instantiate your cache objects, in a key location, for example within global.asax.

I hope this article has served as a reasonably complete consideration of the caching capabilities available within ASP.NET and you are now aware, if you were not before, of the considerable possible performance savings available reasonably simply via the provided functionality. If you have any comments on the article, particularly if you believe there are any errors that should be corrected let me know at chris.sully@cymru-web.net.

References

ASP.NET: Tips, Tutorial and Code
Scott Mitchell et al.
Sams

Professional ASP.NET
Sussman et al.
Wrox

.NET SDK documentation

Various online articles

You may run output_caching_directive_example.aspx here.
You may run output_caching_programmatic_example.aspx here.
You may run cache_class_example.aspx here.
You may download the code here.

Understanding How to Use XSL Transforms

2. January 2003 15:11 by Chris in dev  //  Tags:   //   Comments (0)

Note that this article was first published on 02/01/2003. The original article is available on DotNetJohn, where the code is also available for download and execution.

Original abstract: XSLT, XPATH and how to apply the concepts in .NET. Examines the concept of transformation and how an XSLT stylesheet defines a transformation by describing the relationship between an input tree and an output tree. Continues to look at the structure of a stylesheet, its main sub-components and introduces examples of what you might expect to see therein. Finally, the article examines how to utilise XSLT stylesheets in .NET.

Knowledge assumed: reasonable understanding of XML and ASP.NET / VB.NET.

Introduction

XML represents a widely accepted mechanism for representing data in a platform-neutral manner. XSLT is the XML based language that has been designed to allow transformation of XML into other structures and formats, such as HTML or other XML documents. XSLT is a template-based language that works in collaboration with the XPath language to transform XML documents.

Note that not all applications are suited to such an approach though there are benefits to be derived in all but the simplest problem domains. Suitable applications for implementation with XML/ XSLT are

  • those that require different views of the same data – hence delivering economies of scale to the developer/ organisation.
  • those where maintaining the distinction between data and User Interface elements (UI) is an important consideration – for example for facilitating productivity through specialisation within a development team.

 

.NET provides an XSLT processor which can take as input XML and XSLT documents and, via matching nodes with specified output templates, produce an output document with the desired structure and content.

I’ll examine the processor and the supporting classes as far as XSLT within .NET is concerned in the latter half of this article. First, XSLT:

XSLT

I’m only going to be able to scratch the surface of the XSLT language here but I shall attempt to highlight a few of the key concepts. It is important to remember that XSLT is a language in its own right and, further, it is one in transition only having been around for a few years now. It’s also a little different in mechanism to most you may have previously come across. XSLT is basically a declarative pattern matching language, and as such requires a different mindset and a little getting used to. It’s (very!) vaguely like SQL or Prolog in this regard. Saying that, there are ways to ‘hook in’ more conventional procedural code.

If its not too late, now would be a good time to get round to stating what the acronym XSLT stands for: eXtensible Stylesheet Language: Transformations. XSLT grew from a bigger language called XSL – as it developed the decision was made to split XSL into areas corresponding to XSLT for defining the structural transformations, and ‘the rest’ which is the formatting process of rendering the output. This may commonly be as pixels on a screen, for example, but could also be several other alternatives. ‘The rest’ is still officially called XSL, though has also come to be known as XSL-FO (XSL Formatting objects). That’s the last time we’ll mention XSL-FO.

As XSLT developed it became apparent that there was overlap between the expression syntax in XSLT for selecting parts of a document (XPath), and the XPointer language being developed for linking one document to another. The sensible decision was made to define a single language to undertake both purposes. XPath acts as a sub-language within an XSLT stylesheet. An XPath expression may be used for a variety of functions but typically it is employed to identify parts of the input XML document for subsequent processing. I’ll make no significant further effort in the following discourse to emphasise the somewhat academic distinction between XPath and XSLT, the former being such an important, and integral, component of the latter.

A typical XSLT stylesheet consists of a sequence of template rules, defining how elements should be processed when encountered in the XML input file. In keeping with the declarative nature of the XSLT language, you specify what outputs should be produced by particular input patterns, as distinct from a procedural model where you define the sequence of tasks to be performed.

A tree model similar to the XML DOM is employed by both XSLT and XPath. The different types existing in an XML document can be represented by different types of node in a tree view. In XPath the root node is not an element, the root is the parent of the outermost element, representing the document as a whole. The XSLT tree model can represent every well-formed XML document, as well as documents that are not well formed according to the W3C.

An XPath tree is made up 7 types of node largely corresponding to elements in the XML source document: root, element, text, attribute, comment, processing instruction and namespace. Each node has metadata created from the source document, in accordance with the type of node under consideration. Considering the node type in a little more detail:

As already mentioned the root node is a singular node that should not be confused with the document element – an outermost element that contains all elements in a valid XML document.

Element and attribute refer to your XML entities, e.g.

 <product id=’1type=’book>XSLT for Beginners</product> 

product is an element and id and type are attributes.

Comments nodes represent comments in the XML source written between <!-- and -->. Similarly processing instructions are represented in thw XML source between <? and ?> tags. Note, however, that the XML commonly found as the first element of the XML document is only impersonating a processing instruction – it is not represented as a node in the tree.

A text node is a sequence of characters in the PCDATA (‘parsed character’ data) part of an element.

The XML source tree is converted to a result tree via transformation by the XSLT processor using the instructions of the XSL stylesheet. Time for an example or two:

Most stylesheets contain a number of template rules of the form:

 <xsl:template match="/">
   <xsl:message>Started!</xsl:message>
   <html>
     . . . do other stuff . . .
   </html>
 </xsl:template> 

where the . . . do other stuff . . . might contain further template bodies to undertake further processing, e.g.

 <xsl:template match="/">
   <xsl:message>Started!</xsl:message>
   <html>
     <head></head>
     <body>
       <xsl:apply-templates>
     </body>
   </html>
 </xsl:template> 

As previously stated, both the input document and output document are represented by a tree structure. So, the <body> element above is a literal element that is simply copied over from the stylesheet to the result tree.

<xsl:apply-templates/> means select all the children of the current node in the source tree, finding the matching template rule for each one in the stylesheet and apply it. The results depend on the content of both the stylesheet and the XML document under consideration. Actually, if there is no template for the root node, the built in template is invoked which processes all the children of the root node.

Thus, the simplest way to process an XML document is to write a template rule for each kind of node that might be encountered, or at least that we are interested in and want to process. This is an example of ‘push’ processing and can be considered to be similar logically to Cascading StyleSheets (CSS) where one document defines the structure (HTML/ XML), and the second (the stylesheet) defines the appearance within this structure. The output is conditional on the structure of the XML document.

Push processing works well when the output is to have the same structure and sequence of data as the input, and the input data is predictable.

Listing 1: simple XML file: books.xml

 <?xml version="1.0"?>
 <Library>
   <Book>
     <Title>XSLT Programmers Reference</Title>
     <Publisher>Wrox</Publisher>
     <Edition>2</Edition>
     <Authors>
       <Author>Kay, Michael</Author>
     </Authors>
     <PublishedDate>April 2001</PublishedDate>
     <ISBN>1-816005-06-7</ISBN>
   </Book>
   <Book>
     <Title>Dynamical systems and fractals</Title>
     <Publisher>Cambridge University Press</Publisher>
     <Authors>
       <Author>Becker, Karl-Heinz</Author>
       <Author>Dorfler, Michael</Author>
       <Author>David Sussman</Author>
     </Authors>
     <PublishedDate>1989</PublishedDate>
     <ISBN>0-521-36910-X</ISBN>
   </Book>
 </Library> 

Listing 2: Example of push processing of books.xml: example1.xslt

 <?xml version="1.0"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
 <xsl:template match="Library">
   <html>
   <head></head>
   <body>
   <h1>Library</h1>
   <table border="1">
   <tr>
   <td><b>Title</b></td>
   <td><b>PublishedDate</b></td>
   <td><b>Publisher</b></td>
   </tr>
   <xsl:apply-templates/>
   </table>
   </body>
   </html>
 </xsl:template>
 
 <xsl:template match="Book">
   <tr>
     <xsl:apply-templates select="Title"/>
     <xsl:apply-templates select="PublishedDate"/>
     <xsl:apply-templates select="Publisher"/>
   </tr>
 </xsl:template>
 
 <xsl:template match="Title | PublishedDate | Publisher ">
   <td><xsl:value-of select="."/></td>
 </xsl:template>
 
 </xsl:stylesheet>

Note that in the book template I’ve used <xsl:apply-templates select=" … rather than just <xsl:apply-templates … This is because there is data in the source XML in which we are not interested and if we just let the built in template rules do their stuff the additional data would be copied across to the output tree. I’ve already mentioned the existence of built in template rules: when apply-templates is invoked to process a node and there is no matching template rule in the stylesheet a built in template rule is used, according to the type of the node. For example, for elements apply-templates is called on child nodes and for text and element nodes the text is copied over to the result tree. Try making the modification and viewing the results.

Using the select attribute of apply-templates is one solution - being more careful about which nodes to process (rather than just saying ‘process all children of the current node). Another is to be more precise about how to process them (rather than just saying ‘choose the best-fit template rule). This is termed ‘pull’ processing and is achieved using the value-of command:

<xsl:value-of select=”price” />

In this alternative pull model the stylesheet provides the structure and the document acts wholly as a data source. Thus a ‘pull’ version of the above example would be:

Listing 3: Example of push processing of books.xml: example2.xslt

 <?xml version="1.0"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
 <xsl:template match="/">
   <xsl:apply-templates/>
 </xsl:template>
 
 <xsl:template match="Library">
   <html>
   <head></head>
   <body>
   <h1>Library</h1>
   <table border="1">
   <tr>
   <td><b>Title</b></td>
   <td><b>PublishedDate</b></td>
   </tr>
     <xsl:apply-templates/>
   </table>
   </body>
   </html>
 </xsl:template>
 
 <xsl:template match="Book">
   <tr>
   <td><xsl:value-of select="Title"/></td>
   <td><xsl:value-of select="PublishedDate"/></td>
   </tr>
 </xsl:template>
 
 </xsl:stylesheet> 

These two examples are not hugely different but it is quite important you understand the small but important differences for future situations when you encounter more complex source documents and stylesheets. You can rely on the structure of the XML source document using template matching (push) or explicitly select elements, pulling them into the output document.

Other commands/ points worthy of note at this juncture (there are hundreds more for you to explore) are:

<xsl: for-each> which as you might guess, performs explicit processing of each of the specified nodes in turn.

<xsl: call-templates> invokes a specific template by name, rather than relying on pattern matching.

<xsl: apply-templates> can also take a mode attribute which allows you to make multiple passes through the XML data representation.

I’ve briefly introduced the basic XSLT concepts and, in particular, the push and pull models. The pull model is characterised by a few large templates and use of the <xsl:value-of> element so that the stylesheet controls the order of items in the output. In comparison the push model tends more towards smaller templates with the output largely following the structure of the XML source document.

I mentioned earlier that XSLT is often thought of as a declarative language. However, it also contains the flow control and looping instructions consistent with a procedural language. Typically, a push model stylesheet emphasizes the declarative aspects of the language, while the pull model emphasizes the procedural aspects.

Note the use of the word ‘typical’ - most stylesheets will contain elements of both push and pull models. However, it is useful to keep the two models in mind as it can make your stylesheet development simpler.

There you have it – we’ve scratched the surface of the XSLT and XPath languages and I’ll leave you to explore further. Both Wrox and O’Reilly have several books on the subject that have been well reviewed … take your pick if you want to delve deeper. Let me know if you’d like me to write another article on XSLT, building on what I’ve introduced here.

Time to see what .NET has to offer.

XSLT in .NET

First point of note: you can perform XSLT processing on the server or client (assuming your client browser has an XSLT processor). The usual client vs. server arguments pervade here: chiefly you’d like to utilise the processing power of the client machine rather than tying up server resources but can you be sure the client browser population is fit for purpose? If the answer to the latter is yes – the main requirement being that the XSLT you’ve written doesn’t generate errors in the client browser processor – then you can simply reference the XSLT stylesheet from the XML file and the specified transformation will be undertaken.

Returning to the server side processing options: you won’t be surprised to learn that it is the system.xml namespace where the classes and other namespaces relating to XSLT are found. The main ones are:

1. XpathDocument (system.xml.xpath)
This provides the faster option for XSLT transformation as it provides read only, cursor style access to XML data via its DOM. It has no public properties or methods to remember but does have several different constructors by which it accepts the following objects: XmlTextReader, textReader, stream and string path to an XML document.

2. XslTransform (system.xml.xsl)
This is the XSLT processor and hence the key class of interest to us. Three main steps to utilise: instantiate the transform object, loads the XSLT document into it and then transform the required XML document (accessed via the XPathDocument object created for the purpose).

3. XsltArgumentList:
Allows provision of parameters to XslTransform. XSLT defines a xsl:param element that can be used to hold information passed into the stylesheet from the XSLT processor. XslTArgumentList is used to achieve this.

Also of direct relevance are: XmlDocument and XmlDataDocument but I won’t be considering them further here … I’ll leave this to your own investigation.

Going to go straight to a simple example showing 1 and 2 and above in action:

Listing 4: .NET example: Transform.aspx

 <%@ Page language="vb" trace="false" debug="false"%>
 <%@ Import Namespace="System.Xml" %>
 <%@ Import Namespace="System.Xml.Xsl" %>
 <%@ Import Namespace="System.Xml.XPath" %>
 <%@ Import Namespace="System.IO" %>
 
 <script language="VB" runat="server">
   public sub Page_Load(sender as Object, e as EventArgs)
     Dim xmlPath as string = Server.MapPath("books.xml")
     Dim xslPath as string = Server.MapPath("example2.xslt")
 
     Dim fs as FileStream = new FileStream(xmlPath,FileMode.Open, FileAccess.Read)
     Dim reader as StreamReader = new StreamReader(fs,Encoding.UTF8)
     Dim xmlReader as XmlTextReader = new XmlTextReader(reader)
 
     'Instantiate the XPathDocument Class
     Dim doc as XPathDocument = new XPathDocument(xmlReader)
 
     'Instantiate the XslTransform Class
     Dim xslDoc as XslTransform = new XslTransform()
     xslDoc.Load(xslPath)
     xslDoc.Transform(doc,nothing,Response.Output)
 
     'Close Readers
     reader.Close()
     xmlReader.Close()
   end sub
 </script> 

As you can see this example uses the stylesheet example2.xsl as introduced earlier. Describing the code briefly: on page load strings are defined as the paths to the input files in the local directory. A FileStream object is instantiated and the XML document loaded into it. From this a StreamReader object is instantiated, and in turn a XmlTextReader from this. The DOM can then be constructed within the XPathDocument object from the XML source via the objects so far defined. We then need to instantiate the XSLTransform object, load the stylesheet as defined by the string xslPath, and actually call the transform method. The parameters are the XPathDocument object complete with DOM constructed from the XML document, any parameters passed to the stylesheet – none in this case, and the output destination of the result tree.

ASP.NET also comes complete with the ASP:Xml web control, making it easy to perform simple XSLT transformations in your ASP.NET pages. Use is as per any other web control, you simply supply the 2 input properties (DocumentSource and TransformSource) as parameters, either declaratively or programmatically. Here’s an example that does both, for demonstration and clarification purposes:

Listing 5: ASP:xml web control: Transform2.aspx

 <%@ Page language="vb" trace="true" debug="true"%>
 
 <script language="vb" runat="server">
 sub page_load()
   xslTrans.DocumentSource="books.xml"
   xslTrans.TransformSource="example2.xslt"
 end sub
 </script>
 
 <html>
 <body>
 <asp:xml id="xslTrans" runat="server" 
             DocumentSource="books.xml" TransformSource="example2.xslt" />
 
 </body>
 </html> 

Lastly, just to leave you with the thought that the place of XML/XSLT technology in the ASP.NET model is not clear-cut, as the server controls generate their own HTML. Does this leave XSLT redundant? Well, no … but we may need to be a little more creative in our thinking. For example, the flexibility of XML/XSLT can be combined with the power of ASP.NET server controls by using XSLT to generate the server controls dynamically, thus leveraging the best of both worlds. Perhaps I’ll leave this for another article. Let me know if you are interested.

References:

ASP.NET: Tips, tutorials and code Sams
XSLT Programmers Reference 2nd Edition Wrox
Professional XSL Wrox
Various Online Sources

You may run Transform.aspx by clicking Here.
You may run Transform2.aspx by clicking Here.
You may download the code by clicking Here.

About the author

I am Dr Christopher Sully (MCPD, MCSD) and I am a Cardiff, UK based IT Consultant/ Developer and have been involved in the industry since 1996 though I started programming considerably earlier than that. During the intervening period I've worked mainly on web application projects utilising Microsoft products and technologies: principally ASP.NET and SQL Server and working on all phases of the project lifecycle. If you might like to utilise some of the aforementioned experience I would strongly recommend that you contact me. I am also trying to improve my Welsh so am likely to blog about this as well as IT matters.

Month List