Searching Custom Aspects via CMIS

I didn’t immediately see any simple, working examples of how to search custom aspect properties via CMIS so I’ll post a brief one here.

The Aspect

In this example we’re using a CMIS query to search for the IPTC caption property provided by the IPTC/EXIF project. After installing that module the properties are applied using an aspect which is defined in $ALFRESCO_HOME/WEB-INF/classes/alfresco/module/iptcexif/model/iptcModel.xml.

Within that file you’ll see:

    <aspects>
      <aspect name="iptc:iptc">

The Query

In our case the target Alfresco repository is running enterprise 3.3.1 and aspects are implemented via JOIN syntax as very briefly stated in the wiki.

So the resulting query for this particular search of the IPTC caption field ends up looking like:

SELECT D.*, IPTC.* FROM cmis:document AS D JOIN iptc:iptc AS IPTC ON D.cmis:objectId = IPTC.cmis:objectId WHERE IPTC.iptc:caption LIKE '%searchTerm%'

The Changes

There were big changes in Alfresco 3.4 in terms of metadata extraction and I haven’t yet had a chance to update the forge projects (or determine if they’re still even needed) and there are proposals for aspects in the next CMIS spec so I’ll be back to update this post.

Alfresco Video Thumbnails

It seems like every 6 months to never I hear: “The Alfresco thumbnails forge project used to be cool man, what happened?”  Chill Winston, it’s still cool.

Now that Alfresco has a native implementation of thumbnails is the forge project still needed?  Well, ‘needed’ is relative I suppose, but check out what the project gets you beyond the basic Alfresco distribution:

  • Thumbnails when browsing, searching, or viewing a document in Alfresco Explorer (i.e. not Share)
  • CoolIris view when browsing or searching in Alfresco Explorer (which is sweet)
  • Actions to force the creation or update of thumbnails using Alfresco’s native thumbnail service
  • A video transformer based on ffmpeg with the ability to specify an offset (also sweet)
  • A patch to delete and regenerate thumbnails created by old versions of the forge project (Admittedly, not that awesome, but useful nonetheless.  No one wants a messy bloated repo.)

There’s also several goodies on the roadmap for future releases of the project including Share mods.

So what are you waiting for? Go manage your modules and pop that amp into your war (for the non-geeks, I promise that’s not gross).

Alfresco 3 Thumbnails and the Forge Project

The thumbnails forge project was started quite a while ago (like Alfresco 0.9 or so… old school) and the first public release was put on the forge in late 2006 (which would have been around Alfresco 1.3 I guess).

The project gained the attention of the Alfresco team and they contacted me about making a few changes to bring it more inline with what they had in mind for an implementation, notably creating a public thumbnail service.  Those changes were made and I continued to collaborate with Alfresco on the forge project and gave input on their implementation.

Alfresco 3 now has a native implementation of thumbnails and version 1.0 of the forge project migrated to the new architecture, but it’s still cool.

Here’s a look at some of the differences to be aware of between the old thumbnails project implementation and Alfresco’s new native thumbnails:

Concept Pre 1.0 Versions of Forge Project Alfresco 3 Native
The service responsible for generating and retrieving thumbnails
ThumbnailService

The generateThumbnails method would create thumbnails for
all ThumbnailSpecifications defined.

A specific thumbnail can be generated with overriding options
(possibly defined by a user in the web interface) which
are merged with the defaults of the
ThumbnailSpecification
(useful in defining the offset for a video thumbnail for example).

ThumbnailService

The createThumbnail and updateThumbnail
methods are given the specific TransformationOptions to
use in generating the thumbnail.

The content property from which the source data will be read
must be specified when generating the thumbnail, the default cm:content
will not be assumed.

A common size and destination mimetype for thumbnails
ThumbnailSpecification

ContentTransformers and corresponging command line options explicitly
defined in config

ThumbnailDefinition

ContentTransformer determined by ContentService
from the given TransformationOptions

The detailed resizing specifications
TransformerSpecification

The specific ContentTransformer to use and the necessary
command line options are defined

TransformationOptions

The resizing details are specified using classes containing generic
definitions (for example ImageResizeOptions used by
the ImageTransformationOptions class) which are then
translated to appropriate command line code by the ContentTransformer
found by the ContentService

The model type
tn:thumbnail

<parent>cm:cmobject</parent>

Not being of type content prevents most rules from being run on
thumbnail nodes lest we do something unnecessary and resource intensive
like extract metadata or even create another thumbnail of a thumbnail node
and prevents thumbnail nodes from being indexed or returned in a search.

cm:thumbnail

<parent>cm:content</parent>

Care must be taken to not run rules or index thumbnail nodes since they
extend content.

One suggestion has been to add the following to the thumbnail type:

<aspect name="rule:ignoreInheritedRules">

<title>Ignore Inherited Rules</title>

</aspect>

The model property name of the common size and mimetype destination definition
tn:specificationName

The specifiationName was the name of the
ThumbnailSpecification used to generate the thumbnail.

cm:thumbnailName

The new CreateThumbnailActionExecuter uses the user-chosen
thumbnailName to retrieve the
ThumbnailDefinition but the relationship between the
ThumbnailDefinition and its thumbnailName is
not enforced by the ThumbnailService.

Default thumbnail sizes (max length/width)
SMALL 80px JPEG
MEDIUM 160px JPEG
medium 100px JPEG
doclib 100px PNG
webpreview SWF
imgpreview 480px PNG
avatar 64px PNG
  1. Strikethroughs represent deprecated items.
  2. Avatar thumbnail size has nothing to do with what I can only assume is an incredibly overhyped movie about pissed off blue tree people since I haven’t been to a theater in like 10 years.

This information has been available in the thumbnails project’s 1.0 release readme but as it’s not shown by default I figured I’d post here as well.

Alfresco, SSO, and LDAP Expiring Date Attribute

Deciding where to handle authorization in a setup where Alfresco’s authentication is handled by CAS which itself authenticates against LDAP may not be as easy as it sounds. This post goes through some of the possibilities and one solution.

LDAP Authentication with an Expiration Date Attribute

On Alfresco version 2.x we extended the LDAPAuthenticationComponentImpl class to be able to evaluate an LDAP filter string, configurable via spring, which was used primarily for the purpose of determining whether the user’s access to Alfresco had expired, with the expiration date being stored as an LDAP attribute. So our ldap-authenication.properties file would contain something like:

ldap.authentication.authenticationFilterFormat=(&(uid=%s)(alfrescoExpirationDate>=%sZ))

and the extended Java class LDAPAuthenticationComponentImpl contained:

ctx = ldapInitialContextFactory.getInitialDirContext(bindDn, new String(password));

String authenticationFilter = String.format(

authenticationFilterFormat, new Object[]{userName, timestamp});

NamingEnumeration<SearchResult> answer = ctx.search(

authenticationSearchBase, authenticationFilter, userSearchCtls);

if (answer != null && answer.hasMoreElements()) {

setCurrentUser(escapeUserName(userName, escapeCommasInUid));

}

Once the expiration date was reached the user was denied access to the application.

Moving to Single Sign On

We’re now migrating to Alfresco 3.x and also have the need to move to a single sign on solution.  Jasig’s Central Authentication Service (CAS) was chosen as the implementation, requiring some changes to our expiration strategy.

In our previous configuration, authentication (is this user who they say they are) and authorization (is this user allowed to do x) were intertwined in our single custom LDAPAuthenticationComponentImpl class.  Any other applications authenticating against the same LDAP server could implement their own authentication/authorization methods.

CAS however is only concerned with authentication, not authorization, and applications utilizing it need to be able to decouple those functions, which is a better architecture in most cases anyway.

Our task then is to determine where in the authentication/authorization strategy we should handle this expiration authorization.

Choosing Where to Handle Authorization

Customize the CAS Server?

Modifying the CAS LDAP authentication handler wouldn’t be an appropriate place for our expiration authorization since all CAS services share the same authentication handler and other applications that have no notion of the Alfresco service or its expiration could be denied authentication.

We could try implementing a CAS authorizing serviceRegistryDao which would in theory allow the CAS server to determine if the user were authorized to use the service requested beyond just the Enabled and SSO Participant settings.  An authorization component could be defined that would make the authorization decision based on the service being requested and the user requesting it.  Each service could choose its authorization component as an option during setup in the CAS Service Management application.

CAS really wants the client/application to handle authorization though, so customizing the server doesn’t look like a recommended approach.

Customize Alfresco?

We could setup Alfresco’s authentication chain to talk to CAS, but in version 3.2 Alfresco has changed its authentication configuration to use the notion of Authentication Subsystems (which seems nicely implemented) and most documentation indicates that integration with something like CAS is best achieved by using mod_cas or mod_auth_cas in Apache web server, and since we’re already using Apache in front of Tomcat with mod_proxy_ajp that approach seems like a good fit.

Wait, Can a Hacker Just Spoof HTTP Headers?

It looks like AJP between Tomcat and the web server is responsible for parsing these protocol specific (non-HTTP) headers to find the remote user value and set that in Tomcat’s HttpServletRequest, so a hacker can’t just modify the headers in their browser and have that be translated into an authenticated user.

However, it is extremely important that your Tomcat connector be behind a firewall and only accept connections from known web servers.  Otherwise a hacker could set up their own web server with a bogus authentication mechanism which would add an authenticated user header in the AJP message, which in turn would translate to that user being signed in to a protected application.

mod_cas or mod_auth_cas?

With proper security in place we should be OK to use an Apache CAS module for authentication, but which one?

mod_cas:

  • hasn’t been maintained in quite a while
  • has missing links all over the site
  • the documentation incorrectly states that “CAS stands for Common Authorization Service”
  • Jasig recommends against it

so, um, we’ll go with mod_auth_cas.

There’s a detailed wiki article on setting up Alfresco authentication through mod_auth_cas but our challenge is to determine where in the flow to inject our authorization.

Option 1: Extend Alfresco’s Request Authn to Grab CAS Attribute

The CAS server can return attributes to the client and if mod_auth_cas passes those attributes through mod_proxy_ajp to Tomcat and Alfresco then we may be able to modify the component which checks that header for the authenticated username, HTTPRequestAuthenticationFilter, to also look for the expiration date attribute and authorize or deny the user.

After modifying Tomcat’s example snoop application to show all headers and attributes:

<%
java.util.Enumeration eH = request.getHeaderNames();
while (eH.hasMoreElements()) {
String headerName = (String) eH.nextElement();
out.print("Header <b>" + headerName + ": </b>");
out.print(request.getHeader(headerName) + "<br>");
}
out.print("<br><br>");
java.util.Enumeration eA = request.getAttributeNames();
while (eA.hasMoreElements()) {
String attributeName = (String) eA.nextElement();
out.print("Attribute <b>" + attributeName + ": </b>");
out.print(request.getAttribute(attributeName) + "<br>");
}
%>

it doesn’t look like the additional attributes defined to be available to the CAS service are passed through mod_auth_cas and mod_proxy_ajp to the secured application.  We might be able to pass the data through SAML attributes but it seems that would require POST requests.

Option 2: Extend Alfresco’s Request Authn to query LDAP directly

If we can’t get the expiration date attribute from the request headers we still may be able to modify the request authentication filter to make a trip directly to the LDAP server to query for the attribute and confirm or deny authorization, but maybe we can do everything at the web server.

Option 3: Handle Authorization at Apache

The ‘auth’ in mod_auth_cas appears to only include authentication but there may be hope in allowing mod_auth_cas to handle just the authentication part then using mod_authnz_ldap or the third-party mod_authz_ldap module to handle the authorization side.

mod_authnz_ldap’s Require ldap-filter looks promising, but we would need to insert a dynamic date (today’s) for comparison to the user’s expiration date. LDAP syntax doesn’t seem to have any keyword for the current date to be handled on the server side and it doesn’t look like mod_authnz_ldap has any special tags for injecting it in the filter string so the source code would have to be modified.

mod_authz_ldap however has this in its filter configuration:

%t
The current time in the format YYYYMMDDhhmmss

that may just do it.

Unfortunately, upon further investigation mod_authz_ldap does not support secure SSL/TLS communication with the LDAP server and hasn’t been maintained in a quite a while, so that won’t work.

Dare I try modifying the mod_authnz_ldap source to allow for a dynamic replace of a date tag?  Sure, why not?

The last time I wrote C code it was probably on a machine running Mac OS 7.6, but after several hours and as many espressos I’ve taken the relevant code from mod_authz_ldap’s source, tweaked it a bit, and found the appropriate place to inject it in mod_authnz_ldap’s authn_ldap_build_filter.

You can compile and install a single module on the server with:

apxs -i -a -c mod_authnz_ldap.c

then add the appropriate config to your location directive.  So we have CAS performing authentication and a modified mod_authnz_ldap performing authorization on our Tomcat examples app:

<Location /examples>
AuthType CAS
AuthName "CAS"
CASScope /examples
AuthLDAPURL "ldaps://server.company.com/ou=users,dc=company,dc=com?uid?sub?(alfrescoExpirationDate>=$tZ)"
AuthLDAPBindDN "uid=reader,ou=users,dc=company,dc=com"
AuthLDAPBindPassword "*****"
require valid-user
</Location>

Again, my C chops are extremely rusty but I’ve contributed these changes to mod_authnz_ldap to Apache as Bug 47838.

Other Alfresco SSO Caveats

We can secure the Alfresco app in the same manner as /examples above but we’ll have to secure Alfresco Share with SSO as well and may run into issues with ticket parameter clashes.

As stated in one of the forum posts above, the header Alfresco needs to look for is CAS-User.  Before Alfresco 3.2 the web.xml changes would look like:

<filter-class>org.alfresco.web.app.servlet.HTTPRequestAuthenticationFilter</filter-class>
<!-- Name of HTTP header containing UserID. -->
<init-param>
<param-name>httpServletRequestAuthHeaderName</param-name>
<param-value>CAS-User</param-value>
</init-param>
...

Other Spring Apps with CAS

In some of our other apps we use Spring Security and will want to integrate them with CAS.  There’s a good write up on the details of the communication between spring security and CAS and a forum post on CAS with LDAP authorization that should help.

Hopefully this account of a mad man in pursuit of expiring authorization will be of use to someone, somewhere, someday.

Changing the Alfresco LDAP Group Identifier

We were originally using Apple OS X Server as our LDAP store for our Alfresco instance.

Apple’s OS X Server uses OpenLDAP but adds custom schema for many things including users and groups.  As a result we ended up using the description LDAP attribute for Alfresco’s ldap.synchronisation.groupIdAttributeName.

We’ve since migrated to a generic OpenLDAP server (with a bit of our own custom schema) so we’re now able to use the more common and unchanging cn attribute for the group id.

When we change ldap.synchronisation.groupIdAttributeName in ldap-synchronisation.properties Alfresco imports the new groups properly but group permissions on spaces will retain the old group name so we need to change those to use the new cn attribute.

What we did was to create a temporary table in the Alfresco database, import the mapping of the cn attribute to the description attribute, then run a query to replace the old authorities with the new.

The following assumes Alfresco version 3.x.

Create the Temp Table

CREATE TABLE alfresco.t_ldap_groups (
`dn` VARCHAR( 255 ) NULL ,
`cn` VARCHAR( 255 ) NULL ,
`description` VARCHAR( 255 ) NULL
);

Import the LDAP Group Data

We used phpLDAPAdmin to export our groups subtree as CSV with only the cn and description attributes, then imported that file into the t_ldap_groups table just created.

Replace the Old Authorities

I’m by no means an SQL expert but the query below does the following:

  • Strips GROUP_ from the current stored group long name
  • Searches the temporary LDAP table for that group long name and corresponding group short name
  • Updates the alf_authority.authority field with GROUP_group short name
UPDATE alf_authority
SET authority = CONCAT('GROUP_',
(SELECT cn FROM t_ldap_groups WHERE description =    SUBSTRING(alf_authority.authority, 7) LIMIT 1))
WHERE authority LIKE 'GROUP_%' AND
(SELECT cn FROM t_ldap_groups WHERE description =    SUBSTRING(alf_authority.authority, 7) LIMIT 1) IS NOT NULL;

In Alfresco 2.x the authority is stored directly in the alf_access_control_entry table as well so the update statement would be a bit more complicated.

Drop the Temp Table

DROP TABLE t_ldap_groups;

So far we haven’t had any adverse effects on our development server doing things this way but if anyone has a better method or potential issues with this one let us know.