Rigel Group

They shoot Yaks, don't they?

Who Needs MongoDB? Capped Collections in Postgres.

MongoDB has a construct called Capped Collections, which are DB tables that store a maximum number of records, and old records are removed in order to make way for new records.

It turns out that Postgres (or other SQL DBs) can accomplish this task as well, and easier than you might think.

First, let’s create our table. We are going to have an id column, and a data column, but you can have whatever columns you need for your particular situation.

1
2
3
CREATE TABLE capped_collection (
  id integer NOT NULL,
  data text);

Now, we create a sequence for our primary key, set the MAXVALUE to be the desired size of our capped collection, and we use the CYCLE option, so that the sequence will count up to MAXVALUE and then restart again at 1.

1
2
3
4
5
6
7
8
CREATE SEQUENCE capped_collection_id_seq
    START WITH 1
    INCREMENT BY 1
    MINVALUE 1
    MAXVALUE 10000
    CACHE 1
    CYCLE
    OWNED BY capped_collection.id;

In the next step (not shown here), you should pre-seed the table with MAXVALUE records. We do this so that every id value has a row, and we don’t have to worry about deciding whether to do an INSERT or UPDATE operation, just to keep things simple and performant.

Now that our table is set up, whenever we want to INSERT a new row, we instead do an UPDATE like so

1
2
3
UPDATE capped_collection
SET data = 'your data here'
WHERE id = (SELECT nextval('capped_collection_id_seq'))

The nextval command gets us the next id, and because we specified CYCLE it will wrap around back to 1 once it reaches MAXVALUE. The end result is you will only ever have MAXVALUE rows in the table, with the oldest entries being aged out.

So go forth, and cap those collections!

Working With Complex SQL Statements in Rails and ActiveRecord

Rails (via ActiveRecord) makes it super easy to work with your data, and for most simple tasks insulates you from having to think too hard about SQL. But eventually, everyone hits the ORM wall, and needs to start casting SQL spells around to get the data that they want. And writing SQL can really feel like that sometimes, like an incantation that you speak over your database — and magically the data rearranges itself into the form that you need.

Take, for example, the problem of calculating a duration in business hours. Let’s say we have a Tickets table with opened_at and closed_at timestamps. Our challenge is to calculate the total time the Ticket was open, based not on calendar time but on business hours, like 8am-5pm PST Mon-Fri. Now, you could come up with some Ruby code to calculate that for each row, but we want to do it in the database for all records. So, we are going to build an AR scope that will add an additional column called duration, calculated at query time.

(NOTE: This post assumes you are one of the cool kids and are using Postgres as your DB.)

To start out, let’s get a duration of simple calendar time.

1
2
3
4
5
6
7
8
# id integer
# opened_at timestamp
# closed_at timestamp
class Ticket < ActiveRecord::Base
  def self.with_calendar_duration
    select("tickets.*, (closed_at - opened_at) as duration")
  end
end

Now, you can say

1
Ticket.with_calendar_duration

which will add an additional column called duration that you can access ruby as usual. Note that there is nothing you need to do in your Rails model file (no need for an attr_accessor etc) for this to happen. ActiveRecord will simply add any extra columns selected to the model automagically. (However, since Rails has no type information for the column, it thinks the Postgres interval type is a String).

1
Ticket.with_calendar_duration.first.duration  #=> "10 days 11:00:00.000164"

So, that was easy. Now for the hard (fun) part. We need to calculate the time difference between opened_at and closed_at but taking into account only M-F and 8a-5p PDT. We are going to basically construct a raw SQL query, and take advantage of Common Table Expressions (CTEs) in Postgres, which are underused but full of awesome.

The SQL (the explanation of which I will defer until another blog post) we need is wrapped up in an AR scope:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# id integer
# opened_at timestamp
# closed_at timestamp
class Ticket < ActiveRecord::Base
  def self.with_biz_duration
    query = <<-SQL
      WITH
        var AS (SELECT '08:00'::time as v_start, '17:00'::time as v_end)

      SELECT tickets.*,
        (date_part('epoch',
          COALESCE (h
          - CASE WHEN EXTRACT(ISODOW FROM opened_at) < 6
                  AND opened::time > v_start
                  AND opened::time < v_end THEN
               opened_at - date_trunc('hour',  opened_at)
             ELSE '0'::interval END
          + CASE WHEN EXTRACT(ISODOW FROM closed_at) < 6
                  AND closed_at::time > v_start
                  AND closed_at::time < v_end THEN
               (closed_at - date_trunc('hour', closed_at)) - interval '1h'
            ELSE '0'::interval END
          , '0'::interval)
        )/60)::integer AS duration --minutes
      FROM  (tickets CROSS JOIN var)
      LEFT JOIN  (
        SELECT sub1.id, count(*)::int * interval '1h' AS h
        FROM  (
          SELECT id, v_start, v_end
                ,generate_series (date_trunc('hour', opened_at)
                                , date_trunc('hour', closed_at)
                                , interval '1h') AS h
          FROM   tickets, var
          ) sub1
        WHERE  EXTRACT(ISODOW FROM h) < 6
        AND    h::time >= v_start
        AND    h::time <  v_end
        GROUP  BY 1
      ) sub2 USING (id)
    SQL

    self.find_by_sql(query)
  end
end

So, now we can say

1
Ticket.with_biz_duration.map(&:duration)  #=>  [4760, 2700, 3320, 15980, 13500]

But, this ‘scope’ is not really an AR scope, as it does not return a chainable AR Relation. So you have to always use it at the end of the chain. But the way it currently stands, the query doesn’t know anything about any existing relations, so this won’t work

1
Ticket.where(:something => 'else').with_biz_duration

It would be cool if we could capture the records in the existing scope, and only use those in our query. We can achieve this with another CTE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# id integer
# opened_at timestamp
# closed_at timestamp
class Ticket < ActiveRecord::Base
  def self.with_biz_duration
    query = <<-SQL
      WITH
       existing_scope AS (#{existing_scope_sql}),
       tickets_scoped AS (SELECT tickets.*
                                       FROM tickets
                                       INNER JOIN existing_scope ON existing_scope.id = tickets.id),

        var AS (SELECT '08:00'::time as v_start, '17:00'::time as v_end)

      SELECT tickets_scoped.*,
        (date_part('epoch',
          COALESCE (h
          - CASE WHEN EXTRACT(ISODOW FROM opened_at) < 6
                  AND opened_at::time > v_start
                  AND opened_at::time < v_end THEN
               opened_at - date_trunc('hour',  opened_at)
             ELSE '0'::interval END
          + CASE WHEN EXTRACT(ISODOW FROM closed_at) < 6
                  AND closed_at::time > v_start
                  AND closed_at::time < v_end THEN
               (closed_at - date_trunc('hour', closed_at)) - interval '1h'
            ELSE '0'::interval END
          , '0'::interval)
        )/60)::integer AS duration --minutes
      FROM  (tickets_scoped CROSS JOIN var)
      LEFT JOIN  (
        SELECT sub1.id, count(*)::int * interval '1h' AS h
        FROM  (
          SELECT id, v_start, v_end
                ,generate_series (date_trunc('hour', opened_at)
                                , date_trunc('hour', closed_at)
                                , interval '1h') AS h
          FROM   tickets_scoped, var
          ) sub1
        WHERE  EXTRACT(ISODOW FROM h) < 6
        AND    h::time >= v_start
        AND    h::time <  v_end
        GROUP  BY 1
      ) sub2 USING (id)
    SQL

    self.find_by_sql(query)
  end

  private
  def self.existing_scope_sql
      # have to do this to get the binds interpolated. remove any ordering and just grab the ID
      self.connection.unprepared_statement {self.reorder(nil).select("id").to_sql}
   end
end

So we basically converted the current scope to a SQL statement, and used that as a CTE to run the query against, thus limiting the rows we are operating against. We can use other scopes or where clauses as long as we call our with_biz_duration scope at the end of the chain.

1
Ticket.where(:id => 1).with_biz_duration.map(&:duration)  #=>  [4760]

Now to take it to the bitter end, let’s add the ability to pass in the business hours we want, as well as the timezone.

One issue with our tickets table is that the opened_at and closed_at fields were created as timestamp fields, which in Postgres do not have any timezone information. If we assume our DB server was configured to use UTC as the default timezone, then we need to cast the fields into fields with a UTC time zone, which we then cast again into the timezone we want.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# id integer
# opened_at timestamp
# closed_at timestamp
class Ticket < ActiveRecord::Base
  def self.with_biz_duration(start_time='08:00', end_time='17:00', rails_timezone='America/Los_Angeles')
    pg_timezone = ActiveSupport::TimeZone[rails_timezone].tzinfo.name
    query = <<-SQL
      WITH
       existing_scope AS (#{existing_scope_sql}),
       tickets_scoped AS (SELECT tickets.*,
                                 (tickets.opened_at at time zone 'UTC') at time zone '#{pg_timezone}' as opened_at_tz,
                                 (tickets.closed_at at time zone 'UTC') at time zone '#{pg_timezone}' as closed_at_tz
                                       FROM tickets
                                       INNER JOIN existing_scope ON existing_scope.id = tickets.id),

        var AS (SELECT '#{start_time}'::time as v_start, '#{end_time}'::time as v_end)

      SELECT tickets_scoped.*,
        (date_part('epoch',
          COALESCE (h
          - CASE WHEN EXTRACT(ISODOW FROM opened_at_tz) < 6
                  AND opened_at_tz::time > v_start
                  AND opened_at_tz::time < v_end THEN
               opened_at_tz - date_trunc('hour',  opened_at_tz)
             ELSE '0'::interval END
          + CASE WHEN EXTRACT(ISODOW FROM closed_at_tz) < 6
                  AND closed_at_tz::time > v_start
                  AND closed_at_tz::time < v_end THEN
               (closed_at_tz - date_trunc('hour', closed_at_tz)) - interval '1h'
            ELSE '0'::interval END
          , '0'::interval)
        )/60)::integer AS duration --minutes
      FROM  (tickets_scoped CROSS JOIN var)
      LEFT JOIN  (
        SELECT sub1.id, count(*)::int * interval '1h' AS h
        FROM  (
          SELECT id, v_start, v_end
                ,generate_series (date_trunc('hour', opened_at_tz)
                                , date_trunc('hour', closed_at_tz)
                                , interval '1h') AS h
          FROM   tickets_scoped, var
          ) sub1
        WHERE  EXTRACT(ISODOW FROM h) < 6
        AND    h::time >= v_start
        AND    h::time <  v_end
        GROUP  BY 1
      ) sub2 USING (id)
    SQL

    self.find_by_sql(query)
  end

  private
  def self.existing_scope_sql
      # have to do this to get the binds interpolated. remove any ordering and just grab the ID
      self.connection.unprepared_statement {self.reorder(nil).select("id").to_sql}
   end
end

So, now we can say

1
Ticket.where(:id => 1).with_biz_duration('01:00', '03:00', 'America/New_York').map(&:duration) #=> [960]

Tune in next time as we delve into the mysteries of generate_series and CROSS JOIN which are at the heart of this powerful incantation. Happy SQLing!

Test Objects for Blankness With This One Little Trick!

Rails adds a lot of syntactic sugar to plain old Ruby. A lot of this is wrapped up in the ActiveSupport module. If you work with Rails, and have not read ActiveSuport core_ext from cover to cover, stop now, and go do it. Go on, I’ll wait.

OK, I sense a few lightbulbs coming on out there! “So thats why …”

It seems most Rails folks know about (and use) Object#blank? and Object#present?. These are handy ways to test if something is, well, blank or not. Ruby thinks nil is false, which is great, but when dealing with web apps and user entered data, you might get an empty string or a bunch of space characters, which you also want to think of as false, perhaps so you can set a default value or something. Continually having to check for different types of blankness is annoying, so instead you can use Object#blank?

1
2
3
user.country = params[:country].blank? ? 'USA' : params[:country]
# or, alternatively
user.country = params[:country].present? ? params[:country] : 'USA'

Now, that still looks a bit ugly, to a rubyist’s eyes, so we can use Object#presence (github) to clean it up even more:

1
user.country = params[:country].presense || 'USA'

#presence will either return the value if it is not blank, OR nil. This also works with arrays and hashes as well.

Its a small thing, but neatly encapsulates a common pattern, and isn’t that pretty much what we get paid to do all day?

Frequently Used Commands

If you spend a lot of time at the *nix command line, you probably use your shell history quite a bit to avoid typing commands in over and over. Or maybe you define shell aliases for your most precious incantations. Shell history is great, and I have mine set to save a bajillion entries, ‘cause we have more than 640k these days. But even with a nearly infinite shell history, I find myself wanting to know what are the few important commands I usually need in the specific directory I happen to be in.

Frequently Used Commands (fuc)

So, this little shell script will let you save off the important commands to a .fuc file in the local directory, and then easily recall those commands in a little menu.

Let’s say you are working away, and you want to remember a particular command:

1
$ fuc gulp --require coffee-script watch

Which will save the command to the .fuc file, and also run it. Later, you can just type fuc from the same directory, and get this:

1
2
3
4
5
6
$ fuc
fuc - Frequently Used Commands
1) gulp --require coffee-script watch
2) echo "(╯°□°)╯ ┻━┻"
3) cat log/server.log | grep ERROR
Select command to run (q to Quit):

What I really wanted to do was put the selected command on the command line, ready to be edited, then hit enter to run it. But I couldn’t be bothered to figure out how to do that. So selecting a command just runs it immediately.

(I know there are a lot of much more complicated implementations that tweak your actual command history to be directory-specific, but I wanted to be able to specify which commands were the important ones, and just remember those.)

Here is the script. Put this somewhere in your path, and name it fuc, or if that’s too NSFW than I’m sure you can come up with a better name. Maybe dangit or something.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/bash

# Tell cat that we want to break on newlines not spaces
IFS=$'\r\n'

PS3="Select command to run (q to Quit):"

if [ $# -eq 0 ]
then
  echo "fuc - Frequently Used Commands"
  cmdList=$(cat .fuc | sort -u)
  select cmdName in $cmdList; do
    if [ -n $cmdName ]; then
      echo $cmdName
      eval ${cmdName}
      break
    fi
  done
  exit 1
fi

echo Saving and executing command: $*
echo $* >> .fuc
eval $*

React JSX Transformer in Rails Middleware

Recently I been having a blast playing around with React, and I found this neat hack from @ssorallen called Reactize.

What he is doing is grabbing the HTML response from the server, and then in the browser running the JSXTransformer on the HTML, and mounting the whole document body as a React component. Very clever!

So to riff on that theme a little bit, here is a Rails middleware that will take the HTML page the server was going to send to the client, and replace it with the JSXTransformed version, which is basically a javascript snippet. So the “heavy-lifting” of the JSXTransformer is done server-side.

Another thing we can do is hash the result and throw it in the Rails cache, so we arent doing more work than we need to.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class JsxMiddleware
  def initialize(app)
    @jsxcode = File.read("#{Rails.root}/app/assets/javascripts/JSXTransformer.js")
    @app = app
  end

  def call(env)
    status, headers, response = @app.call(env)
    if env['HTTP_X_JSX'].present?
      response.body = convert_to_jsx(response.body[/<body>(.*)<\/body>/m,1])
      headers['X-JSX'] = 'true'
    end
    [status, headers, response]
  end

  def jsx_context
    # Use a Thread-local variable to store the JS context, with the JSXTransformer code loaded.
    # That way each thread will have its own and we are thread-safe.
    Thread.current[:jsx_context] ||= begin
      ExecJS.compile("global={};" + @jsxcode)
    end
  end

  def convert_to_jsx(html="")
    snippet = "/** @jsx React.DOM */\n" + html
    hash = Digest::MD5.hexdigest(snippet)
    Rails.cache.fetch "jsx:#{hash}" do
      jsx_context.call("global.JSXTransformer.transform", snippet)['code']
    end
  end
end

So on the client, have a link that looks like <a href="/thepage" data-behavior="getViaJSX">Click Me</a> and you could do something like this to request a JSXTransformed page…

1
2
3
4
5
6
7
8
9
10
$ ->
  $("[data-behavior='getViaJSX']").on "click", (e) ->
    e.preventDefault()
    $.ajax
      url: e.target.href
      # The middleware only kicks in if this header exists
      headers: {'X-JSX': true}
      success: (data) ->
        component = eval(data)
        React.renderComponent(component, document.body)

Another option would be to bake it in to TurboLinks itself by patching it to make the request with the X-JSX header.

That’s it! Not sure exactly what it is good for, but a fun exercise anyway.

React JS Roundup

I recently gave a talk on React at FullStack, which is the coolest meetup North of the Wall. You can find the slides here.

React is a game-changer, and will fundamentally change the way we build apps on the web. To learn more about it, here are some great resouces:

Rethinking Best Practices

Pete Hunt gives a talk introducing React at JSConf EU 2013. slides

Functional DOM Programming

A blog post by Pete Hunt which explains the basics of React components.

Real-time frosted glass effect on mobile

Great walkthrough on how to create fast animations on modern mobile devices.

React + TurboLinks

Fun proof-of-concept where Rails TurboLinks is married with React with some mind-bending results.

Om

Take the red pill, and check out Om, which is the combination of ClojureScript with its immutable data structures, and React. This is where things get really exciting! (Follow David Nolen for more interesting tidbits.)

Makona, the Block-Style Editor

Because the world desperately needs another Rich Text editor for the web, I give you Makona. Makona is the Hawaiian word for a mason, which is someone that works with blocks. Makona lets you edit text by working with blocks — Markdown blocks, text blocks, image blocks, code blocks, etc.

The blocks can be saved back to the server as a blob of HTML, or in a JSON structure that contains all the data for the blocks, which opens up some neat possibilities for reusing that content in different contexts.

This project was started mainly as a excuse to learn Reactjs, but with some elbow grease I think it can become a useful tool in the open-source universe. Feel free to pitch in!

Using React.js With CoffeeScript

If you haven’t heard of the latest front-end hotness, head on over to React.js and prepare to have your mind blown. Brought to us by the fine folks over at Facebook, it presents a new take on building browser apps. Once you get your head around it, it really makes a lot of sense, and especially if you need something lightweight to add JavaScript components to your existing site. It doesnt have things like routers or data models. It just concerns itself with building interactive view components in a highly composeable (and performant!) way.

But, the examples are all in POS (Plain Ole JavaScript), which is a problem for me. I much prefer CoffeeScript. And due to the weirdness that is JSX, it is not easy to get React to work with CoffeeScript out-of-the-box.

The first thing to note is that you need to get your workflow pipeline set up correctly. Because, you will need to compile your CoffeeScript to JSX-style JavaScript, and then compile your JSX-style JavaScript to regular old JavaScript. (I know, it sounds crazy and I wouldn’t blame you if you bounced right now. But if you stick with me, enlightenment will come.)

I set up a Grunt workflow that does this, and the relevant parts of the Gruntfile.coffee look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
module.exports = (grunt) ->
  grunt.initConfig
    pkg: grunt.file.readJSON("package.json")

    srcDir: "./src"
    testDir: "./test"
    outputDir: "./dist"

    # Compile to JS first, then we will compile the JSX in another task and move to /dist
    coffee:
      options:
        # This is IMPORTANT, because the first line has to be a JSX comment
        bare: true
      all:
        files: [
          expand: true
          cwd: 'src/'
          src: ['**/*.coffee']
          dest: 'src/'
          ext: '.js'
        ]

    react:
      all:
        files:
          "<%= outputDir %>": "<%= srcDir %>"

    regarde:
      coffee:
        files: "<%= srcDir %>/**/*.coffee"
        tasks: ["coffee", "spawn_react"]

    # Set up a static file server
    connect:
      server:
        options:
          hostname: "0.0.0.0"
          port: 9292
          base: "."
          keepalive: true

    # Clean up artifacts
    clean:
      output: "<%= outputDir %>"

    # Execute server script
    exec:
      server:
        cmd: "./server.js"

  grunt.loadNpmTasks "grunt-contrib-coffee"
  grunt.loadNpmTasks "grunt-regarde"
  grunt.loadNpmTasks "grunt-contrib-connect"
  grunt.loadNpmTasks "grunt-contrib-clean"
  grunt.loadNpmTasks "grunt-exec"
  grunt.loadNpmTasks 'grunt-react'

  # Make sure we get an error on compilation instead of a hang
  grunt.registerTask 'spawn_react', 'Run React in a subprocess', () ->
    done = this.async()
    grunt.util.spawn grunt: true, args: ['react'], opts: {stdio: 'inherit'}, (err) ->
      if err
        grunt.log.writeln(">> Error compiling React JSX file!")
      done()

  grunt.registerTask "server", ["exec:server"]
  grunt.registerTask "build", ["coffee", "spawn_react"]

You will also need the server.js file, which is here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env node

var spawn = require("child_process").spawn,
    watcher = spawn("grunt", ["regarde", "--force"]),
    server = spawn("grunt", ["build", "connect:server"]);

watcher.stdout.on("data", function(data) {
  var importantOutput = data.toString().split("\r?\n").filter(function(str) {
    return />>|Done|Warning|Running/.test(str);
  });

  process.stdout.write(importantOutput.join("\n"));
  // process.stdout.write(data);
});

server.stdout.on("data", function(data) {
  process.stdout.write(data);
});

watcher.on("exit", function(code, signal) {
  server.kill();
  process.exit();
});

server.on("exit", function(code, signal) {
  watcher.kill();
  process.exit();
});

process.on("exit", function() {
  watcher.kill();
  server.kill();
});

Now you can do a grunt server and start writing React code.

Here are some (contrived) code snippets that might help you out if you are struggling with how to reconcile React with CoffeeScript syntax. The secret is to shell out to JavaScript with the ` operator when necessary, so the code is intact when JSX transpiles it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
`/** @jsx React.DOM */`
# The above line HAS to be the first line in the file for JSX to know to process it.
MySimpleComponent = React.createClass
  render: ->  `<pre>{this.props.mytext}</pre>pre>`

MyComponent = React.createClass
  render: ->
    `(
      <ul>
        {this.props.items.map(
          function(item){
            return (
              <li><a href="#" onClick={_this.props.handleClick}>{item}</a></li>
            )
          }, this)
        }
      </ul>
    )`

A big thanks to Facebook and everyone who worked to bring this project to life. I look forword to using it in my projects.

[UPDATE] Vjeux has blog post about how to actually use CS instead of shelling out to JS.

Exporting SharePoint User Profiles to CSV Using Powershell

You may have the (mis-)fortune of working with Sharepoint, and you may also need to gain access to the User Profile data contained therein. You may also want to try out Microsoft’s PowerShell scripting language. If so, you came to the right place, my friend!

This seems like a common enough task, but the code I found in my Googling just wasnt doing it for me, so I am adding this version to interwebs in the hopes someone else will find it useful.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#
# Export Sharepoint User Profiles to CSV file
# John Lynch 2013
# MIT License

$siteUrl = "http://YOUR_HOSTNAME_HERE"
$outputFile = "c:\temp\sharepoint_user_profiles.csv"


Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue
$serviceContext = Get-SPServiceContext -Site $siteUrl
$profileManager = New-Object Microsoft.Office.Server.UserProfiles.UserProfileManager($serviceContext);
$profiles = $profileManager.GetEnumerator()

$fields = @(
            "SID",
            "ADGuid",
            "AccountName",
            "FirstName",
            "LastName",
            "PreferredName",
            "WorkPhone",
            "Office",
            "Department",
            "Title",
            "Manager",
            "AboutMe",
            "UserName",
            "SPS-Skills",
            "SPS-School",
            "SPS-Dotted-line",
            "SPS-Peers",
            "SPS-Responsibility",
            "SPS-PastProjects",
            "SPS-Interests",
            "SPS-SipAddress",
            "SPS-HireDate",
            "SPS-Location",
            "SPS-TimeZone",
            "SPS-StatusNotes",
            "Assistant",
            "WorkEmail",
            "SPS-ClaimID",
            "SPS-ClaimProviderID",
            "SPS-ClaimProviderType",
            "CellPhone",
            "Fax",
            "HomePhone",
            "PictureURL"
           )

$collection = @()

foreach ($profile in $profiles) {
   $user = "" | select $fields
   foreach ($field in $fields) {
     if($profile[$field].Property.IsMultivalued) {
       $user.$field = $profile[$field] -join "|"
     } else {
       $user.$field = $profile[$field].Value
     }
   }
   $collection += $user
}

$collection | Export-Csv $outputFile -NoTypeInformation
$collection |  Out-GridView

Validating SAML Tickets in JRuby Reduex

A while back I had the pleasure o_O of implementing SAML in JRuby. At that time I was working with Java1.7.0u17, and all was right with the world.

Recently I wanted to upgrade to Java1.7.0u40, and the Validate class stopped working, and threw this error:

Exception:javax.xml.crypto.URIReferenceException: com.sun.org.apache.xml.internal.security.utils.resolver.ResourceResolverException: Cannot resolve element with ID _673ef297-23ab-428c-8e11-7fed395a7daf

Hmm. Something has obviously changed. A Google session later, this bug report points me in the right direction. It turns out that Java used to assume any XML node with an attribute named “ID” was in fact an ID node and could be found with getElementById. But newer versions conform more closely to the XML spec and require the node to be “tagged” as an ID node via a schema.

OK, so we change the code (original here) to apply the correct schema:

Validator.java

1
2
3
4
5
6
7
8
9
10
// Snip...

SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(new URL("http://docs.oasis-open.org/security/saml/v2.0/saml-schema-protocol-2.0.xsd"));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setSchema(schema);
Document doc = dbf.newDocumentBuilder().parse(new InputSource(new StringReader(samlResponse)));

// Snip..

This actually works and validates the SAML XML response, but it takes 30+ seconds to do it. Maybe thats because its trying to grab the schema from the web? So I try using a local copy, and still it takes 30+ seconds to run. Drat.

Since there is more than one way to shave a yak, instead of using a schema, you can also programmatically tag nodes to be ID nodes. So lets see what that looks like:

Validator.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Snip...

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
Document doc = dbf.newDocumentBuilder().parse(new InputSource(new StringReader(samlResponse)));

// Loop through the doc and tag every element with an ID attribute as an XML ID node.
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile("//*[@ID]");
NodeList nodeList = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i=0; i<nodeList.getLength() ; i++) {
  Element elem = (Element) nodeList.item(i);
  Attr attr = (Attr) elem.getAttributes().getNamedItem("ID");
  elem.setIdAttributeNode(attr, true);
}

// Snip..

Viola! This works, and of course its super fast.