Historical Login reporting on macOS with the Elastic Stack

Recently, we were looking at ways to determine the usage of resources (in this case laptop and desktop devices) to provide insights toward utilization. Often, we take an educated guess at how many computers are needed in a classroom or cart of laptops based on class size. This is a decent indicator and ensures there is a device for each student. Inevitably, a "we need more ________" comes along and we wonder if the way devices are deployed are used in the most efficient manner. For example, how often are all twenty-four machines in a room used? Is there really enough usage to warrant moving five devices into a room? There are tools we can use via the command line (last) on macOS to see login information, but it isn't always the most indicative. Plus, how do we parse the information? For many years, I've setup login/logout hooks to record information. At one time this was the username and date and time. Later, IP address information was added. Later, we wanted to track when a user logged in and when a user logged out. Eventually, this turned into data looking like this:

LOGIN,admin,Tue Oct 10 19:48:17 CDT 2017, 192.168.2.194
LOGOUT,admin,Tue Oct 10 20:38:04 CDT 2017, 192.168.2.192

This told us our admin user logged in for about 50 minutes and sometime during that time period there was a DHCP lease renewal. Mainly, the IP was used for internal tracking. With ARD, we could send a UNIX command, retrieve the files on machines (that were online) work with them and return back a CSV where the number of logins could be pulled and graphed via Excel. If we ever needed to track a user, we could tail the log file, grep for "admin" and see where the account logged in. The problem is these are all metrics we need to retrieve and format. There had to be a better way. At Penn State Mac Admins this past year, I did a presentation on the Elastic Stack for monitoring systems via syslog and sending in log files from various sources. Suddenly two lightbulbs went off in my head:

  • We process access point connections by running commands against our Cisco Wireless Controller and format the results as a CSV (this means there is an example to process a CSV)
  • We use filebeat to send plaintext logs to logstash (this means there is an example of processing a log file)

With that in mind, I started looking at installing filebeat (and a corresponding LaunchDaemon) to process the file containing our login info. Next, I'd need to create a filter in logstash to map the items in the CSV into fields. After that, we need to make the @timestamp of the event be the date and time the login occurred. This will be more accurate for searching and reflect actual login time instead of when logstash receives the event. Lastly, explore some data to make dashboards and better visualizations.

How to get started? First, we are going to need to create a login/logout hook on a mac. There is an Apple Developer note on how to do this. It is important to format the loginhook in a specific way to make processing easier. Just make sure yours has the relevant data and write down your fomat to refernece later Here are the lines of code from ours:

echo "LOGIN,"$1","`date`","`ifconfig | grep "inet " | grep -v 127.0.0.1 | cut -d\ -f2` << /Library/Logs/login.log
echo "LOGOUT,"$1","`date`","`ifconfig | grep "inet " | grep -v 127.0.0.1 | cut -d\ -f2` << /Library/Logs/login.log

Next, we install filebeat (ours is installed into /usr/local) and create a corresponding LaunchDaemon (in /Library) called co.elastic.filebeat:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
        <string>co.elastic.filebeat</string>
    <key>ProgramArguments</key>
        <array>
            <string>/usr/local/filebeat/filebeat</string>
            <string>-c</string>
            <string>/usr/local/filebeat/filebeat.yml</string>
        </array>
    <key>KeepAlive</key>
        <true/>
</dict>
</plist>

Filebeat also needs a config file (the path is specific in the LaunchDaemon above as /usr/local/filebeat/filebeat.yml):

filebeat.prospectors:
  -
     paths:
       - /Library/Logs/login.login
     input_type: log
     tags: ["macLogin"]

     output.logstash:
       hosts: ["loghost:5044"]

This establishes a prospector to process /Library/Logs/login.log and will tag the files with "macLogin". The destination is our logstash host running on loghost:5044

Now that we are writing data to a log file and sending it to logstash via filebeat, we need to work with logstash on processing the incoming data into a more readable format and pulling key data from the file to store into fields. A filter config is created on our logtash host in the conf.d directory with the following contents:

filter {
  if "macLogin" in [tags] {
    if "," not in [message] {
      drop { }
    }
    else {
      csv {
	add_field => [ "received_at", "%{@timestamp}" ]
        columns => [ "action" , "username", "eventTime", "ip" ]
        separator => ","
      }
      date { match => [ "eventTime", "EEE MMM d HH:mm:ss z yyyy" ] }
      mutate {
        gsub => [ "host", ".local", "" ]  
        add_tag => [ "19-filter-computerLogins" ]
      }
    }
  }
}
  • First, we use a conditional tag that way, we will only process items tagged with "macLogin" with this filter
  • Next, we make sure the line contains a comma. If it does not, then we will drop the message.
  • Next, we act on the message using the csv plugin. We will add a field to the event called received_at with the contents being the @timestamp of the event. This corresponds to when logstash received the event. Our data is formatted with four fields
    • action (this corresponds to LOGIN or LOGOUT)
    • username of the user
    • eventTime (this is the output of the Unix command date)
    • ip address the machine has at the time of the event
  • We seperate the events in the message by the delimiter
  • We work with a date format. We want to format the date of the event using the eventTime and match it to the filter  using the format: EEE MMM d HH:mm:ss z yyyy
  • We mutate our data with two items. A gsub is performed to look at the field host, match the .local and replace it with nothing
  • Finally, for troubleshooting puropses, the tag is added of the filter name 

Why do I add a tag with the config file name? That way as a message flows through all the different filters in logstash, I can identify which filter acted on the data for troubleshooting. As your config builds and gets more complex this can help determine where something is going wrong.

With the data all formatted our last step pertains to storage in elasticsearch. We tell logstash to store this event with:

output
  {  
    if "macLogin" in [tags] 
      { elasticsearch { index => "login-%{+YYYY.MM.dd}"}
  }

Again, using our conditional check for a specific tag we will store this data in the elasticsearch index called login, followed by the year, month, day. We store this data in a different index to allow for longer retention periods. (Curator is an app to delete and close your old indices within elasticsearch). For example, we only keep network data for two weeks, whereas we will want to keep login data for a longer period of time.

Great! We are almost done with our configuration. Now we have shipped our log file as a CSV via filebeat to logstash, set up a filter in logstash to process our data and format it in a way we can work with and store our login data into an index called login-*. Our last step will be to work on visualizing the data via Kibana.

In Kibana navigate to your Management area and select Index patterns. We are going to create a new index pattern called login-* This will match all indices beginning with "login-". As you may remember, in the elasticsearch output of logstash above we create an index for each day. This allows us to match them all. We also will set the time of the event to be eventTime. This allows the time of the login to be used for any data where we show data over a period of time. Also, this will allow for devices not on the network at the time of login to ship their results later, but still have the data be accurate. Finally, click on create to connect Kibana to our elasticsearch index.

Image

With Kibana knowing about our data, we can now work to visualize. In the Discover area of Kibana we are going to select our login index. By default Kibana shows us the last 15 minutes of data. We can switch this by clicking in the top right corner and choosing a larger time window, such as 12 hours. You should now see a series of login events.

Image

We can adjust filters to the data to help with excluding terms we don't want to include. For example, as we look through logins, we know when logging in as admin we are troubleshooting a device and that shouldn't appear in our login counts. All we need to do to remove these logins is click the small - symbol next to the username field where we see admin or any other user we would like to exclude.

Image

We also do the same for the action field as we don't want to see both login and logout as an event, instead, we only want to see a LOGIN. To do this we click the + next to the action field where we see LOGIN. Now we are only filtering for logins by people other than admin.

Image

We determine usage by the number of logins on a machine rather than the duration of login. For us, this is a better indicator of usage of a device than the sheer number of minutes. Plus, even though someone is logged into a machine, there is a chance they forgot to log out and this we felt skewed the metrics. We can further display relevant data by clicking add next to the fields such as username and host. Once you have built the search terms you would like, make sure to save the information in the Discover tab. This can then be used to create visualizations from the saved search rather than needing to re-write the query.

Image

Now we can visualize.

In Kibana go to the Visualize tab and create a new visualization. Let's use a vertical bar char and use it from the saved search we just created. We can look at logins over a period of time by choosing an X-bucket time of Date Histogram using the field eventTime. To see what this looks like, click the play button. You'll see a nice visual of the logins over time. 

Image

We can further display data by selecting the option to add a sub-bucket and splitting the series. We could look at the logins by the top 5 users, but that gets a bit noisy. Instead, lets split the series based on a filter. This allows us to use query language to create a grouping of devices. Since our machines are named in a specific format for their room location or cart, we now have an easy way to group spaces. For our data we use two filters:

  • host:*L21*
  • host:*L04*

This shows us the logins over time based on the room machines are located.

Image

We could also look at this data in another way. Instead of creating a date histogram, let's group data on the X-axis by the same two filters. Now, we see total logins by room. We can then split the series by host to get the top machines in each room. This is good to give us a general sense of resource usage (such as a lab or a cart) 

Image

If we want to drill down into resource usage in a specific space, we can do a different chart.
Let's create this time a horizontal bar chart and again use the search we previously created. Again, let's create a filter for the X-axis, but this time I'll use host:*L21* to drill down into the specific room and say split series based on the term host.keyword. If you notice, Kibana shows data by descending order of five. This means we will only show the top five results. If, for example, there are twenty machines in a specific lab/room/cart and you want to see how much everything is used, switch this number to 20. Then the displayed data will indicate which devices if any are underutilized in relation to others. For the example I created, one machine is used much less than all the others. This may be an indicator the resources are over-provisioned. In the example below, we see one device has only had two logins in the last 30 days. This may be an indicator the device could be better utilized in another space.

Image

Feel free to play around with different visualizations that fit.

As an added bonus, you now not only have a way to visualize some login data, but you now have a fast and centralized way to search for data across your enterprise fleet, including user-login tracking by machine as well as internal IP addresses assigned to machines at the time of a user login.

To make this easier for us to work with, we created an install package for filebeat that will install the necessary files, copy the LaunchDaemon into the system and load it to begin processing data. Using a tool such as Munki, this can easily be deployed to your entire organization and allow for data collection quickly into a centralized location.

Happy login procesing!