Rutgers has a mobile device privacy violation strategy

Rutgers decided to switch everyone over to an Office 365 system for email. All “official Rutgers business” has to be conducted through our new email accounts. If you try to sync mail to your phone, you are prompted to install a Microsoft app which will manage your account. According to the Rutgers Mobile Device Management Policy we “will be prompted by a notice that states administrators will be allowed to make a number of changes to your device but the University will not utilize those features as they are beyond policy.”

I Am Not A Lawyer, but it seems a little bad to sign a contract with someone who says “oh don’t worry about those clauses, we will never use them.” So what are we agreeing to let IT admins do?

What IT cannot see:

  • Call and web history
  • Location
  • Email and text messages
  • Contacts
  • Passwords
  • Calendar
  • Camera roll

What IT can see:

  • Model
  • Serial number
  • Operating system
  • App names
  • Owner
  • Device name

So apparently what apps you have is something that your boss should know about. I suppose you can construct a reason for that, but I don’t really know why it’s anyone’s business. I can see it as being rather dangerous — who are they sharing this information with? Also, Rutgers wants to:

  • Reset your device back to manufacturer’s default settings if the device is lost or stolen.
  • Require you to have a password or PIN on the device.
  • Require you to accept terms and conditions.

Hmmm, abstract “terms and conditions.” Ok then… the features they say are out of scope (for now) are:

  • Remove all installed company-related data and business apps. Your personal data and settings aren’t removed.
  • Enable or disable the camera on your device to prevent you from taking pictures of sensitive company data.
  • Enable or disable web browsing on your device.
  • Enable or disable backup to iCloud.
  • Enable or disable document sync to iCloud.
  • Enable or disable Photo Stream to iCloud.
  • Enable or disable data roaming on your device. If data roaming is allowed, you might incur roaming charges.
  • Enable or disable voice roaming on your device. If voice roaming is allowed, you might incur roaming charges.
  • Enable or disable automatic file synchronization while in roaming mode on your device. If automatic file synchronization is allowed, you might incur roaming charges.

Seems like a lot for the dubious value of checking my work email on my phone. I guess I have some startup funds that need spending. Perhaps I can get a “just for work” device that Rutgers can snoop on as much as they like.

Advertisements

Subscribing to the NSF CIF Listserv

Want to get emails from the NSF’s CIF Program?

  • Compose an email to LISTSERV@listserv.nsf.gov
  • Leave the subject blank
  • In the body of the message, just write “SUBSCRIBE CIF-Announce Firstname Lastname” (without the quotes and replacing Firstname and Lastname with your name). Alternatively, you can subscribe anonymously by writing “SUBSCRIBE CIF-Announce ANONYMOUS” (without the quotes).
  • Send the message. You will receive a confirmation email that you have subscribed. Please read the confirmation email since you may need to respond to it.

Problems with the KDDCup99 Data Set

I’ve used the KDDCup99 data set in a few papers for experiments, primarily because it has a large sample size and preprocessing is not too onerous. However, I recently learned (from Rebecca Wright) that for applications to network security, this data set has been discredited as unrepresentative. The paper by John McHugh from ACM TISSEC details the charges. Essentially there was little validation done with regards to checking how representative the data set is.

Why do I bring this up? Firstly, I suppose I should stop using this data set to make claims about anomaly detection (which may be a problem for AISec coming up at the end of the month). However, it’s not clear, from a machine learning perspective, whether the claims one can make about a particular application will generalize within an application domain, given the lack of standardization of data sets even within a particular application. I could do a bunch of experiments on mixtures of Gaussians which might tell me that the convergence rate is what the theory said it should be, but validating on a variety of “non-synthetic” data sets can at least show how performance varies with data sets properties (regardless of the accuracy with respect to the application). So should I stop using the data set entirely?

Secondly, if we want to develop new models and algorithms for machine learning on security applications, we need data sets, and preferably public data sets. This is a real challenge for anyone trying to develop theoretical frameworks that don’t sound too bogus: practice could drive theory, but there is a kind of security through obscurity model in the data gathering/sharing world which makes it hard to understand what the problems are.

Linkage

Cheating: The List Of Things I Never Want To Hear Again. This is an almost definitive list of plagiarism/cheating excuses. I both love and loathe the idea of making students sign a pledge, but there’s that saying about a horse and water… (h/t Daniel Hsu)

This note on data journalism comes with a longer report about how to integrate data journalism into curricula. It strikes me that many statistics and CS departments are missing the boat here on creating valuable pedagogical material for improving data analytics in journalism. (h/t Meredith Broussard)

Speaking of which, ProPublica has launched version 2.0 of it’s Data Store!

Of course, data isn’t everything: The Perils of Using Technology to Solve Other People’s Problems.

DARPA just launched a podcast series, Voices from DARPA, where DARPA PMs talk about what they’re doing and what they’re interested in. The first one is on molecular synthesis. It’s more for a popular audience than a technical one, but also seems like a smart public-facing move by DARPA.

My friend Steve Severinghaus won the The Metropolitan Society of Natural Historians Photo Contest!

My friend (acquaintance?) Yvonne Lai co-authored this nice article on teaching high school math teachers and the importance of “mathematical knowledge for teaching.”

What’s the proper bibtex type for ArXiV papers?

I like to use @techreport, like


@techreport{ShakeriBS:16ks_dict,
author = {Z. Shakeri and W.U. Bajwa and A.D. Sarwate},
title = {Minimax Lower Bounds on Dictionary Learning for Tensor Data},
number = {arXiv:1608.02792 [cs.IT]},
month = {August},
year = {2016},
institution = {ArXiV},
url = {http://arxiv.org/abs/1608.02792},
}

but the handy-dandy ArXiv to BibTeX uses @misc (which makes it less handy-dandy, TBH):


@misc{1608.02792,
Author = {Zahra Shakeri and Waheed U. Bajwa and Anand D. Sarwate},
Title = {Minimax Lower Bounds on Dictionary Learning for Tensor Data},
Year = {2016},
Eprint = {arXiv:1608.02792},
}

I’d ask this on the TeX Stack Exchange but it seems more of a matter of taste.

Quantifying the professoriate

Faculty at Rutgers are unionized, and currently the union is trying to fight the university administration over their (secretive) use of Academic Analytics to rate the “scholarly productivity” of faculty and departments. For example, last year they produced a ranking of Rutgers departments (pdf). It’s so great to be reduced to a single number!

As the statistical adage goes, garbage in, garbage out, and it’s entirely unclear what AA is using to produce these numbers (although one could guess). It’s a proprietary system and the university refuses to give access to the “confidential innards” — perhaps they don’t want others to see how the sausage is made. If we take just one likely feature, impact factor, we can already see the poverty of single-index measures of productivity. Impact factor can vary widely across indexing systems: Scopus, Web of Knowledge, and Google all produce different numbers because they index different databases. At some point I went though and lumped together papers in my Google profile if they were the same result (e.g. a journal version of a conference paper) but then I was told that this is a bad idea, not because it would lower my impact factor (which it would), but because manipulating an index is bad form. If the index sucks, it’s the index-maker’s fault.

I wonder how many other universities are going through this process. Within one department the levels of “productivity” vary widely, and across disciplines the problem is only harder. The job faced by administrators is tough — how do they know where things can improve? But relying on opaque analytics is at best “statist-ism” and at worst reading entrails.

Protips for Allerton Presentations

So you’re going to be presenting at the Annual Allerton Conference on Communication, Control, and Computing, you say? This annual conference, sometimes called the “Burning Man for EE Systems” by the younger set, has a much older pedigree. This year is the 54th anniversary, and you want to make sure you make an impression making sure you dress (your slides) for success! Here are some tips from old hands on how to make sure your talk is the Sun Singer and not the Death of the Last Centaur.

  • Library: You landed a slot in the library, the crème-de-la-crème of venues at the Allerton Mansion! There is ample seating, so even a moderate audience may seem sparse. Although your slides will be visible, your voice may be inaudible, especially for those “attendees” who are actually checking Facebook against the back wall. Invest in a stage-acting class or perhaps bring a megaphone.
  • Solarium: Afraid that early fall in the Midwest may be a bit chilly? Fear no more, for the Solarium is surrounded by glass and boasts a climate that is more Humboldt than Piatt. Even if the forecast is for rain, beware of light colors on your slides — they will be overwhelmed by the kiss of Phoebus. If the forecast is particularly sunny, consider polarizing the projector and handing out sunglasses for the audience
  • Butternut/Pine: Regardless of how esoteric your paper may be, you are nearly guaranteed a packed house: expect your talk to be punctuated by the door opening, a slight breeze filtering in, and the faint sound of light swearing. In order to maximize visibility, try to not obscure the view of your slides. Recommended places to position yourself include on a bookshelf or behind the screen.
  • Lower Level: Unlike a horror house, the basement of Allerton Mansion is where the fun is — the pool table and ice maker! Here you will be cool, comfortable, and nigh invisible. For some pre-Halloween fun, bring a flashlight and make your presentation a ghost story! Regardless, make sure your slides only have important information on the top quarter of the screen so that session attendees beyond the first row can get the gist.
  • Visitor Center: (Discontinued this year!) In the past, adventurous Allerton attendes would trek across the wild groves and through a manicured garden to the Visitor Center to attend special sessions al fresco (this was a change from the Tent, which may have been more properly au naturel). Hard times on the road make for famished session attendees. Consider offering complementary snacks and beverage to boost attendance.