Wednesday, September 16, 2015

Clocking in

Today the tech world has been abuzz with the story of Ahmed Mohamed who has arrested for bringing a clock to school, and still hasn't gotten it back from police.

Whether it's racism, anti-Muslim sentient, or just the drive to criminalize and punish innocent behavior in the name of safety, the general consensus, at least among non conservatives, seems to be that reaction to him by his school and the police was way way out of line.

Mark Zuckerberg and Obama, as well as a host of luminaries and companies involved science and tech have reached out to show support for the boy.  And that's good, and to be applauded.

But we in Tech are reacting to what feels like an egregious violation, and ignoring the fact that this is an issue that goes beyond one kid in Irving, Texas's experience.  We have a job beyond just being nice to one kid.

First, while the specifics of this issue will likely be called into question, we have to acknowledge that children of color face harsher discipline in general, even for the same infractions. Not just high schoolers, even preschool black children face harsher punishment.  This is in part because of increases in zero tolerance policies that have lead to even minor offenses leading to suspension, expulsion, or event arrest over minor offenses like truancy or smoking.

So first of all,  discipline is unfair.  However, second, tolerance is now a bad thing.  I'm not sure what the argument is that says kicking out a kid for misbehaving teaches them to behave better.  And doing it for minor actions seems ludicrous to me.

Many of us in tech and science grew up privileged.  One of those privileges was likely that if we did something cool at home, we felt safe showing our teachers and mentors.  If we did something problematic, our intentions, whether to cause harm or not, were considered.  Now, that privilege likely wouldn't have been afforded to people of color... but the solution is not a broken system where we take that privilege away... we need schools and communities where we extend the privilege to tinker, to make, to receive the benefit of the doubt to everyone.  Great minds that could be inventing better futures don't get the education they deserve, and everyone loses.  But even if it's just a normal kid that doesn't get their education... someone that could work a job and be a good citizen might instead end up in prison.

It's not enough to do things for Mohamed.  We need to do things to make life better, proactively, for young people who want to learn, bright or not.  We should talk to PTA meetings and neighborhood groups about the freedom to make and tinker.  We should write to our legislators and local politicians.  Not that we have to do everything, but we should try to do something.

Like I said, we have a job.  We want to disrupt things, make the world a better place, we need to think not just about server processes and operating systems, but about justice processes and school systems.


Thursday, August 13, 2015

Fisher-Yates Shuffle, Reservoir Sampling, and Random permutations of N

I recently had a conversation with someone who introduced me to the concept of Reservoir Sampling, which is a technique of generation for a combination of k randomly selected elements when there are N elements total, where N is unknown or larger then available memory, and you must select elements without replacement.


Here's a version I just wrote:

import random

"""
iterator is an Iterator/Generator object for a stream

and N >= k
"""
def reservoir(iterator, return_count):

    arr = []
    current = 0
    for item in iterator:
        if current < return_count:
            arr.append(item)
        else:
            rand = random.randint(0, current)
            if rand <= return_count:
                arr[rand] = item
        current += 1
    return arr


Once I read it, I noticed that it is strikingly similar to an inside out Fisher-Yates Shuffle (which is how you you return a random permutation of a list from a list)

That would look like:


import random

def shuffle(cards):
    deck = []
    current = 0
    for card in cards:
        rand = random.randint(0, current)
        if rand != current:
            deck.append(deck[rand])
        deck[rand] = card
    return deck

That said, with reservoir, you won't have a random permutation of the elements.  And shuffle uses extra memory if the length of the card pile is very long, and you only need some k cards.

Thus:
        
 import random

 def resshuffle(iterator, return_count):
    arr = []
    current = 0
    while (current < return_count):
        rand = random.randint(0, current)
        if current < return_count:
            arr.append(arr[rand])
            arr[rand] = iterator.next() 
         current += 1   
   for item in iterator:
            if rand <= return_count:
                arr[rand] = item
        current += 1

    return arr

Sunday, February 8, 2015

Nurse Scheduling Problem and Optimization

I had an interview on Friday, and one of the questions has been on my mind, because I didn't really like how vague my answer was. It was presented as a design problem, but really it's a software engineering problem.  And it's a problem that I might be working on, if hired.

It turns out that it's something of an instance of the Nurse Scheduling Problem.  Nurses need to work a certain number of shifts, shouldn't have back to back shifts, should work about equally each week, then you have to consider different skill levels, shift preferences etc.

This isn't the exact question, of course... I'm not getting hired at a hospital.  But considering I've been scheduling a lot of interviews, it strikes me that having software engineers do technical interviews of candidates is something of the same problem.  Less busy, but still, a structure lots of people in tech can understand.

It's easy to think of additional constraints, either hard constraints which if violated make a schedule invalid, and soft constraints which can be used to rate valid solutions.  Maybe you have an open office, and two people shouldn't interview simultaneously if they work next to each other.  Soft constraint. Some people can interview for front end, some for back end, some for mobile, some for embedded, some for multiple.  Maybe you have all of those at your company, and need to assign individuals to particular candidates.  You may not give a good interview if you assign the embedded guy to interview someone to see if they actually know android development.  Hard Constraint.

Let's say we have d developers who can interview, i individuals interviewing, and t  time slots that a candidate says they are available and lets say we want to organize based on week.

 Even with just d and i  and the hard constraints, the naive solution will be in polynomial time. Now finding the optimal solution, with the soft constraints, the time slot issue... this is the nurse scheduling problem, and it's NP hard.

Imagine you are a manager assigning people, that's a lot of things to keep track of.  Add in sick days, projects that are priorities, and a manual system is going to be pretty stressful and complicated.

So obviously I'm not looking for an optimal solution.   But I've been thinking on heuristics that might be helpful.

Time slots where only one person is scheduled are easier to sub for.
Everyone may be equal, but multiple certifications can sub for more people, and can interview more people.

It strikes me that there are a variety of ways to simplify the problem.   So I'll try to solve the general problem programmatically and handle corner cases in an adhoc manner.
list available time slots in a calendar.  Have the the Candidates pick one time slot to commit to, then reschedule if needed. If a time slot overlaps, have a rough indication of interviewers available.  Green for multiple interviewers, red for 1, disabled for 0.

Limit time slots. A morning slot and an afternoon slot.  If none of the time slots work, the candidate can inform the company of when they are available by emailing.

Keep track of how many interviews a person does in a week, and when randomly selecting individuals, prefer selecting people who have done less.

I don't need to have an optimal solution, just a reasonably good one that makes finding workable solutions easier.   And it's okay that I didn't have perfect answers.  I don't need to design a perfect system, just one better then what they have currently.  Optimization isn't always about finding the optimum way to do things.  As long as you can make something better, you have improved it.