Various stakeholders have called for human oversight of algorithmic processes, as a means to mitigate the possibility for automated discrimination and other social harms. This is even more crucial in light of the democratization of AI, where data and algorithms, such as Cognitive Services, are deployed into various applications and socio-cultural contexts. Inspired by previous work proposing human-in-the-loop governance mechanisms, we run a feasibility study involving image tagging services. Specifically, we ask whether micro-task crowdsourcing can be an effective means for collecting a diverse pool of data for evaluating fairness in a hypothetical scenario of analyzing professional profile photos in a later phase. In this work-in-progress paper, we present our proposed oversight approach and framework for analyzing the diversity of the images provided. Given the subjectivity of fairness judgements, we first aimed to recruit a diverse crowd from three distinct regions. This study lays the groundwork for expanding the approach, to offer developers a means to evaluate Cognitive Services before and/or during deployment