Project Instructions
After you’ve sourced and vetted the right annotators, giving them clear instructions is vital to the success of the project and can affect the cost, speed, and quality of the data. At micro1, we review and provide feedback on your project instructions prior to beginning any project. This is because teams should avoid updating or changing instructions. It’s important to note that adding clarification to the instructions is still encouraged. Updating instructions requires checking all previously completed tasks to see if they still adhere to the updated guidelines. If you are planning on paying data vendors per-task, vendors typically include a mark-up which leads to higher costs. If paying hourly plus a percentage fee, you can eventually see the direct costs for instruction changes. We highly recommend keeping a dated changelog so annotators and teams can easily stay up to date on changes.
Research teams should be specific on what is unsafe or inappropriate language rather than relying on annotators to make these decisions. Annotator workforces come from many countries with very different cultures and cultural norms. What may be considered taboo or inappropriate to talk about in one culture is completely fine or even encouraged in another. As such you need to make sure everyone is aligned on guidelines and opinions. A strong review pipeline is one way to ensure consensus amongst your annotators, but we found two other cost-effective ways to accomplish this:
-
If you are fine-tuning a model for consumption in the US & Canada, then you should staff the project with talent from the US & Canada or culturally similar locations (your data vendors can help with this)
-
You need to be very explicit on what constitutes a taboo topic or undesired behavior from an AI model.
Otherwise, you’ll have poor post-training results from inconsistent human data, or even worse, your AI model will have ethics that are very different from the end users of your AI model which can severely impact the utility of your model.
At micro1 in order to help annotators get familiar with the platform, we build an FAQ for non-project specific matters. We found that on our biggest project with 800 micro1 sourced developers working everyday our Client Success Managers became overwhelmed with the number of queries from our developers. As such we developed an evolution of question-answer bots we dubbed RAG 2.0, which features better information retrieval techniques and it only replies when it’s confident in the response. We then deployed this AI bot in our slack team channel to provide immediate support to our annotators. We wouldn’t recommend having a bot answer project instruction questions as a model hallucination could be costly and all annotators should read the instructions line-by-line. This is a good balance between automation and increased productivity while minimizing potential mistakes. If you’d like to read more about our AI bot which saved us $20,000/month, you can view our blog post here