Introduction and overview
NeuralSeek’s API can support a wide range of applications like chatbots, search, and website question-answering. While we expect that the API will mostly create benefits for customers and end-users, it also creates safety concerns that are important to characterize and mitigate.
Data Ownership, Retention and Content Responsibility
All data and responses generated by NeuralSeek are, by design, based on the users’s content and training. As such all data generated by the NeuralSeek service is the sole property of the end user, and the user’s content or generated answers will not be used to improve the overall NeuralSeek service.
Users have the ability to delete selected or all data from their NeuralSeek account at any time, via the API or the UI.
NeuralSeek will retain user data for a minimum of 30 days after the last “touch” of that data. EG: if an intent is matched by either NeuralSeek answering a question, or via connection to round-trip logging with a Virtual Agent platform – then the user question, generated training questions, answer and analytics will be retained a minimum of 30 days from that point, and will continue to be extended for every additional touch.
Generated answers, their relevancy to the user’s question and adherence to company priorities are the sole responsibility of the user. Users should leverage NeuralSeek’s Curation and Analytics features to monitor for any lines of questioning that deviate from expected corporate responses, and adjust content in the knowledge repository, edited answers on the Curate tab, as well as NeuralSeek settings on the Configure tab to adjust the outputs.
We prohibit the NeuralSeek API from being used to generate certain content.
We prohibit users from knowingly generating—or allowing others to knowingly generate—the following categories of content:
- Hate: content that expresses, incites, or promotes hate based on identity.
- Harassment: content that intends to harass, threaten, or bully an individual.
- Violence: content that promotes or glorifies violence or celebrates the suffering or humiliation of others.
- Self-harm: content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
- Sexual: content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
- Political: content attempting to influence the political process or to be used for campaigning purposes.
- Spam: unsolicited bulk content.
- Deception: content that is false or misleading, such as attempting to defraud individuals or spread disinformation.
- Malware: content that attempts to generate ransomware, keyloggers, viruses, or other software intended to impose some level of harm.
We have further requirements for certain uses of our service:
- Consumer-facing uses of NeuralSeek in medical, financial, and legal industries, and where else warranted, must provide a disclaimer to users informing them that AI is being used and of its potential limitations.
- Uses of NeuralSeek that simulate another living person must either have that person’s explicit consent or be clearly labeled as “simulated” or “parody.”
Even when the goal of an application is good, users can still interact with it in ways that violate NeuralSeek’s policies. We consider this misuse, and it falls into two categories:
Content violations: An end-user violating our content policy using the application, for example:
- Using a copywriting app to generate disinformation.
- Using a generic chatbot app to engage in erotic chat.
Repurposing: An end-user redirecting the functionality of the application towards a different, unapproved, use-case, for example:
- Prompting a chatbot to behave more like a copywriting service.
- Using a system designed to summarize notes to commit academic plagiarism.
- Pass context (user-id) for all requests
- Do not permit automated posting to other sites (including social media).
- End-users are not allowed to have API access / automation. All end-user input must be thru a GUI.
- Requests are rate-limited. Our default limit is 10 requests per second. For increased limits please contact firstname.lastname@example.org
- Capture User Feedback & keep a human in the “loop”. Use the “Curate” tab of NeuralSeek to monitor trending of user input. If you see instances of user manipulation or unintended use contact us at email@example.com so we may help mitigate.
- Utilize the minimum and warning thresholds. Use extreme care and consideration before allowing NeuralSeek to answer questions to external users with “0” confidence. These have equal chance of being on-point and completely wrong answers that may conflict with your brand.
Safety challenges for high-bandwidth, open-ended machine learning systems
Our guidance for API safety is informed by considerations particular to systems with machine learning (ML) components that can have high-bandwidth, open-ended interactions with people (e.g. via natural language).
- ML components have limited robustness. ML components can only be expected to provide reasonable outputs when given inputs similar to the ones present in the training data. Even if an ML system is considered safe when operated under conditions similar to training data, human operators can provide unfamiliar inputs that put the system into an unsafe state, and it is often not obvious to an operator what inputs will or won’t lead to unsafe behavior. Open-ended ML systems that interact with human operators in the general public (e.g. in a question-answering application) are also susceptible to adversarial inputs from malicious operators who deliberately try to put the system into an undesired state. As one mitigation to this, among others, customers should thus manually evaluate model outputs for each use case being considered, with those outputs being generated over a range of representative inputs as well as some adversarial inputs.
- ML components are biased. ML components reflect the values and biases present in the training data, as well as those of their developers. Systems using ML components—especially systems that interact with people in open-ended ways—can perpetuate or amplify those values. Safety concerns arise when the values embedded into ML systems are harmful to individuals, groups of people, or important institutions. For ML components like the API that are trained on massive amounts of value-laden training data collected from public sources, the scale of the training data and complex social factors make it impossible to completely excise harmful values.
- Open-ended systems have large surface areas for risk. Systems that have high-bandwidth interactions with end-users, like natural language dialogue or question-answering, can be used for virtually any purpose. This makes it impossible to exhaustively enumerate and mitigate all potential safety risks in advance. Instead, we recommend an approach focused on considering broad categories and contexts of potential harm, continuous detection and response for incidents of harm, and continuous integration of new mitigations as needs become apparent.
- Safety is a moving target for ML systems. The safety characteristics of ML systems change every time the ML components are updated, for example if they are retrained with new data, or if new components are trained from scratch with novel architectures. Since ML is an area of active research and new levels of performance are routinely unlocked as research advances, ML system designers should anticipate frequent updates to the ML components and make plans to perform continuous safety analysis.
Harms to consider in risk analysis
Below, we give examples of potential harms (or paths to harm) that may arise in systems involving the API as a component. This list is non-exhaustive, and not every category will apply to every system that uses the API: use cases vary in the extent to which they are open-ended and high stakes. When identifying potential harms, customers should consider their systems in context, include both those who use the system and those subject to it, and examine sources of both allocative and representational harms.
- Providing false information. The system may present false information to users on matters that are safety-critical or health-critical, e.g. giving an incorrect response to a user asking whether they are experiencing a medical emergency and should seek care. Intentionally producing and disseminating misleading information via the API is strictly prohibited.
- Perpetuating discriminatory attitudes. The system may persuade users to believe harmful things about groups of people, e.g. by using racist, sexist, or ableist language.
- Individual distress. The system may create outputs that are distressing to an individual, e.g. by encouraging self-destructive behavior (like gambling, substance abuse, or self-harm) or damaging their self-esteem.
- Incitement to violence. The system may persuade users to engage in violent behavior against any other person or group.
- Physical injury, property damage, or environment damage. In some use cases, e.g. if a system using the API is connected to physical actuators with the potential to cause harm, the system is safety-critical, and physically-damaging failures could result from unanticipated behavior in the API.
The importance of robustness
“Robustness” here refers to a system reliably working as intended and expected in a given context. API customers should ensure that their application is as robust as required for safe use, and should ensure that such robustness is maintained over time.
Robustness is challenging. Language models such as those included in the API are useful for a range of purposes, but can fail in unexpected ways due to, e.g., limited world knowledge. These failures might be visible, such as generation of irrelevant or obviously incorrect text, or invisible, such as failing to find a relevant result when using API-powered search. The risks associated with using the API will vary substantially across use cases, though some general categories of robustness failure to consider include: generation of text that is irrelevant to the context (providing more context will make this less likely); generation of inaccurate text due to a gap in the API’s world knowledge; continuation of an offensive context; and inaccurate classification of text.
Context matters a lot. Customers should bear in mind that API outputs are heavily dependent on the context provided to the model. Providing additional context to the model (such as by giving a few high-quality examples of desired behavior prior to the new input) can make it easier to steer model outputs in desired directions.
Encourage human oversight. Even with substantial efforts to increase robustness, some failures will likely still occur. As such, API customers should encourage end-users to review API outputs carefully before taking any action based on them (e.g. disseminating those outputs).
Keep testing. One way in which the API might not perform as intended, despite initially promising performance, is if the input distribution shifts over time. Additionally, NeuralSeek may alter and improve the underlying models and architecture over time, and customers should ensure that such versions continue to perform well in a given context.
The importance of fairness
“Fairness” here means ensuring that the API does not either have degraded performance for users based on their demographics, or produce text that is prejudiced against certain demographic groups. API customers should take reasonable steps to identify and reduce foreseeable harms associated with demographic biases in the API.
Fairness in ML systems is very challenging. Due to the API being trained on human data, our models exhibit various biases, including but not limited to biases related to gender, race, and religion. For example: the API is trained largely on text in the English language, and is best suited for classifying, searching, summarizing, or generating such text. The API will by default perform worse on inputs that are different from the data distribution it is trained on, including non-English languages as well as specific dialects of English that are not as well-represented in our training data. NeuralSeek has provided baseline information on some biases we have found, though such analysis is not comprehensive; customers should consider fairness issues that may be especially salient in the context of their use case, even if they are not discussed in our baseline analysis. Note that context matters greatly here: providing the API with insufficient context to guide its generations, or providing it with context that relates to sensitive topics, will make offensive outputs more likely.
Characterize fairness risks before deployment. Users should consider their customer base and the range of inputs that they will be using with the API, and should evaluate the performance of the API on a wide range of potential inputs in order to identify cases where the API’s performance might drop.
Filtration tools can help but aren’t panaceas. NeuralSeek has built-in filtration for blocking potentially sensitive outputs. Customers should bear in mind that these measures are not a panacea for eliminating all potentially offensive outputs–offensive outputs using other “safe” words may still be generated, and legitimate outputs might be blocked.