Classifies if text and/or image inputs are potentially harmful. Learn more in the [moderation guide](/docs/guides/moderation).
POST /moderations
Authorizations
Request Body required
object
A string of text to classify for moderation.
An array of strings to classify for moderation.
An array of multi-modal inputs to the moderation model.
An object describing an image to classify.
object
Always image_url
.
Contains either an image URL or a data URL for a base64 encoded image.
object
Either a URL of the image or the base64 encoded image data.
An object describing text to classify.
object
Always text
.
A string of text to classify.
Responses
200
OK
Represents if a given text input is potentially harmful.
object
The unique identifier for the moderation request.
The model used to generate the moderation results.
A list of moderation objects.
object
Whether any of the below categories are flagged.
A list of the categories, and whether they are flagged or not.
object
Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is harassment.
Hateful content that also includes violence or serious harm towards the targeted group based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.
Content that expresses, incites, or promotes harassing language towards any target.
Harassment content that also includes violence or serious harm towards any target.
Content that includes instructions or advice that facilitate the planning or execution of wrongdoing, or that gives advice or instruction on how to commit illicit acts. For example, “how to shoplift” would fit this category.
Content that includes instructions or advice that facilitate the planning or execution of wrongdoing that also includes violence, or that gives advice or instruction on the procurement of any weapon.
Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
Content where the speaker expresses that they are engaging or intend to engage in acts of self-harm, such as suicide, cutting, and eating disorders.
Content that encourages performing acts of self-harm, such as suicide, cutting, and eating disorders, or that gives instructions or advice on how to commit such acts.
Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
Sexual content that includes an individual who is under 18 years old.
Content that depicts death, violence, or physical injury.
Content that depicts death, violence, or physical injury in graphic detail.
A list of the categories along with their scores as predicted by model.
object
The score for the category ‘hate’.
The score for the category ‘hate/threatening’.
The score for the category ‘harassment’.
The score for the category ‘harassment/threatening’.
The score for the category ‘illicit’.
The score for the category ‘illicit/violent’.
The score for the category ‘self-harm’.
The score for the category ‘self-harm/intent’.
The score for the category ‘self-harm/instructions’.
The score for the category ‘sexual’.
The score for the category ‘sexual/minors’.
The score for the category ‘violence’.
The score for the category ‘violence/graphic’.
A list of the categories along with the input type(s) that the score applies to.
object
The applied input type(s) for the category ‘hate’.
The applied input type(s) for the category ‘hate/threatening’.
The applied input type(s) for the category ‘harassment’.
The applied input type(s) for the category ‘harassment/threatening’.
The applied input type(s) for the category ‘illicit’.
The applied input type(s) for the category ‘illicit/violent’.
The applied input type(s) for the category ‘self-harm’.
The applied input type(s) for the category ‘self-harm/intent’.
The applied input type(s) for the category ‘self-harm/instructions’.
The applied input type(s) for the category ‘sexual’.
The applied input type(s) for the category ‘sexual/minors’.
The applied input type(s) for the category ‘violence’.
The applied input type(s) for the category ‘violence/graphic’.