Skip to content

Classifies if text and/or image inputs are potentially harmful. Learn more in the [moderation guide](/docs/guides/moderation).

POST
/moderations

Authorizations

Request Body required

object
input
required
One of:

A string of text to classify for moderation.

string
""
I want to kill them.
model
Any of:
string

Responses

200

OK

Represents if a given text input is potentially harmful.

object
id
required

The unique identifier for the moderation request.

string
model
required

The model used to generate the moderation results.

string
results
required

A list of moderation objects.

Array<object>
object
flagged
required

Whether any of the below categories are flagged.

boolean
categories
required

A list of the categories, and whether they are flagged or not.

object
hate
required

Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is harassment.

boolean
hate/threatening
required

Hateful content that also includes violence or serious harm towards the targeted group based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.

boolean
harassment
required

Content that expresses, incites, or promotes harassing language towards any target.

boolean
harassment/threatening
required

Harassment content that also includes violence or serious harm towards any target.

boolean
illicit
required

Content that includes instructions or advice that facilitate the planning or execution of wrongdoing, or that gives advice or instruction on how to commit illicit acts. For example, “how to shoplift” would fit this category.

boolean
illicit/violent
required

Content that includes instructions or advice that facilitate the planning or execution of wrongdoing that also includes violence, or that gives advice or instruction on the procurement of any weapon.

boolean
self-harm
required

Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.

boolean
self-harm/intent
required

Content where the speaker expresses that they are engaging or intend to engage in acts of self-harm, such as suicide, cutting, and eating disorders.

boolean
self-harm/instructions
required

Content that encourages performing acts of self-harm, such as suicide, cutting, and eating disorders, or that gives instructions or advice on how to commit such acts.

boolean
sexual
required

Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).

boolean
sexual/minors
required

Sexual content that includes an individual who is under 18 years old.

boolean
violence
required

Content that depicts death, violence, or physical injury.

boolean
violence/graphic
required

Content that depicts death, violence, or physical injury in graphic detail.

boolean
category_scores
required

A list of the categories along with their scores as predicted by model.

object
hate
required

The score for the category ‘hate’.

number
hate/threatening
required

The score for the category ‘hate/threatening’.

number
harassment
required

The score for the category ‘harassment’.

number
harassment/threatening
required

The score for the category ‘harassment/threatening’.

number
illicit
required

The score for the category ‘illicit’.

number
illicit/violent
required

The score for the category ‘illicit/violent’.

number
self-harm
required

The score for the category ‘self-harm’.

number
self-harm/intent
required

The score for the category ‘self-harm/intent’.

number
self-harm/instructions
required

The score for the category ‘self-harm/instructions’.

number
sexual
required

The score for the category ‘sexual’.

number
sexual/minors
required

The score for the category ‘sexual/minors’.

number
violence
required

The score for the category ‘violence’.

number
violence/graphic
required

The score for the category ‘violence/graphic’.

number
category_applied_input_types
required

A list of the categories along with the input type(s) that the score applies to.

object
hate
required

The applied input type(s) for the category ‘hate’.

Array<string>
Allowed values: text
hate/threatening
required

The applied input type(s) for the category ‘hate/threatening’.

Array<string>
Allowed values: text
harassment
required

The applied input type(s) for the category ‘harassment’.

Array<string>
Allowed values: text
harassment/threatening
required

The applied input type(s) for the category ‘harassment/threatening’.

Array<string>
Allowed values: text
illicit
required

The applied input type(s) for the category ‘illicit’.

Array<string>
Allowed values: text
illicit/violent
required

The applied input type(s) for the category ‘illicit/violent’.

Array<string>
Allowed values: text
self-harm
required

The applied input type(s) for the category ‘self-harm’.

Array<string>
Allowed values: text image
self-harm/intent
required

The applied input type(s) for the category ‘self-harm/intent’.

Array<string>
Allowed values: text image
self-harm/instructions
required

The applied input type(s) for the category ‘self-harm/instructions’.

Array<string>
Allowed values: text image
sexual
required

The applied input type(s) for the category ‘sexual’.

Array<string>
Allowed values: text image
sexual/minors
required

The applied input type(s) for the category ‘sexual/minors’.

Array<string>
Allowed values: text
violence
required

The applied input type(s) for the category ‘violence’.

Array<string>
Allowed values: text image
violence/graphic
required

The applied input type(s) for the category ‘violence/graphic’.

Array<string>
Allowed values: text image