Cyber Guardian

Por Samurai Labs | Atualizado 2 महीने पहले | Text Analysis

6.5 / 10



Nível de serviço


Health Check


Seguidores: 2
Site do produto Termos de uso
API Creator:
Rapid account: Samurai Labs
Samurai Labs
Efetuar login para classificar API
Avaliação: 4 - Votos: 3


Each message gets categorized as containing various categories of behavior that’s abusive or potentially unwelcome in your community allowing for greater transparency and understanding than a simple binary configuration would. The detailed output also allows for greater configurability. You have the choice over which detected categories you deem to be of the highest priority for your community.

The API currently detects 6 main violence categories:

  • Personal Attack - “You are as stupid as a cow”, “These guys are complete assholes”
  • Sexual Harassment - “How long is your penis?”, “I’m gonna jizz on your face”
  • Solicting Photos - “How about you post some nudes?”
  • Sexism - “stalking is just a compliment relax girl”
  • Bad Wish / Threat - “Imma shoot u when you come to school”, “His days are numbered”
  • Rejection - “No one cares about you”, “She should fuck right off”

And 2 lexical categories:

  • Profanity - “fuck”, “whore”, “wetback”
  • Sexual Remark - “boobies”, “pussy”, “schlong”

Either of these categories may co-occur with any other depending on the content of the message.
The violence categories are divided based on:

  • Object of the attack - interlocutor, third party, other (where applicable)
  • Severity of the attack - mild, strong, severe

The lexical categories are divided based on:

  • Object - only applicable for profanity, here understood as a context in which a profanity can appear, e.g. racist, misogynistic, xenophobic
  • Severity - mild, strong, severe

For a more detailed breakdown of the categories and their subdivisions please refer to the PDF document in the spotlights section. All of this information can be used for further configuration and allows you to act on a precise and fine-grained level to make sure that the intended moderation behavior matches your needs perfectly.

Integrate the Cyber Guardian into your suite of moderation tools on various platforms like Discord, Twitch or Youtube.

Moderation Functionalities

  • Actions: Removing user messages, kicking, muting, and banning. The actions can be selected in the moment by the moderators or set to be performed automatically.
  • Real-time notifications: The API sends moderation notifications and provides information on moderation actions that were taken through designated channels.
  • Configuration: The moderation behavior of the API is fully customizable thanks to its external Configuration Dashboard. This lets you use all of the detection information provided by the API (types of violence, lexical categories, severity and combinations thereof) and adjust the desired moderation behavior by setting automatic actions, keeping them at the level of manual moderation or turning them off completely.
  • Customisable features: You can modify the behavior of the Cyber Guardian even further by adding your own keyword lists, whitelisting phrases, preparing automatic messages to be sent upon moderation actions and more.
  • Strike System: Moderation actions can also be triggered automatically through a gradually escalating strike system that ramps up the moderation action with each repeated offense. Define which actions are taken at which point and what triggers them with a high degree of customisability.

Using these functionalities you can empower your current moderation solutions with a versatile tool that can adjust itself to your and your communities needs. If your community has a 13+ audience it is likely wise to maintain all detections at the level of manual moderation at least and add some automated actions for the most egregious offenses. More adult-oriented communities might choose to completely turn off the detection of profanities, sexual remarks and some milder detection categories as they are likely to be seen as acceptable and these detections might not bring in valuable information in that context. Adjust your configurations freely at any time, customize moderation actions and automate to save on manual moderation tasks. Using these principles we have prepared a series of moderation presets for the API. You can review the presets in the Configuration Dashboard and select one that addresses your needs best.

Some of the features mentioned here are platform-dependent and the API currently only supports English. Other languages and new features continuously under development.