Response Evaluation

Rate the responses from each model and evaluate the effectiveness and relevance to your prompts. The app will keep track of your ratings.

Response Evaluation

The rating system provides a simple and intuitive way of evaluating model responses for your purposes. The 👍 button adds one to the running total, the 🤷‍♀️ button adds zero, and the 💩 button subtracts one. You can assign your own meaning to these values based on your own evaluation criteria.

Each model’s cumulative rating is displayed for the session under each response. This value will reset to zero if you refresh the page.

Your response ratings also update in the session history table. The number reported in the session history table is not cumulative. Instead, you will see a value from the set [-1, 0, 1], representing your qualitative rating selection for the model’s generated response, where 👍 equals 1, 🤷‍♀️ equals 0, and 💩 equals -1. This value should auto-populate as soon as you make your rating selection. If you make a mistake, just click on the button representing your true rating and the incorrect value will be replaced.

Saving Comments

Add a comment and save it to a model’s response. Type a comment in the box and hit the Save Comment button. Your comment should auto-populate in the session history table at the bottom of the page, and you can verify that your comment was saved by checking the table.

If you make a mistake, you can just type the corrected comment in the text input box and hit Save Comment again. You should see your updated comment replace the previous comment in the session history table.

If you choose to export your session chat history, the full text of your comment will be in the CSV file.

Refreshing the page erases your saved comments along with your other session chat history.