This README provides instructions for running experiments on structured and text-based datasets using the provided Python script.
This repository contains the code for the thesis "Explaining Graph Neural Networks using High Level Concepts" of the University of Paderborn. The goal of the project is to detect and enrich an OWL ontology using high-level concepts and state how to try to reduce the concept length. This README provides instructions for running experiments on structured and text-based datasets using the provided Python script.
## Overview
...
...
@@ -9,81 +9,119 @@ The script allows for running experiments on two types of datasets: structured a
## Requirements
- Python 3.6 or higher
- OpenAI API key
- Dependencies are listed in the `requirements.txt` file.
### Installing Dependencies
## Setup Instructions
Before running the experiments, install the necessary Python packages using the following command:
### 1. Virtual Environment Setup
First, create and activate a virtual environment:
```bash
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows
venv\Scripts\activate
# On Unix or MacOS
source venv/bin/activate
```
### 2. Install Dependencies
After activating the virtual environment, install the required packages:
```bash
pip install-r requirements.txt
```
Ensure that you have Python and pip installed on your system. If you encounter any issues during the installation, please check that your Python environment is correctly set up.
## Configuration
The script uses a JSON configuration file to determine which datasets are available for experiments. The structure of the configuration file should be as follows:
```json
{
"structured":[
{
"datasetName":"BA2Motif"
},
{
"datasetName":"MultiShape"
},
{
"datasetName":"MUTAG"
}
],
"text":[
{
"datasetName":"dblp",
"grouped_keyword_dir":"rawData/dblp/groups",
"entity_name":"author"
},
{
"datasetName":"imdb",
"grouped_keyword_dir":"rawData/imdb/groups",
"entity_name":"movie"
}
]
}
### 3. Environment Configuration
Create a `.env` file in the root directory and add your OpenAI API key:
```
OPEN_AI_API_KEY="your-api-key-here"
```
## How to Run
### 4. Data Setup
Create a `rawData` folder in the root directory and add the following datasets:
1. For DBLP dataset:
- Create a `dblp` folder inside `rawData`
- Download data from [DBLP Dataset](https://github.com/Jhy1993/HAN/tree/master/data/DBLP_four_area)
- Place all files in the `rawData/dblp` folder
2. For IMDB dataset:
- Create an `imdb` folder inside `rawData`
- Download data from [IMDB Dataset](https://github.com/Jhy1993/HAN/tree/master/data/imdb)
- Place all files in the `rawData/imdb` folder
```
### 5. Run Tests
Before running experiments, verify the setup by running tests:
```bash
pytest -v
```
1.**Setup the Configuration File**: Ensure your `config.json` file is set up as described in the Configuration section and is located in the same directory as your script, or provide the path to it when running the script.
## How to Run Experiments
2.**Running Experiments**:
- Use the command-line interface to specify the type of dataset and optionally target a specific dataset within that type.
- You can specify the dataset type (`structured` or `text`) and optionally target a specific dataset.
1. **Running Experiments**:
- Ensure your virtual environment is activated
- Use the command-line interface to specify the type of dataset and optionally target a specific dataset within that type
- You can specify various parameters to customize the experiment execution
### Command-Line Arguments
- `-c, --config`: Path to the configuration file. Defaults to `config.json`.
-`-t, --type`: Type of dataset to run the experiments on. Choices are `structured` or `text`. Defaults to `structured`.
- `-i, --iterations`: Number of times to run the experiment. Must be a positive integer. Defaults to 5.
- `-t, --type`: Type of dataset to run the experiments on. Choices are:
- `s` or `structured`: For structured datasets
- `t` or `text`: For text datasets
- If not specified, runs experiments on both types
- `-d, --dataset`: Specific dataset name to run. Optional.
- `-n, --num_groups`: List of group sizes to run experiments with. Space-separated integers. Optional.
- Default: [0, 5, 10, 15, 20, 25]
- `-l, --labels`: List of labels to run experiments with. Space-separated integers. Optional.
- `-b, --boolean_concepts`: Whether to create high-level concepts as boolean values.
- Must explicitly state 'true' or 'false'
- Defaults to 'true'
- `-e, --use_experimented_groups`: Flag to use experimented groups instead of creating new grouped keywords.
- No value needed, just include the flag to enable
- `-p, --penalty`: Set the penalty value for evolearner. Defaults to 1.
- `--title`: Title to append to the results folder name. Optional.
- Will be converted to lowercase and stripped of non-alphanumeric characters
- Added with an underscore prefix to the folder name
python main.py --type text --dataset dblp --use_experimented_groups --penalty 3
```
### Output
The results will be saved in a folder with a name based on the experiment parameters and any provided title. For example, if you run an experiment with the title "batch1", the results will be saved in a folder in the format "{{timestamp}}_batch1".
---
**Note**: Parts of the DiscriminativeExplainer and ConvertToOWL files were adapted and modified from the [PG-XGNN project](https://git.cs.uni-paderborn.de/pg-xgnn/pg-xgnn).