"C:\\Users\\Jayesh\\AppData\\Local\\Temp\\ipykernel_14208\\44085670.py:5: FutureWarning: DataFrame.groupby with axis=1 is deprecated. Do `frame.T.groupby(...)` without axis instead.\n",
"C:\\Users\\Jayesh\\AppData\\Local\\Temp\\ipykernel_13724\\44085670.py:5: FutureWarning: DataFrame.groupby with axis=1 is deprecated. Do `frame.T.groupby(...)` without axis instead.\n",
"Component_ (0.16) Technology_ (0.06) Connector_Data_ (0.05) Connector_ (0.05) Pattern_ (0.02) ProgrammingConcept_ (0.02) SoftwareArtifact_ (0.02) DigitalResource_ (0.02) Quality_Attribute_ (0.02) run (0.01) Requirement_ (0.01) state (0.01) start (0.01) time (0.01) need (0.01) use (0.01) restart (0.01) web (0.01) rest (0.01) provide (0.01)\n",
"Component_ (0.10) Pattern_ (0.06) Connector_ (0.06) Technology_ (0.05) SoftwareArtifact_ (0.04) Connector_Data_ (0.04) Requirement_ (0.03) ProgrammingConcept_ (0.02) DigitalResource_ (0.02) Quality_Attribute_ (0.02) info (0.01) use (0.01) timeline (0.01) need (0.01) time (0.01) rest (0.01) web (0.01) ui (0.01) provide (0.01) make (0.01)\n",
"\n",
"Topic #1: \n",
"Pattern_ (0.19) Connector_ (0.04) Component_ (0.03) DigitalResource_ (0.02) Requirement_ (0.02) info (0.02) SoftwareArtifact_ (0.02) configuration (0.02) scheduler (0.02) Technology_ (0.02) capacity (0.01) support (0.01) create (0.01) set (0.01) parent (0.01) Quality_Attribute_ (0.01) use (0.01) limit (0.01) federation (0.01) rest (0.01)\n",
"Pattern_ (0.08) Component_ (0.05) Requirement_ (0.02) DigitalResource_ (0.02) Dominant Topic (0.02) support (0.02) configuration (0.02) capacity (0.02) set (0.02) scheduler (0.01) use (0.01) Connector_ (0.01) cpu (0.01) different (0.01) property (0.01) type (0.01) parent (0.01) memory (0.01) Quality_Attribute_ (0.01) need (0.01)\n",
"\n",
"Topic #2: \n",
"Technology_ (0.10) SoftwareArtifact_ (0.04) Requirement_ (0.03) DigitalResource_ (0.03) Component_ (0.02) Connector_ (0.02) container (0.02) use (0.02) support (0.02) ProgrammingConcept_ (0.01) Quality_Attribute_ (0.01) type (0.01) need (0.01) work (0.01) provide (0.01) implementation (0.01) Connector_Data_ (0.01) local (0.01) launch (0.01) disk (0.01)\n",
" with open(\"E:\\DSSE\\DSSE-Group-7\\Assignment_2\\Week 2\\yarn_vectorizer.pkl\", \"rb\") as f:\n"
]
},
{
"name": "stdout",
"output_type": "stream",
...
...
@@ -1495,7 +1479,7 @@
},
{
"cell_type": "code",
"execution_count": 89,
"execution_count": 55,
"metadata": {},
"outputs": [
{
...
...
@@ -1512,7 +1496,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\Jayesh\\AppData\\Local\\Temp\\ipykernel_14208\\986355751.py:23: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n",
"C:\\Users\\Jayesh\\AppData\\Local\\Temp\\ipykernel_13724\\986355751.py:23: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`\n",
" dtm['Dominant Topic'] = dominant_topics\n"
]
}
...
...
@@ -1548,10 +1532,376 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": []
"source": [
"import pickle\n",
"\n",
"# load vectorizer\n",
"with open(\"E:\\DSSE\\DSSE-Group-7\\Assignment_2\\Week 2\\yarn_vectorizer.pkl\", \"rb\") as f:\n",
# rename the columns of the document-term matrix to the relevant class
dtm.rename(columns=ontology_dict,inplace=True)
# aggregate the document-term matrix by the relevant class
dtm=dtm.groupby(by=dtm.columns,axis=1).sum()
```
%% Output
C:\Users\Jayesh\AppData\Local\Temp\ipykernel_14208\44085670.py:5: FutureWarning: DataFrame.groupby with axis=1 is deprecated. Do `frame.T.groupby(...)` without axis instead.
C:\Users\Jayesh\AppData\Local\Temp\ipykernel_13724\44085670.py:5: FutureWarning: DataFrame.groupby with axis=1 is deprecated. Do `frame.T.groupby(...)` without axis instead.
Component_ (0.16) Technology_ (0.06) Connector_Data_ (0.05) Connector_ (0.05) Pattern_ (0.02) ProgrammingConcept_ (0.02) SoftwareArtifact_ (0.02) DigitalResource_ (0.02) Quality_Attribute_ (0.02) run (0.01) Requirement_ (0.01) state (0.01) start (0.01) time (0.01) need (0.01) use (0.01) restart (0.01) web (0.01) rest (0.01) provide (0.01)
Component_ (0.10) Pattern_ (0.06) Connector_ (0.06) Technology_ (0.05) SoftwareArtifact_ (0.04) Connector_Data_ (0.04) Requirement_ (0.03) ProgrammingConcept_ (0.02) DigitalResource_ (0.02) Quality_Attribute_ (0.02) info (0.01) use (0.01) timeline (0.01) need (0.01) time (0.01) rest (0.01) web (0.01) ui (0.01) provide (0.01) make (0.01)
Topic #1:
Pattern_ (0.19) Connector_ (0.04) Component_ (0.03) DigitalResource_ (0.02) Requirement_ (0.02) info (0.02) SoftwareArtifact_ (0.02) configuration (0.02) scheduler (0.02) Technology_ (0.02) capacity (0.01) support (0.01) create (0.01) set (0.01) parent (0.01) Quality_Attribute_ (0.01) use (0.01) limit (0.01) federation (0.01) rest (0.01)
Pattern_ (0.08) Component_ (0.05) Requirement_ (0.02) DigitalResource_ (0.02) Dominant Topic (0.02) support (0.02) configuration (0.02) capacity (0.02) set (0.02) scheduler (0.01) use (0.01) Connector_ (0.01) cpu (0.01) different (0.01) property (0.01) type (0.01) parent (0.01) memory (0.01) Quality_Attribute_ (0.01) need (0.01)
Topic #2:
Technology_ (0.10) SoftwareArtifact_ (0.04) Requirement_ (0.03) DigitalResource_ (0.03) Component_ (0.02) Connector_ (0.02) container (0.02) use (0.02) support (0.02) ProgrammingConcept_ (0.01) Quality_Attribute_ (0.01) type (0.01) need (0.01) work (0.01) provide (0.01) implementation (0.01) Connector_Data_ (0.01) local (0.01) launch (0.01) disk (0.01)
C:\Users\Jayesh\AppData\Local\Temp\ipykernel_14208\986355751.py:23: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
C:\Users\Jayesh\AppData\Local\Temp\ipykernel_13724\986355751.py:23: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`