Files
literature-review/report.tex
2024-10-16 12:15:01 +03:00

231 lines
17 KiB
TeX
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

\documentclass[a4paper, final]{article}
%\usepackage{literat} % Нормальные шрифты
\usepackage[14pt]{extsizes} % для того чтобы задать нестандартный 14-ый размер шрифта
\usepackage{tabularx}
\usepackage[T2A]{fontenc}
\usepackage[utf8]{inputenc}
% \usepackage[russian]{babel}
\usepackage{amsmath}
\usepackage[left=25mm, top=20mm, right=20mm, bottom=20mm, footskip=10mm]{geometry}
\usepackage{ragged2e} %для растягивания по ширине
\usepackage{setspace} %для межстрочного интервала
\usepackage{moreverb} %для работы с листингами
\usepackage{indentfirst} % для абзацного отступа
\usepackage{moreverb} %для печати в листинге исходного кода программ
\usepackage{graphicx}
\usepackage{array}
\usepackage{multirow}
\renewcommand\verbatimtabsize{4\relax}
\renewcommand\listingoffset{0.2em} %отступ от номеров строк в листинге
\renewcommand{\arraystretch}{1.4} % изменяю высоту строки в таблице
\usepackage[font=small, singlelinecheck=false, justification=centering, format=plain, labelsep=period]{caption} %для настройки заголовка таблицы
\usepackage{listings} %листинги
\usepackage{xcolor} % цвета
\usepackage{hyperref}% для гиперссылок
\usepackage{enumitem} %для перечислений
\newtheorem{theorem}{Теорема} % Создание нового окружения для теорем
\setlist[enumerate,itemize]{leftmargin=1.2cm} %отступ в перечислениях
\hypersetup{colorlinks,
allcolors=[RGB]{010 090 200}} %красивые гиперссылки (не красные)
% подгружаемые языки — подробнее в документации listings (это всё для листингов)
\lstloadlanguages{ C++}
% включаем кириллицу и добавляем кое−какие опции
\lstset{tabsize=2,
breaklines,
basicstyle=\footnotesize,
columns=fullflexible,
flexiblecolumns,
numbers=left,
numberstyle={\footnotesize},
keywordstyle=\color{blue},
inputencoding=cp1251,
extendedchars=true
}
\lstdefinelanguage{MyC}{
language=C++,
% ndkeywordstyle=\color{darkgray}\bfseries,
% identifierstyle=\color{black},
% morecomment=[n]{/**}{*/},
% commentstyle=\color{blue}\ttfamily,
% stringstyle=\color{red}\ttfamily,
% morestring=[b]",
% showstringspaces=false,
% morecomment=[l][\color{gray}]{//},
keepspaces=true,
escapechar=\%,
texcl=true
}
\textheight=24cm % высота текста
\textwidth=16cm % ширина текста
\oddsidemargin=0pt % отступ от левого края
\topmargin=-1.5cm % отступ от верхнего края
\parindent=24pt % абзацный отступ
\parskip=5pt % интервал между абзацами
\tolerance=2000 % терпимость к "жидким" строкам
\flushbottom % выравнивание высоты страниц
% Настройка листингов
\lstset{
language=C++,
extendedchars=\true,
inputencoding=utf8,
keepspaces=true,
% captionpos=b,
}
\begin{document} % начало документа
% НАЧАЛО ТИТУЛЬНОГО ЛИСТА
\begin{center}
\hfill \break
\hfill \break
\normalsize{MINISTRY OF SCIENCE AND HIGHER EDUCATION OF THE RUSSIAN FEDERATION\\
Federal State Autonomous Educational Institution of Higher Education Peter the Great St. Petersburg Polytechnic University\\[10pt]}
\normalsize{Institute of Computer Science and Cybersecurity}\\[10pt]
\normalsize{Higher School of Artificial Intelligence Technology}\\[10pt]
\normalsize{Direction 02.03.01 Mathematics and computer Science}\\
\hfill \break
\hfill \break
\hfill \break
\hfill \break
\large{\textbf{Literature Review}}\\
\large{\textit{Machine learning approaches for assessing drug resistance in cancer treatment}}\\
\hfill \break
\hfill \break
\end{center}
\small{
\begin{tabular}{lrrl}
\!\!\!Student, & \hspace{2cm} & & \\
\!\!\!group 5130201/20102 & \hspace{2cm} & \underline{\hspace{3cm}} &Tishenko А. А. \\\\
\!\!\!Supervisor, Ph. D. & \hspace{2cm} & \underline{\hspace{3cm}} & Motorin D. E. \\\\
&&\hspace{4cm}
\end{tabular}
\begin{flushright}
<<\underline{\hspace{1cm}}>>\underline{\hspace{2.5cm}} 2024г.
\end{flushright}
}
\hfill \break
% \hfill \break
\begin{center} \small{Saint-Petersburg, 2024} \end{center}
\thispagestyle{empty} % выключаем отображение номера для этой страницы
% КОНЕЦ ТИТУЛЬНОГО ЛИСТА
\newpage
% \tableofcontents
% \newpage
\section*{Introduction}
\addcontentsline{toc}{section}{Introduction}
Progress has been made in chemotherapy drugs, but drug resistance remains a major challenge in cancer treatment and the main cause of cancer progression and even death. However, there are no clear indicators for predicting the risk of drug resistance in patients. Existing drug sensitivity assessment methods has limitations such as low modeling success rates, high cost, and time-consuming process. Machine learning is both an expanding and evolving field of computing, and it seems that it can significantly help in solving chemotherapy resistance problem. Here we provide an overview of how different studies apply machine learning algorithms to predict and understand chemotherapy resistance in various cancer types. Also we consider the strengths and limitations of each approach and discuss obtained results.
\newpage
\section{Machine learning and chemotherapy resistance}
Machine learning has been widely applied to various classification, regression, feature extraction and many other problems in the field of biology and medicine. The field of cancer treatment has also not been left aside, in particular, machine learning has recently been actively used in research related to the problem of cancer cell chemotherapy resistance.
Authors of~\cite{paclitaxel} applied and compared five different machine learning algorithms to classify cancer cells based on their level of drug resistance. They extracted 112 morphological features from dataset of nearly 3000 single-cell quantitative phase images of epithelial ovarian cancer (EOC) cells. After that, authors employed five supervised machine learning algorithms, Tree, Naive Bayes, K-nearest neighbors (KNN), support vector machine (SVM), and neural network (NN), to perform multi-classification on four types of drug-resistant cancer cells. The optimal classification algorithm was determined by comparing the classification testing accuracy for each cell type and the confusion matrix. The chosen trained model was then used for further interpretable analysis.
Another study aims to evaluate the potential of mitochondria-related chemoradiotherapy (CRT) resistance (MRCRTR) genes in predicting esophageal cancer prognosis using machine learning \cite{mitochondria}. Authors used machine learning algorithms for both classification and regression tasks. For classification they applied seven algorithms: generalized linear model (GLM), K-nearest neighbor (KNN), least absolute shrinkage and selection operator (LASSO) regression, neural network (NN), random forest (RF), support vector machine (SVM), extreme gradient
boosting (XGB). They applied those algorithms to pretty similiar task as in~\cite{paclitaxel}, but in this paper authors identified only two classes -- CRT response and CRT non-response. The authors did not stop at classification alone, but also trained 10 machine learning algorithms, including random survival forest (RSF), elastic network (Enet), LASSO, ridge, stepwise Cox, Coxboost, partial least squares regression for Cox (plsRcox), supervised principal components (SuperPC), generalized boosted regression modeling (GBM), and survival support vector machine (survival-SVM), to build consensus prognostic model to predict MRCRTR score. Using the leave-one-out cross-validation (LOOCV) framework, a total of 101 algorithm combinations were applied to match prognostic models.
Machine learning algorithms also was successfully applied for same classification task as in~\cite{paclitaxel} and~\cite{mitochondria} by authors of~\cite{sers}. They employed robust machine learning algorithm based on principal component analysis and linear discriminant analysis (PCA-LDA) to extract the feature of blood-SERS data and establish an effective predictive model for identifying the radiotherapy resistance subjects from sensitivity ones, and for identifying the nasopharyngeal cancer (NPC) subjects from healthy ones.
The authors of article~\cite{heterogeneity} chose a different approach by applying machine learning algorithms from the specialized software CellProfiler~\cite{cellprofile} to extract quantitative image features. They subsequently used bioinformatics analysis to explore the relationship between these features of intra-tumor heterogeneity (ITH) and drug resistance. Notably, the authors did not aim to train new models but instead utilized pre-trained algorithms from CellProfiler. Unlike studies \cite{paclitaxel}, \cite{mitochondria}, and \cite{sers}, where algorithms were employed for regression and classification tasks, this research focused specifically on extracting quantitative features from images. Based on CellProfiler, the authors constructed a pipeline for the extraction and analysis of these features, which enabled them to draw conclusions regarding the connection between these features and drug resistance in cancer cells.
\section{Feature analysis}
\section{Datasets}
\section{Results}
\begin{table}[h!]
\centering
\caption{Methods used in research papers.}
\footnotesize
\begin{tabularx}{\textwidth}{|X|p{2cm}|X|X|X|}
\hline
\textbf{Article} & \textbf{Cancer type} & \textbf{Machine learning algorithms} & \textbf{Datasets} & \textbf{Feature importance analysis} \\
\hline
Classification of paclitaxel-resistant ovarian cancer cells using holographic flow cytometry through interpretable machine learning~\cite{paclitaxel} & Epithelial ovarian cancer (EOC) & Tree, Naive Bayes, K-nearest neighbors
(KNN), support vector machine (SVM), and neural network (NN) & Self-produced dataset of 2998 quantitative phase images (QPIs) of EOC cells & SHapley Additive
exPlanations (SHAP), Pearson coefficient, Kruskal-Wallis test \\
\hline
Heterogeneity of computational pathomic signature predicts drug resistance and intra-tumor heterogeneity of ovarian cancer~\cite{heterogeneity} & Epithelial ovarian cancer (EOC) & CellProfiler~\cite{cellprofile}, least absolute shrinkage and selection operator (LASSO) regression & 494 ovarian and 70 paracarcinoma tissues images from The Cancer Genome Atlas (TCGA) database~\cite{tcga} & Statistical analysis using R~\cite{r-lang}. Various visualizations, including heatmaps, Venn diagrams, ROC curves, and survival curves. \\
\hline
Mitochondria-related chemoradiotherapy resistance genes-based machine learning model associated with immune cell infiltration on the prognosis of esophageal cancer and its value in pan-cancer~\cite{mitochondria} & Esophageal cancer & Generalized linear model (GLM), K-nearest neighbor (KNN), least absolute shrinkage and selection operator (LASSO) regression, neural network (NN), random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB) & Nearly 500 tissue samples, RNA-sequences and some other clinical data from Gene Expression Omnibus (GEO) database~\cite{geo}, information on 183 esophageal cancer patients from The Cancer Genome Atlas (TCGA) database~\cite{tcga} & Statistical analysis using DALEX package~\cite{dalex} for~R~\cite{r-lang} \\
\hline
Molecular separation-assisted label-free SERS combined with machine learning for nasopharyngeal cancer screening and radiotherapy resistance prediction~\cite{sers} & Nasopharyng-eal cancer & Principal component analysis and linear discriminant analysis (PCA-LDA) & Self-produced dataset of 120 plasma samples, 60 of which from healthy volunteers, 30 from radiotherapy sensitivity patients and 30 from radiotherapy resistance patients & - \\
\hline
\end{tabularx}
\end{table}
\newpage
\begin{table}[h!]
\centering
\caption{Results obtained in research papers.}
\footnotesize
\begin{tabularx}{\textwidth}{|X|X|X|X|}
\hline
\textbf{Article} & \textbf{Key results} & \textbf{Best algorithms} & \textbf{Metrics} \\
\hline
Classification of paclitaxel-resistant ovarian cancer cells using holographic flow cytometry through interpretable machine learning~\cite{paclitaxel} & Demonstrated that morphological changes in epithelial ovarian cancer (EOC) cells correlate with drug sensitivity, highlighting the potential for monitoring drug resistance.
& Support vector machine (SVM) and neural network (NN) & Accuracy of 94.5\% for SVM and 93.4\% for NN \\
\hline
Heterogeneity of computational pathomic signature predicts drug resistance and intra-tumor heterogeneity of ovarian cancer~\cite{heterogeneity} & Demonstrated a strong correlation between intra-tumor heterogeneity (ITH) and drug resistance in epithelial ovarian cancer (EOC) cells & Least absolute shrinkage and selection operator (LASSO) regression & Area under curve (AUC) of 0.601, 0.594, and 0.589 for 1, 3, and 5 years survival time accordingly \\
\hline
Mitochondria-related chemoradiotherapy resistance genes-based machine learning model associated with immune cell infiltration on the prognosis of esophageal cancer and its value in pan-cancer~\cite{mitochondria} & Proposed a model that incorporates mitochondria-related chemoradiotherapy resistance (MRCRTR) genes. Identified six mitochondria-related genes that affect CRT and the prognosis of esophageal cancer. & Neural network (NN) and least absolute shrinkage and selection operator (LASSO) regression & Root mean squared error (RMSE) of 0.001 for NN and 0.003 for LASSO \\
\hline
Molecular separation-assisted label-free SERS combined with machine learning for nasopharyngeal cancer screening and radiotherapy resistance prediction~\cite{sers} & Developed a novel approach using label-free surface-enhanced Raman spectroscopy (SERS) to profile molecular patterns in the blood of nasopharyngeal cancer (NPC) patients, distinguishing those with radiotherapy sensitivity from those with resistance & Principal component analysis and linear discriminant analysis (PCA-LDA) & Accuracy of 96.7\% for identifying radiotherapy resistance subjects from sensitivity ones and 100\% for identifying the nasopharyngeal cancer (NPC) subjects from healthy ones \\
\hline
\end{tabularx}
\end{table}
% \section*{Conclusion}
% \addcontentsline{toc}{section}{Conclusion}
% Conclusion text
\newpage
\phantom{text}
\newpage
\phantom{text}
\newpage
% \section*{Literature}
% \addcontentsline{toc}{section}{Literature}
\vspace{-1.5cm}
\begin{thebibliography}{0}
\bibitem{paclitaxel}
Lu Xin, Wen Xiao, Huanzhi Zhang, Yakun Liu, Xiaoping Li, Pietro Ferraro, Feng Pan, Classification of paclitaxel-resistant ovarian cancer cells using holographic flow cytometry through interpretable machine learning, 2024.
\bibitem{heterogeneity}
Qiuli Zhu, Hua Dai, Feng Qiu, Weiming Lou, Xin Wang, Libin Deng, Chao Shi, Heterogeneity of computational pathomic signature predicts drug resistance and intra-tumor heterogeneity of ovarian cancer, 2024.
\bibitem{mitochondria}
Ziyu Liu, Zahra Zeinalzadeh, Tao Huang, Yingying Han, Lushan Peng, Dan Wang, Zongjiang Zhou, DIABATE Ousmane, Junpu Wang, Mitochondria-related chemoradiotherapy resistance genes-based machine learning model associated with immune cell infiltration on the prognosis of esophageal cancer and its value in pan-cancer, 2024.
\bibitem{sers}
Jun Zhang, Youliang Weng, Yi Liu, Nan Wang, Shangyuan Feng, Sufang Qiu, Duo Lin, Molecular separation-assisted label-free SERS combined with machine learning for nasopharyngeal cancer screening and radiotherapy resistance prediction, 2024.
\bibitem{cellprofile}
T. Misteli, C. McQuin, A. Goodman, V. Chernyshev, L. Kamentsky, B.A. Cimini, et al., CellProfiler 3.0: next-generation image processing for biology, 2018.
\bibitem{tcga}
The Cancer Genome Atlas (TCGA) database. Available at \url{https://www.cancer.gov/ccg/research/genome-sequencing/tcga}. Accessed October 8, 2024.
\bibitem{geo}
Gene Expression Omnibus (GEO) database. Available at \url{https://www.ncbi.nlm.nih.gov/geo/}. Accessed October 8, 2024.
\bibitem{r-lang}
The R Project for Statistical Computing. Available at \url{https://www.r-project.org/}. Accessed October 8, 2024.
\bibitem{dalex}
DALEX: explainers for complex predictive models, Przemyslaw Biecek, 2018.
\end{thebibliography}
\end{document}