METHODOLOGICAL ASPECTS OF IDENTIFICATION OF TISSUE-SPECIFIC PROTEINS AND PEPTIDES FORMING THE CORRECTIVE PROPERTIES OF INNOVATIVE MEAT PRODUCTS

O aeoJtheways toaddress the Ioodpaahty Yssaes facingtheindastryis thedevelopmentofstandardizedandcertifiedmethods relatedto thecondact of in-depthstadiesofbiochemicaUndica.torsofpaahtya.ndsa|etyofmeata.ndmeat prodacts. k e w o rtd laboratory practtcetnthe Jle ld o f foodphtya.ndsajetyshowsaconsta.ntexpa.nsionofthe hstofcontrolledtndcato o j Jood r0^a(^ts.^nta^p0rtaY^t featьfre(фthemodernperiodгnthedevelopmentoJbгomedгcala.ndbгotechnologгcal ^ أ raw m^t(^rtatsand resea.tchistheibatrodacti(m(Jawholec(iplex(Jpostgen(iictechnologies,whicha.rebasedonasystematicapproachtothestiuly ДЛЯ ЦИТИРОВАНИЯ: Вострикова Н.Л., Чернуха И.М., Хвостов ДВ. Методологические аспекты идентификации тканеспецифичных белков и пеп­ тидов, формирующих корригирующие свойства инновационных мясных продуктов. Теория и практика переработки мяса. 2018;3(3): 36-55. DOI 10.21323/2414-438X-2018-3-3-36-55 FOR CITATION: Vostrikova N.L., Chernukha I.M., Khvostov D.V. Methodological aspects of identification of tissue-specific proteins and pep­ tides forming the corrective properties of innovative meat products. Theory and practice of meat processing. 2018;3(3): 36-55 (In Russ.). DOI 10.21323/2414-438X-2018-3-3-36-55

Like other biological macromolecules, such as polysac charides and nucleic acids, proteins are crucial parts o f organisms and participate almost in all processes in cells.M an y proteins are enzymes that catalyze biochem ical reac tions and are essential for metabolism.Proteins also have the structural and m echanical functions: actin and m yo sin in muscles, proteins in the cytoskeleton, w hich form the system o f m icrofilam ents for maintenance o f the cell shape.O ther proteins play an im portant role in cell sig naling, im m une reactions, cell adhesion and the cell cycle.Proteins are necessary in animals' diets to provide the es sential am ino acids, w h ich cannot be synthesized.Diges tion disintegrates proteins to use in m etabolism [6].
To perform their function, proteins interact w ith other substances on the m olecular or io n ic level, or w ith other proteins.Proteins interact in different contexts and w ith different results.Several proteins activate or deactivate other proteins by bonding w ith them or by their (de) phos phorylation.In the process o f phosphorylation, the phos phate group is added (or removed) from a protein, w hich activates or deactivates a protein.O ther proteins are linked w ith each other form ing the so-called protein complexes.They play im portant roles in the whole cell, for example, in D N A replication.Other class o f proteins is lin ke d w ith each other to form structural complexes, w hich give a cell their three-dim ensional structure.The understanding o f these m ain interactions has a vital im portance for the u n derstanding o f the internal processes o f the vital activity o f an organism.
Tissue-specific gene expression can lead to existence or absence o f certain protein interactions and complexes resulting in deep functional differences o f biological p ro cesses in tissues [7].

Introduction
Proteins are large biom olecules or m acrom olecules that consist o f one or several long chains o f am ino acid residues.Proteins have a w ide spectrum o f functions in side organism s, in clu d in g catalyzing m etabolic reactions, D N A replication, a response to stim u li and transport of m olecules from one place to another.Proteins are different from each other, prim arily, by the am ino acid sequence, w h ich is determ ined by the nucleotide sequence o f their genes.A s a rule, this leads to protein fo ld in g into a spe cific three-dim ensional structure, w h ich determ ines its activity [1].A linear chain o f am ino acid residues is called a polypeptide.A protein contains, at least, one long p o ly peptide.Short polypeptides that contain less than 20-30 residues are seldom recognized as proteins and usually called peptides or som etim es oligopeptides.The in d iv id ual am ino acid residues are lin ke d together by the peptide bonds and adjacent am ino acid residues.The sequence o f am ino acid residues in a protein is determ ined by a sequence o f a gene that is encoded in the genetic code.In general, the genetic code specifies 20 standard am ino ac ids; however, in some organism s the genetic code can in clude selenocysteine and in some organism s pyrrolysine.Shortly after or even du rin g synthesis, post-translational m odifications o f the protein residues often occur, w hich change the physical and chem ical properties, folding, sta bility, activity and, finally, the protein fu n ctio n [2].Som e tim es proteins contain non-peptide groups, w h ich can be called prosthetic groups or co-factors.Proteins can also act together achieving a certain fu n ctio n and they are al ways associated w ith the developm ent o f stable protein complexes [3,4].
A fte r form ation, proteins exist o n ly for a particu lar perio d o f tim e, then they are degraded and processed by a cellular apparatus though a process o f the protein cycle.The lifespan o f proteins is measured by the pe rio d o f their half-decay and com prises a w ide range.They can exist for There are experim ental methods (for example, for de tection, extraction and purification o f proteins, as w ell as for deciphering the protein structure and functions, p u ri fication o f the in itia l protein is often required).Com puta tional methods usually use com puter programs for protein analysis.However, m any experim ental methods (for exam ple, mass-spectrometry) require com putational analysis o f raw data.

Genetic methods
Experim ental analysis usually requires protein expres sion and purification.Expression is achieved by m anip u lating D N A , w h ich encodes a protein o f interest.Therefore, protein analysis usually requires the D N A m ethods espe cially cloning.Several proteins have never been directly se quenced; however, by translating codons from the know n m R N A sequences into am ino acids by the m ethod know n as «conceptual translation», the m ajority o f protein m o l ecules were analyzed.Site-directed mutagenesis differen tia lly introduces m utations that change the protein struc ture.A function o f protein parts can be better understood by analyzing changes in a phenotype as a result o f this impact.Fusion proteins are obtained by inserting protein tags, such as His-tag to obtain a m odified protein, w hich is easier to trace.A n example o f this can be Sn f2 H -G F P labeled complexes (chrom atin-rem odeling), w h ich consist o f protein lin ked w ith the green fluorescent label [14].
Analysis o f D N A alleles can be identified as associated w ith diseases, for example, when M endelian traits (or between a trait and a marker, or two markers) [15].
There are also other possibilities.For example, im m unohistochem istry usually uses an antibody to one or sever al proteins o f interest, w hich are conjugated w ith enzymes giving either lum inescent or chrom ogenic signals that can be compared between samples allow ing obtaining in fo r m ation about localization.A no th er useful m ethod is co fractionation in gradients o f sucrose (or other substance) w ith the use o f isopycnic centrifugation [16].
Through another application in the field o f genetic engineering, know n as site-directed mutagenesis, the re searchers can change the protein sequence and, therefore, its structure, cellular localization and susceptibility to regulation.This m ethod even allows incorporation o f n o n natural am ino acids into proteins using m odified tR N A [17] and rational design o f new proteins w ith new proper ties [18].
The understanding o f the character o f these interac tions is im portant for studying how and w hy cells function.The interactions between all proteins in a cell comprise the so-called protein-protein interactions (PPIs) o f a network.N o t all proteins are present in all cells and tissue types; therefore, protein interactions are lim ited by cellular and tissue types, where both interacting proteins exist.These tissue-dependent interactions form tissue-specific PPIs (TSPPIs) [8].
To this end, the m ethodological aspects o f analysis are studied and effective algorithm s, w hich show the m ost sig nificant results, are chosen or developed.In addition, the m ain properties o f TSPPIs are studied to gain an insight about the structure o f interactions and properties o f par ticular protein groups.Finally, the algorithm s o f clusteriza tion are used to reveal tissue-specific functional modules in the fram ew ork o f TSPPIs [9].
It is particularly interesting to study the form ation mechanisms o f substances o f protein and peptide nature that determine bio-corrective and quality characteristics of the pathways o f their biosynthesis [10].
M ore and more attention is paid to bio-inform atics as an instrum ent for studying proteome from the view point o f the hypothetical presence o f certain biolo g ically active peptides and m arker proteins.A ccordin g to the w orld da tabases, the muscle proteins o f productive anim als contain am ino acid sequences that have various biological proper ties [11].

Approaches to studying proteins
The protein activity and structures can be studied in vitro, in vivo and in silico.In vitro investigations o f p u ri fied proteins in the controlled environm ent are useful for studying how a protein executes its function: for example, studies o f enzyme kinetics reveal the chem ical m echanism o f the catalytic activity o f an enzyme and its relative affinity to different possible substrate molecules.O n the contrary, in vivo experiments can give inform ation about the p hysi ological role o f proteins in the context o f a cell or even the whole organism.W h en using in silico investigations, the com putational methods are used to study proteins.
In vivo investigations o f proteins are often connected w ith protein synthesis and localization in a cell.The spe cifics o f how proteins are targeted to specific organelles or cellular structures is often unclear, although it is know n about m any intracellular proteins synthesized in the cyto plasm and m em brane-bound or secreted proteins in the endoplasm ic reticulum [4].A useful m ethod for assessing cellular localization applies genetic engineering for expres sion in a cell o f a fused protein or chim era that consists o f the natural protein o f interest lin ked w ith a «reporter», such as the green fluorescent protein (G FP) [12].The fused protein location in a cell can be clearly and effectively v isu alized by m icroscopy [13].
the final possible protein conformations are determined by solving the distance geometry problem.D ual-polarization interferom etry is a quantitative analytical m ethod for mea suring general conform ation o f proteins and conform ation al changes due to interactions or another stimulus.C ircu la r dichroism is the other laboratory m ethod for detection o f the internal в-sheet/a-helix protein composition.Electron cryom icroscopy is used for gaining structural inform ation o f lower resolution about very large protein complexes including assembled viruses, a variant know n as electron crystallography can also give inform ation w ith high resolu tion in some cases, especially, for tw o-dim ensional crystals o f membrane proteins.Determ ined protein structures are usually deposited in the ProteinData Bank (PDB), a freely accessible resource, from w hich structural data about thou sands o f proteins can be obtained as Cartesian coordinates for each atom in a protein.
M ethods for pre d ictio n o f a protein structure thrive to provide a way o f creating credible structure for p ro teins, w hich structures are not determ ined experim en ta lly [22,23].

Chromatographic methods
The most universal m ethod for detection o f peptide and protein p u rity is reversed-phase chromatography (R P -H P L C ) based on the use o f a broad spectrum o f hydro phobic stationary phases [24].To separate the components o f the pharm aceutical preparations o f peptide and protein nature, the reversed phases are most often employed w ith the use o f the silica gel bonded to C8 and C18 alkyl groups w ith the average pore size o f 300 A. In R P -H P L C , elution o f the pharm aceutical preparations o f the peptide and p ro tein nature or their fragments obtained as a result o f the enzymatic degradation is usually perform ed in the mode o f gradient elution at p H = 2 -3 and a room temperature as partial or complete protein denaturation is possible when a temperature is increased, w hich also affects the chrom ato graphic m o bility [25,26].
The understanding o f the m echanism o f the polypep tide interaction w ith the reversed phase surface is im p o r tant for the understanding o f their separation in R P -H P L C .Separation o f sm all molecules is connected w ith their con tinual partition ing between the m obile phase and the hy-

Proteomics
The m ain experimental methods o f proteom ics are tw o dim ensional electrophoresis [19], w hich makes it possible to separate a great quantity o f proteins, mass-spectrom etry, w hich allows rapid protein identification and highthroughput peptide sequencing (most often after digestion in a gel), protein microarrays, w hich enable detection o f relative levels o f many proteins that are present in a cell, and tw o-hybrid screening, w hich allows a systematic study o f protein-protein interactions.The systematic endeavor to detect the protein structure is know n as structural proteomics [20,21].

BioittJormatics
Fo r analysis o f the structure, functions and evolutions o f proteins, a wide spectrum o f com putational methods was created.
The development o f such instrum ents was conditioned by a large num ber o f genom ic and proteom ic data available for different organisms in cluding the hum an genome.It is im possible to study all proteins experimentally; thus, only several o f them are subjected to laboratory experiments, and com putational tools are used for extrapolation o f sim i lar proteins.These hom ologous proteins can be effectively identified in distant relatives by a sequence alignment.G e nom ic and gene sequences can be different tools for certain traits.Sequence profiling tools can find sites o f restriction enzymes and predict secondary structures.Phylogenetic trees can be built, and evolution hypotheses are developed using the special software, such as ClustalW , regarding the o rigin o f m odern organisms and their expressed genes.The bioinform atics field is now irreplaceable for the gene and protein analysis.The «hydrophobic foot» of the polypeptide adsorbs to the hydrophobic surface of the reversed-phase material where it remains until the concentration of the organic modifier increases to the critical concentration [27] using the same prin cip le o f separation twice or by com bi nation o f two different principles o f M S separation.Both M A L D I and electrospray can be coupled to any o f these three methods o f separation.The fact that M A L D I p ro duces short bursts o f ions in the vacuum, and electrospray creates a continuous beam o f ions in atmosphere, as a rule, leads to coupling M A L D I to T O F M S and electrospray to quadrupole and io n trap [29].

Ldenti^cation ‫أ0‬ peptides and proteins
One o f the m ain tasks o f quality assessment o f phar m aceutical preparations having the peptide and protein structure is protein identification.Proteins are identified, mostly, by their am ino acid sequence.There are three ap proaches to mass spectrom etric analysis o f peptides and proteins.
The m ost com m on m ethod is still «bottom-up» proteomics, when a protein is digested into peptides, w hich can be effectively analyzed using a broad spectrum o f L C -M S or M A L D I -T O F -M S instruments.Sample preparation for an experiment by the «bottom-up» m ethod requires several laborious stages o f transition o f proteins to the peptide level.The m ost im portant step in these approaches is protein digestion, w hich is always a bottleneck as it re quires a lo t o f time.Therefore, a significant increase in the throughput can be achieved by acceleration o f the diges tion process.M o d e rn m ethods allow reducing digestion duration from overnight incubation (~ 15 hours) to m in utes or even seconds.This achievement also makes pos sible integration into on-line systems, thereby, reducing the num ber o f tiresome steps for sample processing and a risk o f sample loss [31].This section gives an overview o f the available digestion strategies and recent changes in ac celeration o f the digestion process.
Protein digestion (both enzym atic and non-enzym atic) is an im portant and (almost) irreplaceable tool for id en tification, characterization and quantitative assessment o f proteins using proteom ics strategy [32].
Proteom ics plays a crucial role in the m ain fields o f investigations, in clud ing detection o f disease biom arkers and biosystems and, as such, significantly contributes to the understanding o f the vital biological processes [33].
The choice o f proteom ics approach should be based on a type o f the question, w hich is to be answered.G lo b al proteomics, for example, fin din g a biom arker in a very com plex sample, deals w ith the identification o f an often lowabundant unknow n protein, w hich is present in a com plex sample w ith a broad dynam ic range o f proteins.To this end, it is necessary to use a completely different approach than targeted protein analysis, such as, characteriza tion o f the post-translational m odification (P T M ) states, w hich requires a detailed study and complete m apping o f a know n protein sequence for localization o f the possibly low-abundant post-translational m odifications (PTM s).In addition, protein (classes) w ith certain characteristics may need special preparation.drophobic stationary phase.However, polypeptides are too large for partition into the hydrophobic phase; they adsorb to the hydrophobic surface after they enter the colum n and rem ain adsorbed u n til the concentration o f the organic m odifier reaches the critical concentration necessary for desorption (Figure 1).Then, they desorb and, slightly in teracting w ith the surface, elute dow n the colum n [27].
Polypeptides can be regarded as «sitting» on the sta tionary phase; w ith that, m ost o f the m olecule is exposed to the m obile phase and o nly a part o f the m olecule (the «hydrophobic foot») is in contact w ith the reversed-phase surface.R P -H P L C separates polypeptides based on the slight differences in the «hydrophobic foot» o f the p o ly peptide being separated.The differences in the «hydrophobic foot» are conditioned by the differences in am ino acid sequences as w ell as in conform ation [27].

Mass spectrometry methods
A t present, high performance liq u id chromatography coupled w ith mass spectrom etry (M S) is one o f the m ost com m on methods for protein and peptide analysis [28,29].
Recent advances illustrate the role o f proteom ics based on the mass spectrom etry as an irreplaceable instrum ent for systems o f m olecular and cellular biology, as w ell as for new fields.They include analysis o f the protein-protein in teractions through affinity-based isolations on a sm all and widespread scale, m apping o f m ultiple organelles, sim u l taneous description o f the genome o f the m alaria parasite and generation o f the quantitative protein profiles in d if ferent species.It can be expected that the ability o f mass spectrom etry to id en tify and increasingly often to quantify thousands o f proteins in com plex samples can profoundly affect biology and medicine.
M ass-spectrom etric measurements are carried out in the gas phase on the charged ions.By definition, a mass spectrometer consists o f a io n source, a mass-analyzer, w hich measures the mass-to-charge ratio (m/z) o f the io n ized analytes, and a detector, w hich records the num ber o f ions at each m /z value.
A t present, in proteom ic investigations, four m ain types o f mass analyzers are used.These are the ion-trap, tim eof-flight (TO F), quadrupole and Fourier transform ion cyclotron (FT-M S) analyzers.They differ from each other in design and performance and each o f them has its own strengths and weaknesses.These analyzers can be auto nom ic or, in some cases, com bined in tandem to take ad vantage o f each o f them [30].
To achieve mass separation, three different principles can be used: separation based on tim e o f flight (T O F M S), separation by quadrupole electric fields generated by metal rods (quadrupole M S ) or separation by selective ejection o f ions from a three-dim ensional trapping field (the ion trap or Fourier transform io n cyclotron resonance).For structural analysis, such as peptide sequencing, two stages o f mass spectrom etry are carried out in tandem (tandem mass spectrom etry or M S / M S), w hich can be achieved chrom atography coupled to electrospray io n iza tio n M S (L C -E S I-M S ) or m atrix-assisted laser desorption/ionization tim e-of-fligh t m ass-spectrom etry ( M A L D I -T O F M S).Protein id en tificatio n is carried out based on pep tide mass fingerprints or peptide sequences [34].Peptide sequencing can be successfully perform ed by co llisio ninduced dissociation (C ID ) in m ore available ESI-ion trap and E S I-Q T O F m ass-spectrom etry or M A L D I -T O F / T O F equipm ent [30].
Digestion o f the com plex protein sample, for example, the whole proteome using the «bottom-up» approach gives a large num ber o f peptides and even more that can be ana lyzed by the most pow erful instrument.The com plexity o f the peptide set can be reduced w ithout loss o f the in fo rm a tion content due to production o f lower quantity o f larger peptides.This is the so-called «middle-down» proteom ic approach, w hich combines the best of the «top-down» and «bottom-down» approaches using the advantages o f the im proved M S m easuring equipment and availability o f the fragmentation m ethods based on electrons [42], m aintain ing the same sensitivity level associated w ith the peptide analysis [37].Peptides o f the m iddle range (~ 3000-20000 Da) show im proved separation by L C and after ESI carry a higher num ber o f charges, w hich enhances fragmenta tion by C ID [43], E T D [44] or E C D [45] in O rbitrap M S, quadrupole-linear io n trap (QTrap) M S and quadrupole-F T IC R M S instruments.Com pared to sm aller peptides, more reliable peptide identification is obtained resulting in an im provem ent o f protein sequence coverage and P T M identification [46].
The predictable nature o f peptide fragm entation allows com paring the experim ental M S / M S spectra w ith pre dicted spectra in in silico o f know n protein sequences for protein id en tificatio n [47].In this regard, the advanced tools o f bioin fo rm atics play a key role in all proteom ics strategies.However, these tools are based on an assum p tio n that the digestion process (in clu d in g reduction and alkylation o f disulfide bridges) is optim al.If it is not true, for example, w hen peptides are bond through the intact Approaches in proteom ics can be distinguished by the level o f the analysis (Fig. 2).Achievem ents in the field of mass spectrom etry now allow direct protein analysis.In this «top-down» experim ent purified proteins are detected intact and after fragm entation w ith the use o f collisionalactivated dissociation (C A D ), electron capture dissocia tion (E C D ) and electron transfer dissociation (E T D ) giv ing inform ation about the intact protein mass and am ino acid sequence [34,35].A nalysis o f the intact proteins by the «top-down» m ethod m in im izes sample preparation and saves inform ation w hich can be lost in other proteomics strategies, such as connectivity o f several P T M s, but it is comparatively insensitive [36].Since analyses have large sizes, the requirements to the M S equipm ent in terms o f resolution and mass accuracy are met o nly by high throughput mass spectrometers, such as Fourier transform io n cyclotron resonance mass spectrometer (FT IC R -M S ) and O rbitrap M S [37].Nevertheless, intact protein analysis was also carried out using more available equipment such as tandem quadrupole and quadrupole tim e-of-flight mass spectrometer (Q T O F M S).In practice, the protein mass range that can be analyzed using «top-down» proteomics is lim ited to ' 50 kD a, therefore, about 500 am ino acids [38].Otherwise, only the C -term inal and N -term in a l ends are sequenced [39,40].Despite the obvious advantages of «top-down» proteomics, the further development o f M S equipment is necessary before it w ill become the m ain technology.
The o verw helm ing m ajo rity o f proteom ic experim ents envisage a procedure o f protein digestion in to peptides before M S analysis.A n a lysis o f peptides has several ad vantages com pared to proteins, in clu d in g m ore effective separation by liq u id chrom atography (LC ), low er m o lec ular mass and low er num ber o f charge states resulting in an increase in sensitivity [41].D epending on the size of the obtained peptides, an approach can be called «bottom-up» or «m iddle-down».In the «bottom-up» strat egy a protein is hydrolyzed to peptides in a range o f ' 500-3000 Da.Then, these peptides are analyzed by liq u id

Proteindatabases
In practice, peptides, as a rule, are identified by search ing protein databases using the search systems, for exam ple, M ascot [49], S E Q U E S T [50], X! Tandem [51], Inspect [52], O M S S A [53], M assM atrix [54], C ru x [55], M y riM a tch [56], M S -G F D B [57] and others.
The m ost com m on search systems are the first three o f the above-mentions list.
The search system M ascot is based on the M O W S E al gorithm (M O le cu la r W eight Search), proposed by Pappin et al. in 1993 [58].
This algorithm uses the search by mass fingerprints o f peptides.A t first, the peptide masses from a database are compared w ith the experim ental data on peptide masses w ith consideration for the specified error.Then, for each match, the Score value (the so-called confidence level) is calculated according to the equation: 5000 M t X П m i .
prot n i, j Score = disulfide bridge, the unpredictable peptides and/or co m plex fragm entation spectra are generated, w h ich w ill not be identified in an automated database search.Protein digestion can lead to loss o f in form ation , for example, the P T M presence and conn ectivity [39] or an ab ility to distinguish closely related proteins due to a failure in de tection o f certain parts o f a protein sequence because of the inadequate size or unfavorable io n iza tio n properties o f the generated peptides.Finally, the qu ality o f the ob tained protein identifications and m odifications can be controlled by false detection rates, protein and peptide score threshold; however, a c ritica l review o f the obtained results is necessary.Protein digestion is an im portant step in the «bottomup» and «middle-down» proteom ics strategy and sig nifi cantly effects protein identification quality [48].For m any years, protein digestion has been im proved due to the de velopm ent o f new methods for increasing throughput and reproducibility [31].(6) scoring peptide sequences; (7) creating an X M L output file that w ill reflect the best scoring sequences and several statistical distrib u tions relevant to the scoring process [51].The described algorithm s are not w ithout drawbacks.For example, M ascot assigns high values o f the index to short peptides, w hich are not always unique for the stud ied protein; also, in analysis o f the proteolyrtic peptide m ix tures m any sequences w ith close score values are present in the list o f candidates; however, the program in the final report leaves only one sequence.

Sam
A disadvantage o f the Sequest algorithm is the high com plexity of calculations and generalizations o f spectra.X! Tan dem, as a probabilistic approach, in certain cases does not al low perform ing reliable identification.A brief description of more than 100 different algorithms and software packages for processing o f mass spectrometry data on peptides and pro teins are presented on the sites http://en.wikipedia.org/wiki/M ass_spectrom etry_softare и http://www.ms-utils.org.
The use o f the databases for protein and peptide identifi cation allows deciphering mass-spectra o f complex mixtures over a short period of time [66].A lm ost all know n amino acid sequences of proteins and peptides are combined in web open-access databases.Each o f them has its own for mat for data storage, various degree o f redundancy, interac tion w ith related or sim ilar databases.A ll databases can be divided into five types.The first type is archive databases, where inform ation is entered b y users.These databases in clude GenBank, E M B L , PD B. The second type is curated databases, w hich content is curated by specialists, such as Swiss-Prot.The third type is automated databases, where entries are generated by computer programs; they include, for example, T rE M B L. The next type is derivative databases w hich are supplemented by processing data from databases o f the first two types.They include SCOP, P F A M , G O and others.The fifth type is integrated databases, w hich com bines inform ation from different databases, such as E N -T R E Z [67].Detection of an am ino acid sequence of peptides and proteins w ithout using search programs and databases is called de novo sequencing.This approach is used for iden tification o f proteins that were not described earlier, in case o f the presence o f unstudied mutations, post-translational m odifications and so on.The used algorithms for de novo 36 sequencing are based on different mathematical m eth ods.The first algorithms for determination o f am ino acid sequence [68,69] represented a search o f all possible com binations of am ino acids com prising the mass of the parent ion, w hich fragmentation was compared to the experimen tal mass spectrum.It is obvious that an error in measuring the mass of the parent ion leads to an increase in the number o f corresponding combinations.

Conclusion
A t present, new pharmaceutical preparations of protein and peptide nature, the methods for control o f specific pro teins emerge in food sector.To detect complex, m ulti-com where M prot -molecular mass of each matching protein, П п -is a product, which is calculated from the Mowseweight matrix, m‫؛‬ j-for each matching between the experi mental data and peptide masses calculated from the entries in the genomic database.
This algorithm can be used for an M S /M S search.In this case, in the equation for Score, a peptide plays a role o f a protein, and a fragment plays a role o f a peptide.The sum o f the peptide scores gives the Protein Score.Also, for each candidate, the species o rigin is given, w hich can become decisive for interpretation, and references to the personal pages (final result) that contain the comprehensive in fo r m ation about a potential protein (values o f its m olecular mass and isoelectric point, decryption o f a tryptic peptide, a num ber o f matches,% o f coverage o f the complete am ino acid sequence o f a protein b y peptides and so on) [60,61].This algorithm can be used for an M S /M S search.
The assessment algorithm is based on probability, w hich has several advantages: (a) a sim ple rule can be used for assessment whether a result is significant or not.This is especially useful for protection from false responses.(b) Scores can be compared w ith the results o f other types of search, such as sequence homology.(c) Search parameters can be easily optim ized by iteration [62].
The Sequest search system is based on the in divid u al identification o f each mass spectrum [50].This resource can be found on the site [63,64].In this method, the n u cleotide databases are translated in six reading frames and obtained am ino acid sequences are searched during the process to id en tify and fit the linear sequences to the frag m entation templates found in the tandem mass-spectra o f peptides.Then the m utual correlation function is used for m easuring the sim ilarity between the mass-to-charge ratio for the ions o f fragments predicted by the am ino acid se quences translated from the nucleotide database and the ions o f fragments observed in the tandem mass spectrum.Generally, the difference o f 0.1 to 2 between norm ed m u tual correlation functions for the results o f the search for the first and second ranking indicates a successful m atch ing between a sequence and a spectrum [45].
The X! Tandem searching system is the m ost developed, as it is a software w ith an open source code.The in fo rm a tion about the resource can be found on the site [65].
T A N D E M was created for running from the command line w ith the name of the input X M L file as a single param eter for the command line.The code was developed w ith the use o f a set o f classes that can accomplish the follow ing tasks: (1) reading X M L input parameter files; (2) reading protein sequences from F A S T A files; (3) reading M S / M S spectra in com m on A S C II formats (D T A , P K L and M a trix Science); (4) conditioning M S / M S spectra to remove noise and com m on artifacts; (5) processing peptide sequences w ith the cleavage reagents, post-translational and chem ical m odifica tions; Recently, proteom ics became w idely used in the field o f biotechnology.Using the proteom ic technologies, the Gorbatov Research Center for Food Systems has devel oped the m ethodological approaches for identification o f the protein profile of meat products, experim ental meat samples and specially produced sausage products, deter m ined tissue-specific proteins, w hich can be used as in dividual biom arkers upon controlling meat products for correspondence to the stated com position.Also, soya and chicken proteins were registered, w hich are the m arker o f falsification [70,71].

Acknowledgments
The study was funded by the grant o f the Russian Scien tific Foundation (project No. 16-16-10073).ponent changes occurring in meat products, it is necessary to use m ethodological approaches, w hich allow register ing hundreds and thousands of proteins.One of these ap proaches is proteomics, w hich makes it possible to identify and reveal quantitative and qualitative changes in the pro tein com position o f cells and tissues.The advances of proteomics help researchers to solve tasks o f post-translational modifications, cell signaling, functional and structural ho m ology o f proteins, detection of a gene expression level.The most significant instrum ent o f proteomics is investigation of the protein maps of hum an and anim al tissues.
Proteom ic technologies are considered quite prom ising and effective for detection o f the biochem ical changes in meat products, for example, the changes in the therm ally stable and species specific proteins, w hich are capable of becom ing corresponding biomarkers.
ple ID (comment): E n te r_ C o m m e n t D alabase searched: S w issP ro td e e r.fa sta P a ra m e le rsu se d In S e a rc h Number of database entries 60632 P M F search selects 25 entries © Rairk D yn a m ic Static P ro b a b ility P ro b a b ility S c o re S co re #(%) M a s se s M atched M a s s E rro r M ean (Std Dev) (ppm) P ro te in c .v e r a g e P ro te in M W (Da).'pl S p e c ie s A c c e s s io ir # P ro te in Nam e
calculating a L O D score developed by N ew ton M orton.It is a statistical test w hich is used for linkage analysis in human, anim al and plant populations.The L O D score compares the like lih o o d o f obtaining test data, if the two lo ci are really linked, w ith the like lih o o d o f observing random events.Positive L O D scores favor the presence o f linkage, w hile negative L O D scores indicate that linkage is less probable.Com puterized analysis o f a L O D score is a simple m ethod for analysis o f com plex fam ily pedigrees to detect a linkage between

DFigure 1 .
Figure1.The «hydrophobic foot» of the polypeptide adsorbs to the hydrophobic surface of the reversed-phase material where it remains until the concentration of the organic modifier increases to the critical concentration[27]

Figure 2 .
Figure2.Overview of proteomics approaches[31] ple ID (comment): E n te r C o m m e n t D alabase searched: S w issP ro td e e r.fa sta Param eters used In Search Number 0 ‫؛‬ database entries 60632 P M E search selects 25 entries.in C o v e ra g e P ro te in M W (D a)/pl S p e c ie s A c c e s s io ii # P ro te in Nam e 4.35 C E R E H A 0 A 2 1 2 D J2 7 C N T N 5 (fragm ent)

Figure 3 .
Figure 3. Protein identification by the international database of the the National Center for Biotechnology Information (NCBI) (USA) (software Mascot «MatrixScience», (USA) [59].(1 -list of potential candidate proteins; 2 -decryption of tryptic peptides with a certain number of matches; 3 -distribution of the revealed peptides along the amino acid sequence; 4 -final result)