Home technology Distributed storage system

Distributed storage system



Introduction

Distributedstoragesystemistostoredatainmultipleindependentdevices.Thetraditionalnetworkstoragesystemusesacentralizedstorageservertostorealldata.Thestorageserverbecomesthebottleneckofsystemperformanceandthefocusofreliabilityandsecurity,whichcannotmeettheneedsoflarge-scalestorageapplications.Thedistributednetworkstoragesystemadoptsanexpandablesystemstructure,usesmultiplestorageserverstosharethestorageload,anduseslocationserverstolocatestorageinformation.Itnotonlyimprovesthereliability,availability,andaccessefficiencyofthesystem,itisalsoeasytoexpand.

Keytechnology

Metadatamanagement

Inthebigdataenvironment,thevolumeofmetadataisalsoverylarge.Theaccessperformanceisthekeytotheperformanceoftheentiredistributedfilesystem.Commonmetadatamanagementcanbedividedintocentralizedanddistributedmetadatamanagementarchitectures.Thecentralizedmetadatamanagementarchitectureusesasinglemetadataserver,whichissimpletoimplement.Butthereareproblemssuchassinglepointoffailure.Thedistributedmetadatamanagementarchitecturedispersesmetadataonmultiplenodes.Furthermore,theperformancebottleneckofthemetadataserverissolved.Italsoimprovesthescalabilityofthemetadatamanagementarchitecture,buttheimplementationismorecomplicatedandtheproblemofmetadataconsistencyisintroduced.Inaddition,thereisadistributedarchitecturewithoutametadataserver,whichorganizesdatathroughonlinealgorithmsanddoesnotrequireadedicatedmetadataserver.Butthisarchitectureisverydifficulttoguaranteedataconsistency.Theimplementationismorecomplicated.Thefiledirectorytraversaloperationisinefficientandlacksthefilesystemglobalmonitoringandmanagementfunction.

Systemelasticexpansiontechnology

Inthebigdataenvironment,thedatascaleandcomplexityincreaseveryrapidly,whichrequireshighsystemexpansionperformance..Torealizethehighscalabilityofthestoragesystem,twoimportantissuesmustbesolvedfirst,includingthedistributionofmetadataandthetransparentmigrationofdata.Thedistributionofmetadataismainlyrealizedthroughstaticsubtreepartitioningtechnology,thelatterfocusesontheoptimizationofdatamigrationalgorithms.Inaddition,thebigdatastoragesystemishuge.Thenodefailurerateishigh,socertainadaptivemanagementfunctionsneedtobecompleted.Thesystemmustbeabletoestimatethenumberofnodesrequiredbasedontheamountofdataandtheworkloadofcalculations,anddynamicallymovedatabetweennodes.Toachieveloadbalancing;atthesametime.Whenanodefails,thedatamustbeabletoberestoredthroughamechanismsuchasacopy,withoutaffectingtheupper-layerapplication.

Optimizationtechnologyinthestoragehierarchy

Whenbuildingastoragesystem.Itneedstobeconsideredbasedoncostandperformance.Therefore,storagesystemsusuallyusemultiplelayersofstoragedeviceswithdifferentcostperformancetoformastoragehierarchy.Thescaleofbigdataislarge,sobuildinganefficientandreasonablestoragehierarchycanreducesystemenergyconsumptionandconstructioncostswhileensuringsystemperformance,andusetheprincipleofdataaccesslocality.Thestoragehierarchycanbeoptimizedfromtwoaspects.Fromtheperspectiveofimprovingperformance,youcananalyzeapplicationcharacteristics,identifyhotdataandcacheorprefetchit,andimproveaccessperformancethroughefficientcacheprefetchalgorithmsandreasonablecachecapacityratios.Fromtheperspectiveofcostreduction,theuseofinformationlifecyclemanagementmethodstomigratecolddatawithlowaccessfrequencytolow-speedandcheapstoragedevicescangreatlyreducesystemconstructioncostsandenergyconsumptionattheexpenseofoverallsystemperformance.

Storageoptimizationtechnologyforapplicationsandloads

Thetraditionaldatastoragemodelneedstosupportasmanyapplicationsaspossible,soitneedstohavegoodversatility.Bigdatahasthecharacteristicsoflarge-scale,highdynamics,andfastprocessing.Thegeneraldatastoragemodelisusuallynotthemodelthatcanimprovetheapplicationperformancethemost.Thebigdatastoragesystempaysmuchmoreattentiontotheperformanceofupper-layerapplicationsthanthepursuitofversatility.Tooptimizestorageforapplicationsandloadsistocoupledatastoragewithapplications.Simplifyorexpandthefunctionsofthedistributedfilesystem,customizeanddeeplyoptimizethefilesystemaccordingtospecificapplications,specificloads,andspecificcomputingmodels,sothatapplicationscanachievethebestperformance.ThistypeofoptimizationtechnologymanageslargedataexceedingpetabytesontheinternalstoragesystemsofInternetcompaniessuchasGoogleandFacebook,andcanachieveveryhighperformance.

Factorstoconsider

Consistency

Thedistributedstoragesystemneedstousemultipleserverstostoredatatogether.Asthenumberofserversincreases,theserverfails.Theprobabilityisalsoincreasing.Inordertoensurethatthesystemisstillavailableintheeventofaserverfailure.Thegeneralpracticeistodivideapieceofdataintomultiplepiecesandstorethemindifferentservers.However,duetotheexistenceoffailuresandparallelstorage,theremaybeinconsistenciesbetweenmultiplecopiesofthesamedata.Thenatureofensuringthatthedataofmultiplecopiesiscompletelyconsistentisreferredtohereasconsistency.

Availability

Adistributedstoragesystemrequiresmultipleserverstoworkatthesametime.Whenthenumberofserversincreases,itisinevitablethatsomeofthemwillfail.Wehopethatthissituationwillnotcausetoomuchimpactontheentiresystem.Afterapartofthenodesinthesystemfails,thesystemasawholedoesnotaffecttheread/writerequestsofthecustomerserviceside,whichiscalledavailability.

Partitionfaulttolerance

Multipleserversinthedistributedstoragesystemareconnectedthroughthenetwork.However,wecannotguaranteethatthenetworkisalwaysunobstructed.Distributedsystemsneedtobefault-toleranttodealwithproblemscausedbynetworkfailures.Asatisfactorysituationisthatwhenanetworkisbrokendownintomultiplepartsduetoafailure,thedistributedstoragesystemcanstillwork.

This article is from the network, does not represent the position of this station. Please indicate the origin of reprint
TOP